Home

Jillion

Jillion, The Java Informatics Large Library for Genomics, is an open source genomics software library written in Java to support bioinformatics. This library was created by a single Software Engineer at the J. Craig Venter Institute (JCVI) and used by several projects including The Influenza Genome Project, Leptospira Genome Project and The Human Microbiome Project and used in over 20,000 viral whole and draft genome submissions to Genbank.

In September 2015, The Viral Pathogen Database and Analysis Resource (ViPR) added a new Rotavirus Genotype Detection Tool written using Jillion and is over an order of magnitude faster than other similar webtools.

How is Jillion Different Than BioJava and Picard?

BioJava and Picard are other Java libraries for bioinformatics that are similar to Jillion. Each of these libraries support some common bioinformatic read formats such as FASTA and FASTQ but there the similarities end. BioJava focuses mainly on input reads and genome annotation where as Jillion focuses on genome assembly. Picard focuses mainly on SAM alignment data. Jillion supports not only input reads and alignments but also has object representations of contigs as well as parsers and writers for many common assembly file formats such as SAM/BAM and Consed's ACE format.

Sequence Support

Like BioJava, Jillion can handle various read input formats such as fasta, fastq, and scf encoded files, but Jillion can also natively handle other formats such as sff, ztr and abi chromatograms. Sequence objects also have different implementations depending on the use case and type of data. For example, a NucleotideSequence object which contains only the nucleotides A,C,G and T could represent each nucleotide as 2 bits each. A different implementation that stores each nucleotide as 4 bits would be used if the sequence contained ambiguous bases. Since quality sequences often have consecutive quality scores of the same value, a run length implementation can compactly store reads or even contig consensus qualities in only a few bytes.

Sanger Chromatogram Format read and write support comparisons to Jillion:

Format	Version	BioJava Read	BioJava Write	Picard Read	Picard Write	Jillion Read	Jillion Write
Abi		✓		X		✓
Ztr	1.2	X	X	X	X	✓	✓
Scf	2	✓	X	X	X	✓	✓
Scf	3	✓	X	X	X	✓	✓

All the popular bioinformatics libraries can read write fasta and fastq files, but only Jillion supports sff files. Jillion has been tested on sff files produced by 454 and Ion Torrent:

Format	Encoding	BioJava Read	BioJava Write	Picard Read	Picard Write	Jillion Read	Jillion Write
Fasta	nucleotide	✓	✓	✓	X	✓	✓
Fasta	protein	✓	✓	X	X	✓	✓
Fasta	qualities	X	X	X	X	✓	✓
Fasta	positions	X	X	X	X	✓	✓
Fasta index (fai)	nuclotide	X	X	✓	✓	✓	✓
Fasta index (fai)	protein	X	X	✓	✓	✓	✓
Fastq	sanger/solexa/illumina	✓	✓	✓	Sanger only	✓	✓
sff		X	X	X	X	✓	✓
bfa (MAQ binary fasta)		X	X	✓	✓	✓	✓
bfq (MAQ) binary fastq)		X	X	✓	✓	✓	✓

Assembly Support

Jillion has objects that represent contigs produced by several assembler programs that are used internally by JCVI including Sam and Bam alignment files, Phrap/Consed .ace files, Celera Assembler .asm files and CLC Bio Assembly Cell .cas files among others. Each contig object not only has the contig consensus sequence but also includes all the underlying read information. Coupled with support for all the various read formats, it is possible to analyze, edit and write out new assembly files. Even though all the underlying read data is stored for each contig, memory usage is kept low. Nucleotide sequence objects for reads that have been assembled into a contig can be encoded to only store a pointer to the contig consensus sequence, the read's start offset into the consensus and any differences in the read sequence vs. the alignment to the contig consensus (if any). This greatly reduces the memory usage for storing underlying contig data since most reads in an assembly have a high identity to the consensus sequence and therefore, few differences.

Unlike BioJava and Picard, Jillion can read and write several different assembly output formats. The Jillion contig objects include the consensus sequence as well as all the underlying sequence read data.

Format	BioJava Read	BioJava Write	Picard Read	Picard Write	Jillion Read	Jillion Write
sam	X	X	✓	✓	✓	✓
bam	X	X	✓	✓	✓	✓
Phrap/Consed .ace	X	X	X	X	✓	✓
Celera .asm	X		X		✓
CLC Bio .cas	X		X		✓
TIGR .contig	X	X	X	X	✓	✓
TIGR .tasm	X	X	X	X	✓	✓

Funding

This work has been funded in whole or part with federal funds from the National Institute of Allergy and Infectious Diseases, National Institutes of Health, Department of Health and Human Services under contract numbers HHSN272200900007C and U19AI110819.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Home

Jillion

How is Jillion Different Than BioJava and Picard?

Sequence Support

Assembly Support

Funding

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally