Skip to content
Dan Katzel edited this page Jul 15, 2017 · 1 revision

Range

Range is an object representing an immutable pair of coordinates which describes a contiguous subset of values. Ranges are used throughout the code base to represent everything from trim points, sub sequence ranges, alignment coordinates, read and/or contig locations in a scaffold.

Coordinate Systems

Different users and different file formats prefer to use different coordinate systems when talking about genomic locations so the Range class has methods that can seamlessly convert from one coordinate to another.

These are the current coordinate systems the Range class supports:

0-based

This is exactly like computer array offsets. The coordinates begin at 0 and the last element in the range has an offset of length -1.

         coordinate system    0  1  2  3  4  5
                            --|--|--|--|--|--|
         range elements       0  1  2  3  4  5

1-based (also known as residue-based)

This is the most common coordinate system used in human readable genomic data. The first element has a position of 1 and the last element in the range has a position of length.

        coordinate system    1  2  3  4  5  6
                           --|--|--|--|--|--|
        range elements       0  1  2  3  4  5

Spaced-based

This coordinate system counts the "spaces" between elements. The first element has a coordinate of 0 while the last element in the range has a position of length. This coordinate system is used in the Celera Assembler ASM format.

        coordinate system   0  1  2  3  4  5  6
                           --|--|--|--|--|--|--
        range elements       0  1  2  3  4  5

Creating Range Objects

Ranges can be only be built using the static factory methods (there are no public constructors).

//unless specified, the coordinate system defaults to 0-based.
Range range = Range.of(0,10);
 
assertEquals(0, range.getBegin());
assertEquals(10, range.getEnd());
 
//constructor with specified coordinate system
Range residueRange = Range.of(RESIDUE_BASED,1, 11);

Converting Ranges To Different Coordinate Systems

There are methods getBegin(CoordinateSystem) and getEnd(CoordinateSystem) that can return the begin and end coordinates in the given coordinate system.

//defaults to 0-based range
Range range = Range.of(0,10);
 
assertEquals(0, range.getBegin(ZERO_BASED));
assertEquals(10, range.getEnd(ZERO_BASED));
 
assertEquals(0, range.getBegin(SPACED_BASED));
assertEquals(11, range.getEnd(SPACED_BASED));
 
 
assertEquals(1, range.getBegin(RESIDUE_BASED));
assertEquals(11, range.getEnd(RESIDUE_BASED));

Intersecting Ranges

Testing if Two Ranges Intersect at All

Range range = Range.of(0,10);
Range otherRange = Range.of(5, 15);
 
assertTrue(range.intersects(otherRange));
 
Range differentRange = Range.of(15,25);
assertFalse(range.intersects(differentRange));

Creating a new Range that is the Intersection of Two Ranges

Range target = Range.of(5,15);
assertEquals(Range.of(5, 10), range.intersection(target));
 
//if ranges don't intersect then return empty range
Range target = Range.of(15,25);
assertTrue(range.intersection(target).isEmpty());

Merging Ranges

Ranges that overlap could be merged into one Range that encompasses both input Ranges. There is a helper class called Ranges that has static methods with working with collections of Ranges. Ranges has methods to merge Ranges that are close to each other.

assertEquals(Arrays.asList(Range.of(0, 20)),
                            Ranges.merge(
					Range.of(0,10),
					Range.of(5,20)));
 
//The merge() method will merge as many Ranges as it can and return a List of new Ranges that are guaranteed to not overlap.
List<Range> merged = Range.merge(
                         Range.of(0,10),
		         Range.of(5,20),
		         Range.of(22,30));
 
assertEquals(Arrays.asList(Range.of(0,20), Range.of(22,30)),
                         merged);

Clone this wiki locally