Skip to content

bam diff seems to return the reads that look identical between A and B #59

@hisplan

Description

@hisplan

Hi,

I have two BAM files that I'd like to compare. Each is about 5.6GB. I expect them to be identical (I'm sort of doing a reproducibility test).

When I ran with the following command:

bam diff --in1 a.bam --in2 b.bam --all --onlyDiffs --recPoolSize -1 --out c.bam

It generated three files:

-rw-r--r-- 1    2373574 Jun 13 15:12 c.bam
-rw-r--r-- 1    1803478 Jun 13 15:12 c_only1_a.bam
-rw-r--r-- 1    1803105 Jun 13 15:12 c_only2_b.bam

I tried to see what actually differs between the two, but I think they look identical. My suspicion is maybe something to do with the muti-mapped reads. Do you have any idea how to resolve this?

samtools view c.bam | head -n1
:TAGATCGCAAATGGTA:CTTATCAAGCGC:;A00228:279:HFWFVDMXX:1:1229:8477:28792	0	1	629088	1	46M1I44M	*	0	0	GTCCGAACTAGTATCAGGCTTCAAAATCGAATACGCCGCAGGCCCCCTTCGCCCTATTCTTCATAGCAGAATACACAAACATTATTATAAT	,:FF,FFFFFF:,,FF:FFFFFFF,F::FFF,FF,FF:FFFF:FFF,:F:F,FFFFFF,F,FF:FFF,FFF:FFFFFFFFFFFFF:FFFFF	AS:i:78	HI:i:4	NH:i:4	nM:i:3	ZC:Z:42M1I48M	ZT:Z:AS:i:78;HI:i:3;NH:i:4;nM:i:3
$ samtools view a.bam | grep -F ":TAGATCGCAAATGGTA:CTTATCAAGCGC:;A00228:279:HFWFVDMXX:1:1229:8477:28792"
:TAGATCGCAAATGGTA:CTTATCAAGCGC:;A00228:279:HFWFVDMXX:1:1229:8477:28792	0	1	629088	1	42M1I48M	*	0	0	GTCCGAACTAGTATCAGGCTTCAAAATCGAATACGCCGCAGGCCCCCTTCGCCCTATTCTTCATAGCAGAATACACAAACATTATTATAAT	,:FF,FFFFFF:,,FF:FFFFFFF,F::FFF,FF,FF:FFFF:FFF,:F:F,FFFFFF,F,FF:FFF,FFF:FFFFFFFFFFFFF:FFFFF	NH:i:4	HI:i:3	AS:i:78	nM:i:3
:TAGATCGCAAATGGTA:CTTATCAAGCGC:;A00228:279:HFWFVDMXX:1:1229:8477:28792	0	1	629088	1	46M1I44M	*	0	0	GTCCGAACTAGTATCAGGCTTCAAAATCGAATACGCCGCAGGCCCCCTTCGCCCTATTCTTCATAGCAGAATACACAAACATTATTATAAT	,:FF,FFFFFF:,,FF:FFFFFFF,F::FFF,FF,FF:FFFF:FFF,:F:F,FFFFFF,F,FF:FFF,FFF:FFFFFFFFFFFFF:FFFFF	NH:i:4	HI:i:4	AS:i:78	nM:i:3
$ samtools view b.bam | grep -F ":TAGATCGCAAATGGTA:CTTATCAAGCGC:;A00228:279:HFWFVDMXX:1:1229:8477:28792"
:TAGATCGCAAATGGTA:CTTATCAAGCGC:;A00228:279:HFWFVDMXX:1:1229:8477:28792	0	1	629088	1	42M1I48M	*	0	0	GTCCGAACTAGTATCAGGCTTCAAAATCGAATACGCCGCAGGCCCCCTTCGCCCTATTCTTCATAGCAGAATACACAAACATTATTATAAT	,:FF,FFFFFF:,,FF:FFFFFFF,F::FFF,FF,FF:FFFF:FFF,:F:F,FFFFFF,F,FF:FFF,FFF:FFFFFFFFFFFFF:FFFFF	NH:i:4	HI:i:3	AS:i:78	nM:i:3
:TAGATCGCAAATGGTA:CTTATCAAGCGC:;A00228:279:HFWFVDMXX:1:1229:8477:28792	0	1	629088	1	46M1I44M	*	0	0	GTCCGAACTAGTATCAGGCTTCAAAATCGAATACGCCGCAGGCCCCCTTCGCCCTATTCTTCATAGCAGAATACACAAACATTATTATAAT	,:FF,FFFFFF:,,FF:FFFFFFF,F::FFF,FF,FF:FFFF:FFF,:F:F,FFFFFF,F,FF:FFF,FFF:FFFFFFFFFFFFF:FFFFF	NH:i:4	HI:i:4	AS:i:78	nM:i:3

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions