Skip to content

Conversation

@adam-azarchs
Copy link
Member

Use std::unique_ptr in more places.

Replace c-style arrays with c++ std::array.

Use defaulted constructors/destructors where appropriate.

Make it easier for the compiler to vectorize comparisons in the inner
loop of stitchAlignToTranscript.

Muck around a little with compiler flags.

Use std::unique_ptr in more places.

Replace c-style arrays with c++ std::array.

Use defaulted constructors/destructors where appropriate.

Make it easier for the compiler to vectorize comparisons in the inner
loop of stitchAlignToTranscript.
We're linking statically, and this lets the compiler be better at
dead-code elimination.
src/lib.rs Outdated
Comment on lines 268 to 274
let (chr, pos, cigar, cigar_ops) = if record.len() > 0 {
let rec = &record[0];
(rec.tid().to_string(), rec.pos().to_string(), format!("{}", rec.cigar()), rec.cigar().len().to_string())
} else {
("NA".to_string(), "NA".to_string(), "NA".to_string(), "NA".to_string())
};
println!("{:?},{:?},{:?},{},{},{},{}", std::str::from_utf8(&read).unwrap(), new_now.duration_since(now), record.len(), chr, pos, cigar, cigar_ops);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lots of unnecessary memcpy going on here.

Suggested change
let (chr, pos, cigar, cigar_ops) = if record.len() > 0 {
let rec = &record[0];
(rec.tid().to_string(), rec.pos().to_string(), format!("{}", rec.cigar()), rec.cigar().len().to_string())
} else {
("NA".to_string(), "NA".to_string(), "NA".to_string(), "NA".to_string())
};
println!("{:?},{:?},{:?},{},{},{},{}", std::str::from_utf8(&read).unwrap(), new_now.duration_since(now), record.len(), chr, pos, cigar, cigar_ops);
if record.len() > 0 {
let rec = &record[0];
println!("{:?},{:?},{:?},{},{},{},{}", std::str::from_utf8(&read).unwrap(), new_now.duration_since(now), record.len(), rec.tid(), rec.pos(),rec.cigar(), rec.cigar().len())
} else {
println!("{:?},{:?},{:?},NA,NA,NA,NA", std::str::from_utf8(&read).unwrap(), new_now.duration_since(now), record.len())
};

Though I guess it doesn't really matter if this is just temporary benchmarking code.

But, maybe this should be an actual benchmark?

@evolvedmicrobe
Copy link
Contributor

Hey @adam-azarchs, yeah agreed it doesn't matter since this is just quick benchmarking code. Just for context, we're spending a bunch of time in ALIGN_AND_COUNT and I have zero faith that couldn't be made a lot faster, so was taking a look at the STAR code today.

In general, on a per read basis we take about ~280 microseconds to align a read, for a throughput of about 3.5K reads a second, which means that with datasets with ~380M reads we're taking almost 30 core hours to get our alignments out. Obviously, with cloud that's nothing, but I suspect we could probably do much better for stand alone users and consume less energy.

I benchmarked this branch again, and as before found that although it's better for reasons of readability and I kind of want to merge it in just for that, it didn't move the needle on speed (actually was ~4% slower on average, and although not shown, that wasn't statistically significant, so it may have been a little faster or a little slower but wasn't a game changer).

The plot below is the timings for 10K of the same reads to align, we have a lot of reproducible variation between reads but not so much between branches. The plot shows how long it took to align each read on master and your branch.

There's a lot of heuristics at work in this code, but stepping through the code it looks like we should have a much higher throughput than we do, so am looking into that a bit more.

image

@adam-azarchs
Copy link
Member Author

I think I know why there was maybe a performance regression, because the loop vectorizer was having trouble figuring out that comparing two array<char, 4> was just comparing two 32-bit values. I added a template specialization to make it easier for it to do it right.

@adam-azarchs
Copy link
Member Author

I'm wondering whether #53 will make any difference to what we see here.

@evolvedmicrobe
Copy link
Contributor

I think at next cellranger-dev meeting we should discuss getting a benchmark you can iterate with. The CPU utilization of ALIGN_AND_COUNT remains quite low, and I think it might be best to subdivide the algorithmic and implementation optimizations amongst folks.

@adam-azarchs
Copy link
Member Author

Low utilization means I/O bound. There's definitely low-hanging fruit to be found there. To start with, I think we need to give another try to the mmap strategy, just this time a less buggy implementation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants