Skip to content

Conversation

ikelos
Copy link
Member

@ikelos ikelos commented Nov 30, 2024

Scanning had been rigged to chop up a layer's readable bits into the chunks that make it up from the lower layer. Unfortunately it did this on chunks in the lower layer, rather than chunks readable in the higher layer. This meant chunks in Intel were already read in page sized blocks. This vastly simplifies the reading of data and brings non-linear layer searching in sync with linear layer searching (because there's no good reason they should be different).

This doesn't directly help with the yarascanning issue that spawned its investigation, but it will vastly simplify things. The big question is whether it speeds things up or slows them down...

Since this is core functionality I'd like all eyes on it before it lands please...

Scanning had been rigged to chop up a layer's readable bits into the
chunks that make it up from the lower layer.  Unfortunately it did this
on chunks in the lower layer, rather than chunks readable in the higher
layer.  This meant chunks in Intel were already read in page sized
blocks.  This vastly simplifies the reading of data and brings
non-linear layer searching in sync with linear layer searching (because
there's no good reason they should be different).
@ikelos ikelos force-pushed the feature/scan-one-layer-deep branch from df7b8a4 to 90c048f Compare November 30, 2024 14:43
Copy link
Contributor

@gcmoreira gcmoreira left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll take a closer look later, but from a quick glance, it seems like this new implementation will always yield a list with just one tuple, which doesn't seem ideal. Is there any chance we can improve that as well? Also, I think these changes will also require updates to the docstring.

@ikelos
Copy link
Member Author

ikelos commented Dec 1, 2024

Probably it will, but I think that's ok. It allows for the gathering of data from disparate places, which might be useful one day for a different type of layer. I think that going back and ripping out how it used to work may not provide much benefit, and I'd rather do it when we do major renovations on the scanning system instead. This might not be a very useful PR, everything works as is, I don't think it provides significant speed-up (particularly since it'll need to do re-lookups again) but it's also not detrimental and should make the scanning easier to understand (even though it took me 3 attempts to get it all rigth). Happy to leave this sit here, but a useful exercise in case we ever hit similar situations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants