Skip to content

feat: add pyperscan support #5228

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Conversation

ffontaine
Copy link
Contributor

hyperscan will run simultaneously all version checkers on a file which reduce processing time.

pyperscan package is used instead of the most well-known hyperscan package as pyperscan allows to add a tag for each pattern. This feature will allow to retrieve easily the checker associated to the matched pattern.

On my local machine, running a scan on an embedded firmware takes 220 seconds with pyperscan instead of 326 seconds.

However, pyperscan is slower on a single file and unsupported on Windows, so add a --pyperscan option (disabled by default)

Fix #2485

Copy link
Contributor

@anthonyharrison anthonyharrison left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the database built each time cve-bin-tool is run? If not, where is the database stored? Should rgere be an option to rebulid the database?

Need extra tests to support the new CLI option

Copy link
Contributor

@alex-ter alex-ter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few minor docs-related comments with suggested fixes.

doc/MANUAL.md Outdated
@@ -887,6 +889,17 @@ This option allows one to skip (disable) a comma-separated list of checkers and

This option allows one to enable a comma-separated list of checkers.

### --pyperscan

The pyperscan flag enables pyperscan support in the CVE Bin Tool. [pyperscan](https://github.com/vlaci/pyperscan) is an opinionated Python binding for [Hyperscan](https://www.hyperscan.io) focusing on easy of use and safety.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've checked a bit more, and looks like pyperscan uses Vectorscan fork by default, not Hyperscan (see vlaci/pyperscan#35 and e.g., the build container configs). You might want to change that, too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, I added a note in the README file to add this information.

@ffontaine
Copy link
Contributor Author

Is the database built each time cve-bin-tool is run? If not, where is the database stored? Should rgere be an option to rebulid the database?

Yes, the hyperscan database is built every time cve-bin-tool is run and is not stored anywhere.
Actually, hyperscan database is handled like python regular expression which are also compiled on the fly:

cls.VERSION_PATTERNS = list(map(re.compile, cls.VERSION_PATTERNS))

As it is not saved anywhere, it doesn't make sense to add an option to rebuild it.

Need extra tests to support the new CLI option

I added a simple test, tell me if more is needed.

@ffontaine ffontaine force-pushed the add-hyperscan branch 2 times, most recently from 8daf170 to cced951 Compare July 27, 2025 19:16
hyperscan will run simultaneously all version checkers on a file which
reduce processing time.

pyperscan package is used instead of the most well-known hyperscan
package as pyperscan allows to add a tag for each pattern. This feature
will allow to retrieve easily the checker associated to the matched
pattern.

On my local machine, running a scan on an embedded firmware takes 220
seconds with pyperscan instead of 326 seconds.

However, pyperscan is slower on a single file and unsupported on
Windows, so add a --pyperscan option (disabled by default)

Fix intel#2485

Signed-off-by: Fabrice Fontaine <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

hyperscan for regex matching?
3 participants