New lexer #194

shewitt-au · 2025-08-16T15:10:01Z

New lexer

I have been working on a new lexer. In no hurry. Just hacking away at it when time permits. And it's somewhere near ready. It uses an external header-only lexing library (lexertl17). And it's fast! Really fast! I'm sure there are problems, but I have not found any yet.

ImHex changes here

I've tested it using GitHub's CI tests on the PL, but in ImHex I could not. The changes need approval (didn't used to).

Implementation details

I've used an external header-only lexing library which uses DFA tables for the lexing. In debug builds these are built at runtime. In Release build I've set things up to build them at compile time. This is not strictly necessary, but I figured may as well. Faster is faster.

@WerWolv I'd appreciate if you could check out the pre-build steps I've added for these changes. They seem to work, but I used a Google-and-tweak approach to get it to work.

For me at least (an older computer) the speed increase is dramatic. I'd be interested on seeing how it performs on other systems.

WerWolv · 2025-08-16T17:12:18Z

Hey, thanks for the PR. Next time though, before starting on a big change like that, especially when adding new dependencies, it would be great if you could join e.g our Discord server or otherwise have a discussion with me so we can lay out what actually should be done.

Currently I'm not really sure what kind of issues you're solving by rewriting the custom-made lexer. Is it really too slow? As far as I know, a sizable portion is spent in the Parser and basically everything else in the evaluator while the lexer is more or less instantaneous unless you're giving it gigantic files I guess. So what exactly is the use case? I'm open for improvements but I'm not sure just rewriting it is the right approach as there has been quite some thought been put into the whole pipeline by many people

shewitt-au · 2025-08-16T17:19:34Z

@WerWolv

I understand your reservations. I would suggest you give it a try. If it improves things as much for you as it does for me (an older computer) then your resonable reservations might be reduced. I've never used Discord. Am I too old for Discord? Perhaps I can still learn new tricks.

shewitt-au · 2025-08-16T17:22:50Z

WerWolv

Next time though, before starting on a big change like that, especially when adding new dependencies, it would be great if you could

Yeah, that sounds resonable. But I've had not had many good opinions re my changes in the past, even when I felt they had merit. I'd already started this. Figured may as well finish it.

WerWolv · 2025-08-16T17:31:37Z

Having small features or fixes merged is usually no issue but for bigger things, not having a discussion first will usually lead to you implementing things that aren't necessary needed or done in a way that doesn't align with other things. I promise you, talking to me about what you're planning to do will increase the chances of having your changes merged thousandfold :)
You can join our Discord here if you like, it's ultimately not too much different from IRC: https://discord.gg/X63jZ36xBY

I wasn't trying to say your PR is not needed, I'm just wondering what actual problem it's trying to address. Like if Lexing previously took 2ms and now it takes 1ms, that is a 50% improvement that basically nobody will ever notice. That's why I'm wondering why you chose to rewrite it completely in the first place

shewitt-au · 2025-08-16T17:35:57Z

@WerWolv

I'm just wondering what actual problem

For me pattern loading, and the "analyzing data" task are slow. Not horribly slow, but slow enough that I (initially, when I was new to ImHex) wondered if it was working or the app had hung. I the case of the "analyzing data" task the suggested pattern pops up "out of the blue" after I've loaded a file and are doing other things.

As for why the lexer, I started at the bottom. I figured something at an end only has one dependency, whereas something in the middle has two. The bottom it is.

For me, in a release build, the analyzing data task is twice as fast with these changes. I've measured it. And I've only changed the lexer. I can feel the difference. The pop-up that suggests a pattern pops up much faster.

That said, if others don't get similar speed increases, that is intersting in its own right.

Enough spruiking I guess:)

shewitt-au · 2025-11-04T04:37:44Z

@WerWolv

I've been using this PR in all my local builds. Here's why:

Old lexer:

old.mp4

New lexer:

new.mp4

When making it I dumped the output of both lexers for all patterns and compared them. I've been using it for months. The usability difference is anything but negligible. For me it’s dramatic. I does have an external dependency (header only). There are no additional runtime dependencies.

I would be interested in hearing any thoughts on it. Perhaps there is something I could do to make it more acceptable? As things are I can only guess as to why something that, for me at least makes such a big difference, seems to have attracted so little interest. Speedwise it wipes the floor with the old lexer.

shewitt-au mentioned this pull request Aug 16, 2025

new lexer WerWolv/ImHex#2409

Closed

shewitt-au marked this pull request as ready for review August 16, 2025 17:08

WerWolv force-pushed the master branch from 6c011fb to f97999d Compare September 1, 2025 20:53

Migrate code

c31ac1a

shewitt-au force-pushed the new_lexer branch from bea2aaf to c31ac1a Compare September 6, 2025 12:48

shewitt-au added 4 commits September 7, 2025 20:06

Add .gitignore to hide gen code

7cca893

Fix max line len for last line without /n

0e7758e

Merge branch 'WerWolv:master' into new_lexer

d75495e

Merge branch 'WerWolv:master' into new_lexer

c958609

Cleanup some includes

3212b7a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

New lexer #194

New lexer #194

Uh oh!

shewitt-au commented Aug 16, 2025 •

edited

Loading

Uh oh!

WerWolv commented Aug 16, 2025

Uh oh!

shewitt-au commented Aug 16, 2025 •

edited

Loading

Uh oh!

shewitt-au commented Aug 16, 2025 •

edited

Loading

Uh oh!

WerWolv commented Aug 16, 2025

Uh oh!

shewitt-au commented Aug 16, 2025 •

edited

Loading

Uh oh!

shewitt-au commented Nov 4, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

New lexer #194

Are you sure you want to change the base?

New lexer #194

Uh oh!

Conversation

shewitt-au commented Aug 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

New lexer

Implementation details

Uh oh!

WerWolv commented Aug 16, 2025

Uh oh!

shewitt-au commented Aug 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

shewitt-au commented Aug 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

WerWolv commented Aug 16, 2025

Uh oh!

shewitt-au commented Aug 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

shewitt-au commented Nov 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

shewitt-au commented Aug 16, 2025 •

edited

Loading

shewitt-au commented Aug 16, 2025 •

edited

Loading

shewitt-au commented Aug 16, 2025 •

edited

Loading

shewitt-au commented Aug 16, 2025 •

edited

Loading

shewitt-au commented Nov 4, 2025 •

edited

Loading