Skip to content

Conversation

shewitt-au
Copy link
Contributor

@shewitt-au shewitt-au commented Aug 16, 2025

New lexer

I have been working on a new lexer. In no hurry. Just hacking away at it when time permits. And it's somewhere near ready. It uses an external header-only lexing library (lexertl17). And it's fast! Really fast! I'm sure there are problems, but I have not found any yet.

ImHex changes here

I've tested it using GitHub's CI tests on the PL, but in ImHex I could not. The changes need approval (didn't used to).

Implementation details

I've used an external header-only lexing library which uses DFA tables for the lexing. In debug builds these are built at runtime. In Release build I've set things up to build them at compile time. This is not strictly necessary, but I figured may as well. Faster is faster.

@WerWolv I'd appreciate if you could check out the pre-build steps I've added for these changes. They seem to work, but I used a Google-and-tweak approach to get it to work.

For me at least (an older computer) the speed increase is dramatic. I'd be interested on seeing how it performs on other systems.

@shewitt-au shewitt-au marked this pull request as ready for review August 16, 2025 17:08
@WerWolv
Copy link
Owner

WerWolv commented Aug 16, 2025

Hey, thanks for the PR. Next time though, before starting on a big change like that, especially when adding new dependencies, it would be great if you could join e.g our Discord server or otherwise have a discussion with me so we can lay out what actually should be done.

Currently I'm not really sure what kind of issues you're solving by rewriting the custom-made lexer. Is it really too slow? As far as I know, a sizable portion is spent in the Parser and basically everything else in the evaluator while the lexer is more or less instantaneous unless you're giving it gigantic files I guess. So what exactly is the use case? I'm open for improvements but I'm not sure just rewriting it is the right approach as there has been quite some thought been put into the whole pipeline by many people

@shewitt-au
Copy link
Contributor Author

shewitt-au commented Aug 16, 2025

@WerWolv

I understand your reservations. I would suggest you give it a try. If it improves things as much for you as it does for me (an older computer) then your resonable reservations might be reduced. I've never used Discord. Am I too old for Discord? Perhaps I can still learn new tricks.

@shewitt-au
Copy link
Contributor Author

shewitt-au commented Aug 16, 2025

WerWolv

Next time though, before starting on a big change like that, especially when adding new dependencies, it would be great if you could

Yeah, that sounds resonable. But I've had not had many good opinions re my changes in the past, even when I felt they had merit. I'd already started this. Figured may as well finish it.

@WerWolv
Copy link
Owner

WerWolv commented Aug 16, 2025

Having small features or fixes merged is usually no issue but for bigger things, not having a discussion first will usually lead to you implementing things that aren't necessary needed or done in a way that doesn't align with other things. I promise you, talking to me about what you're planning to do will increase the chances of having your changes merged thousandfold :)
You can join our Discord here if you like, it's ultimately not too much different from IRC: https://discord.gg/X63jZ36xBY

I wasn't trying to say your PR is not needed, I'm just wondering what actual problem it's trying to address. Like if Lexing previously took 2ms and now it takes 1ms, that is a 50% improvement that basically nobody will ever notice. That's why I'm wondering why you chose to rewrite it completely in the first place

@shewitt-au
Copy link
Contributor Author

shewitt-au commented Aug 16, 2025

@WerWolv

I'm just wondering what actual problem

For me pattern loading, and the "analyzing data" task are slow. Not horribly slow, but slow enough that I (initially, when I was new to ImHex) wondered if it was working or the app had hung. I the case of the "analyzing data" task the suggested pattern pops up "out of the blue" after I've loaded a file and are doing other things.

As for why the lexer, I started at the bottom. I figured something at an end only has one dependency, whereas something in the middle has two. The bottom it is.

For me, in a release build, the analyzing data task is twice as fast with these changes. I've measured it. And I've only changed the lexer. I can feel the difference. The pop-up that suggests a pattern pops up much faster.

That said, if others don't get similar speed increases, that is intersting in its own right.

Enough spruiking I guess:)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants