Skip to content

Conversation

Scheremo
Copy link
Contributor

@Scheremo Scheremo commented Sep 23, 2025

This PR extends the CIRCT Verilog LSP server with support for project command files (-C / --command-file) and improves robustness around source locations and diagnostics.

  • Command file support

    • Added commandFiles option to VerilogServerOptions.
    • Extended the CLI with -C/--command-file flags to pass one or more command files.
    • Each buffer’s Driver now processes command files, filtering out the current main buffer to avoid duplication.
  • Source location handling

    • Improved absolute/real path resolution when constructing LSP URIs.
    • Fallbacks added for when slang::SourceManager has no full path info.
  • Compilation setup

    • Ensure all definitions in the main buffer scope are treated as top modules.
  • Indexing improvements

    • Skip top instances not defined in the current file.
    • Added guards against invalid or empty ranges (especially macro expansions).

Currently the main thing I am unhappy with is that two compilations are created, where the first one is only to determine all module names - this is necessary, since we index only instantiated modules, and only top modules are instantiated. I tried looking, but didn't find a flag to treat all module definitions as top modules. Maybe someone else has a deeper understanding of the Slang driver and knows a way around this?

The other thing that is a bit hacky is the mangling of the command file; this is necessary because we do definition lookups based on raw pointers, so if the main document is also in the command file the pointers do not match. To prevent this I currently scan the command file and remove the main file (if present). I think we could prevent this if instead of matching raw pointers we matched against a file id / filename + offset. However, this might be slower than the current solution since it adds another layer of indirection. Let me know if you have opinions!

Refactored top module determination to only process the main buffer file; this is sufficient, no need to compile twice.
Also refactored command file mangling to instead not import the main buffer into slang compilation if it is present in any command file, and then after compilation bind it to the llvm source manager's buffer id.

@Scheremo Scheremo force-pushed the pr-lsp-file-list branch 4 times, most recently from e45c7f8 to b4211d2 Compare September 23, 2025 13:01
Copy link
Member

@uenoku uenoku left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi looks super cool! Leveraging slang -C command is great! As for testing I had a PR for adding definition provider #8280 so maybe you can check tests there (test/Tools/circt-verilog-lsp-server/find-definition.test). Non blocking but regarding location look-up we might want to store a mapping from location range to symbol using llvm::IntervalMap since it's very frequently used (like hovering #8304).

@Scheremo
Copy link
Contributor Author

Hi looks super cool! Leveraging slang -C command is great! As for testing I had a PR for adding definition provider #8280 so maybe you can check tests there (test/Tools/circt-verilog-lsp-server/find-definition.test). Non blocking but regarding location look-up we might want to store a mapping from location range to symbol using llvm::IntervalMap since it's very frequently used (like hovering #8304).

Hi @uenoku! Since you're working on virtually the same thing, why don't we join forces?
I think your implementation in #8280 looks very complete & good, is there anything preventing us from merging this?
I believe I can add -C support on top of that pretty easily. WDYT?

@uenoku
Copy link
Member

uenoku commented Sep 24, 2025

Since you're working on virtually the same thing, why don't we join forces?

That would be really cool!

I think your implementation in #8280 looks very complete & good
Thanks!

is there anything preventing us from merging this?

No, I should have merged more earlier :) Let me merge today or tomorrow.

I believe I can add -C support on top of that pretty easily. WDYT?

Yes, I agree! I think you can add SlangProjectLookup to VerilogIndex or VerilogDocument.

@Scheremo
Copy link
Contributor Author

Since you're working on virtually the same thing, why don't we join forces?

That would be really cool!

I think your implementation in #8280 looks very complete & good
Thanks!

is there anything preventing us from merging this?

No, I should have merged more earlier :) Let me merge today or tomorrow.

I believe I can add -C support on top of that pretty easily. WDYT?

Yes, I agree! I think you can add SlangProjectLookup to VerilogIndex or VerilogDocument.

Awesome! Just give me a ping once that's done and I'll work out the rest from there 😄

@uenoku
Copy link
Member

uenoku commented Sep 24, 2025

Merged the PR, sorry for the merge conflict!

@Scheremo Scheremo force-pushed the pr-lsp-file-list branch 13 times, most recently from 8d781ef to 2a4d2c7 Compare September 25, 2025 17:24
@Scheremo
Copy link
Contributor Author

Merged the PR, sorry for the merge conflict!

No worries about that @uenoku! I adapted this version to integrate with the pointer-based lookup you implemented and added a test for cross-file resolution; I also extended the way we index by setting the top modules explicitly and only indexing the ones whose location is in the actual file that's open. Without this optimization, on a large project like the snitch_cluster with a ~400 line file list the LSP takes ~30 seconds to index (on my MBP M4 Max). With this optimization I didn't manage to eyeball-measure the time it takes anymore as it's too small (<1s).

While the regression doesn't capture performance, inter-file lookup in an appreciably large project is reasonably performant. No feeling of lag with the snitch_cluster on my end.

@Scheremo Scheremo marked this pull request as ready for review September 25, 2025 17:33
@Scheremo Scheremo changed the title DRAFT: [circt-verilog-lsp] Add support for -C command files for Slang, project-wide defintion lookup [circt-verilog-lsp] Add support for -C command files for Slang, project-wide defintion lookup Sep 25, 2025
@Scheremo Scheremo force-pushed the pr-lsp-file-list branch 2 times, most recently from de24edc to 9e9c1be Compare September 25, 2025 19:14
@uenoku
Copy link
Member

uenoku commented Sep 26, 2025

Super cool! Thank you for rebasing it.

where the first one is only to determine all module names - this is necessary, since we index only instantiated modules, and only top modules are instantiated. I tried looking, but didn't find a flag to treat all module definitions as top modules.

Does "all module names" here mean modules defined in the main file? I think this is ok if the modules not defined in the main file is not elaborated.

To prevent this I currently scan the command file and remove the main file (if present). I think we could prevent this if instead of matching raw pointers we matched against a file id / filename + offset. However, this might be slower than the current solution since it adds another layer of indirection.

I think we can tweak the getSMLoc() to return a pointer in main buffer if slang location points to the current main file by comparing a file path. Changing the key of IntervalMap to offset looks reasonable as well 👍

I also extended the way we index by setting the top modules explicitly and only indexing the ones whose location is in the actual file that's open. Without this optimization, on a large project like the snitch_cluster with a ~400 line file list the LSP takes ~30 seconds to index (on my MBP M4 Max). With this optimization I didn't manage to eyeball-measure the time it takes anymore as it's too small (<1s).

That's super cool and clever! Thank you for adding the optimization. Relevantly in the long term it would be really cool if we can store indexed data to disk like clangd https://clangd.llvm.org/design/indexing

name = path.string();
}

auto memBuffer = llvm::MemoryBuffer::getFile(name);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can look up the main buffer from here instead of opening a new memory buffer

if (getFileName(uri) == name) { 
  bufferIDMap[bufferID] = 0; // Insert the id for the next call.  
  const auto *buffer = sourceMgr.getMemoryBuffer(0); // get main buffer
  return return llvm::SMLoc::getFromPointer(buffer->getBufferStart() + loc.offset());
} 

@Scheremo Scheremo force-pushed the pr-lsp-file-list branch 5 times, most recently from 63e53c6 to 1e718cd Compare September 26, 2025 08:59
@Scheremo
Copy link
Contributor Author

Super cool! Thank you for rebasing it.

where the first one is only to determine all module names - this is necessary, since we index only instantiated modules, and only top modules are instantiated. I tried looking, but didn't find a flag to treat all module definitions as top modules.

Does "all module names" here mean modules defined in the main file? I think this is ok if the modules not defined in the main file is not elaborated.

That is a good point, and I adapted it. Only the main buffer file is parsed to figure out which top modules to compile the entire project for.

To prevent this I currently scan the command file and remove the main file (if present). I think we could prevent this if instead of matching raw pointers we matched against a file id / filename + offset. However, this might be slower than the current solution since it adds another layer of indirection.

I think we can tweak the getSMLoc() to return a pointer in main buffer if slang location points to the current main file by comparing a file path. Changing the key of IntervalMap to offset looks reasonable as well 👍

I think I'll leave changing the key of IntervalMap to a later day; I gave it some thought, and I think there are also issue with just offsets. We would really want a data structure as the key that normalizes both llvm source buffer IDs and slang buffer IDs over an offset. Seems non-obvious to me! Instead I fixed the current implementation by setting up all slang buffers as llvm membuffers as well; This is pessimistic since most might not be accessed very often, but the most stright forward way I could think of.

I also extended the way we index by setting the top modules explicitly and only indexing the ones whose location is in the actual file that's open. Without this optimization, on a large project like the snitch_cluster with a ~400 line file list the LSP takes ~30 seconds to index (on my MBP M4 Max). With this optimization I didn't manage to eyeball-measure the time it takes anymore as it's too small (<1s).

That's super cool and clever! Thank you for adding the optimization. Relevantly in the long term it would be really cool if we can store indexed data to disk like clangd https://clangd.llvm.org/design/indexing

Absolutely!

I think I've addressed most or all of the points you raised, and I am happy with the state. Please let me know if you find other points we can improve, otherwise I am okay with merging :)

@Scheremo Scheremo force-pushed the pr-lsp-file-list branch 4 times, most recently from 4a8575e to 7b5b9c2 Compare September 26, 2025 12:20
Copy link
Member

@uenoku uenoku left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, really thank you for updating the PR and working around the issue of LLVM buffers/slang source managers. I added comments mostly for code style but otherwise looks awesome!

I agree in the future we should get rid of buffers in VerilogDocument and completely rely on SlangSourceManager, and use offset (or pointer in a slang buffer) as a key of IntervalMap. I originally followed ImportVerilog's pattern to hold LLVM buffers along with slang but it looks we don't need necessary anymore (#8840) probably thanks to slang's version bump.

@Scheremo
Copy link
Contributor Author

Scheremo commented Sep 28, 2025

Hi Hideto, thanks for the comprehensive and constructive feedback! I think I addressed all of it, but please let me know if I missed any or there is further. Otherwise I am okay with merging 😄

EDIT: I also agree with eventually using only slang buffers, rather than LLVM Source buffers + Slang buffers. Should make life easier and tooling faster 😃

@Scheremo Scheremo force-pushed the pr-lsp-file-list branch 2 times, most recently from ce4ed0a to a346e6f Compare September 29, 2025 05:57
This PR extends the CIRCT Verilog LSP server with support for project command files (`-C` / `--command-file`) and improves robustness around source locations and diagnostics.

- **Command file support**
  - Added `commandFiles` option to `VerilogServerOptions`.
  - Extended the CLI with `-C`/`--command-file` flags to pass one or more command files.
  - Each buffer’s `Driver` now processes command files, filtering out the current main buffer to avoid duplication.
  - Implemented `removeFileToTemp` helper to strip the main file from command files and materialize a temporary version.

- **Source location handling**
  - Improved absolute/real path resolution when constructing LSP URIs.
  - Fallbacks added for when `slang::SourceManager` has no full path info.

- **Compilation setup**
  - Ensure all definitions in the buffer/project scope are treated as top modules.

- **Indexing improvements**
  - Skip top instances not defined in the current file.
  - Added guards against invalid or empty ranges (especially macro expansions).
@uenoku
Copy link
Member

uenoku commented Sep 29, 2025

Cool, thank you for updating this! Looks great! I'm going to merge :)

@uenoku uenoku merged commit d469223 into llvm:main Sep 29, 2025
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants