Skip to content

Conversation

CheeksTheGeek
Copy link

@CheeksTheGeek CheeksTheGeek commented Sep 7, 2025

  • update identifier extraction to conform to IEEE Verilog/SV Standard for proper handling of leading backslashes and trailing whitespace
    This tightly follows the IEEE 1800 Standard.
    for the dff.sv:
// This file is public domain, it can be freely copied without restrictions.
// SPDX-License-Identifier: CC0-1.0

`timescale 1us/1us

module dff (
  input logic clk, d,
  input logic _reset_n,  // starts with underscore hence needs __getitem__ access
  output logic q,
  output logic \!special!\  // escaped identifier hence needs __getitem__ access
);

always @(posedge clk) begin
  if (!_reset_n) begin
    q <= 1'b0;
    \!special!\ <= 1'b0;
  end else begin
    q <= d;
    \!special!\ <= ~d;  // invert d for the special signal
  end
end
endmodule

we get:

-> import pdb; pdb.set_trace()
(Pdb) dir(dut)
['!special!\\', '__abstractmethods__', '__bool__', '__class__', '__class_getitem__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__firstlineno__', '__format__', '__ge__', '__getattr__', '__getattribute__', '__getitem__', '__getstate__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__iter__', '__le__', '__len__', '__lt__', '__module__', '__ne__', '__new__', '__orig_bases__', '__parameters__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__slots__', '__static_attributes__', '__str__', '__subclasshook__', '__weakref__', '_abc_impl', '_child_path', '_def_file', '_def_name', '_discover_all', '_discovered', '_get', '_get_handle_by_key', '_handle', '_id', '_items', '_keys', '_log', '_name', '_path', '_reset_n', '_sub_handle_key', '_sub_handles', '_type', '_values', 'clk', 'd', 'q']

Fixes #1273

…or proper handling of leading backslashes and trailing whitespace
@caryr
Copy link
Collaborator

caryr commented Sep 9, 2025

It looks like the patch for lexor.lex is not needed and I'm not sure it is even valid (why is the 'a' at the beginning of the lexor match?). The code as is produces the following intermediate code:

.port_info 4 /OUTPUT 1 "!special!\\";
v0xa0003a8b0_0 .var "!special!\\", 0 0;

Which is the correct string representation for this escaped identifier. I still need to look at what is happening in vvp to see if something bad is happening there that is ultimately causing your issues. Ideally we also want an example program that demonstrates the problem. I assume 'pdb' is using the VPI interface to get at the names so I'll look at creating a VPI example when I look at the 'vvp' part of this.

@martinwhitaker
Copy link
Collaborator

We've ended up with a double backslash in the vvp code. That may be necessary for the vvp parser (I've not looked), but adding a $dumpvars shows the double blackslash makes its way through to the VCD file. So there does look to be something that needs fixing.

On a positive note, we get the correct output when using the vlog95 target.

@caryr
Copy link
Collaborator

caryr commented Sep 10, 2025

I think the bug is in the vvp lexor where it is not expanding the \ items correctly. We need the \ in the string passed to vvp so it can handle things like ", octal constants, etc. correctly. The attached vvp lexor patch does some of that, but not everything that may be needed.

@caryr caryr added Bug and removed Bug labels Sep 10, 2025
@martinwhitaker
Copy link
Collaborator

martinwhitaker commented Sep 13, 2025

I think the bug is in the vvp lexor where it is not expanding the \ items correctly. We need the \ in the string passed to vvp so it can handle things like ", octal constants, etc. correctly. The attached vvp lexor patch does some of that, but not everything that may be needed.

I've had a look at this.

For literal strings (e.g. arguments to system calls), the compiler replaces all non-printable characters with escaped octal values prior to code generation and tgt-vvp passes these through unchanged. vvp converts these back to the original characters in the function __vpiStringConst::process_string_().

For labels (which are constructed from hierarchical scope names), tgt-vvp escapes all characters that would need to be escaped in the Verilog source. vvp doesn't convert them back, but that doesn't matter because the labels are internal to vvp and not visible via the VPI.

For symbol names, tgt-vvp only escapes " and \ characters (in the function vvp_mangle_name()). Currently vvp doesn't convert them back, which is what we need. But we don't need to handle anything other than \" and \\.

@martinwhitaker
Copy link
Collaborator

Whilst investigating this, I found that the following code compiles OK but results in a syntax error when read by vvp:

module test();

task \!special!\ ;

begin
  $display("special");
end

endtask
 
initial begin
  \!special!\ ;
end

endmodule

The problem appears to be having a trailing \ in a label. I can't immediately see why this is a problem.

Copy link
Collaborator

@martinwhitaker martinwhitaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For efficiency we could also handle the octal escapes in vvp/lexor.lex, to save doing a second pass in __vpiStringConst::process_string_(). But that's not essential.



\\[^ \t\b\f\r\n]+ {
a\\[^ \t\b\f\r\n]+[ \t\b\f\r\n] {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As Cary remarked, the leading a shouldn't be there and breaks the parser. The trailing [ \t\b\f\r\n] is an unnecessary complication, and will cause the compiler to mis-report line numbers if an escaped identifier is terminated by a new line. So I agree with Cary that the changes to this file are not needed.

a\\[^ \t\b\f\r\n]+[ \t\b\f\r\n] {
assert(yylloc.lexical_pos != UINT_MAX);
yylloc.lexical_pos += 1;
yylval.text = strdupnew(yytext+1);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was the only use of strdupnew(), so it will need to be removed to keep the code warning-free.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, this comment should have been attached to the similar line in vvp/lexor.lex.

// Handle escape sequences
src++; // skip the backslash
switch (*src) {
case '\\': *dst++ = '\\'; break; // \\ -> \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With gcc 12 I get the following warning:

../../source/vvp/lexor.lex:82:53: warning: multi-line comment [-Wcomment]

which is because it has treated the final \ as a line continuation character. And indeed, testing shows that the following case is not handled.

case '"': *dst++ = '"'; break; // \" -> "
case 'n': *dst++ = '\n'; break; // \n -> newline
case 't': *dst++ = '\t'; break; // \t -> tab
case 'r': *dst++ = '\r'; break; // \r -> carriage return
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per my previous comments, I don't think we need to handle \n, \t. and \r.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Escaped Identifiers not conformed to IEEE 1800
3 participants