Escaped Identifiers should conform to IEEE 1800 #1274

CheeksTheGeek · 2025-09-07T20:32:15Z

update identifier extraction to conform to IEEE Verilog/SV Standard for proper handling of leading backslashes and trailing whitespace
This tightly follows the IEEE 1800 Standard.
for the dff.sv:

// This file is public domain, it can be freely copied without restrictions.
// SPDX-License-Identifier: CC0-1.0

`timescale 1us/1us

module dff (
  input logic clk, d,
  input logic _reset_n,  // starts with underscore hence needs __getitem__ access
  output logic q,
  output logic \!special!\  // escaped identifier hence needs __getitem__ access
);

always @(posedge clk) begin
  if (!_reset_n) begin
    q <= 1'b0;
    \!special!\ <= 1'b0;
  end else begin
    q <= d;
    \!special!\ <= ~d;  // invert d for the special signal
  end
end
endmodule

we get:

-> import pdb; pdb.set_trace()
(Pdb) dir(dut)
['!special!\\', '__abstractmethods__', '__bool__', '__class__', '__class_getitem__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__firstlineno__', '__format__', '__ge__', '__getattr__', '__getattribute__', '__getitem__', '__getstate__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__iter__', '__le__', '__len__', '__lt__', '__module__', '__ne__', '__new__', '__orig_bases__', '__parameters__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__slots__', '__static_attributes__', '__str__', '__subclasshook__', '__weakref__', '_abc_impl', '_child_path', '_def_file', '_def_name', '_discover_all', '_discovered', '_get', '_get_handle_by_key', '_handle', '_id', '_items', '_keys', '_log', '_name', '_path', '_reset_n', '_sub_handle_key', '_sub_handles', '_type', '_values', 'clk', 'd', 'q']

Fixes #1273

…or proper handling of leading backslashes and trailing whitespace

caryr · 2025-09-09T11:35:53Z

It looks like the patch for lexor.lex is not needed and I'm not sure it is even valid (why is the 'a' at the beginning of the lexor match?). The code as is produces the following intermediate code:

.port_info 4 /OUTPUT 1 "!special!\\";
v0xa0003a8b0_0 .var "!special!\\", 0 0;

Which is the correct string representation for this escaped identifier. I still need to look at what is happening in vvp to see if something bad is happening there that is ultimately causing your issues. Ideally we also want an example program that demonstrates the problem. I assume 'pdb' is using the VPI interface to get at the names so I'll look at creating a VPI example when I look at the 'vvp' part of this.

martinwhitaker · 2025-09-09T16:29:37Z

We've ended up with a double backslash in the vvp code. That may be necessary for the vvp parser (I've not looked), but adding a $dumpvars shows the double blackslash makes its way through to the VCD file. So there does look to be something that needs fixing.

On a positive note, we get the correct output when using the vlog95 target.

caryr · 2025-09-10T02:42:37Z

I think the bug is in the vvp lexor where it is not expanding the \ items correctly. We need the \ in the string passed to vvp so it can handle things like ", octal constants, etc. correctly. The attached vvp lexor patch does some of that, but not everything that may be needed.

martinwhitaker · 2025-09-13T14:03:24Z

I think the bug is in the vvp lexor where it is not expanding the \ items correctly. We need the \ in the string passed to vvp so it can handle things like ", octal constants, etc. correctly. The attached vvp lexor patch does some of that, but not everything that may be needed.

I've had a look at this.

For literal strings (e.g. arguments to system calls), the compiler replaces all non-printable characters with escaped octal values prior to code generation and tgt-vvp passes these through unchanged. vvp converts these back to the original characters in the function __vpiStringConst::process_string_().

For labels (which are constructed from hierarchical scope names), tgt-vvp escapes all characters that would need to be escaped in the Verilog source. vvp doesn't convert them back, but that doesn't matter because the labels are internal to vvp and not visible via the VPI.

For symbol names, tgt-vvp only escapes " and \ characters (in the function vvp_mangle_name()). Currently vvp doesn't convert them back, which is what we need. But we don't need to handle anything other than \" and \\.

martinwhitaker · 2025-09-13T14:15:35Z

Whilst investigating this, I found that the following code compiles OK but results in a syntax error when read by vvp:

module test();

task \!special!\ ;

begin
  $display("special");
end

endtask
 
initial begin
  \!special!\ ;
end

endmodule

The problem appears to be having a trailing \ in a label. I can't immediately see why this is a problem.

martinwhitaker

For efficiency we could also handle the octal escapes in vvp/lexor.lex, to save doing a second pass in __vpiStringConst::process_string_(). But that's not essential.

martinwhitaker · 2025-09-14T14:26:24Z

lexor.lex



-\\[^ \t\b\f\r\n]+         {
+a\\[^ \t\b\f\r\n]+[ \t\b\f\r\n] {


As Cary remarked, the leading a shouldn't be there and breaks the parser. The trailing [ \t\b\f\r\n] is an unnecessary complication, and will cause the compiler to mis-report line numbers if an escaped identifier is terminated by a new line. So I agree with Cary that the changes to this file are not needed.

martinwhitaker · 2025-09-14T14:48:53Z

lexor.lex

+a\\[^ \t\b\f\r\n]+[ \t\b\f\r\n] {
      assert(yylloc.lexical_pos != UINT_MAX);
      yylloc.lexical_pos += 1;
-      yylval.text = strdupnew(yytext+1);


This was the only use of strdupnew(), so it will need to be removed to keep the code warning-free.

Sorry, this comment should have been attached to the similar line in vvp/lexor.lex.

martinwhitaker · 2025-09-14T14:48:58Z

vvp/lexor.lex

+              // Handle escape sequences
+              src++; // skip the backslash
+              switch (*src) {
+                  case '\\': *dst++ = '\\'; break;  // \\ -> \


With gcc 12 I get the following warning:

../../source/vvp/lexor.lex:82:53: warning: multi-line comment [-Wcomment]

which is because it has treated the final \ as a line continuation character. And indeed, testing shows that the following case is not handled.

martinwhitaker · 2025-09-14T14:49:00Z

vvp/lexor.lex

+                  case '"': *dst++ = '"'; break;   // \" -> "
+                  case 'n': *dst++ = '\n'; break;  // \n -> newline
+                  case 't': *dst++ = '\t'; break;  // \t -> tab
+                  case 'r': *dst++ = '\r'; break;  // \r -> carriage return


As per my previous comments, I don't think we need to handle \n, \t. and \r.

update identifier extraction to conform to IEEE Verilog/SV Standard f…

430c6cd

…or proper handling of leading backslashes and trailing whitespace

caryr added Bug and removed Bug labels Sep 10, 2025

martinwhitaker requested changes Sep 14, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Escaped Identifiers should conform to IEEE 1800 #1274

Escaped Identifiers should conform to IEEE 1800 #1274

Uh oh!

CheeksTheGeek commented Sep 7, 2025 •

edited

Loading

Uh oh!

caryr commented Sep 9, 2025

Uh oh!

martinwhitaker commented Sep 9, 2025

Uh oh!

caryr commented Sep 10, 2025

Uh oh!

martinwhitaker commented Sep 13, 2025 •

edited

Loading

Uh oh!

martinwhitaker commented Sep 13, 2025

Uh oh!

martinwhitaker left a comment •

edited

Loading

Uh oh!

martinwhitaker Sep 14, 2025

Uh oh!

martinwhitaker Sep 14, 2025

Uh oh!

martinwhitaker Sep 14, 2025

Uh oh!

martinwhitaker Sep 14, 2025

Uh oh!

martinwhitaker Sep 14, 2025

Uh oh!

Uh oh!

Escaped Identifiers should conform to IEEE 1800 #1274

Are you sure you want to change the base?

Escaped Identifiers should conform to IEEE 1800 #1274

Uh oh!

Conversation

CheeksTheGeek commented Sep 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

caryr commented Sep 9, 2025

Uh oh!

martinwhitaker commented Sep 9, 2025

Uh oh!

caryr commented Sep 10, 2025

Uh oh!

martinwhitaker commented Sep 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

martinwhitaker commented Sep 13, 2025

Uh oh!

martinwhitaker left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

martinwhitaker Sep 14, 2025

Choose a reason for hiding this comment

Uh oh!

martinwhitaker Sep 14, 2025

Choose a reason for hiding this comment

Uh oh!

martinwhitaker Sep 14, 2025

Choose a reason for hiding this comment

Uh oh!

martinwhitaker Sep 14, 2025

Choose a reason for hiding this comment

Uh oh!

martinwhitaker Sep 14, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

CheeksTheGeek commented Sep 7, 2025 •

edited

Loading

martinwhitaker commented Sep 13, 2025 •

edited

Loading

martinwhitaker left a comment •

edited

Loading