-
Notifications
You must be signed in to change notification settings - Fork 570
Escaped Identifiers should conform to IEEE 1800 #1274
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Escaped Identifiers should conform to IEEE 1800 #1274
Conversation
…or proper handling of leading backslashes and trailing whitespace
It looks like the patch for lexor.lex is not needed and I'm not sure it is even valid (why is the 'a' at the beginning of the lexor match?). The code as is produces the following intermediate code:
Which is the correct string representation for this escaped identifier. I still need to look at what is happening in |
We've ended up with a double backslash in the vvp code. That may be necessary for the vvp parser (I've not looked), but adding a On a positive note, we get the correct output when using the vlog95 target. |
I think the bug is in the vvp lexor where it is not expanding the \ items correctly. We need the \ in the string passed to vvp so it can handle things like ", octal constants, etc. correctly. The attached vvp lexor patch does some of that, but not everything that may be needed. |
I've had a look at this. For literal strings (e.g. arguments to system calls), the compiler replaces all non-printable characters with escaped octal values prior to code generation and tgt-vvp passes these through unchanged. vvp converts these back to the original characters in the function For labels (which are constructed from hierarchical scope names), tgt-vvp escapes all characters that would need to be escaped in the Verilog source. vvp doesn't convert them back, but that doesn't matter because the labels are internal to vvp and not visible via the VPI. For symbol names, tgt-vvp only escapes |
Whilst investigating this, I found that the following code compiles OK but results in a syntax error when read by vvp:
The problem appears to be having a trailing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For efficiency we could also handle the octal escapes in vvp/lexor.lex, to save doing a second pass in __vpiStringConst::process_string_()
. But that's not essential.
|
||
|
||
\\[^ \t\b\f\r\n]+ { | ||
a\\[^ \t\b\f\r\n]+[ \t\b\f\r\n] { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As Cary remarked, the leading a
shouldn't be there and breaks the parser. The trailing [ \t\b\f\r\n]
is an unnecessary complication, and will cause the compiler to mis-report line numbers if an escaped identifier is terminated by a new line. So I agree with Cary that the changes to this file are not needed.
a\\[^ \t\b\f\r\n]+[ \t\b\f\r\n] { | ||
assert(yylloc.lexical_pos != UINT_MAX); | ||
yylloc.lexical_pos += 1; | ||
yylval.text = strdupnew(yytext+1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was the only use of strdupnew()
, so it will need to be removed to keep the code warning-free.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, this comment should have been attached to the similar line in vvp/lexor.lex.
// Handle escape sequences | ||
src++; // skip the backslash | ||
switch (*src) { | ||
case '\\': *dst++ = '\\'; break; // \\ -> \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With gcc 12 I get the following warning:
../../source/vvp/lexor.lex:82:53: warning: multi-line comment [-Wcomment]
which is because it has treated the final \
as a line continuation character. And indeed, testing shows that the following case is not handled.
case '"': *dst++ = '"'; break; // \" -> " | ||
case 'n': *dst++ = '\n'; break; // \n -> newline | ||
case 't': *dst++ = '\t'; break; // \t -> tab | ||
case 'r': *dst++ = '\r'; break; // \r -> carriage return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As per my previous comments, I don't think we need to handle \n
, \t
. and \r
.
This tightly follows the IEEE 1800 Standard.
for the dff.sv:
we get:
Fixes #1273