Skip to content

Conversation

@kaby76
Copy link
Contributor

@kaby76 kaby76 commented Nov 21, 2025

NB: This PR is still in progress. I am currently debugging the gcc compiler (actually cc1) to understand the parser flow and the grammar it implements, specifically for the gcc extensions. The old gcc extensions currently in the c grammar are incorrect.

This PR brings the C grammar up to date with the latest ISO Specification. It also fixes several critical bugs in the correctness of the parse with respect to types.

As noted in several reported Issues for c, several examples were parsed incorrectly. This was because the EBNF in the ISO Specification, which this grammar is based on, is ambiguous. To disambiguate the grammar, I added a symbol table to the grammar's actions.

At this point, any C code input must be pre-processed so that any and all definitions are included in the input. The parser must have this information to perform type checking and select the correct parse.

Fixed

Already fixed. Please close.

* Split grammar.
* Add CSharp base class for parser.
* Add missing lexer rules for string literals.
* Limit targets to just CSharp for now.
* Add disambiguation predicates, symbol table.
@kaby76
Copy link
Contributor Author

kaby76 commented Nov 22, 2025

BS! Some of the examples in the test suite are not strict ISO C. For example, this should not parse.

int p1, p2;
int a2(int param1, param2);

But, it does!

$ gcc -c -std=c24 p.c
gcc.exe: error: unrecognized command-line option '-std=c24'; did you mean '-std=c2x'?
11/22-08:50:09 ~/issues/g4-current/c/Generated-CSharp
$ gcc -c -std=c2x p.c
p.c:2:20: error: unknown type name 'param2'
    2 | int a2(int param1, param2);
      |                    ^~~~~~
11/22-08:50:17 ~/issues/g4-current/c/Generated-CSharp
$ cat p.c
int p1, p2;
int a2(int param1, param2);
11/22-08:51:49 ~/issues/g4-current/c/Generated-CSharp
$ ./bin/Debug/net8.0/Test.exe p.c
CSharp 0 p.c success 0.0453763
Total Time: 0.1107057
11/22-08:52:29 ~/issues/g4-current/c/Generated-CSharp
$

It can't parse because a declarator must include a type, as with int param2. int param1, param2 currently parses because param2 is parsed as a type id, as though you were give it int param1, int. We cannot at the same time ignore this, and add a symbol table to solve #4676.

So, several of the examples provided should not parse, but they do.

The example did not parse with "gcc -c -std=c2x". It's a requirement because a symbol table in needed to disambiguate the grammar, and with the symbol table, this can't parse!
Examples must parse with new symbol table implementation.
@kaby76 kaby76 changed the title [c] Fix for #4676. [c] Fixed numerous issues with the C grammar. Nov 23, 2025
@kaby76 kaby76 changed the title [c] Fixed numerous issues with the C grammar. [c] Bringing the C grammar up to date with the latest ISO Spec, and fixing various issues. Nov 25, 2025
@kaby76 kaby76 changed the title [c] Bringing the C grammar up to date with the latest ISO Spec, and fixing various issues. [c] Update the C grammar to the latest ISO Spec, and fixing various issues. Nov 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant