-
Notifications
You must be signed in to change notification settings - Fork 3.8k
Description
This is a problem with the recently added antlr-format. The tool has a bug in reaching a fixed point (i.e., if I take version "before" which is outputed by the tool and rerun the tool on it, the tool should return exactly the same file). In addition it removed a comment that should not have been removed. It's essential that the antlr-format tool outputs a format that reliably follows the coding standard.
Here are four successive versions of PlSqlLexer.g4, the first version the initial. With the second through fourth versions, the START_CMD rule was changed by antlr-format, each time to a different format.
-
Version 7fbb97b, committed 4 months ago.
-
Version 7535367, committed 3 weeks ago. This file is the 1st reformat using antlr-format, which was part of the PR to reformatted all the grammars.
-
Version f083ee2. This file is the 2nd reformat using antlr-format, associated with my PR to perform "auto reformat".
-
Version be1d809. This file is the 3rd reformat using antlr-format, committed 12 hours ago, and associated with a PR to modify PlSql.
Each version of the file was altered only by the tool.
Starting with the 1st version of START_CMD:
grammars-v4/sql/plsql/PlSqlLexer.g4
Lines 2457 to 2463 in 7fbb97b
| // TODO: should starts with newline | |
| START_CMD | |
| //: 'STA' 'RT'? SPACE ~('\r' | '\n')* NEWLINE_EOF | |
| // https://docs.oracle.com/cd/B19306_01/server.102/b14357/ch12002.htm | |
| // https://docs.oracle.com/cd/B19306_01/server.102/b14357/ch12003.htm | |
| : '@''@'? | |
| ; |
After 1st application of antlr-format:
grammars-v4/sql/plsql/PlSqlLexer.g4
Lines 2469 to 2474 in 7535367
| // TODO: should starts with newline | |
| START_CMD | |
| //: 'STA' 'RT'? SPACE ~('\r' | '\n')* NEWLINE_EOF | |
| : // https://docs.oracle.com/cd/B19306_01/server.102/b14357/ch12002.htm | |
| '@' '@'? | |
| ; // https://docs.oracle.com/cd/B19306_01/server.102/b14357/ch12003.htm |
The 2nd application of antlr-format (from my PR) reformats it again.
grammars-v4/sql/plsql/PlSqlLexer.g4
Lines 2469 to 2473 in f083ee2
| // TODO: should starts with newline | |
| START_CMD | |
| : // https://docs.oracle.com/cd/B19306_01/server.102/b14357/ch12002.htm | |
| '@' '@'? | |
| ; // https://docs.oracle.com/cd/B19306_01/server.102/b14357/ch12003.htm |
Finally, the last PR applies antlr-format a third time, changing the rule again.
grammars-v4/sql/plsql/PlSqlLexer.g4
Lines 2469 to 2472 in be1d809
| // TODO: should starts with newline | |
| START_CMD: // https://docs.oracle.com/cd/B19306_01/server.102/b14357/ch12002.htm | |
| '@' '@'? | |
| ; // https://docs.oracle.com/cd/B19306_01/server.102/b14357/ch12003.htm |
Analysis
The comment //: 'STA' 'RT'? SPACE ~('\r' | '\n')* NEWLINE_EOF was removed by antlr-format on the 2nd application of antlr-format. The formatter should not be removing information.
Of the 16 grammar files that were reformatted with my PR. f083ee2 https://github.com/antlr/grammars-v4/actions/runs/7242750577/job/19728583870#step:6:11 , most seem to be minor changes. But, the formatter should output a fixed point version of the grammar on first try. Otherwise I will have to repeat the application until it a fixed point is achieved.
I wrote a script to repeatably apply antlr-format until a fixed point is achieved.
#
rm -rf foo
mkdir foo
pushd foo
cp ../PlSqlLexer.g4.7fbb97b PlSqlLexer.g4
i=1
while :
do
echo Iteration $i
cp PlSqlLexer.g4 PlSqlLexer.g4.before
cp PlSqlLexer.g4 PlSqlLexer.g4.after
dos2unix *.g4
antlr-format PlSqlLexer.g4.after 2>&1 1> /dev/null
dos2unix *.g4
diff PlSqlLexer.g4.before PlSqlLexer.g4.after
if [ $? -ne 0 ]
then
echo No fixed point yet.
else
echo Fixed point achieved.
break
fi
cp PlSqlLexer.g4.after PlSqlLexer.g4
i=`expr $i + 1`
done
We don't see the fixed point achieved for PlSqlLexer.g4 until 5 applications of antlr-format.
out.txt
What is more troubling is whether there is a grammar (or many) that has (have) no fixed point at all. In this case the tool always produces a new version ad infinitum.