-
I have project I'm trying to analyzing using GumTree. The project has the file encoded in ISO-8859-1. When I try to run
I realize that the file should be probably encoded in UTF-8, since it's the default encoding in my Linux system. However this is not true for the project repository. That's why I wonder whether such encoding issues should be handle in GumTree or outside of it. |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments
-
Charset can be specified with ReaderConfigurator However, it is unused in Gumtree itself. It is possible to detect automatically the charset of input file, however, theoretically it's impossible to 100% determine the correct charset (see libs and discussions here: https://stackoverflow.com/questions/499010/java-how-to-determine-the-correct-charset-encoding-of-a-stream). |
Beta Was this translation helpful? Give feedback.
-
Exact! It could be doable to supply it via the command line, but it would be applied to all generators so if the repo mixes two charsets it won't be ideal, but it's better than nothing. |
Beta Was this translation helpful? Give feedback.
Charset can be specified with ReaderConfigurator
charset()
:gumtree/core/src/main/java/com/github/gumtreediff/gen/TreeGenerator.java
Lines 69 to 72 in 353bfc9
However, it is unused in Gumtree itself. It is possible to detect automatically the charset of input file, however, theoretically it's impossible to 100% determine the correct charset (see libs and discussions here: https://stackoverflow.com/questions/499010/java-how-to-determine-the-correct-charset-encoding-of-a-stream).