The ooSimple-Transpiler is an educational project that takes a custom object-oriented programming language
(ooSimple
) and transpiles it into equivalent C code.
It is implemented using the ANTLR4 framework and demonstrates the key stages of language processing:
parsing, semantic analysis with symbol tables, and structured code generation.
Unlike a traditional compiler that outputs machine code, this project focuses on source-to-source translation. The transpiler allows developers to write in an object-oriented style and obtain readable, valid C programs, bridging high-level abstractions with a lower-level language.
Key highlights of this project include:
- Custom Object-Oriented Language: The
ooSimple
language supports classes, methods, inheritance, variables, and control flow constructs. - ANTLR4 Integration: Grammar-driven parsing using ANTLR4.
- Semantic Validation: Symbol table construction and type checking to ensure correctness.
- Readable C Code Output: Generates understandable C code, making the project ideal for learning both compiler principles and language translation.
ANTLR4 (Another Tool for Language Recognition) is a parser generator used to build compilers, interpreters, and translators for custom languages. In this project, it is used to define and process the ooSimple
language. Key aspects include:
-
Grammar Definition: The user defines the grammar of the target language in a
.g4
file. This grammar specifies the rules for tokenizing and parsing the input.- Example:
ooSimpleGrammar.g4
describes the structure of theooSimple
language. - Grammar rules are categorized into lexer rules (for tokens) and parser rules (for syntax structure).
- Example:
-
Lexer Generation: ANTLR4 generates a lexer from the grammar file. The lexer scans the input code and breaks it into tokens based on the rules defined in the grammar.
- Tokens are the smallest meaningful units like keywords (
class
), identifiers (myVar
), or symbols ({
,}
). - The lexer uses regular expressions defined in the grammar to recognize these tokens.
- Tokens are the smallest meaningful units like keywords (
-
Parser Generation: ANTLR4 generates a parser that organizes the tokens into a structured hierarchy according to the grammar rules.
- The parser checks the syntax of the token stream, ensuring it follows the defined grammar.
- Errors in syntax are reported during this stage.
- The output of this step is the parse tree, which represents the logical structure of the input.
-
Parse Tree Construction: The parse tree is a hierarchical representation of the input, where each node corresponds to a rule or token.
- Root Node: Represents the starting rule defined in the grammar.
- Child Nodes: Represent the components or subrules of the parent node.
- Leaf Nodes: Represent individual tokens (e.g., keywords or literals).
-
Listener or Visitor Execution: Custom logic can be implemented by traversing the parse tree using Listeners or Visitors to perform tasks like semantic analysis or code generation.
- Visitors provide a more manual approach, offering greater control over tree traversal.
The parse tree is central to how ANTLR4 processes input. Here's how it works:
- Creation: ANTLR4 builds the parse tree as the parser processes the input code. Each node in the tree corresponds to a specific grammar rule or token.
- Structure: The root node represents the starting grammar rule, and child nodes represent the components of that rule. Leaf nodes are individual tokens.
- Traversal: The parse tree can be traversed automatically using Listeners or manually using Visitors.
- Purpose: The parse tree serves as a blueprint for further processing, like validating semantics or generating code.
Listeners are a mechanism provided by ANTLR4 to traverse the parse tree. They automatically respond to events triggered during tree traversal, such as entering or exiting a node.
- Built-In Methods:
enterRule
: Called when the parser enters a specific grammar rule.exitRule
: Called when the parser exits that rule.
- Custom Implementation:
- Users extend the base listener class to add logic for handling specific nodes in the tree.
- Example: In this project, the
SymbolicTablePhase
listener constructs and validates the symbol table.
Listeners interact directly with the parse tree to process the input. Here's how:
- Invocation: As the parse tree is traversed, ANTLR4 automatically invokes the appropriate listener methods (e.g.,
enterRule
andexitRule
) for each node. - Custom Logic Execution: Listeners implement custom actions for each relevant node, such as:
- Building a symbol table (e.g.,
SymbolicTablePhase
in this project). - Generating target code (e.g.,
CodeGenerator
).
- Building a symbol table (e.g.,
- Event-Driven Processing: The listener pattern allows event-driven processing, enabling modular and efficient handling of grammar rules.
- Efficiency: Listeners reduce the complexity of manual tree traversal by focusing on specific nodes, allowing modular and organized logic.
- Integration: Each listener method corresponds to a specific rule in the grammar, ensuring a tight connection between the grammar's design and the tree's processing.
By combining the parse tree's hierarchical structure with listeners' event-driven approach, ANTLR4 provides a powerful and flexible framework for language processing.
The ooSimple
programming language is a custom object-oriented language designed with the following key features:
- Classes are declared using the keyword
class
, followed by the class name. - Multiple Inheritance: Supported using the keyword
inherits
. A class may inherit from one or more parent classes, allowing complex hierarchies. - Example:
class MyClass inherits ParentClass1, ParentClass2: ...
- Methods are defined using the keyword
def
, followed by the method name and parameters. - Methods can return values or modify the object's state.
- The
self
keyword is used as the first parameter to refer to the current instance. - The return type is specified after a colon
:
. If the method does not return a value, a dash (-
) is used to indicatevoid
. - Example:
def calculate(self, value): int return value * 2;
- Constructors are special methods named
__init__
and are used to initialize object properties. - Example:
def __init__ClassName(self, name): int self.name = name; ;
- A special
main
class can be defined to contain the entry point of the program. - Example:
class Main: def main(self): print "Hello, World!"; ;
- Variables are declared with their types, such as
int
orclass
types. - Multiple variables can be declared in a single line.
- Example:
int a, b, c; int x; int z; ;
- Standard control flow constructs are supported, such as
if-else
,while
, andreturn
. - Example of
if-else
:if (x > 0): print "Positive"; else: print "Non-positive"; endif;
- Example of
while
:while (x > 0): x = x - 1; print x; endwhile;
The ooSimpleGrammar.g4 file describe fully the language structure.
The ooSimple-Transpiler compiler processes the input ooSimple
code in a systematic and multi-step approach to produce equivalent C code. The process includes the following stages:
-
Parse Tree Generation:
- The compiler begins by parsing the
ooSimple
input code using the ANTLR4 framework. - A hierarchical parse tree is created based on the grammar rules defined in
ooSimpleGrammar.g4
. - This tree represents the syntactic structure of the input program, with nodes corresponding to grammar rules or tokens.
- The compiler begins by parsing the
-
First Tree Traversal: Symbol Table Creation:
- During the first traversal of the parse tree, a symbol table is constructed.
- This table stores metadata for program entities, including:
- Classes: Their names, inheritance relationships, and members.
- Methods: Their names, parameter types, return types, and visibility.
- Variables: Their names, types, and scopes.
- Scoping rules are enforced to manage nested blocks and inheritance.
- Semantic checks, such as duplicate declarations or undefined parent classes, are performed.
-
Second Tree Traversal: Code Generation:
- The parse tree is traversed a second time to produce equivalent C code.
- The traversal involves:
- Statement Translation: High-level constructs like
if
,while
, andreturn
are converted to C equivalents. - Variable Resolution: Ensuring all variables and methods are declared in the symbol table before use.
- Output Construction: Sequentially generating lines of valid C code to replicate the behavior of the input program.
- Statement Translation: High-level constructs like
- Type consistency and scoping rules are revalidated to ensure correctness.
-
Error Handling:
- Syntax errors are reported during the parsing stage.
- Semantic errors, such as undeclared variables or type mismatches, are reported during the first and second traversals.
This flow ensures a clear separation of concerns, making the compiler modular, extensible, and easy to debug.
The ooSimple_Compiler runs directly using the run_antlr.sh script, which automates the compilation of ooSimple files into C. While IntelliJ IDEA was used during development, it is not a requirement for running the project.
-
Script Overview:
- The project includes a script named
run_antlr.sh
located in the root directory. - This script automates the compilation process for the
ooSimple
source code.
- The project includes a script named
-
Usage Instructions:
- Open a terminal and navigate to the root directory of the project.
- Run the script using:
./run_antlr.sh
- Ensure the script has executable permissions. If not, make it executable:
chmod +x run_antlr.sh
-
Input Path:
- The script allows you to specify the path of the input
ooSimple
file. Update the input path directly within the script or modify it to accept arguments dynamically.
- The script allows you to specify the path of the input
-
Output Files:
- The generated C files are stored in the
compiler_output
directory within the project. - Ensure this directory exists and has write permissions.
- The generated C files are stored in the
-
Predefined Input Files:
- The project includes a directory named
input_code_files
that contains ready-to-useoos
files for testing.- Subdirectories:
happy_day_scenario
: Contains validoos
files that compile without errors. Examples:Factor.oos
MathOperations.oos
Vehicle.oos
InvalidSyntaxTests
: Containsoss
files designed to trigger specific syntax and semantic errors, testing the compiler's robustness in error handling.undeclaredLocalVariable.oos
assigmentDataTypesConflict.oos
methodWrongParameterType.oos
- Subdirectories:
- The project includes a directory named
- The script leverages the ANTLR4 runtime and ensures all dependencies are configured correctly.
- It compiles the grammar, processes the input file, and generates C code in one step.
GitHub: xrddev
Released under the MIT License. Originally built as part of a university project.