cparse is an LR(1) and LALR(1) parser generator for C. It consumes a grammar written in an intuitive textual format and, together with the clex lexer generator, lets you build parsers that can both validate and produce parse trees for input programs.
- Modernised LR(1) and LALR(1) construction written in portable C11.
- Predictable performance via dynamic data structures instead of fixed-size arrays.
- First/Follow set computation with useful diagnostics for malformed grammars.
- Straightforward runtime API (cparseAccept,cparse) that produces parse trees.
- Ships with a tiny test suite (make test) and a static library build (libcparse.a).
git clone https://github.com/h2337/cparse.git
cd cparse
git submodule update --init --recursive
make test   # builds libcparse.a, the test binary, and runs the testsThis project depends only on a C11 compiler. The bundled clex submodule is used for lexical analysis in the examples and tests.
#include "cparse.h"
#include "clex/clex.h"
// Token kinds supplied to clex
enum {
  TOK_RETURN,
  TOK_IDENTIFIER,
  TOK_SEMICOLON,
};
static const char *token_names[] = {
  [TOK_RETURN] = "RETURN",
  [TOK_IDENTIFIER] = "IDENTIFIER",
  [TOK_SEMICOLON] = "SEMICOL",
};
int main(void) {
  clexLexer *lexer = clexInit();
  clexRegisterKind(lexer, "return", TOK_RETURN);
  clexRegisterKind(lexer, "[a-zA-Z_]([a-zA-Z_]|[0-9])*", TOK_IDENTIFIER);
  clexRegisterKind(lexer, ";", TOK_SEMICOLON);
  const char *grammar_src =
      "S -> A IDENTIFIER SEMICOL\n"
      "A -> RETURN";
  Grammar *grammar = cparseGrammar(grammar_src);
  LALR1Parser *parser = cparseCreateLALR1Parser(grammar, lexer, token_names);
  if (cparseAccept(parser, "return answer;")) {
    ParseTreeNode *root = cparse(parser, "return answer;");
    /* ... consume parse tree ... */
    cparseFreeParseTree(root);
  }
  cparseFreeParser(parser);
  cparseFreeGrammar(grammar);
  clexLexerDestroy(lexer);
  return 0;
}| Function | Description | 
|---|---|
| Grammar *cparseGrammar(const char *grammar_source) | Parse a grammar description into an internal representation. | 
| LR1Parser *cparseCreateLR1Parser(Grammar *, clexLexer *, const char *const *token_names) | Build an LR(1) parser (the lexer remains owned by the caller). | 
| LALR1Parser *cparseCreateLALR1Parser(Grammar *, clexLexer *, const char *const *token_names) | Build an LALR(1) parser by merging LR(1) states (caller retains lexer ownership). | 
| bool cparseAccept(LR1Parser *, const char *input) | Check whether the input belongs to the grammar. | 
| ParseTreeNode *cparse(LR1Parser *, const char *input) | Parse input and return the parse tree (or NULLon failure). | 
| void cparseFreeParseTree(ParseTreeNode *) | Recursively release a parse tree allocated by cparse. | 
| void cparseFreeParser(LR1Parser *) | Release parser state (works for LALR parsers as well). | 
| void cparseFreeGrammar(Grammar *) | Release grammar data structures. | 
A parse tree node stores the grammar symbol in value, the matched token (for terminals) in token, and its children in children.
- makeor- make testsbuilds- libcparse.a, the supporting objects, and the- testsbinary.
- make testruns the regression tests.
- make examplesbuilds the sample programs under- examples/.
- make cleanremoves build artefacts.
You can link libcparse.a into your own project together with clex/clex.o and clex/fa.o, or embed the sources directly.
The examples/ directory contains small, self-contained programs that exercise the library. After running make examples you can try the expression parser:
./examples/expr_parser "8 + 5 * 2"The example registers a handful of tokens, builds an LALR(1) grammar for arithmetic expressions, validates the input, and prints the resulting parse tree.
Each line describes a production:
NonTerminal -> alternative1 | alternative2 | ...
- Tokens are whitespace separated.
- Use epsilonto denote an empty production.
- Lines beginning with #are treated as comments.
For example:
S -> A IDENTIFIER SEMICOL
A -> RETURN | epsilon
- The implementation avoids fixed limits and lazy strtok()parsing, making it suitable for larger grammars.
- First and Follow sets are computed iteratively; unexpected productions emit diagnostics on stderr.
- The repository includes a tiny test harness in tests.c. Extend it with grammar-specific checks as needed.
- Parsers borrow the clexLexeryou pass; create it up front, register token kinds, and destroy it once you are done with parsing.
Distributed under the terms of the MIT License. See LICENSE.