cparse

cparse is an LR(1) and LALR(1) parser generator for C. It consumes a grammar written in an intuitive textual format and, together with the clex lexer generator, lets you build parsers that can both validate and produce parse trees for input programs.

Highlights

Modernised LR(1) and LALR(1) construction written in portable C11.
Predictable performance via dynamic data structures instead of fixed-size arrays.
First/Follow set computation with useful diagnostics for malformed grammars.
Straightforward runtime API (cparseAccept, cparse) that produces parse trees.
Ships with a tiny test suite (make test) and a static library build (libcparse.a).

Getting Started

git clone https://github.com/h2337/cparse.git
cd cparse
git submodule update --init --recursive
make test   # builds libcparse.a, the test binary, and runs the tests

This project depends only on a C11 compiler. The bundled clex submodule is used for lexical analysis in the examples and tests.

Quick Example

#include "cparse.h"
#include "clex/clex.h"

// Token kinds supplied to clex
enum {
  TOK_RETURN,
  TOK_IDENTIFIER,
  TOK_SEMICOLON,
};

static const char *token_names[] = {
  [TOK_RETURN] = "RETURN",
  [TOK_IDENTIFIER] = "IDENTIFIER",
  [TOK_SEMICOLON] = "SEMICOL",
};

int main(void) {
  clexLexer *lexer = clexInit();
  clexRegisterKind(lexer, "return", TOK_RETURN);
  clexRegisterKind(lexer, "[a-zA-Z_]([a-zA-Z_]|[0-9])*", TOK_IDENTIFIER);
  clexRegisterKind(lexer, ";", TOK_SEMICOLON);

  const char *grammar_src =
      "S -> A IDENTIFIER SEMICOL\n"
      "A -> RETURN";

  Grammar *grammar = cparseGrammar(grammar_src);
  LALR1Parser *parser = cparseCreateLALR1Parser(grammar, lexer, token_names);

  if (cparseAccept(parser, "return answer;")) {
    ParseTreeNode *root = cparse(parser, "return answer;");
    /* ... consume parse tree ... */
    cparseFreeParseTree(root);
  }

  cparseFreeParser(parser);
  cparseFreeGrammar(grammar);
  clexLexerDestroy(lexer);
  return 0;
}

Runtime API

Function	Description
`Grammar cparseGrammar(const char grammar_source)`	Parse a grammar description into an internal representation.
`LR1Parser cparseCreateLR1Parser(Grammar , clexLexer , const char const *token_names)`	Build an LR(1) parser (the lexer remains owned by the caller).
`LALR1Parser cparseCreateLALR1Parser(Grammar , clexLexer , const char const *token_names)`	Build an LALR(1) parser by merging LR(1) states (caller retains lexer ownership).
`bool cparseAccept(LR1Parser , const char input)`	Check whether the input belongs to the grammar.
`ParseTreeNode cparse(LR1Parser , const char *input)`	Parse input and return the parse tree (or `NULL` on failure).
`void cparseFreeParseTree(ParseTreeNode *)`	Recursively release a parse tree allocated by `cparse`.
`void cparseFreeParser(LR1Parser *)`	Release parser state (works for LALR parsers as well).
`void cparseFreeGrammar(Grammar *)`	Release grammar data structures.

A parse tree node stores the grammar symbol in value, the matched token (for terminals) in token, and its children in children.

Building

make or make tests builds libcparse.a, the supporting objects, and the tests binary.
make test runs the regression tests.
make examples builds the sample programs under examples/.
make clean removes build artefacts.

You can link libcparse.a into your own project together with clex/clex.o and clex/fa.o, or embed the sources directly.

Examples

The examples/ directory contains small, self-contained programs that exercise the library. After running make examples you can try the expression parser:

./examples/expr_parser "8 + 5 * 2"

The example registers a handful of tokens, builds an LALR(1) grammar for arithmetic expressions, validates the input, and prints the resulting parse tree.

Grammar Format

Each line describes a production:

NonTerminal -> alternative1 | alternative2 | ...

Tokens are whitespace separated.
Use epsilon to denote an empty production.
Lines beginning with # are treated as comments.

For example:

S -> A IDENTIFIER SEMICOL
A -> RETURN | epsilon

Development Notes

The implementation avoids fixed limits and lazy strtok() parsing, making it suitable for larger grammars.
First and Follow sets are computed iteratively; unexpected productions emit diagnostics on stderr.
The repository includes a tiny test harness in tests.c. Extend it with grammar-specific checks as needed.
Parsers borrow the clexLexer you pass; create it up front, register token kinds, and destroy it once you are done with parsing.

License

Distributed under the terms of the MIT License. See LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 69 Commits
clex @ dfb0789		clex @ dfb0789
examples		examples
tools		tools
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
cparse.h		cparse.h
format.sh		format.sh
grammar.c		grammar.c
grammar.h		grammar.h
logo.svg		logo.svg
lr1_lalr1.c		lr1_lalr1.c
lr1_lalr1.h		lr1_lalr1.h
tests.c		tests.c
util.c		util.c
util.h		util.h

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

cparse

Highlights

Getting Started

Quick Example

Runtime API

Building

Examples

Grammar Format

Development Notes

License

About

Uh oh!

Releases

Packages

Languages

License

h2337/cparse

Folders and files

Latest commit

History

Repository files navigation

cparse

Highlights

Getting Started

Quick Example

Runtime API

Building

Examples

Grammar Format

Development Notes

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages