Skip to content

quangd42/regex.zig

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

54 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

regex.zig

CI License: MIT OR Apache-2.0

regex.zig provides a native regular expression engine for Zig. It guarantees worst-case O(m * n) search time, where m is proportional to the size of the regex and n is proportional to the size of the input being searched. Certain Perl/PCRE features are omitted, most notably backreferences and arbitrary lookahead or lookbehind assertions.

Status

This project is pre-1.0. The library is already usable, but syntax coverage, compile flags, API ergonomics, and performance features are still evolving quickly.

Why

Zig does not yet have an established native regex engine. This is also a way for me to learn and test Zig's design and philosophy on a more serious project.

Features

The current implementation includes:

  • a Pike VM execution engine
  • literals, concatenation, alternation
  • capturing groups, non-capturing groups, and named captures
  • repetition operators (?, *, +, {m}, {m,}, {m,n}) including lazy forms
  • Perl classes (\d, \w, \s) and bracket classes (including POSIX classes)
  • ASCII escapes including C-style escapes and \xNN
  • assertions and boundaries (^, $, \A, \z, \b, \B)
  • inline flags:
    • global flags (?imsUu)
    • scoped flags (?i:...)
    • flag toggles (?i-m:...)
  • compile options via Regex.compile(..., .{ .syntax = ... }) for default syntax flags:
    • case-insensitive (i)
    • multi-line (m)
    • dot-matches-new-line (s)
    • swap-greed (U)
    • Unicode scalar mode (u)
  • leftmost-first search semantics

The engine is byte-oriented and has some support for Unicode. See docs/unicode.md for the current Unicode behavior and docs/supported-syntax.md for the syntax support.

Using the Package

Fetch the dependency:

zig fetch --save git+https://github.com/quangd42/regex.zig.git

Wire it into your build.zig:

const regex_dep = b.dependency("regex", .{
    .target = target,
    .optimize = optimize,
});

exe.root_module.addImport("regex", regex_dep.module("regex"));

The package name is regex, as declared in build.zig.zon.

Choosing the Query API

The three main search APIs do different amounts of work:

  • match() answers only "does this regex match anywhere?" and is the cheapest query.
  • find() returns the start/end of the leftmost match and tracks only group 0.
  • findCaptures() returns subgroup locations and does the full capture-slot work.

If you only need a boolean, prefer match(). If you only need the match span, prefer find(). Use findCaptures() only when you actually need subgroup locations.

findCaptures() returns capture data from the most recent search on a Regex. That capture data becomes invalid after the next search on the same Regex. If you need it to survive later searches, copy it into your own storage with Captures.copy(dest) - see examples.

Use findAll() to iterate over the successive non-overlapping match spans you would get from repeated find(). findAllCaptures() works similarly, but each match contains full capture data, which becomes invalid after each iteration.

If you need to configure bounds or anchoring, use the corresponding *In API with Regex.Input:

const anchored = Regex.Input.init(haystack, .{ .start = 3, .anchored = true });
const anchored_found = re.findIn(anchored);

For unanchored searches, the engine also uses a small literal-prefix fast path when the pattern begins with a required literal byte.

Examples

For a runnable demo, see src/main.zig.

const std = @import("std");
const Regex = @import("regex");

pub fn main() !void {
    const gpa = std.heap.page_allocator;

    var re = try Regex.compile(gpa, "(\\d\\d)/(\\d\\d)/(\\d\\d\\d\\d)", .{});
    defer re.deinit();

    const haystack = "date=03/18/2026";

    // Is there a match?
    std.debug.print("match? {}\n", .{re.match(haystack)});

    // Where is the match?
    if (re.find(haystack)) |m| {
        std.debug.print("match at [{}, {})\n", .{ m.start, m.end });
    }

    // Where is the match in the given search window?
    if (re.findIn(.init(haystack, .{ .start = 5, .end = 15 }))) |m| {
        std.debug.print("bounded match at [{}, {})\n", .{ m.start, m.end });
    }

    // Where is the match and where are its capture groups?
    if (re.findCaptures(haystack)) |caps| {
        std.debug.print("month: {s}\n", .{caps.get(2).?.bytes(haystack)});
        std.debug.print("year: {s}\n", .{caps.get(3).?.bytes(haystack)});

        var buf: [8]?Regex.Match = undefined;
        const stored = caps.copy(&buf);

        _ = re.find("something else");
        std.debug.print("copied full match: {s}\n", .{stored[0].?.bytes(haystack)});
    }
}
const std = @import("std");
const Regex = @import("regex");

pub fn main() !void {
    const gpa = std.heap.page_allocator;
    // Iterate over all non-overlapping matches with `findAll()`:
    {
        const haystack = "Hello World, Alice and Bob";

        var re = try Regex.compile(gpa, "[A-Z][a-z]+", .{});
        defer re.deinit();

        var iter = re.findAll(haystack);
        while (iter.next()) |m| {
            std.debug.print("{s} [{}, {})\n", .{ m.bytes(haystack), m.start, m.end });
        }
    }
    // Iterate over all matches with capture groups using `findAllCaptures()`:
    // Each `Captures` yielded by `findAllCaptures()` is invalidated by the next
    // `next()` call on that iterator. Use `Captures.copy(dest)` if you need to keep
    // capture data after advancing.
    {
        const haystack = "x=12 y=34";

        var re = try Regex.compile(gpa, "(?<key>\\w+)=(?<value>\\d+)", .{});
        defer re.deinit();

        var iter = re.findAllCaptures(haystack);
        while (iter.next()) |caps| {
            std.debug.print("{s} -> {s}\n", .{
                caps.name("key").?.bytes(haystack),
                caps.name("value").?.bytes(haystack),
            });
        }
    }
}

Compile options let you set default syntax flags up front. Inline flags inside the pattern can still override them:

const std = @import("std");
const Regex = @import("regex");

pub fn main() !void {
    const gpa = std.heap.page_allocator;

    var re = try Regex.compile(gpa, "abc", .{
        .syntax = .{ .case_insensitive = true },
    });
    defer re.deinit();

    std.debug.assert(re.match("ABC"));
}

Documentation

See:

License

This project is available under either of:

You may choose either license.

About

Zig regular expression engine, with guaranteed linear time matching.

Topics

Resources

License

Apache-2.0, MIT licenses found

Licenses found

Apache-2.0
LICENSE-APACHE
MIT
LICENSE-MIT

Stars

Watchers

Forks

Releases

No releases published

Contributors