regex.zig

regex.zig provides a native regular expression engine for Zig. It guarantees worst-case O(m * n) search time, where m is proportional to the size of the regex and n is proportional to the size of the input being searched. Certain Perl/PCRE features are omitted, most notably backreferences and arbitrary lookahead or lookbehind assertions.

Status

This project is pre-1.0. The library is already usable, but syntax coverage, compile flags, API ergonomics, and performance features are still evolving quickly.

Why

Zig does not yet have an established native regex engine. This is also a way for me to learn and test Zig's design and philosophy on a more serious project.

Features

The current implementation includes:

a Pike VM execution engine
literals, concatenation, alternation
capturing groups, non-capturing groups, and named captures
repetition operators (?, *, +, {m}, {m,}, {m,n}) including lazy forms
Perl classes (\d, \w, \s) and bracket classes (including POSIX classes)
ASCII escapes including C-style escapes and \xNN
assertions and boundaries (^, $, \A, \z, \b, \B)
inline flags:
- global flags (?imsUu)
- scoped flags (?i:...)
- flag toggles (?i-m:...)
compile options via Regex.compile(..., .{ .syntax = ... }) for default syntax flags:
- case-insensitive (i)
- multi-line (m)
- dot-matches-new-line (s)
- swap-greed (U)
- Unicode scalar mode (u)
leftmost-first search semantics

The engine is byte-oriented and has some support for Unicode. See docs/unicode.md for the current Unicode behavior and docs/supported-syntax.md for the syntax support.

Using the Package

Fetch the dependency:

zig fetch --save git+https://github.com/quangd42/regex.zig.git

Wire it into your build.zig:

const regex_dep = b.dependency("regex", .{
    .target = target,
    .optimize = optimize,
});

exe.root_module.addImport("regex", regex_dep.module("regex"));

The package name is regex, as declared in build.zig.zon.

Choosing the Query API

The three main search APIs do different amounts of work:

match() answers only "does this regex match anywhere?" and is the cheapest query.
find() returns the start/end of the leftmost match and tracks only group 0.
findCaptures() returns subgroup locations and does the full capture-slot work.

If you only need a boolean, prefer match(). If you only need the match span, prefer find(). Use findCaptures() only when you actually need subgroup locations.

findCaptures() returns capture data from the most recent search on a Regex. That capture data becomes invalid after the next search on the same Regex. If you need it to survive later searches, copy it into your own storage with Captures.copy(dest) - see examples.

Use findAll() to iterate over the successive non-overlapping match spans you would get from repeated find(). findAllCaptures() works similarly, but each match contains full capture data, which becomes invalid after each iteration.

If you need to configure bounds or anchoring, use the corresponding *In API with Regex.Input:

const anchored = Regex.Input.init(haystack, .{ .start = 3, .anchored = true });
const anchored_found = re.findIn(anchored);

For unanchored searches, the engine also uses a small literal-prefix fast path when the pattern begins with a required literal byte.

Examples

For a runnable demo, see src/main.zig.

const std = @import("std");
const Regex = @import("regex");

pub fn main() !void {
    const gpa = std.heap.page_allocator;

    var re = try Regex.compile(gpa, "(\\d\\d)/(\\d\\d)/(\\d\\d\\d\\d)", .{});
    defer re.deinit();

    const haystack = "date=03/18/2026";

    // Is there a match?
    std.debug.print("match? {}\n", .{re.match(haystack)});

    // Where is the match?
    if (re.find(haystack)) |m| {
        std.debug.print("match at [{}, {})\n", .{ m.start, m.end });
    }

    // Where is the match in the given search window?
    if (re.findIn(.init(haystack, .{ .start = 5, .end = 15 }))) |m| {
        std.debug.print("bounded match at [{}, {})\n", .{ m.start, m.end });
    }

    // Where is the match and where are its capture groups?
    if (re.findCaptures(haystack)) |caps| {
        std.debug.print("month: {s}\n", .{caps.get(2).?.bytes(haystack)});
        std.debug.print("year: {s}\n", .{caps.get(3).?.bytes(haystack)});

        var buf: [8]?Regex.Match = undefined;
        const stored = caps.copy(&buf);

        _ = re.find("something else");
        std.debug.print("copied full match: {s}\n", .{stored[0].?.bytes(haystack)});
    }
}

const std = @import("std");
const Regex = @import("regex");

pub fn main() !void {
    const gpa = std.heap.page_allocator;
    // Iterate over all non-overlapping matches with `findAll()`:
    {
        const haystack = "Hello World, Alice and Bob";

        var re = try Regex.compile(gpa, "[A-Z][a-z]+", .{});
        defer re.deinit();

        var iter = re.findAll(haystack);
        while (iter.next()) |m| {
            std.debug.print("{s} [{}, {})\n", .{ m.bytes(haystack), m.start, m.end });
        }
    }
    // Iterate over all matches with capture groups using `findAllCaptures()`:
    // Each `Captures` yielded by `findAllCaptures()` is invalidated by the next
    // `next()` call on that iterator. Use `Captures.copy(dest)` if you need to keep
    // capture data after advancing.
    {
        const haystack = "x=12 y=34";

        var re = try Regex.compile(gpa, "(?<key>\\w+)=(?<value>\\d+)", .{});
        defer re.deinit();

        var iter = re.findAllCaptures(haystack);
        while (iter.next()) |caps| {
            std.debug.print("{s} -> {s}\n", .{
                caps.name("key").?.bytes(haystack),
                caps.name("value").?.bytes(haystack),
            });
        }
    }
}

Compile options let you set default syntax flags up front. Inline flags inside the pattern can still override them:

const std = @import("std");
const Regex = @import("regex");

pub fn main() !void {
    const gpa = std.heap.page_allocator;

    var re = try Regex.compile(gpa, "abc", .{
        .syntax = .{ .case_insensitive = true },
    });
    defer re.deinit();

    std.debug.assert(re.match("ABC"));
}

Documentation

See:

docs/optimizations.md for current compile-time and runtime optimizations
docs/supported-syntax.md for the syntax support entrypoint
docs/testing.md for the corpus/testing setup
docs/unicode.md for Unicode mode behavior and current limitations

License

This project is available under either of:

Apache License, Version 2.0, in LICENSE-APACHE
MIT license, in LICENSE-MIT

You may choose either license.

Name		Name	Last commit message	Last commit date
Latest commit History 54 Commits
.github		.github
.zed		.zed
docs		docs
src		src
tests		tests
tools/unicode		tools/unicode
.gitignore		.gitignore
LICENSE-APACHE		LICENSE-APACHE
LICENSE-MIT		LICENSE-MIT
README.md		README.md
build.zig		build.zig
build.zig.zon		build.zig.zon

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

regex.zig

Status

Why

Features

Using the Package

Choosing the Query API

Examples

Documentation

License

About

Licenses found

Uh oh!

Releases

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

regex.zig

Status

Why

Features

Using the Package

Choosing the Query API

Examples

Documentation

License

About

Topics

Resources

License

Licenses found

Uh oh!

Stars

Watchers

Forks

Releases

Contributors

Uh oh!

Languages