regex.zig provides a native regular expression engine for Zig. It guarantees
worst-case O(m * n) search time, where m is proportional to the size of the
regex and n is proportional to the size of the input being searched. Certain
Perl/PCRE features are omitted, most notably backreferences and arbitrary
lookahead or lookbehind assertions.
This project is pre-1.0. The library is already usable, but syntax coverage, compile flags, API ergonomics, and performance features are still evolving quickly.
Zig does not yet have an established native regex engine. This is also a way for me to learn and test Zig's design and philosophy on a more serious project.
The current implementation includes:
- a Pike VM execution engine
- literals, concatenation, alternation
- capturing groups, non-capturing groups, and named captures
- repetition operators (
?,*,+,{m},{m,},{m,n}) including lazy forms - Perl classes (
\d,\w,\s) and bracket classes (including POSIX classes) - ASCII escapes including C-style escapes and
\xNN - assertions and boundaries (
^,$,\A,\z,\b,\B) - inline flags:
- global flags
(?imsUu) - scoped flags
(?i:...) - flag toggles
(?i-m:...)
- global flags
- compile options via
Regex.compile(..., .{ .syntax = ... })for default syntax flags:- case-insensitive (
i) - multi-line (
m) - dot-matches-new-line (
s) - swap-greed (
U) - Unicode scalar mode (
u)
- case-insensitive (
- leftmost-first search semantics
The engine is byte-oriented and has some support for Unicode. See docs/unicode.md for the current Unicode behavior and docs/supported-syntax.md for the syntax support.
Fetch the dependency:
zig fetch --save git+https://github.com/quangd42/regex.zig.gitWire it into your build.zig:
const regex_dep = b.dependency("regex", .{
.target = target,
.optimize = optimize,
});
exe.root_module.addImport("regex", regex_dep.module("regex"));The package name is regex, as declared in build.zig.zon.
The three main search APIs do different amounts of work:
match()answers only "does this regex match anywhere?" and is the cheapest query.find()returns the start/end of the leftmost match and tracks only group 0.findCaptures()returns subgroup locations and does the full capture-slot work.
If you only need a boolean, prefer match(). If you only need the match span,
prefer find(). Use findCaptures() only when you actually need subgroup
locations.
findCaptures() returns capture data from the most recent search on a Regex.
That capture data becomes invalid after the next search on the same Regex.
If you need it to survive later searches, copy it into your own storage with
Captures.copy(dest) - see examples.
Use findAll() to iterate over the successive non-overlapping match spans you would get
from repeated find(). findAllCaptures() works similarly, but each match contains full
capture data, which becomes invalid after each iteration.
If you need to configure bounds or anchoring, use the corresponding *In API with
Regex.Input:
const anchored = Regex.Input.init(haystack, .{ .start = 3, .anchored = true });
const anchored_found = re.findIn(anchored);For unanchored searches, the engine also uses a small literal-prefix fast path when the pattern begins with a required literal byte.
For a runnable demo, see src/main.zig.
const std = @import("std");
const Regex = @import("regex");
pub fn main() !void {
const gpa = std.heap.page_allocator;
var re = try Regex.compile(gpa, "(\\d\\d)/(\\d\\d)/(\\d\\d\\d\\d)", .{});
defer re.deinit();
const haystack = "date=03/18/2026";
// Is there a match?
std.debug.print("match? {}\n", .{re.match(haystack)});
// Where is the match?
if (re.find(haystack)) |m| {
std.debug.print("match at [{}, {})\n", .{ m.start, m.end });
}
// Where is the match in the given search window?
if (re.findIn(.init(haystack, .{ .start = 5, .end = 15 }))) |m| {
std.debug.print("bounded match at [{}, {})\n", .{ m.start, m.end });
}
// Where is the match and where are its capture groups?
if (re.findCaptures(haystack)) |caps| {
std.debug.print("month: {s}\n", .{caps.get(2).?.bytes(haystack)});
std.debug.print("year: {s}\n", .{caps.get(3).?.bytes(haystack)});
var buf: [8]?Regex.Match = undefined;
const stored = caps.copy(&buf);
_ = re.find("something else");
std.debug.print("copied full match: {s}\n", .{stored[0].?.bytes(haystack)});
}
}const std = @import("std");
const Regex = @import("regex");
pub fn main() !void {
const gpa = std.heap.page_allocator;
// Iterate over all non-overlapping matches with `findAll()`:
{
const haystack = "Hello World, Alice and Bob";
var re = try Regex.compile(gpa, "[A-Z][a-z]+", .{});
defer re.deinit();
var iter = re.findAll(haystack);
while (iter.next()) |m| {
std.debug.print("{s} [{}, {})\n", .{ m.bytes(haystack), m.start, m.end });
}
}
// Iterate over all matches with capture groups using `findAllCaptures()`:
// Each `Captures` yielded by `findAllCaptures()` is invalidated by the next
// `next()` call on that iterator. Use `Captures.copy(dest)` if you need to keep
// capture data after advancing.
{
const haystack = "x=12 y=34";
var re = try Regex.compile(gpa, "(?<key>\\w+)=(?<value>\\d+)", .{});
defer re.deinit();
var iter = re.findAllCaptures(haystack);
while (iter.next()) |caps| {
std.debug.print("{s} -> {s}\n", .{
caps.name("key").?.bytes(haystack),
caps.name("value").?.bytes(haystack),
});
}
}
}Compile options let you set default syntax flags up front. Inline flags inside the pattern can still override them:
const std = @import("std");
const Regex = @import("regex");
pub fn main() !void {
const gpa = std.heap.page_allocator;
var re = try Regex.compile(gpa, "abc", .{
.syntax = .{ .case_insensitive = true },
});
defer re.deinit();
std.debug.assert(re.match("ABC"));
}See:
- docs/optimizations.md for current compile-time and runtime optimizations
- docs/supported-syntax.md for the syntax support entrypoint
- docs/testing.md for the corpus/testing setup
- docs/unicode.md for Unicode mode behavior and current limitations
This project is available under either of:
- Apache License, Version 2.0, in LICENSE-APACHE
- MIT license, in LICENSE-MIT
You may choose either license.