Skip to content

feat: Migrate JSON parser from nlohmann/json to simdjson #1

@scc-tw

Description

@scc-tw

Labels: good first issue help wanted C++ performance build
Difficulty: easy → medium (self-contained)
Mentors: @maintainers will review & guide


Why this matters

We only use nlohmann::json in one place: src/json_deserializer.cpp. We don’t re-serialize or edit JSON—we just deserialize into typed C++ structs. Replacing that implementation with simdjson (On-Demand API) improves parsing speed and memory use, and keeps JSON details out of public headers.


What you’ll do

  1. Replace the internals of src/json_deserializer.cpp to parse with simdjson On-Demand instead of nlohmann::json.
  2. Preserve public behavior (same returned structs; same observable error semantics or clearly mapped equivalents).
  3. Update the build to add simdjson as a private dependency and remove nlohmann/json if it’s no longer needed.
  4. Add or adjust a few tests (happy path + edge cases).

Scope is intentionally small: one source file + minimal build/test tweaks.


Getting started

1) Build the project

Follow our build guide: https://cycraft-corp.github.io/hakka_json/Architecture/Building/

2) Add simdjson to the build

  • Prefer find_package(simdjson CONFIG); fall back to FetchContent with a pinned version if the package isn’t available.
  • We use Conan 2. Please add simdjson as a private requirement so downstream users don’t need to know about it.

CMake example

find_package(simdjson CONFIG QUIET)
if (NOT simdjson_FOUND)
  include(FetchContent)
  FetchContent_Declare(
    simdjson
    GIT_REPOSITORY https://github.com/simdjson/simdjson.git
    GIT_TAG v3.11.5 # pin a version
  )
  FetchContent_MakeAvailable(simdjson)
endif()

# Replace 'hakka_json_core' with the target that owns json_deserializer.cpp
target_link_libraries(hakka_json_core PRIVATE simdjson::simdjson)

Conan example (conanfile)

[requires]
simdjson/4.0.7

[generators]
CMakeDeps
CMakeToolchain

Keep simdjson PRIVATE on our targets; don’t expose its headers in PUBLIC interfaces.

3) Implement the migration (inside src/json_deserializer.cpp)

Use simdjson’s On-Demand API to parse directly from the input buffer and populate our C++ structs. Avoid building a DOM unless absolutely necessary.

4) Map errors clearly

Translate parsing errors to our project error codes (or to the current exception scheme, if that’s what the API uses). Suggested mapping:

simdjson source Project error code Notes
UTF8_ERROR HAKKA_JSON_UTF8_ERROR Invalid UTF-8 in input
NUMBER_OUT_OF_RANGE HAKKA_JSON_NUMBER_OUT_OF_RANGE Numeric overflow/underflow
DEPTH_ERROR HAKKA_JSON_RECURSION_DEPTH_EXCEEDED Too deeply nested
NO_SUCH_FIELD, INCORRECT_TYPE, etc. HAKKA_JSON_INVALID_JSON (or specific) Missing/incorrect type/structure issues
any other parse error HAKKA_JSON_INVALID_JSON Fallback

Tests to add/run

Create or update tests that feed real-world samples directly into the deserializer:

  • ✅ Valid minimal payload → struct populated
  • ✅ Missing required field → appropriate error
  • ✅ Wrong type (string vs number, etc.) → appropriate error
  • ✅ Large integer boundary → error (or clamped) per current behavior
  • ✅ Non-UTF-8 input → HAKKA_JSON_UTF8_ERROR
  • ✅ Very large but valid document (performance sanity)

Run them with:

cd build
ctest --C Debug

Acceptance criteria

  • src/json_deserializer.cpp no longer includes nlohmann/json.hpp and uses simdjson On-Demand.
  • Public behavior is preserved (same structs; same observable error cases).
  • Tests pass locally and in CI.
  • Build finds/links simdjson privately; nlohmann/json is removed if unused elsewhere.

Submission checklist (PR)

  • Focused PR touching src/json_deserializer.cpp (+ build/test files as needed)
  • Clear description of what changed and why
  • Notes on error-mapping decisions (if any)
  • (Optional) A quick before/after parse-time note

Where to ask for help

Open a draft PR or start a discussion—maintainers are happy to help you land it. Please be kind and follow our Code of Conduct. 🙏


Pointers & references


Thanks for considering this contribution—your first PR here can make the project faster for everyone. 🚀

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions