Skip to content

Conversation

Stranger6667
Copy link
Owner

This is yet another attempt to improve the schema tree layout without going fully into building a compiler & interpreter, as noted in #641 (which will require way more effort).

The core goal is to prevent excessive memory consumption by recursive references, which are evaluated lazily and keep expanding on each deeper recursion cycle (#675).

Solving the above issue should also remove the need to store Registry inside Validator, and, in principle, it should be possible to pass Registry by reference. It will unblock a better API for jsonschema where the user won't need to pass the Registry by value (likely cloning it).

Inspired by changes I made in the css-inline crate a few years ago.

Copy link

codecov bot commented May 22, 2025

Codecov Report

Attention: Patch coverage is 93.38731% with 49 lines in your changes missing coverage. Please review.

Project coverage is 86.14%. Comparing base (44520ad) to head (5e3a3bf).
Report is 1 commits behind head on master.

Files with missing lines Patch % Lines
crates/jsonschema/src/compiler_v2.rs 91.45% 24 Missing ⚠️
crates/jsonschema/src/ir/number.rs 47.22% 19 Missing ⚠️
crates/jsonschema/src/ir/mod.rs 98.87% 4 Missing ⚠️
crates/jsonschema-referencing/src/registry.rs 0.00% 1 Missing ⚠️
crates/jsonschema/src/ir/display.rs 98.18% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #749      +/-   ##
==========================================
+ Coverage   85.71%   86.14%   +0.43%     
==========================================
  Files          95      100       +5     
  Lines       15531    16362     +831     
==========================================
+ Hits        13312    14095     +783     
- Misses       2219     2267      +48     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link

codspeed-hq bot commented May 22, 2025

CodSpeed Performance Report

Merging #749 will not alter performance

Comparing dd/flat (5e3a3bf) with master (958b430)

Summary

✅ 47 untouched benchmarks
🆕 1 new benchmarks

Benchmarks breakdown

Benchmark BASE HEAD Change
🆕 new_is_valid N/A 5.2 µs N/A

@Stranger6667 Stranger6667 mentioned this pull request Jun 12, 2025
Signed-off-by: Dmitry Dygalo <[email protected]>
Signed-off-by: Dmitry Dygalo <[email protected]>
Signed-off-by: Dmitry Dygalo <[email protected]>
Signed-off-by: Dmitry Dygalo <[email protected]>
Signed-off-by: Dmitry Dygalo <[email protected]>
Signed-off-by: Dmitry Dygalo <[email protected]>
Signed-off-by: Dmitry Dygalo <[email protected]>
Signed-off-by: Dmitry Dygalo <[email protected]>
Signed-off-by: Dmitry Dygalo <[email protected]>
Signed-off-by: Dmitry Dygalo <[email protected]>
Signed-off-by: Dmitry Dygalo <[email protected]>
Signed-off-by: Dmitry Dygalo <[email protected]>
Signed-off-by: Dmitry Dygalo <[email protected]>
Signed-off-by: Dmitry Dygalo <[email protected]>
@lucab
Copy link
Contributor

lucab commented Jun 14, 2025

I was looking a bit more into the memory usage of this library, and I have an additional sidenote that I'll leave here.
Other than the well-known lazy-statics, it looks like there are some larger memory leak coming from the $ref validator here:

fn lazy_compile(&self) -> &SchemaNode {
self.inner.get_or_init(|| {
let resolver = self
.registry
.resolver_from_raw_parts(self.base_uri.clone(), self.scopes.clone());
let ctx = compiler::Context::new(
Arc::clone(&self.config),
Arc::clone(&self.registry),
Rc::new(resolver),
self.vocabularies.clone(),
self.draft,
self.location.clone(),
);
// INVARIANT: This schema was already used during compilation before detecting a
// reference cycle that lead to building this validator.
compiler::compile(&ctx, self.resource.as_ref()).expect("Invalid schema")
})
}

I think that the current recursive-validators manage to build a cyclic family of Arc (or Rc?) via the intertwined Context, Registry, Resolver and the validators.
I naively tried to break this by demoting LazyRefValidator.registry to a Weak, but that didn't work at runtime.

@Stranger6667
Copy link
Owner Author

@lucab

Thank you for taking a deep look into this! The presence of registry, ctx, etc, is surely an awful hack. In this PR, the static references are resolved at the compilation step, so there is no need to keep those structures.

I am not sure about $dynamicRef 100% yet, but it should also be possible to avoid this "compilation at runtime" approach too. I.e., store all possible jump locations and decide on the exact one at runtime.

Right now, static refs work in the new compiler, so adding $dynamicRef is the next step on my TODO list.

As a side note, another factor contributing to memory usage could be the presence of many separate allocations (every node is boxed). I believe that, depending on the used allocator, memory fragmentation may also impact the final result.

Gradually, I want to pack things more (e.g. intern String for properties, etc), and I think it will also improve the situation with runtime performance & memory usage.

Signed-off-by: Dmitry Dygalo <[email protected]>
Signed-off-by: Dmitry Dygalo <[email protected]>
Signed-off-by: Dmitry Dygalo <[email protected]>
Signed-off-by: Dmitry Dygalo <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants