-
Notifications
You must be signed in to change notification settings - Fork 596
Prevent worst-case exponential complexity in dependency evaluation #10523
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've tested this with the given config snippet and verified that this indeed fixes the exponential CPU-time use by the master-branch code. It's still not perfect, I now get ~2% usage vs. ~0.5% with about the same amount of objects (dependencies, hosts and services) in a flat hierarchy, but maybe that can't be avoided anyway, I don't know...
What exactly do you mean by flat? I'd expect the most comparable result from a long chain of dependencies (i.e. 0 -> 1 -> 2 -> 3 -> ...) with the same number of checkables (i.e. length being twice the depth of the graph in my example), though that will probably still be a bit cheaper as you only have half the |
Same number of objects total, but each service depends on a single host (i.e. the one host).
That's probably it. |
This allows using ref-counted pointers to const objects. Adds a second typedef so that T::ConstPtr can be used similar to how T::Ptr currently is.
05c92c8
to
0f6185a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before:
Bildschirmaufnahme.2025-07-31.um.10.15.07.mov
After:
Bildschirmaufnahme.2025-07-31.um.10.23.12.mov
Checkable::IsReachable() and DependencyGroup::GetState() call each other recursively. Moving them to a common helper class allows adding caching to them in a later commit without having to pass a cache between the functions (through a public interface) or resorting to thread_local variables.
So far, calling Checkable::IsReachable() traversed all possible paths to it's parents. In case a parent is reachable via multiple paths, all it's parents were evaluated multiple times, result in a worst-case exponential complexity. With this commit, the implementation keeps track of which checkables were already visited and uses the already-computed reachability instead of repeating the computation, ensuring a worst-case linear runtime within the graph size.
0f6185a
to
9601468
Compare
So far, calling Checkable::IsReachable() traversed all possible paths to it's parents. In case a parent is reachable via multiple paths, all it's parents were evaluated multiple times, result in a worst-case exponential complexity.
With this PR, the implementation keeps track of which checkables were already visited and uses the already-computed reachability instead of repeating the computation, ensuring a worst-case linear runtime within the graph size.
For implementing this, there are two additional commits with preparations for that change:
T::ConstPtr
as a typedef forintrusive_ptr<const T>
combined with the necessary changes to support this (changing some functions to accept aconst Object*
instead of anObject*
as well as marking the reference counter attribute asmutable
). This was done becauseCheckable::IsReachable()
isconst
, i.e.this
is of typeconst Checkable*
, so aCheckable::Ptr
can't be constructed from it, only the newCheckable::ConstPtr
.Checkable::IsReachable()
andDependencyGroup::GetState()
is moved to a new helper class (the original methods are kept and now transparently call the ones in the helper class), so that they can easily access common storage (i.e. a cache needed for implementing this) without having to explicitly pass the cache through a public interface or having to resort tothread_local
.Test
I've prepared a config file that creates the following dependency graph that is pretty much the worst case for the old implementation. Note that this diagram is reduced to 4 levels, the actual config generates 25 levels, making the issue much more obvious when running the config.
bobby-little-dependencies.conf
This can easily be started in a container:
docker run --rm -it -v $(pwd)/bobby-little-dependencies.conf:/icinga2.conf:ro icinga/icinga2:dev icinga2 daemon -c /icinga2.conf
Before
Heavy CPU usage (fluctuating somewhere between 200% and 800% on my machine), checks aren't executed every 10s as configured.
After
Almost no CPU usage (mostly idle, peaks of around 2% on my machine), regular check execution.
ref/IP/59145
ref/IP/59671