Fix possibly excessive memory use when computer test result overview #6850

Martchus · 2025-11-21T15:21:14Z

When accessing the test results overview page (especially without parameters, e.g. http://localhost:9526/tests/overview) a very huge set of jobs is considered. This leads to a very excessive memory usage, e.g. with current OSD data the RSS rises over 7 GiB and the request takes very long. There's already a limit of 2000 jobs which is also effective but unfortunately not passed to complex_query and only used later on. I tracked the memory usage while running the code with debug printing it it was very clear that complex_query is the culprit.

This change uses the limit from the start. This means additional filtering done in Perl code also needs to happen as part of the now limited initial query. So this change replaces these Perl loops/grep with passing the filter parameters directly to the database. This slightly changes the order of things but should lead to the same end result.

Still a draft as I'll also have to take care of other places where we use complex_query in a similar way. I probably also still need to change some details depending on test results. If I took care of other places I can probably also remove code that is then no longer required, e.g. latest_jobs.

Related ticket: https://progress.opensuse.org/issues/192448

Martchus · 2025-11-24T11:43:14Z

It looks like the current version of the test results overview actually doesn't return results across all builds via /tests/overview - despite having problems of huge memory consumption in some cases. At least with my local database I actually just get two jobs via /tests/overview of a single build which looks rather random. With this PR I actually get jobs of many different builds which is much more what one would expect from such an unconstrained query. This behavior is also visible in unit tests where now more results show up on test result overview pages.

I'll review whether the new behavior makes sense in the different test cases (manual and unit tests). So for the new behavior makes sense in my manual tests which would mean the current code is already cutting corners somewhere which now seems no longer necessary.

EDIT: I've also just tested this with OSD data. The version of this PR keeps the memory usage limited (the final RSS of the process is 178.7 MiB) and it is notably faster than the current version (which ends up with an RSS of 7.4 GiB). With OSD data the difference in the jobs being present on /tests/overview is much smaller. They are actually almost identical. Considering results are limited to 2000 jobs with the remark "narrow down your search parameters" this is completely acceptable.

I also get meaningful results for e.g. /tests/overview?result=failed much faster. The set of jobs is again not identical but it does make sense. The typical overview pages (with the usual constraints so the limit of 2000 isn't reached) I tested look identical. Less constrained over view pages (e.g. just the BUILD is constrained but the limit still isn't reached) I tested also look identical. All tested with OSD data.

The only regression I've seen so far is a missing @64bit.

EDIT: The regression about the missing @64bit is not really a regression. With this PR it only considers the jobs that are actually being displayed when computing the "preferred" machine (to omit it) but that makes actually more sense. (The current code also doesn't consider all jobs to compute the preferred machine. But also not only jobs being actually displayed. So the current behavior is a complicated middle ground. Some filters (e.g. the job group) are considered and some filters are not (e.g. the state, result and the "filter" to show only the latest jobs). Probably we don't care much about this so I'll ignore it.)

Martchus · 2025-11-24T17:05:27Z

Looks like the current code determines the latest build automatically when no build is specified. Since that actually seems to be a wanted feature I made my change behave similarly. With this there is almost no change in behavior anymore. Some tests still fail which I'll have to work on tomorrow.

I also added the following to the commit message because this in one of the remaining differences and I haven't found a good way to avoid it:

With this change the "preferred" machine (basically the most frequent machine per architecture) is only determined based on the jobs that are actually displayed. That means a jobs will no longer show as e.g. kde@uefi when filtering for machine=uefi because with this filter the machine uefi becomes the preferred (most frequent) machine (as now all displayed jobs have that machine). I think this change is acceptable. It means that some tests had to be adjusted accordingly.

Martchus · 2025-11-25T12:25:49Z

I almost fixed all mistakes but the test filtering does not reveal old jobs is revealing a bigger problem with my current approach. I guess it actually made an important difference that the current code applies these filters only relatively at the end (but unfortunately also before limiting takes place). I need to find a way to do the filtering (on e.g. result) after the "latest" de-duplication/grouping but still before the limit is applied.

Martchus · 2025-11-25T16:15:17Z

The case tested via the sub test filtering does not reveal old jobs is now also working with my latest push.

I now also covered the API route that is the equivalent of the overview page. With this the only remaining unaltered usage of complex_query is in the "list" API route but this one is already limited from the start. It still uses Perl code for filtering the latest jobs. Hence I wasn't able to remove the latest_jobs function.

With all these changes to the PR the memory usage of /tests/overview is still significantly reduced. (Sill around 200 MiB, not over 7 GiB.)

codecov · 2025-11-25T16:34:51Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 99.26%. Comparing base (e3b9e48) to head (c018f21).
⚠️ Report is 5 commits behind head on master.

Additional details and impacted files

@@           Coverage Diff           @@
##           master    #6850   +/-   ##
=======================================
  Coverage   99.26%   99.26%           
=======================================
  Files         402      402           
  Lines       41493    41522   +29     
=======================================
+ Hits        41187    41218   +31     
+ Misses        306      304    -2

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

lib/OpenQA/Schema/ResultSet/Jobs.pm

t/10-tests_overview.t

lib/OpenQA/Schema/ResultSet/Jobs.pm

d3flex · 2025-11-26T14:02:27Z

lib/OpenQA/WebAPI/Controller/Test.pm

-    @jobs = @jobs[0 .. ($limit - 1)] if $limit_exceeded;
-    my @jobids = map { $_->id } @jobs;
+    my @jobs = $jobs->all;
+    my $preferred_machines = _calculate_preferred_machines(\@jobs);


just a note. The $jobs are used here only for calculate the $preferred_machines which is used later for this condition

&& $preferred_machines->{$job->ARCH} && $preferred_machines->{$job->ARCH} ne $job->MACHINE)``` So the `_prepare_job_results` needs this only and not the whole ResultSet. If you call `_calculate_preferred_machines(\@jobs);` before https://github.com/Martchus/openQA/blob/79f52e8c934da96eaa20e95fca56a8891dfc361a/lib/OpenQA/WebAPI/Controller/Test.pm#L848C5-L848C94 and just pass this. I think that way, it would simplify the dependencies between the invocations and the functions would be more clear. But lets not change anything now.

The $jobs are used here only for calculate …

The "only" in this sentence makes it wrong. Just search for @jobs in the subsequent code of this function; you'll find 4 occurrences.

d3flex

I finished with the review. I would approve but I would like to ask you for a tiny typo in the 3rd commit. it says That means a jobs will no longer show as. it meant to be without a, right? Other than that I am ready to approve it

When accessing the test results overview page (especially without parameters, e.g. http://localhost:9526/tests/overview) a very huge set of jobs is considered. This leads to a very excessive memory usage, e.g. with current OSD data the RSS rises over 7 GiB and the request takes very long. There's already a limit of 2000 jobs which is also effective but unfortunately not passed to `complex_query` and only used later on. I tracked the memory usage while running the code with debug printing it it was very clear that `complex_query` is the culprit. This change uses the limit from the start. This means additional filtering done in Perl code also needs to happen as part of the now limited initial query (using a sub query, see the added comment). So this change replaces these Perl loops/grep with passing the filter parameters directly to the database. This slightly changes the order of things but should lead to the same end result. With this change the "preferred" machine (basically the most frequent machine per architecture) is only determined based on the jobs that are actually displayed. That means jobs will no longer show as e.g. `kde@uefi` when filtering for `machine=uefi` because with this filter the machine `uefi` becomes the preferred (most frequent) machine (as now all displayed jobs have that machine). I think this change is acceptable. It means that some tests had to be adjusted accordingly. Related ticket: https://progress.opensuse.org/issues/192448

Martchus · 2025-11-26T14:36:11Z

I fixed the typo in the 3rd commit.

Martchus force-pushed the overview-limit branch from c33d9f5 to a8e88bc Compare November 24, 2025 17:01

Martchus force-pushed the overview-limit branch 2 times, most recently from e78ad16 to 8bcf758 Compare November 25, 2025 16:14

Martchus marked this pull request as ready for review November 25, 2025 16:15

Martchus changed the title ~~WIP: Fix possibly excessive memory use when computer test result overview~~ Fix possibly excessive memory use when computer test result overview Nov 25, 2025

Martchus force-pushed the overview-limit branch from 8bcf758 to 572f5a4 Compare November 25, 2025 16:36

okurz requested changes Nov 25, 2025

View reviewed changes

lib/OpenQA/Schema/ResultSet/Jobs.pm Outdated Show resolved Hide resolved

t/10-tests_overview.t Show resolved Hide resolved

Martchus added 2 commits November 26, 2025 10:36

Fix indentation in overview.html.ep

d718225

Fix typo in _prepare_complex_query_search_args

696fbf5

Martchus force-pushed the overview-limit branch from 572f5a4 to ef8c95b Compare November 26, 2025 09:36

okurz approved these changes Nov 26, 2025

View reviewed changes

d3flex reviewed Nov 26, 2025

View reviewed changes

lib/OpenQA/Schema/ResultSet/Jobs.pm Outdated Show resolved Hide resolved

Martchus force-pushed the overview-limit branch from ef8c95b to 79f52e8 Compare November 26, 2025 13:20

d3flex reviewed Nov 26, 2025

View reviewed changes

lib/OpenQA/Schema/ResultSet/Jobs.pm Outdated Show resolved Hide resolved

d3flex reviewed Nov 26, 2025

View reviewed changes

Martchus added 2 commits November 26, 2025 15:35

Avoid repeating MAIN_SETTINGS in various places

c018f21

Martchus force-pushed the overview-limit branch from 591da33 to c018f21 Compare November 26, 2025 14:36

d3flex approved these changes Nov 26, 2025

View reviewed changes

mergify bot merged commit 131df4d into os-autoinst:master Nov 26, 2025
51 checks passed

Martchus deleted the overview-limit branch November 26, 2025 15:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix possibly excessive memory use when computer test result overview #6850

Fix possibly excessive memory use when computer test result overview #6850

Uh oh!

Martchus commented Nov 21, 2025 •

edited

Loading

Uh oh!

Martchus commented Nov 24, 2025 •

edited

Loading

Uh oh!

Martchus commented Nov 24, 2025

Uh oh!

Martchus commented Nov 25, 2025

Uh oh!

Martchus commented Nov 25, 2025

Uh oh!

codecov bot commented Nov 25, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

d3flex Nov 26, 2025

Uh oh!

Martchus Nov 26, 2025

Uh oh!

d3flex left a comment

Uh oh!

Martchus commented Nov 26, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Fix possibly excessive memory use when computer test result overview #6850

Fix possibly excessive memory use when computer test result overview #6850

Uh oh!

Conversation

Martchus commented Nov 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Martchus commented Nov 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Martchus commented Nov 24, 2025

Uh oh!

Martchus commented Nov 25, 2025

Uh oh!

Martchus commented Nov 25, 2025

Uh oh!

codecov bot commented Nov 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

d3flex Nov 26, 2025

Choose a reason for hiding this comment

Uh oh!

Martchus Nov 26, 2025

Choose a reason for hiding this comment

Uh oh!

d3flex left a comment

Choose a reason for hiding this comment

Uh oh!

Martchus commented Nov 26, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Martchus commented Nov 21, 2025 •

edited

Loading

Martchus commented Nov 24, 2025 •

edited

Loading

codecov bot commented Nov 25, 2025 •

edited

Loading