🐛 Ensure proper Redis client shutdown in Celery #8237

giancarloromeo · 2025-08-20T08:52:39Z

What do these changes do?

This PR fixes a critical issue with Celery's Redis client lifecycle management where Redis connections were not properly cleaned up, leading to resource leaks. The fix ensures proper initialization and shutdown of Redis clients throughout the application.

Centralizes Redis client lifecycle management for Celery operations
Refactors worker initialization to remove redundant celery_settings parameter
Updates method names for consistency (lifespan → start_and_hold)

BONUS:

This PR fixes an issue that caused hanging tests. Now they complete in ~3m (btw. they always complete)

Related issue/s

fixes Celery worker not calling shutdown on Redis instance #8159

How to test

cd packages/celery-library
make tests

Dev-ops

codecov · 2025-08-20T08:54:48Z

Codecov Report

❌ Patch coverage is 30.76923% with 9 lines in your changes missing coverage. Please review.
✅ Project coverage is 86.70%. Comparing base (35e7048) to head (8125b5d).

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #8237      +/-   ##
==========================================
- Coverage   88.03%   86.70%   -1.34%     
==========================================
  Files        1919     1449     -470     
  Lines       74341    59854   -14487     
  Branches     1305      682     -623     
==========================================
- Hits        65449    51894   -13555     
+ Misses       8499     7733     -766     
+ Partials      393      227     -166

Flag	Coverage Δ
integrationtests	`59.60% <ø> (-4.63%)`	⬇️
unittests	`86.29% <30.76%> (-0.39%)`	⬇️

Components	Coverage Δ
pkg_aws_library	`∅ <ø> (∅)`
pkg_celery_library	`85.20% <100.00%> (-2.18%)`	⬇️
pkg_dask_task_models_library	`∅ <ø> (∅)`
pkg_models_library	`∅ <ø> (∅)`
pkg_notifications_library	`∅ <ø> (∅)`
pkg_postgres_database	`∅ <ø> (∅)`
pkg_service_integration	`∅ <ø> (∅)`
pkg_service_library	`72.37% <0.00%> (+0.02%)`	⬆️
pkg_settings_library	`∅ <ø> (∅)`
pkg_simcore_sdk	`65.29% <ø> (-19.75%)`	⬇️
agent	`93.53% <ø> (ø)`
api_server	`92.84% <ø> (ø)`
autoscaling	`95.89% <ø> (ø)`
catalog	`92.34% <ø> (ø)`
clusters_keeper	`99.13% <ø> (ø)`
dask_sidecar	`92.37% <ø> (+0.56%)`	⬆️
datcore_adapter	`97.94% <ø> (ø)`
director	`75.81% <ø> (ø)`
director_v2	`85.40% <ø> (-5.52%)`	⬇️
dynamic_scheduler	`96.27% <ø> (ø)`
dynamic_sidecar	`89.19% <ø> (-0.91%)`	⬇️
efs_guardian	`89.62% <ø> (ø)`
invitations	`91.44% <ø> (ø)`
payments	`92.61% <ø> (ø)`
resource_usage_tracker	`92.13% <ø> (+0.21%)`	⬆️
storage	`∅ <ø> (∅)`
webclient	`∅ <ø> (∅)`
webserver	`88.10% <ø> (-0.04%)`	⬇️

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 35e7048...8125b5d. Read the comment docs.

🚀 New features to boost your workflow:

📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

mergify · 2025-08-20T09:42:15Z

🧪 CI Insights

Here's what we observed from your CI run for 8125b5d.

❌ Failed Jobs

Pipeline	Job	Retries	🔍 CI Insights	📄 Logs
`CI`	`integration-tests`	`0`	View	View
	`system-tests`	`0`	View	View
	`unit-tests`	`0`	View	View

Copilot

Pull Request Overview

This PR fixes a critical issue with Celery's Redis client lifecycle management where Redis connections were not properly cleaned up, leading to resource leaks. The fix ensures proper initialization and shutdown of Redis clients throughout the application.

Centralizes Redis client lifecycle management for Celery operations
Refactors worker initialization to remove redundant celery_settings parameter
Updates method names for consistency (lifespan → start_and_hold)

Reviewed Changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
`services/storage/src/simcore_service_storage/modules/celery/__init__.py`	Complete refactor to manage Redis client lifecycle with proper setup/shutdown
`services/storage/src/simcore_service_storage/core/application.py`	Updated to use new setup_celery function name and reorganized conditional logic
`packages/service-library/src/servicelib/fastapi/celery/app_server.py`	Added task_manager property and renamed lifespan method to start_and_hold
`packages/service-library/src/servicelib/celery/app_server.py`	Made task_manager abstract property and renamed lifespan method
`packages/celery-library/src/celery_library/signals.py`	Simplified worker initialization by removing redundant celery_settings parameter
`packages/celery-library/src/celery_library/common.py`	Removed create_task_manager function that was causing lifecycle issues
`services/storage/tests/conftest.py`	Updated test fixture to match simplified worker initialization
`packages/celery-library/tests/conftest.py`	Updated test fixtures with proper Redis client lifecycle management
`packages/service-library/src/servicelib/celery/models.py`	Fixed parameter name from task_context to task_filter

services/storage/src/simcore_service_storage/modules/celery/__init__.py

services/storage/src/simcore_service_storage/core/application.py

matusdrobuliak66

Thanks

GitHK

Just some minor things

services/storage/src/simcore_service_storage/core/application.py

services/storage/src/simcore_service_storage/modules/celery/worker_main.py

services/storage/tests/conftest.py

sanderegg

not sure I completely get where the issue was? in the celery package tests?
Why are we not just using the real thing in the celery package tests instead of that complicated fake?

packages/celery-library/src/celery_library/signals.py

packages/celery-library/tests/unit/test_async_jobs.py

packages/celery-library/tests/conftest.py

packages/service-library/src/servicelib/fastapi/celery/app_server.py

sanderegg · 2025-08-22T11:45:55Z

packages/service-library/src/servicelib/fastapi/celery/app_server.py

+        task_manager: TaskManager = self.app.state.task_manager
+        return task_manager
+
+    async def start_and_hold(self, startup_completed_event: threading.Event) -> None:


it is a lifespan and the one problem I see here is the returned type that is wrong. It should be AsyncIterator[None] which would remove the confusion I think

We don't yield anything here. This is the place in which the initialized FastAPI instance stays parked waiting for the shutdown event.

Since this is the primary entrypoint for this service, i would call it run_until_shutdown that emphasizes the lifecycle clearly (and reminds the
naming from asyncio library).

Regarding @sanderegg comment.

In other parts of the code our approach is to provide a context-manager like function that includes setup&tear-down parts in one place (see https://github.com/ITISFoundation/osparc-simcore/blob/master/packages/service-library/src/servicelib/fastapi/postgres_lifespan.py#L31C11-L31C37).

This approach here is difference since this member function encapsulates the setup&tear-down parts AND runs it. That reduces the flexibility but I guess you do not need it here.

I understand this function also can only be called once. Therefore I would add a protection for it

TIP: use log_context(INFO,...) instead of _logger.info

pcrespov · 2025-08-25T09:55:09Z

packages/service-library/src/servicelib/fastapi/celery/app_server.py

+        task_manager: TaskManager = self.app.state.task_manager
+        return task_manager
+
+    async def start_and_hold(self, startup_completed_event: threading.Event) -> None:


Since this is the primary entrypoint for this service, i would call it run_until_shutdown that emphasizes the lifecycle clearly (and reminds the
naming from asyncio library).

Regarding @sanderegg comment.

In other parts of the code our approach is to provide a context-manager like function that includes setup&tear-down parts in one place (see https://github.com/ITISFoundation/osparc-simcore/blob/master/packages/service-library/src/servicelib/fastapi/postgres_lifespan.py#L31C11-L31C37).

This approach here is difference since this member function encapsulates the setup&tear-down parts AND runs it. That reduces the flexibility but I guess you do not need it here.

I understand this function also can only be called once. Therefore I would add a protection for it

pcrespov · 2025-08-25T09:55:51Z

packages/service-library/src/servicelib/celery/app_server.py


    @abstractmethod
-    async def lifespan(
+    async def start_and_hold(


check my other comment about renaming this

for interfaces, plaease add some doc about what is expected, specially
when the name does not reveals all details

pcrespov · 2025-08-25T09:57:54Z

packages/service-library/src/servicelib/fastapi/celery/app_server.py

+        task_manager: TaskManager = self.app.state.task_manager
+        return task_manager
+
+    async def start_and_hold(self, startup_completed_event: threading.Event) -> None:


TIP: use log_context(INFO,...) instead of _logger.info

packages/service-library/src/servicelib/fastapi/celery/app_server.py

pcrespov · 2025-08-25T10:03:53Z

packages/service-library/src/servicelib/fastapi/celery/app_server.py

-    async def lifespan(self, startup_completed_event: threading.Event) -> None:
+    @property
+    def task_manager(self) -> TaskManager:
+        task_manager = self.app.state.task_manager


The order in which the app state is setup is very important and here I do not see how this is guaranteed. Can you please show me offline how the workflow works?

…client-lifecycle

sonarqubecloud · 2025-08-26T11:04:05Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

feat: move redis client lifecycle to app server's one

3174045

giancarloromeo self-assigned this Aug 20, 2025

giancarloromeo added a:celery-library a:storage issue related to storage service labels Aug 20, 2025

giancarloromeo added this to the Voyager milestone Aug 20, 2025

fix: remove unused param

ab99e13

giancarloromeo added 9 commits August 20, 2025 13:43

fix: fake server Redis client's lifecycle

0d26e0c

fix: typecheck

f2b84eb

fix: remove unuseful setup

064f8ac

fix: celery task manager fixture

f8290e8

fix: task manager property

8999602

fix: typecheck

dd0a684

fix: absolute import

a81aafc

tests: use in-memory Redis

d50dbb4

fix: rename

86bbe39

giancarloromeo requested a review from Copilot August 20, 2025 19:54

Copilot AI reviewed Aug 20, 2025

View reviewed changes

services/storage/src/simcore_service_storage/modules/celery/__init__.py Show resolved Hide resolved

giancarloromeo added 5 commits August 21, 2025 10:31

fix: shutdown

70bf868

fix: worker shutdown

1f626a2

fix: use threads

246d695

fix: explicity stop worker

41446dc

fix: shutdown

ee52317

giancarloromeo requested a review from bisgaard-itis August 21, 2025 14:19

giancarloromeo added 5 commits August 21, 2025 20:43

fix: raise timeout

97ded5f

fix: force worker stop

abeab74

fix: remove rabbit

8d2c423

fix: rabbit

1bb6dad

fix: use separate Celery app

9327579

giancarloromeo added 3 commits August 22, 2025 12:30

fix: add wait after submit

5198a37

fix: test indent

f223f79

fix: loglevel

f347698

giancarloromeo commented Aug 22, 2025

View reviewed changes

services/storage/src/simcore_service_storage/core/application.py Show resolved Hide resolved

fix: wait

585dc9f

giancarloromeo changed the title ~~🐛 Fix Celery's Redis client lifecycle~~ 🐛 Ensure Proper Redis Client Shutdown in Celery Aug 22, 2025

giancarloromeo changed the title ~~🐛 Ensure Proper Redis Client Shutdown in Celery~~ 🐛 Ensure proper Redis client shutdown in Celery Aug 22, 2025

giancarloromeo marked this pull request as ready for review August 22, 2025 11:17

giancarloromeo requested review from sanderegg and pcrespov as code owners August 22, 2025 11:17

Merge branch 'master' into is8159/fix-redis-client-lifecycle

ac35b82

giancarloromeo requested review from GitHK and matusdrobuliak66 August 22, 2025 11:17

matusdrobuliak66 approved these changes Aug 22, 2025

View reviewed changes

GitHK approved these changes Aug 22, 2025

View reviewed changes

services/storage/src/simcore_service_storage/core/application.py Show resolved Hide resolved

services/storage/src/simcore_service_storage/modules/celery/worker_main.py Outdated Show resolved Hide resolved

services/storage/tests/conftest.py Outdated Show resolved Hide resolved

giancarloromeo added 2 commits August 22, 2025 13:37

fix: remove partials

ceae3c3

fix: remove unused asserts

81161e9

sanderegg reviewed Aug 22, 2025

View reviewed changes

giancarloromeo added 2 commits August 22, 2025 14:02

fix: assert

1e02fe7

fix: remove partial

ad4cc8c

pcrespov reviewed Aug 25, 2025

View reviewed changes

giancarloromeo added 5 commits August 26, 2025 09:10

remove unused

46f5034

rename

fec6e98

Merge branch 'master' into is8159/fix-redis-client-lifecycle

8d4cf9b

Merge remote-tracking branch 'upstream/master' into is8159/fix-redis-…

8270bb1

…client-lifecycle

fix: remove unused

8125b5d

🐛 Ensure proper Redis client shutdown in Celery #8237

Are you sure you want to change the base?

🐛 Ensure proper Redis client shutdown in Celery #8237

Uh oh!

Conversation

giancarloromeo commented Aug 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What do these changes do?

BONUS:

Related issue/s

How to test

Dev-ops

Uh oh!

codecov bot commented Aug 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

mergify bot commented Aug 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🧪 CI Insights

❌ Failed Jobs

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

matusdrobuliak66 left a comment

Choose a reason for hiding this comment

Uh oh!

GitHK left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sanderegg left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sonarqubecloud bot commented Aug 26, 2025

Quality Gate passed

Uh oh!

Uh oh!

giancarloromeo commented Aug 20, 2025 •

edited

Loading

codecov bot commented Aug 20, 2025 •

edited

Loading

mergify bot commented Aug 20, 2025 •

edited

Loading