Skip to content

Feat: Make Celery periodic task scheduling configurable#3666

Open
aayush-patidar wants to merge 5 commits intochaoss:mainfrom
aayush-patidar:#issuefix#3406
Open

Feat: Make Celery periodic task scheduling configurable#3666
aayush-patidar wants to merge 5 commits intochaoss:mainfrom
aayush-patidar:#issuefix#3406

Conversation

@aayush-patidar
Copy link

Description

This PR removes hardcoded periodic Celery task intervals and makes task scheduling fully configuration-driven using AugurConfig. Users can now customize scheduling frequencies or disable specific periodic tasks via augur.config.json without modifying source code.
Default configuration values preserve the existing scheduling behavior when no custom configuration is provided, ensuring full backward compatibility.
Fixes #3406

Problem

Several periodic Celery tasks in augur/tasks/init/celery_app.py were scheduled using hardcoded time intervals. This made it difficult for users to adjust task frequency or disable tasks without changing source code.

Solution

All previously hardcoded Celery task schedules have been refactored to read their values from configuration, following the same pattern already used for materialized view scheduling.

Changes

  • Refactored setup_periodic_tasks to retrieve scheduling intervals from AugurConfig instead of hardcoded constants.

  • Added new configuration keys under the [Tasks] section with defaults matching previous behavior:

    • non_repo_domain_tasks_interval_in_days
      
    • retry_errored_repos_cron_hour
      
    • retry_errored_repos_cron_minute
      
    • process_contributors_interval_in_seconds
    • create_collection_status_records_interval_in_seconds
  • Implemented logic to disable interval-based tasks when the configured value is <= 0.

  • Updated configuration documentation to describe new keys, units, defaults, and disable behavior.

  • Added automated tests to validate scheduling logic, default fallbacks, and disabled tasks.

Backward Compatibility

All new configuration options include defaults that match the original hardcoded intervals. Existing installations without these keys defined will continue to operate with the same task schedules as before.

Testing

  • Added tests/tasks/test_celery_scheduler.py
  • Run tests with:
    • pytest tests/tasks/test_celery_scheduler.py

Signed-off-by: Aayush Patidar <patidaraayush053@gmail.com>
@aayush-patidar
Copy link
Author

Hi @sgoggins
This PR makes all periodic Celery task schedules configurable via AugurConfig, following the same pattern already used for materialized views.
Defaults preserve existing behavior, tasks can be disabled via config, and tests + docs are included.
Happy to adjust anything if needed — thanks for the review!

Copy link
Contributor

@MoralCode MoralCode left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this seems like a helpful pr. I'm a little concerned this may be going a little top far with adding new configuration options, but overall i think its a helpful change, especially with the unit tests for task setup

Comment on lines +259 to +261
retry_repos_hour_val = config.get_value('Tasks', 'retry_errored_repos_cron_hour')
retry_repos_minute_val = config.get_value('Tasks', 'retry_errored_repos_cron_minute')
retry_repos_hour = int(retry_repos_hour_val) if retry_repos_hour_val is not None else 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

im not sure if we should be defining cron parameters as individual config items

Comment on lines +6 to +16
Task Scheduling Configuration
-----------------------------

The following keys in the ``[Tasks]`` section control the scheduling of periodic Celery tasks. You can adjust these intervals or disable tasks by setting the value to <= 0 (for interval-based tasks).

- ``collection_interval``: Interval in seconds for the main collection monitor. Default: 30.
- ``core_collection_interval_days``: Interval in days for core collection. Default: 15.
- ``secondary_collection_interval_days``: Interval in days for secondary collection. Default: 10.
- ``facade_collection_interval_days``: Interval in days for facade collection. Default: 10.
- ``ml_collection_interval_days``: Interval in days for ML collection. Default: 40.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we already have docs in ReadTheDocs for many of these config items. Can you put this documentation alongside those (i know they may not be in the best place, but id prefer that than introducing more duplicate docs)

Signed-off-by: Aayush Patidar <patidaraayush053@gmail.com>
Signed-off-by: Aayush Patidar <patidaraayush053@gmail.com>
Signed-off-by: Aayush Patidar <patidaraayush053@gmail.com>
pyproject.toml Outdated
[tool.pytest.ini_options]
addopts = "-ra -s"
testpaths = [
"tests",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is going to run all the tests, which is likely going to cause some of the older, not-yet-fixed ones to fail.

if your test is the only one within the tasks folder can you change this to tests/tasks (and if not also add the filename and the .py extension)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

make remaining periodic tasks configurable

2 participants