Skip to content

Refactor background process metrics to be homeserver-scoped #18670

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

MadLittleMods
Copy link
Contributor

@MadLittleMods MadLittleMods commented Jul 7, 2025

Refactor background process metrics to be homeserver-scoped

Part of #18592

Separated out of #18656 because it's a bigger, unique piece of the refactor

Testing strategy

  1. Add the metrics listener in your homeserver.yaml
    listeners:
      # This is just showing how to configure metrics either way
      #
      # `http` `metrics` resource
      - port: 9322
        type: http
        bind_addresses: ['127.0.0.1']
        resources:
          - names: [metrics]
            compress: false
      # `metrics` listener
      - port: 9323
        type: metrics
        bind_addresses: ['127.0.0.1']
  2. Start the homeserver: poetry run synapse_homeserver --config-path homeserver.yaml
  3. Fetch http://localhost:9322/_synapse/metrics and/or http://localhost:9323/metrics
  4. Observe response includes the background processs metrics (synapse_background_process_start_count, synapse_background_process_db_txn_count_total, etc) with the server_name label

Dev notes

synapse/metrics/background_process_metrics.py

SYNAPSE_POSTGRES=1 SYNAPSE_POSTGRES_USER=postgres SYNAPSE_TEST_LOG_LEVEL=INFO poetry run trial tests.util.test_task_scheduler.TestTaskSchedulerWithBackgroundWorker.test_schedule_task

Log context docs: https://github.com/element-hq/synapse/blob/56f5097d1c8de968b845eda9e2abd5207ee76825/docs/log_contexts.md

Todo

Pull Request Checklist

  • Pull request is based on the develop branch
  • Pull request includes a changelog file. The entry should:
    • Be a short description of your change which makes sense to users. "Fixed a bug that prevented receiving messages from other servers." instead of "Moved X method from EventStore to EventWorkerStore.".
    • Use markdown where necessary, mostly for code blocks.
    • End with either a period (.) or an exclamation mark (!).
    • Start with a capital letter.
    • Feel free to credit yourself, by adding a sentence "Contributed by @github_username." or "Contributed by [Your Name]." to the end of the entry.
  • Code style is correct (run the linters)

```
SYNAPSE_TEST_LOG_LEVEL=INFO poetry run trial tests.test_server.OptionsResourceTests.test_known_request
The "poetry.dev-dependencies" section is deprecated and will be removed in a future version. Use "poetry.group.dev.dependencies" instead.
tests.test_server
  OptionsResourceTests
    test_known_request ...                                              [ERROR]

===============================================================================
[ERROR]
Traceback (most recent call last):
  File "/home/eric/Documents/github/element/synapse/tests/unittest.py", line 131, in new
    return code(orig, *args, **kwargs)
  File "/home/eric/Documents/github/element/synapse/tests/unittest.py", line 225, in setUp
    return orig()
  File "/home/eric/Documents/github/element/synapse/tests/test_server.py", line 219, in setUp
    self.homeserver = setup_test_homeserver(
  File "/home/eric/Documents/github/element/synapse/tests/server.py", line 1147, in setup_test_homeserver
    hs.setup()
  File "synapse/server.py", line 361, in setup
    self.datastores = Databases(self.DATASTORE_CLASS, self)
  File "synapse/storage/databases/__init__.py", line 104, in __init__
    main = main_store_class(database, db_conn, hs)
  File "synapse/storage/databases/main/__init__.py", line 174, in __init__
    super().__init__(database, db_conn, hs)
  File "synapse/storage/databases/main/events_bg_updates.py", line 122, in __init__
    super().__init__(database, db_conn, hs)
  File "synapse/storage/databases/main/experimental_features.py", line 44, in __init__
    super().__init__(database, db_conn, hs)
  File "synapse/storage/databases/main/devices.py", line 1776, in __init__
    super().__init__(database, db_conn, hs)
  File "synapse/storage/databases/main/devices.py", line 95, in __init__
    super().__init__(database, db_conn, hs)
  File "synapse/storage/databases/main/roommember.py", line 2030, in __init__
    super().__init__(database, db_conn, hs)
  File "synapse/storage/databases/main/room.py", line 2449, in __init__
    super().__init__(database, db_conn, hs)
  File "synapse/storage/databases/main/room.py", line 2023, in __init__
    super().__init__(database, db_conn, hs)
  File "synapse/storage/databases/main/room.py", line 152, in __init__
    super().__init__(database, db_conn, hs)
  File "synapse/storage/databases/main/registration.py", line 2519, in __init__
    super().__init__(database, db_conn, hs)
  File "synapse/storage/databases/main/registration.py", line 2407, in __init__
    super().__init__(database, db_conn, hs)
  File "synapse/storage/databases/main/profile.py", line 53, in __init__
    super().__init__(database, db_conn, hs)
  File "synapse/storage/databases/main/presence.py", line 80, in __init__
    super().__init__(database, db_conn, hs)
  File "synapse/storage/databases/main/presence.py", line 61, in __init__
    super().__init__(database, db_conn, hs)
  File "synapse/storage/databases/main/transactions.py", line 81, in __init__
    super().__init__(database, db_conn, hs)
  File "synapse/storage/databases/main/state.py", line 974, in __init__
    super().__init__(database, db_conn, hs)
  File "synapse/storage/databases/main/state.py", line 118, in __init__
    super().__init__(database, db_conn, hs)
  File "synapse/storage/databases/main/state.py", line 765, in __init__
    super().__init__(database, db_conn, hs)
  File "synapse/storage/databases/main/event_federation.py", line 2094, in __init__
    super().__init__(database, db_conn, hs)
  File "synapse/storage/databases/main/media_repository.py", line 226, in __init__
    super().__init__(database, db_conn, hs)
  File "synapse/storage/databases/main/media_repository.py", line 119, in __init__
    super().__init__(database, db_conn, hs)
  File "synapse/storage/databases/main/filtering.py", line 49, in __init__
    super().__init__(database, db_conn, hs)
  File "synapse/storage/databases/main/event_push_actions.py", line 1894, in __init__
    super().__init__(database, db_conn, hs)
  File "synapse/storage/databases/main/metrics.py", line 75, in __init__
    super().__init__(database, db_conn, hs)
  File "synapse/storage/databases/main/event_push_actions.py", line 263, in __init__
    super().__init__(database, db_conn, hs)
  File "synapse/storage/databases/main/end_to_end_keys.py", line 1551, in __init__
    super().__init__(database, db_conn, hs)
  File "synapse/storage/databases/main/e2e_room_keys.py", line 69, in __init__
    super().__init__(database, db_conn, hs)
  File "synapse/storage/databases/main/search.py", line 436, in __init__
    super().__init__(database, db_conn, hs)
  File "synapse/storage/databases/main/search.py", line 145, in __init__
    super().__init__(database, db_conn, hs)
  File "synapse/storage/databases/main/account_data.py", line 68, in __init__
    super().__init__(database, db_conn, hs)
  File "synapse/storage/databases/main/push_rule.py", line 138, in __init__
    super().__init__(database, db_conn, hs)
  File "synapse/storage/databases/main/appservice.py", line 103, in __init__
    super().__init__(database, db_conn, hs)
  File "synapse/storage/databases/main/pusher.py", line 82, in __init__
    super().__init__(database, db_conn, hs)
  File "synapse/storage/databases/main/roommember.py", line 102, in __init__
    super().__init__(database, db_conn, hs)
  File "synapse/storage/databases/main/pusher.py", line 579, in __init__
    super().__init__(database, db_conn, hs)
  File "synapse/storage/databases/main/receipts.py", line 142, in __init__
    server_name=self.server_name,
builtins.AttributeError: 'DataStore' object has no attribute 'server_name'

tests.test_server.OptionsResourceTests.test_known_request
-------------------------------------------------------------------------------
Ran 1 tests in 0.094s

FAILED (errors=1)
```
…in `synapse/storage/databases/main/lock.py`
@@ -543,16 +544,24 @@ async def start(hs: "HomeServer") -> None:
# Set up the SIGHUP machinery.
if hasattr(signal, "SIGHUP"):

@wrap_as_background_process("sighup")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because @wrap_as_background_process was updated to take self.server_name, we've had to refactor a few of the calls that don't use self.

Should be equivalent to before.

def wrap_as_background_process_inner(
func: Callable[P, Awaitable[Optional[R]]],
def wrapper(
func: Callable[Concatenate[HasServerName, P], Awaitable[Optional[R]]],
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This HasServerName pattern matches what we do for @measure_func

We want to use self.server_name for the functions that people are decorating @wrap_as_background_process with.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like a reasonable precedence.

…bscriptable` in with old deps

```
  File "/opt/hostedtoolcache/Python/3.9.23/x64/lib/python3.9/site-packages/synapse/util/caches/lrucache.py", line 128, in <module>
    ) -> defer.Deferred[None]:
TypeError: 'type' object is not subscriptable
```
@@ -119,13 +124,15 @@ def __init__(self, *args: Any, **kwargs: Any):
# capture the sentinel context as its containing context and won't prevent
# GC of / unintentionally reactivate what would be the current context.
self._logging_context = BackgroundProcessLoggingContext(
"replication_command_handler"
name="replication_command_handler", server_name=self.server_name
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Error:

  File "/home/runner/work/synapse/synapse/synapse/replication/tcp/redis.py", line 127, in __init__
    name="replication_command_handler", server_name=self.server_name
builtins.AttributeError: 'RedisSubscriber' object has no attribute 'server_name'

self.server_name is not set at this point since we set this after it's created, see

def buildProtocol(self, addr: IAddress) -> RedisSubscriber:
p = super().buildProtocol(addr)
p = cast(RedisSubscriber, p)
# We do this here rather than add to the constructor of `RedisSubcriber`
# as to do so would involve overriding `buildProtocol` entirely, however
# the base method does some other things than just instantiating the
# protocol.
p.server_name = self.server_name

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I fixed this in f0ee141 but I'm not totally confident in this. Please double check.

For reference, we do have some logging context docs.

…trics-counter-server-name-label

Conflicts:
	synapse/handlers/device.py
	synapse/storage/databases/main/end_to_end_keys.py
See #18670 (comment)

Error:

```
  File "/home/runner/work/synapse/synapse/synapse/replication/tcp/redis.py", line 127, in __init__
    name="replication_command_handler", server_name=self.server_name
builtins.AttributeError: 'RedisSubscriber' object has no attribute 'server_name'
```

`self.server_name` is not set in the constructor since we set this after it's created, see
https://github.com/element-hq/synapse/blob/06108100c848ab3f6b9e541b7769bbc63adc2bf6/synapse/replication/tcp/redis.py#L383-L391
Comment on lines +130 to +136
def _get_logging_context(self) -> BackgroundProcessLoggingContext:
"""
We lazily create the logging context so that `self.server_name` is set and
available. See `RedisDirectTcpReplicationClientFactory.buildProtocol` for more
details on why we set `self.server_name` after the fact instead of in the
constructor.
"""
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please double-check all of this logic. See #18670 (comment)

@MadLittleMods MadLittleMods marked this pull request as ready for review July 19, 2025 02:29
@MadLittleMods MadLittleMods requested a review from a team as a code owner July 19, 2025 02:29
@MadLittleMods MadLittleMods requested review from devonh and removed request for a team July 22, 2025 15:10
Copy link
Member

@devonh devonh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A small comment about the logging context changes.
Otherwise all these changes look appropriate to me.

Thank you for also cleaning up some of the unrelated doc strings along the way.

Comment on lines +76 to +78
self.server_name = (
hs.hostname
) # nb must be called this for @wrap_as_background_process
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this comment refer to?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, is this saying that the self.server_name variable must have that particular name in order to be picked up by the @wrap_as_background_process decorator?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct 👍

(this is the same language we use for @cached as well)

def wrap_as_background_process_inner(
func: Callable[P, Awaitable[Optional[R]]],
def wrapper(
func: Callable[Concatenate[HasServerName, P], Awaitable[Optional[R]]],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like a reasonable precedence.

@MadLittleMods MadLittleMods requested a review from devonh July 22, 2025 23:55
@MadLittleMods MadLittleMods merged commit b7e7f53 into develop Jul 23, 2025
74 of 78 checks passed
@MadLittleMods MadLittleMods deleted the madlittlemods/18592-background-process-metrics-counter-server-name-label branch July 23, 2025 18:28
@MadLittleMods
Copy link
Contributor Author

Thanks for the review @devonh 🐇

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants