Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions changelog.d/19068.misc
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Be mindful of other logging context filters in 3rd-party code and avoid overwriting log record fields unless we know the log record is relevant to Synapse.
40 changes: 36 additions & 4 deletions synapse/logging/context.py
Original file line number Diff line number Diff line change
Expand Up @@ -604,25 +604,57 @@ def __init__(
self._default_request = request

def filter(self, record: logging.LogRecord) -> Literal[True]:
"""Add each fields from the logging contexts to the record.
"""
Add each field from the logging context to the record.

Please be mindful of 3rd-party code outside of Synapse (like in the case of
Synapse Pro for small hosts) as this is running as a global log record filter.
Other code may have set their own attributes on the record and the log record
may not be relevant to Synapse at all so we should not mangle it.

We can have some defaults but we should avoid overwriting existing attributes on
any log record unless we actually have a Synapse logcontext (not just the
default sentinel logcontext).

Returns:
True to include the record in the log output.
"""
context = current_context()
record.request = self._default_request
record.server_name = "unknown_server_from_no_context"

# Avoid overwriting an existing `server_name` on the record. This is running in
# the context of a global log record filter so there may be 3rd-party code that
# adds their own `server_name` and we don't want to interfere with that
# (clobber).
if not hasattr(record, "server_name"):
record.server_name = "unknown_server_from_no_logcontext"

# context should never be None, but if it somehow ends up being, then
# we end up in a death spiral of infinite loops, so let's check, for
# robustness' sake.
if context is not None:
record.server_name = context.server_name

def safe_set(attr: str, value: Any) -> None:
"""
Only write the attribute if it hasn't already been set or we actually have
a Synapse logcontext (indicating that this log record is relevant to
Synapse).
"""
if context is not SENTINEL_CONTEXT or not hasattr(record, attr):
setattr(record, attr, value)
Comment on lines +637 to +644
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the abstraction worth it? Should I just inline the usage?

Originally, I thought I would have to use it more for all of the request attributes below but turns out we can do a little optimization to avoid it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not move setting the server_name under the request attribute? Then you'd only have one usage of safe_set, and then it'd be even more clear that it's not necessary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not following 🙇

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I meant structuring the code to be be something like the following:

        context = current_context()
        record.request = self._default_request

        # Avoid overwriting an existing `server_name` on the record. This is running in
        # the context of a global log record filter so there may be 3rd-party code that
        # adds their own `server_name` and we don't want to interfere with that.
        if not hasattr(record, "server_name"):
            record.server_name = "unknown_server_from_no_logcontext"

        # context should never be None, but if it somehow ends up being, then
        # we end up in a death spiral of infinite loops, so let's check, for
        # robustness' sake.
        if context is not None:

            # Add some data from the HTTP request.
            request = context.request
            # The sentinel logcontext has no request so if we get past this point, we
            # know we have some actual Synapse logcontext and don't need to worry about
            # using `safe_set`. We'll consider this an optimization since this is a
            # pretty hot-path.
            if request is None:
                return True

            def safe_set(attr: str, value: Any) -> None:
                """
                Only write the attribute if it hasn't already been set.
                """
                if not hasattr(record, attr):
                    setattr(record, attr, value)

            safe_set("server_name", context.server_name)

            # Logging is interested in the request ID. Note that for backwards
            # compatibility this is stored as the "request" on the record.
            safe_set("request", str(context))

            record.ip_address = request.ip_address
            record.site_tag = request.site_tag
            record.requester = request.requester
            record.authenticated_entity = request.authenticated_entity
            record.method = request.method
            record.url = request.url
            record.protocol = request.protocol
            record.user_agent = request.user_agent

        return True

where you get the check for SENTINEL out of the way early, and then run code that assumes a non-sentinel logcontext.

Though, I now realise safe_set was checking both that this wasn't the SENTINEL_CONTEXT, as well as checking that the attribute wasn't already set. So the above doesn't completely eliminate the need for safe_set.

Though if safe_set is now only checking that the attribute is not already set, then it becomes even less necessary, and can probably be removed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I'd also refactor this block like so to save on indentation:

        # context should never be None, but if it somehow ends up being, then
        # we end up in a death spiral of infinite loops, so let's check, for
        # robustness' sake.
        if context is None:
            return True

        # Add some data from the HTTP request.
        request = context.request
        ...

        return True

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I meant structuring the code to be be something like the following: [...] where you get the check for SENTINEL out of the way early, and then run code that assumes a non-sentinel logcontext.

This won't work because not everything is logged with a context.request (like start-up messages, background jobs, etc)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merged but if we come up with a better pattern, I can make a follow-up PR ⏩

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I was wondering if there was something like that. Thanks for explaining, looks fine to me then.


safe_set("server_name", context.server_name)

# Logging is interested in the request ID. Note that for backwards
# compatibility this is stored as the "request" on the record.
record.request = str(context)
safe_set("request", str(context))

# Add some data from the HTTP request.
request = context.request
# The sentinel logcontext has no request so if we get past this point, we
# know we have some actual Synapse logcontext and don't need to worry about
# using `safe_set`. We'll consider this an optimization since this is a
# pretty hot-path.
if request is None:
return True

Expand Down
Loading