Skip to content

Conversation

outergod
Copy link
Contributor

@outergod outergod commented Jul 30, 2025

Description

Change the way the FastAPI instrumentor deals with the FastAPI middleware stack so that exception handling code doesn't get executed twice, but still has a valid OTEL context available. At the same time, make sure instrumentor hooks failures cannot crash the service itself.

Fixes #3642
Fixes #3637

Type of change

  • Bug fix (non-breaking change which fixes an issue)

How Has This Been Tested?

Using the MRE in the linked issue, and added unit tests.

Does This PR Require a Core Repo Change?

  • No.

Checklist:

  • Followed the style guidelines of this project
  • Changelogs have been updated
  • Unit tests have been added
  • Documentation has been updated (not needed)

@outergod outergod changed the title Fix/gh 3642 fastapi exceptions Rewrite FastAPI instrumentor middleware stack to be failsafe Jul 30, 2025
@outergod outergod requested a review from a team as a code owner July 30, 2025 11:00
Copy link
Contributor

@alexmojaki alexmojaki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks solid, thank you so much!

@outergod
Copy link
Contributor Author

Thank you for the thorough review and the persistence @alexmojaki!
Do you have any suggestions how to find a maintainer to sponsor the PR? Should I ask on the CNCF Slack?

@alexmojaki
Copy link
Contributor

@xrmx @emdneto @codefromthecrypt @lzchen please review? This relates to #3012 which you reviewed previously. Also cc @Kludex @adriangb

Copy link
Member

@emdneto emdneto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. I gave it a try to run the repro and now I can see the recorded exception. Overall, it sounds good.

If it helps to others to review, before we had:
ServerErrorMiddleware (outermost) -> OpenTelemetryMiddleware -> ServerErrorMiddleware (innermost)

now we have:
ServerErrorMiddleware (outer -- same as before) -> OpenTelemetryMiddleware -> ServerErrorMiddleware (with original handler/debug) -> ExceptionHandlerMiddleware (with access to the span context)

But I'm afraid we are not seeing some issues in the current structure of tests for FastAPI. Noticed that while reviewing #3701

@xrmx xrmx enabled auto-merge (squash) August 28, 2025 10:35
@xrmx xrmx merged commit c32b738 into open-telemetry:main Aug 28, 2025
632 checks passed
@github-project-automation github-project-automation bot moved this from Ready for review to Done in @xrmx's Python PR digest Aug 28, 2025
zhirafovod pushed a commit to zhirafovod/opentelemetry-python-contrib that referenced this pull request Sep 15, 2025
…lemetry#3664)

* rewrite FastAPIInstrumentor:build_middleware_stack to become failsafe

* add test cases for FastAPI failsafe handling

* add CHANGELOG entry

* remove unused import

* [lint] don't return from failsafe wrapper

* [lint] allow broad exceptions

* [lint] more allowing

* record FastAPI hook exceptions in active span

* remove comment

* properly deal with hooks not being set

* add custom FastAPI exception recording

* move failsafe hook handling down to OpenTelemetryMiddleware

* shut up pylint

* optimize failsafe to check for `None` only once

* remove confusing comment and simplify wrapper logic

* add clarifying comment

* test proper exception / status code recording

* add HTTP status code check

* test HTTP status on the exception recording span

* improve test by removing TypeError

* rectify comment/explanation on inner middleware for exception handling

* minor typo

* move ExceptionHandlingMiddleware as the outermost inner middleware

Also improve code documentation and add another test.

* use distinct status code in test

* improve comemnt

Co-authored-by: Alex Hall <[email protected]>

* narrow down exception handling

Co-authored-by: Alex Hall <[email protected]>

* narrow down FastAPI exception tests to relevant spans

* collapse tests, more narrow exceptions

* move failsafe hook tests to ASGI test suite

* update CHANGELOG

* satisfy linter

* don't record exception if span is not recording

* add test for unhappy instrumentation codepath

* make inject fixtures private

* give up and shut up pylint

* improve instrumentation failure error message and add test

---------

Co-authored-by: Alex Hall <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

FastAPI instrumentation: errors in hooks not handled properly FastAPI instrumentor stops recording exception event starting v0.55b0
4 participants