Skip to content

[WIP] Fix tornado server (request) duration metric calculation #3489

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

devmonkey22
Copy link
Contributor

Description

Fix the tornado instrumenter to track a request's elapsed time in an async-safe way so concurrent requests calculate their own elapsed time for the HTTP_SERVER_DURATION metric properly.

Fixes #3486

Type of change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

I just used the existing automated tests to ensure no regressions occurred. I can explore further if we want to try to add concurrent request testing.

Does This PR Require a Core Repo Change?

  • Yes. - Link to PR:
  • No.

Checklist:

See contributing.md for styleguide, changelog guidelines, and more.

  • Followed the style guidelines of this project
  • Changelogs have been updated
  • Unit tests have been added
  • Documentation has been updated

@devmonkey22 devmonkey22 requested a review from a team as a code owner May 9, 2025 16:34
Copy link

linux-foundation-easycla bot commented May 9, 2025

CLA Signed

The committers listed above are authorized under a signed CLA.

@github-actions github-actions bot requested a review from shalevr May 9, 2025 16:34
@devmonkey22 devmonkey22 force-pushed the 3486-tornado-server-duration branch from 91568a1 to 047bf3d Compare May 9, 2025 16:40
@devmonkey22 devmonkey22 marked this pull request as draft May 9, 2025 16:42
@devmonkey22
Copy link
Contributor Author

Question for review - with this change, do we think it needs a specific test, where we start 2 concurrent requests, staggered, then measure expected duration similar to opentelemetry-instrumentation-asgi test_asgi_middleware?

@devmonkey22 devmonkey22 marked this pull request as ready for review May 28, 2025 14:48
xrmx and others added 20 commits August 5, 2025 16:41
…unked json in bedrock converse_stream (open-telemetry#3544)

* opentelemetry-instrumentation-botocore: fix handling of tool input chunked json in converse_stream

We need to accumulate all the tool input as string before decoding it
otherwise we may end up with invalid json.

* Add changelog

* Remove leftover print

* No need for too-many-statements disable
…ntation (open-telemetry#3460)

* respect supress_instrumentation

* update CHANGELOG

* fix link

* update CHANGELOG

---------

Co-authored-by: Emídio Neto <[email protected]>
…rumentation (open-telemetry#3477)

* respect supress_instrumentation

* update changelog

* Update CHANGELOG.md

Fix changelog after release

---------

Co-authored-by: Emídio Neto <[email protected]>
Co-authored-by: Riccardo Magliocchetti <[email protected]>
…iddleware (open-telemetry#3529)

* opentelemetry-instrumentation-starlette: fix memory leak and double middleware

* Changelog

* Update CHANGELOG.md

Co-authored-by: Joe McGinley <[email protected]>

---------

Co-authored-by: Joe McGinley <[email protected]>
…etry#3257)

* feat: support batch (getmany) in aiokafka instrumentation

* test: fix unclosed resources and typing

* test: add test_wrap_getmany

* fix: get unique topic list in batch

* fix: update typing, run pyupgrade

* fix: remove json.dumps from SERVER_ADDRESS attribute

* fix pylint

* fix: sync span_kind with spec

* add CHANGELOG entry

* remove changes not from this issue

* move types under TYPE_CHECKING

* move CHANGELOG entry to unreleased

* enable pyright for aiokafka, fix key type

* Update CHANGELOG.md

---------

Co-authored-by: Riccardo Magliocchetti <[email protected]>
Co-authored-by: Emídio Neto <[email protected]>
…etry#3513)

* Add support for metrics

* Updated changelog

* Update package.py

* generate and lint

* Update

* Update

---------

Co-authored-by: Tammy Baylis <[email protected]>
Co-authored-by: Riccardo Magliocchetti <[email protected]>
…emetry#3429)

* Fixes container detector for systemd & cgroupv1 with Docker

* Update CHANGELOG

---------

Co-authored-by: Emídio Neto <[email protected]>
Co-authored-by: Riccardo Magliocchetti <[email protected]>
…telemetry#3540)

* refactor(grpc): replace SpanAttributes with semconv attributes

* refactor(flask): replace SpanAttributes with semconv attributes (fix ruff linting)

---------

Co-authored-by: Riccardo Magliocchetti <[email protected]>
…detector-containerid` (open-telemetry#3536)

* resource-dector-container: package rename

Signed-off-by: emdneto <[email protected]>

* fix ruff

Signed-off-by: emdneto <[email protected]>

* change entrypoint

Signed-off-by: emdneto <[email protected]>

* fix

Signed-off-by: emdneto <[email protected]>

* fix test

Signed-off-by: emdneto <[email protected]>

* fix for entrypoint name

Signed-off-by: emdneto <[email protected]>

* add changelog

Signed-off-by: emdneto <[email protected]>

---------

Signed-off-by: emdneto <[email protected]>
…emetry#3545)

* Allow reraising the root exception if instrumentation fails

I would rather completely fail startup in my services if instrumentation fails for whatever reason instead of just logging an exception and continuing.

Use case:

from opentelemetry import autoinstrumentation

autoinstrumentation.initialize(swallow_exceptions=False)

* Fix lint

* Type hinting, re-raise original exception

* One more type hint to indicate None return

---------

Co-authored-by: Riccardo Magliocchetti <[email protected]>
… debug (open-telemetry#3579)

* resource-detector-containerid: demote failure to read cgroup files to debug

Make the detection more quiet so we can load it on application not
running on linux machines without spamming logs.

* Add changelog
…n-telemetry#3582)

* refactor: fix import paths

* fix imports

---------

Co-authored-by: Riccardo Magliocchetti <[email protected]>
smoke and others added 23 commits August 5, 2025 16:41
…-telemetry#3601)

* feat(pymongo): aggregate and getMore capture statements

* chore: update changelog

* fix: tests

* fix: proper MockCommand and expectations

* chore: ruff-format
* ref(mysql): remove SpanAttributes

* fix imports
open-telemetry#3584)

* pika: added instrumentation for pika.connection.Connection and pika.channel.Channel, thus added instrumentation support to all SelectConnection adapters.

* updated changelog.

---------

Co-authored-by: Riccardo Magliocchetti <[email protected]>
…3498)

* Add tornado WebSocketHandler instrumentation support. (open-telemetry#2761)

* Linting

* Update CHANGELOG.md

* Apply refactor changes from open-telemetry#3582

---------

Co-authored-by: Riccardo Magliocchetti <[email protected]>
* Fix system metrics unit tests on mac

Don't test for system.network.connections on mac

* round
* Added base files for the project

* Added fix for generate and pre-commit

* Added package.py

* tox generate changes to auto-generated files

* tox ruff reformatted file

* removed python 3.8 from classifiers

* addressed comments

* addressed aabmass's comments
* Fix git pull error in core contrib test

Core contrib is intermittently failing when
pulling the core repo because tox does not
retry on failure.

Add gh actions/checkout for the core repo
before running tox to mitigate this.

* generate-workflows

Signed-off-by: emdneto <[email protected]>

* Update CHANGELOG.md

---------

Signed-off-by: emdneto <[email protected]>
Co-authored-by: emdneto <[email protected]>
Co-authored-by: Leighton Chen <[email protected]>
* Initial structure

* Update scripts/generate_instrumentation_bootstrap.py

Co-authored-by: Pablo Collins <[email protected]>

* missing generated library

* Version update to offset traceloop implementation

* uv sync after rebase

* Missing release entries

---------

Co-authored-by: Pablo Collins <[email protected]>
@devmonkey22
Copy link
Contributor Author

devmonkey22 commented Aug 5, 2025

Apologies - my merge from main turned into a rebase and munged this PR up. I'll close this and open a new PR to keep things clean.

Moved to #3679

@devmonkey22 devmonkey22 closed this Aug 5, 2025
@devmonkey22 devmonkey22 deleted the 3486-tornado-server-duration branch August 5, 2025 20:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Tornado instrumentation HTTP_SERVER_DURATION metric is inaccurate and not async-safe