Skip to content

[Bug]: Data race in GraphCycles during concurrent gRPC thread initialization (ThreadSanitizer) #2038

@jnillius

Description

@jnillius

Describe the issue

ThreadSanitizer reports a data race in libabsl_graphcycles_internal.so.2401.0.0 during OpenTelemetry OTLP gRPC metrics exporter initialization, when multiple gRPC worker threads are started concurrently.

Observed stack summary points to memmove / std::copy_n on int* buffers from Abseil graphcycles internals:

WARNING: ThreadSanitizer: data race
  Read of size 8 ... #0 memmove (libtsan.so.2)
                    ... #2 std::__copy_move_a2<...> (libabsl_graphcycles_internal.so.2401.0.0)
  Previous write of size 8 ... #0 memmove (libtsan.so.2)
                             ... #2 std::__copy_move_a2<...> (libabsl_graphcycles_internal.so.2401.0.0)

  Thread creation:
    grpc_core::(anonymous namespace)::ThreadInternalsPosix::ThreadInternalsPosix(...) (libgpr.so.39)

This is reproducible under load and currently requires a TSan suppression on our side to keep CI signal clean.

Steps to reproduce the problem

  1. Build a C++ binary with ThreadSanitizer enabled (for example -fsanitize=thread -fno-omit-frame-pointer -g) and with OpenTelemetry OTLP gRPC metrics exporter enabled.
  2. At startup, initialize telemetry and create/start the OTLP gRPC metrics exporter so gRPC spins up multiple internal threads.
  3. Run repeatedly (or in CI parallel load) until TSan reports a race in libabsl_graphcycles_internal.so with memmove / std::__copy_move_a2 frames.
  4. Optional temporary workaround used downstream:
    race:libabsl_graphcycles_internal.so
    

What version of Abseil are you using?

2401.0.0 (2024 Q1 LTS). We also inspected absl/synchronization/internal/graphcycles.cc on main at commit b85d1690 and did not find relevant synchronization changes.

What operating system and version are you using?

RHEL 9 (x86_64)

What compiler and version are you using?

GCC 12.2 (gcc-toolset-12), with ThreadSanitizer (-fsanitize=thread)

What build system are you using?

Maven (top-level orchestration) with CMake-based native C++ modules.

Additional context

  • Based on stack signatures and source inspection, this appears related to graphcycles growth paths (Vec<T>::Grow() / NodeSet::Grow()) that use std::copy_n/memmove.
  • We understand graphcycles is used by Abseil mutex deadlock detection and may be best-effort behavior; however, this still produces actionable TSan race reports for downstream users.
  • If useful, we can provide the full raw TSan log from a failing CI run (including all thread creation stacks) and a reduced reproducer from our telemetry startup path.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions