Skip to content

Implement Failed Test Replay #9214

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 22 commits into
base: master
Choose a base branch
from

Conversation

daniel-mohedano
Copy link
Contributor

@daniel-mohedano daniel-mohedano commented Jul 22, 2025

What Does This Do

Implements Test Optimization's Failed Test Replay using Live Debugger's Exception Replay. When the feature is enabled and a test is retried due to Auto Test Retries, Exception Replay's logic will create a probe for the exception thrown (in the case of the test probably an assertion error, but not limited to it). When the test is retried, the probe captures debugging information if the exception is encountered again, creating a snapshot of the variables. If the snapshot is captured, it is send as a log to Datadog. The following modifications were made to Exception Replay's original implementation:

  • Exception Replay is enabled if Failed Test Replay is enabled by the user.
    • If the build system (Maven or Gradle) is instrumented, the property is propagated to the child process
    • If running in headless mode, without build system instrumentation, Failed Test Replay is marked as active through Config. The enabling of Exception Replay now checks for either the property to be marked as enabled or Failed Test Replay being marked as active. This works due to CiVisibility's system being initialized before Live Debugger's.
  • DefaultExceptionDebugger was modified to support Failed Test Replay. If working on Failed Test Replay mode it will:
    • Instrument Errors, which were previously ignored.
    • Ignore the max number of exception per second limit.
    • Ignore the exception capturing cooldown.
    • Apply the instrumentation synchronously. Failed test retries can be performed in rapid succession and the async approach to the instrumentation meant that most of the times the instrumentation was not performed before the next test failure. This has also been added as a separate configuration to support it in regular Exception Replay.
  • Adds a product field to snapshots, populated with test_optimization if Failed Test Replay was marked as active. This allows us to have the option of not billing customers for logs generated by the product.
  • Removed Live Debugger's dependency on Remote Config being enabled for its configuration to be initialized.
  • Exception Replay now supports Agentless mode when Failed Test Replay is enabled. If DD_CIVISIBILITY_AGENTLESS_ENABLED is set, Live Debugger's logic for Exception Replay will use the logs API instead of the agent's.
  • If Failed Test Replay is enabled, a TestListener is registered to flush DebuggerSink on test suite end, avoiding unsent snapshots.

Additional changes:

  • Refactored BackendApiFactory.Intake to a standalone Intake, given that it is useful in order to compute agentless mode URLs.
  • Updated libraries capabilities to add failed_test_replay in test frameworks that support Auto Test Retries.
  • Other changes related to adding di_enabled to the Settings response and telemetry.

Validation:

  • MavenSmokeTest now has an additional test for Failed Test Replay, validating the feature when build system instrumentation is present.
  • Implemented JUnitConsoleSmokeTest to validate the feature in headless mode. This test should ensure that the ordering dependency between CiVisibility's system and Live Debugger's is always accounted for.
  • Both smoke tests also validate:
    • Tests that do not have an Auto Test Retries execution strategy will not have probes installed.
    • Snapshot data is captured for all test retries and not limited to the first one.

Motivation

Test Optimization wants to improve the support for Failed Test Replay, implementing it in additional languages apart from JS.

Contributor Checklist

Jira ticket: SDTEST-2242

@daniel-mohedano daniel-mohedano added type: enhancement Enhancements and improvements tag: do not merge Do not merge changes comp: ci visibility Continuous Integration Visibility comp: debugger Dynamic Instrumentation labels Jul 22, 2025
@pr-commenter
Copy link

pr-commenter bot commented Jul 23, 2025

Debugger benchmarks

Parameters

Baseline Candidate
baseline_or_candidate baseline candidate
ci_job_date 1755783970 1755784316
end_time 2025-08-21T13:47:31 2025-08-21T13:53:17
git_branch master daniel.mohedano/failed-test-replay
git_commit_sha 9aad755 49eaebf
start_time 2025-08-21T13:46:11 2025-08-21T13:51:57
See matching parameters
Baseline Candidate
ci_job_id 1091974999 1091974999
ci_pipeline_id 74366557 74366557
cpu_model Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
git_commit_date 1755783414 1755783414

Summary

Found 0 performance improvements and 0 performance regressions! Performance is the same for 9 metrics, 6 unstable metrics.

See unchanged results
scenario Δ mean agg_http_req_duration_min Δ mean agg_http_req_duration_p50 Δ mean agg_http_req_duration_p75 Δ mean agg_http_req_duration_p99 Δ mean throughput
scenario:noprobe unstable
[-29.120µs; +18.595µs] or [-10.303%; +6.579%]
unstable
[-40.439µs; +25.286µs] or [-12.485%; +7.807%]
unstable
[-53.570µs; +34.431µs] or [-15.787%; +10.147%]
unstable
[-18.786µs; +222.563µs] or [-1.935%; +22.930%]
same
scenario:basic same same same unstable
[+38.458µs; +248.830µs] or [+5.270%; +34.098%]
unstable
[-179.662op/s; +179.662op/s] or [-6.827%; +6.827%]
scenario:loop unsure
[-12.194µs; -5.690µs] or [-0.137%; -0.064%]
unsure
[-16.975µs; -7.170µs] or [-0.189%; -0.080%]
unsure
[-19.183µs; -8.669µs] or [-0.213%; -0.096%]
same same
Request duration reports for reports
gantt
    title reports - request duration [CI 0.99] : candidate=None, baseline=None
    dateFormat X
    axisFormat %s
section baseline
noprobe (323.895 µs) : 286, 362
.   : milestone, 324,
basic (279.943 µs) : 273, 287
.   : milestone, 280,
loop (8.971 ms) : 8966, 8976
.   : milestone, 8971,
section candidate
noprobe (316.318 µs) : 293, 339
.   : milestone, 316,
basic (278.618 µs) : 272, 285
.   : milestone, 279,
loop (8.959 ms) : 8955, 8963
.   : milestone, 8959,
Loading
  • baseline results
Scenario Request median duration [CI 0.99]
noprobe 323.895 µs [286.025 µs, 361.765 µs]
basic 279.943 µs [273.38 µs, 286.507 µs]
loop 8.971 ms [8.966 ms, 8.976 ms]
  • candidate results
Scenario Request median duration [CI 0.99]
noprobe 316.318 µs [293.442 µs, 339.194 µs]
basic 278.618 µs [272.482 µs, 284.753 µs]
loop 8.959 ms [8.955 ms, 8.963 ms]

@pr-commenter
Copy link

pr-commenter bot commented Jul 23, 2025

Benchmarks

Startup

Parameters

Baseline Candidate
baseline_or_candidate baseline candidate
git_branch master daniel.mohedano/failed-test-replay
git_commit_date 1755780320 1755783414
git_commit_sha 9aad755 49eaebf
release_version 1.53.0-SNAPSHOT~9aad75597f 1.51.0-SNAPSHOT~49eaebf239
See matching parameters
Baseline Candidate
application insecure-bank insecure-bank
ci_job_date 1755785130 1755785130
ci_job_id 1091974992 1091974992
ci_pipeline_id 74366557 74366557
cpu_model Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
kernel_version Linux runner-zfyrx7zua-project-304-concurrent-0-jn9tj0x5 6.8.0-1031-aws #33~22.04.1-Ubuntu SMP Thu Jun 26 14:22:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux Linux runner-zfyrx7zua-project-304-concurrent-0-jn9tj0x5 6.8.0-1031-aws #33~22.04.1-Ubuntu SMP Thu Jun 26 14:22:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
module Agent Agent
parent None None

Summary

Found 0 performance improvements and 0 performance regressions! Performance is the same for 46 metrics, 13 unstable metrics.

Startup time reports for insecure-bank
gantt
    title insecure-bank - global startup overhead: candidate=1.51.0-SNAPSHOT~49eaebf239, baseline=1.53.0-SNAPSHOT~9aad75597f

    dateFormat X
    axisFormat %s
section tracing
Agent [baseline] (1.053 s) : 0, 1053200
Total [baseline] (8.641 s) : 0, 8640587
Agent [candidate] (1.045 s) : 0, 1044926
Total [candidate] (8.6 s) : 0, 8600380
section iast
Agent [baseline] (1.181 s) : 0, 1180830
Total [baseline] (9.349 s) : 0, 9348833
Agent [candidate] (1.177 s) : 0, 1176598
Total [candidate] (9.338 s) : 0, 9337754
Loading
  • baseline results
Module Variant Duration Δ tracing
Agent tracing 1.053 s -
Agent iast 1.181 s 127.63 ms (12.1%)
Total tracing 8.641 s -
Total iast 9.349 s 708.246 ms (8.2%)
  • candidate results
Module Variant Duration Δ tracing
Agent tracing 1.045 s -
Agent iast 1.177 s 131.673 ms (12.6%)
Total tracing 8.6 s -
Total iast 9.338 s 737.374 ms (8.6%)
gantt
    title insecure-bank - break down per module: candidate=1.51.0-SNAPSHOT~49eaebf239, baseline=1.53.0-SNAPSHOT~9aad75597f

    dateFormat X
    axisFormat %s
section tracing
crashtracking [baseline] (1.455 ms) : 0, 1455
crashtracking [candidate] (1.465 ms) : 0, 1465
BytebuddyAgent [baseline] (737.181 ms) : 0, 737181
BytebuddyAgent [candidate] (732.091 ms) : 0, 732091
GlobalTracer [baseline] (244.604 ms) : 0, 244604
GlobalTracer [candidate] (242.152 ms) : 0, 242152
AppSec [baseline] (30.47 ms) : 0, 30470
AppSec [candidate] (30.183 ms) : 0, 30183
Debugger [baseline] (6.096 ms) : 0, 6096
Debugger [candidate] (6.045 ms) : 0, 6045
Remote Config [baseline] (687.976 µs) : 0, 688
Remote Config [candidate] (674.735 µs) : 0, 675
Telemetry [baseline] (11.608 ms) : 0, 11608
Telemetry [candidate] (11.326 ms) : 0, 11326
section iast
crashtracking [baseline] (1.461 ms) : 0, 1461
crashtracking [candidate] (1.447 ms) : 0, 1447
BytebuddyAgent [baseline] (851.971 ms) : 0, 851971
BytebuddyAgent [candidate] (848.66 ms) : 0, 848660
GlobalTracer [baseline] (234.592 ms) : 0, 234592
GlobalTracer [candidate] (232.644 ms) : 0, 232644
AppSec [baseline] (26.908 ms) : 0, 26908
AppSec [candidate] (26.038 ms) : 0, 26038
Debugger [baseline] (6.564 ms) : 0, 6564
Debugger [candidate] (7.588 ms) : 0, 7588
Remote Config [baseline] (607.247 µs) : 0, 607
Remote Config [candidate] (612.559 µs) : 0, 613
Telemetry [baseline] (8.389 ms) : 0, 8389
Telemetry [candidate] (8.358 ms) : 0, 8358
IAST [baseline] (29.358 ms) : 0, 29358
IAST [candidate] (30.306 ms) : 0, 30306
Loading
Startup time reports for petclinic
gantt
    title petclinic - global startup overhead: candidate=1.51.0-SNAPSHOT~49eaebf239, baseline=1.53.0-SNAPSHOT~9aad75597f

    dateFormat X
    axisFormat %s
section tracing
Agent [baseline] (1.05 s) : 0, 1049624
Total [baseline] (10.85 s) : 0, 10849923
Agent [candidate] (1.05 s) : 0, 1049653
Total [candidate] (10.733 s) : 0, 10733366
section appsec
Agent [baseline] (1.231 s) : 0, 1231332
Total [baseline] (10.889 s) : 0, 10889016
Agent [candidate] (1.233 s) : 0, 1233172
Total [candidate] (10.806 s) : 0, 10806078
section iast
Agent [baseline] (1.18 s) : 0, 1180497
Total [baseline] (10.886 s) : 0, 10885645
Agent [candidate] (1.181 s) : 0, 1181286
Total [candidate] (10.972 s) : 0, 10971634
section profiling
Agent [baseline] (1.196 s) : 0, 1195745
Total [baseline] (10.903 s) : 0, 10903246
Agent [candidate] (1.195 s) : 0, 1194558
Total [candidate] (10.904 s) : 0, 10904434
Loading
  • baseline results
Module Variant Duration Δ tracing
Agent tracing 1.05 s -
Agent appsec 1.231 s 181.709 ms (17.3%)
Agent iast 1.18 s 130.873 ms (12.5%)
Agent profiling 1.196 s 146.122 ms (13.9%)
Total tracing 10.85 s -
Total appsec 10.889 s 39.093 ms (0.4%)
Total iast 10.886 s 35.722 ms (0.3%)
Total profiling 10.903 s 53.323 ms (0.5%)
  • candidate results
Module Variant Duration Δ tracing
Agent tracing 1.05 s -
Agent appsec 1.233 s 183.518 ms (17.5%)
Agent iast 1.181 s 131.632 ms (12.5%)
Agent profiling 1.195 s 144.905 ms (13.8%)
Total tracing 10.733 s -
Total appsec 10.806 s 72.711 ms (0.7%)
Total iast 10.972 s 238.268 ms (2.2%)
Total profiling 10.904 s 171.067 ms (1.6%)
gantt
    title petclinic - break down per module: candidate=1.51.0-SNAPSHOT~49eaebf239, baseline=1.53.0-SNAPSHOT~9aad75597f

    dateFormat X
    axisFormat %s
section tracing
crashtracking [baseline] (1.457 ms) : 0, 1457
crashtracking [candidate] (1.458 ms) : 0, 1458
BytebuddyAgent [baseline] (733.983 ms) : 0, 733983
BytebuddyAgent [candidate] (733.588 ms) : 0, 733588
GlobalTracer [baseline] (244.242 ms) : 0, 244242
GlobalTracer [candidate] (242.936 ms) : 0, 242936
AppSec [baseline] (30.47 ms) : 0, 30470
AppSec [candidate] (30.098 ms) : 0, 30098
Debugger [baseline] (6.1 ms) : 0, 6100
Debugger [candidate] (6.053 ms) : 0, 6053
Remote Config [baseline] (677.511 µs) : 0, 678
Remote Config [candidate] (672.866 µs) : 0, 673
Telemetry [baseline] (11.587 ms) : 0, 11587
Telemetry [candidate] (13.695 ms) : 0, 13695
section appsec
crashtracking [baseline] (1.457 ms) : 0, 1457
crashtracking [candidate] (1.48 ms) : 0, 1480
BytebuddyAgent [baseline] (760.282 ms) : 0, 760282
BytebuddyAgent [candidate] (761.561 ms) : 0, 761561
GlobalTracer [baseline] (237.004 ms) : 0, 237004
GlobalTracer [candidate] (237.799 ms) : 0, 237799
AppSec [baseline] (170.252 ms) : 0, 170252
AppSec [candidate] (169.948 ms) : 0, 169948
Debugger [baseline] (7.443 ms) : 0, 7443
Debugger [candidate] (6.576 ms) : 0, 6576
Remote Config [baseline] (652.821 µs) : 0, 653
Remote Config [candidate] (645.144 µs) : 0, 645
Telemetry [baseline] (9.247 ms) : 0, 9247
Telemetry [candidate] (10.11 ms) : 0, 10110
IAST [baseline] (23.754 ms) : 0, 23754
IAST [candidate] (23.762 ms) : 0, 23762
section iast
crashtracking [baseline] (1.455 ms) : 0, 1455
crashtracking [candidate] (1.461 ms) : 0, 1461
BytebuddyAgent [baseline] (851.843 ms) : 0, 851843
BytebuddyAgent [candidate] (852.233 ms) : 0, 852233
GlobalTracer [baseline] (234.423 ms) : 0, 234423
GlobalTracer [candidate] (233.675 ms) : 0, 233675
AppSec [baseline] (26.777 ms) : 0, 26777
AppSec [candidate] (26.923 ms) : 0, 26923
Debugger [baseline] (6.621 ms) : 0, 6621
Debugger [candidate] (7.487 ms) : 0, 7487
Remote Config [baseline] (598.757 µs) : 0, 599
Remote Config [candidate] (594.26 µs) : 0, 594
Telemetry [baseline] (8.24 ms) : 0, 8240
Telemetry [candidate] (8.403 ms) : 0, 8403
IAST [baseline] (29.338 ms) : 0, 29338
IAST [candidate] (29.347 ms) : 0, 29347
section profiling
crashtracking [baseline] (1.414 ms) : 0, 1414
crashtracking [candidate] (1.42 ms) : 0, 1420
BytebuddyAgent [baseline] (761.342 ms) : 0, 761342
BytebuddyAgent [candidate] (760.042 ms) : 0, 760042
GlobalTracer [baseline] (222.388 ms) : 0, 222388
GlobalTracer [candidate] (221.786 ms) : 0, 221786
AppSec [baseline] (29.973 ms) : 0, 29973
AppSec [candidate] (29.945 ms) : 0, 29945
Debugger [baseline] (7.07 ms) : 0, 7070
Debugger [candidate] (6.297 ms) : 0, 6297
Remote Config [baseline] (710.31 µs) : 0, 710
Remote Config [candidate] (709.172 µs) : 0, 709
Telemetry [baseline] (15.518 ms) : 0, 15518
Telemetry [candidate] (16.325 ms) : 0, 16325
ProfilingAgent [baseline] (107.7 ms) : 0, 107700
ProfilingAgent [candidate] (108.284 ms) : 0, 108284
Profiling [baseline] (108.348 ms) : 0, 108348
Profiling [candidate] (108.922 ms) : 0, 108922
Loading

Load

Parameters

Baseline Candidate
baseline_or_candidate baseline candidate
git_branch master daniel.mohedano/failed-test-replay
git_commit_date 1755780320 1755783414
git_commit_sha 9aad755 49eaebf
release_version 1.53.0-SNAPSHOT~9aad75597f 1.51.0-SNAPSHOT~49eaebf239
See matching parameters
Baseline Candidate
application insecure-bank insecure-bank
ci_job_date 1755784886 1755784886
ci_job_id 1091974993 1091974993
ci_pipeline_id 74366557 74366557
cpu_model Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
kernel_version Linux runner-zfyrx7zua-project-304-concurrent-1-hlwa0p8r 6.8.0-1031-aws #33~22.04.1-Ubuntu SMP Thu Jun 26 14:22:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux Linux runner-zfyrx7zua-project-304-concurrent-1-hlwa0p8r 6.8.0-1031-aws #33~22.04.1-Ubuntu SMP Thu Jun 26 14:22:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux

Summary

Found 1 performance improvements and 3 performance regressions! Performance is the same for 8 metrics, 12 unstable metrics.

scenario Δ mean http_req_duration Δ mean throughput candidate mean http_req_duration candidate mean throughput baseline mean http_req_duration baseline mean throughput
scenario:load:insecure-bank:iast_GLOBAL:high_load worse
[+299.577µs; +705.308µs] or [+2.927%; +6.892%]
unstable
[-92.711op/s; +24.291op/s] or [-20.424%; +5.351%]
10.736ms 419.727op/s 10.234ms 453.938op/s
scenario:load:insecure-bank:tracing:high_load worse
[+266.214µs; +504.126µs] or [+3.580%; +6.780%]
unstable
[-108.537op/s; +47.787op/s] or [-17.433%; +7.675%]
7.821ms 592.219op/s 7.435ms 622.594op/s
scenario:load:petclinic:code_origins:high_load better
[-2.188ms; -1.375ms] or [-4.854%; -3.050%]
unstable
[-4.395op/s; +10.242op/s] or [-4.182%; +9.745%]
43.291ms 108.025op/s 45.072ms 105.101op/s
scenario:load:petclinic:tracing:high_load worse
[+1.199ms; +2.037ms] or [+2.695%; +4.577%]
unstable
[-10.416op/s; +5.588op/s] or [-10.028%; +5.380%]
46.116ms 101.450op/s 44.498ms 103.864op/s
Request duration reports for insecure-bank
gantt
    title insecure-bank - request duration [CI 0.99] : candidate=1.51.0-SNAPSHOT~49eaebf239, baseline=1.53.0-SNAPSHOT~9aad75597f
    dateFormat X
    axisFormat %s
section baseline
no_agent (4.295 ms) : 4247, 4343
.   : milestone, 4295,
iast (9.122 ms) : 8974, 9269
.   : milestone, 9122,
iast_FULL (13.791 ms) : 13515, 14068
.   : milestone, 13791,
iast_GLOBAL (10.234 ms) : 10055, 10413
.   : milestone, 10234,
profiling (9.177 ms) : 8992, 9362
.   : milestone, 9177,
tracing (7.435 ms) : 7328, 7543
.   : milestone, 7435,
section candidate
no_agent (4.322 ms) : 4274, 4371
.   : milestone, 4322,
iast (9.191 ms) : 9042, 9339
.   : milestone, 9191,
iast_FULL (14.065 ms) : 13790, 14340
.   : milestone, 14065,
iast_GLOBAL (10.736 ms) : 10538, 10934
.   : milestone, 10736,
profiling (9.289 ms) : 9132, 9446
.   : milestone, 9289,
tracing (7.821 ms) : 7707, 7934
.   : milestone, 7821,
Loading
  • baseline results
Variant Request duration [CI 0.99] Δ no_agent
no_agent 4.295 ms [4.247 ms, 4.343 ms] -
iast 9.122 ms [8.974 ms, 9.269 ms] 4.827 ms (112.4%)
iast_FULL 13.791 ms [13.515 ms, 14.068 ms] 9.496 ms (221.1%)
iast_GLOBAL 10.234 ms [10.055 ms, 10.413 ms] 5.939 ms (138.3%)
profiling 9.177 ms [8.992 ms, 9.362 ms] 4.882 ms (113.7%)
tracing 7.435 ms [7.328 ms, 7.543 ms] 3.14 ms (73.1%)
  • candidate results
Variant Request duration [CI 0.99] Δ no_agent
no_agent 4.322 ms [4.274 ms, 4.371 ms] -
iast 9.191 ms [9.042 ms, 9.339 ms] 4.868 ms (112.6%)
iast_FULL 14.065 ms [13.79 ms, 14.34 ms] 9.742 ms (225.4%)
iast_GLOBAL 10.736 ms [10.538 ms, 10.934 ms] 6.414 ms (148.4%)
profiling 9.289 ms [9.132 ms, 9.446 ms] 4.967 ms (114.9%)
tracing 7.821 ms [7.707 ms, 7.934 ms] 3.498 ms (80.9%)
Request duration reports for petclinic
gantt
    title petclinic - request duration [CI 0.99] : candidate=1.51.0-SNAPSHOT~49eaebf239, baseline=1.53.0-SNAPSHOT~9aad75597f
    dateFormat X
    axisFormat %s
section baseline
no_agent (37.27 ms) : 36965, 37575
.   : milestone, 37270,
appsec (47.289 ms) : 46852, 47725
.   : milestone, 47289,
code_origins (45.072 ms) : 44687, 45457
.   : milestone, 45072,
iast (45.508 ms) : 45115, 45902
.   : milestone, 45508,
profiling (48.595 ms) : 48160, 49030
.   : milestone, 48595,
tracing (44.498 ms) : 44122, 44873
.   : milestone, 44498,
section candidate
no_agent (36.277 ms) : 35982, 36571
.   : milestone, 36277,
appsec (47.65 ms) : 47236, 48063
.   : milestone, 47650,
code_origins (43.291 ms) : 42920, 43662
.   : milestone, 43291,
iast (45.965 ms) : 45576, 46354
.   : milestone, 45965,
profiling (47.437 ms) : 46954, 47921
.   : milestone, 47437,
tracing (46.116 ms) : 45713, 46518
.   : milestone, 46116,
Loading
  • baseline results
Variant Request duration [CI 0.99] Δ no_agent
no_agent 37.27 ms [36.965 ms, 37.575 ms] -
appsec 47.289 ms [46.852 ms, 47.725 ms] 10.019 ms (26.9%)
code_origins 45.072 ms [44.687 ms, 45.457 ms] 7.802 ms (20.9%)
iast 45.508 ms [45.115 ms, 45.902 ms] 8.238 ms (22.1%)
profiling 48.595 ms [48.16 ms, 49.03 ms] 11.325 ms (30.4%)
tracing 44.498 ms [44.122 ms, 44.873 ms] 7.228 ms (19.4%)
  • candidate results
Variant Request duration [CI 0.99] Δ no_agent
no_agent 36.277 ms [35.982 ms, 36.571 ms] -
appsec 47.65 ms [47.236 ms, 48.063 ms] 11.373 ms (31.4%)
code_origins 43.291 ms [42.92 ms, 43.662 ms] 7.014 ms (19.3%)
iast 45.965 ms [45.576 ms, 46.354 ms] 9.688 ms (26.7%)
profiling 47.437 ms [46.954 ms, 47.921 ms] 11.161 ms (30.8%)
tracing 46.116 ms [45.713 ms, 46.518 ms] 9.839 ms (27.1%)

Dacapo

Parameters

Baseline Candidate
baseline_or_candidate baseline candidate
git_branch master daniel.mohedano/failed-test-replay
git_commit_date 1755780320 1755783414
git_commit_sha 9aad755 49eaebf
release_version 1.53.0-SNAPSHOT~9aad75597f 1.51.0-SNAPSHOT~49eaebf239
See matching parameters
Baseline Candidate
application biojava biojava
ci_job_date 1755785392 1755785392
ci_job_id 1091974994 1091974994
ci_pipeline_id 74366557 74366557
cpu_model Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
kernel_version Linux runner-zfyrx7zua-project-304-concurrent-0-wvbqpjz8 6.8.0-1031-aws #33~22.04.1-Ubuntu SMP Thu Jun 26 14:22:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux Linux runner-zfyrx7zua-project-304-concurrent-0-wvbqpjz8 6.8.0-1031-aws #33~22.04.1-Ubuntu SMP Thu Jun 26 14:22:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux

Summary

Found 0 performance improvements and 0 performance regressions! Performance is the same for 11 metrics, 1 unstable metrics.

Execution time for tomcat
gantt
    title tomcat - execution time [CI 0.99] : candidate=1.51.0-SNAPSHOT~49eaebf239, baseline=1.53.0-SNAPSHOT~9aad75597f
    dateFormat X
    axisFormat %s
section baseline
no_agent (1.483 ms) : 1471, 1495
.   : milestone, 1483,
appsec (3.636 ms) : 3423, 3849
.   : milestone, 3636,
iast (2.214 ms) : 2151, 2277
.   : milestone, 2214,
iast_GLOBAL (2.254 ms) : 2191, 2318
.   : milestone, 2254,
profiling (2.057 ms) : 2007, 2108
.   : milestone, 2057,
tracing (2.027 ms) : 1978, 2076
.   : milestone, 2027,
section candidate
no_agent (1.483 ms) : 1472, 1495
.   : milestone, 1483,
appsec (3.61 ms) : 3398, 3822
.   : milestone, 3610,
iast (2.216 ms) : 2153, 2279
.   : milestone, 2216,
iast_GLOBAL (2.255 ms) : 2192, 2318
.   : milestone, 2255,
profiling (2.051 ms) : 2001, 2102
.   : milestone, 2051,
tracing (2.019 ms) : 1970, 2067
.   : milestone, 2019,
Loading
  • baseline results
Variant Execution Time [CI 0.99] Δ no_agent
no_agent 1.483 ms [1.471 ms, 1.495 ms] -
appsec 3.636 ms [3.423 ms, 3.849 ms] 2.153 ms (145.2%)
iast 2.214 ms [2.151 ms, 2.277 ms] 731.222 µs (49.3%)
iast_GLOBAL 2.254 ms [2.191 ms, 2.318 ms] 771.529 µs (52.0%)
profiling 2.057 ms [2.007 ms, 2.108 ms] 574.192 µs (38.7%)
tracing 2.027 ms [1.978 ms, 2.076 ms] 544.048 µs (36.7%)
  • candidate results
Variant Execution Time [CI 0.99] Δ no_agent
no_agent 1.483 ms [1.472 ms, 1.495 ms] -
appsec 3.61 ms [3.398 ms, 3.822 ms] 2.127 ms (143.4%)
iast 2.216 ms [2.153 ms, 2.279 ms] 733.034 µs (49.4%)
iast_GLOBAL 2.255 ms [2.192 ms, 2.318 ms] 772.1 µs (52.1%)
profiling 2.051 ms [2.001 ms, 2.102 ms] 568.299 µs (38.3%)
tracing 2.019 ms [1.97 ms, 2.067 ms] 535.482 µs (36.1%)
Execution time for biojava
gantt
    title biojava - execution time [CI 0.99] : candidate=1.51.0-SNAPSHOT~49eaebf239, baseline=1.53.0-SNAPSHOT~9aad75597f
    dateFormat X
    axisFormat %s
section baseline
no_agent (15.534 s) : 15534000, 15534000
.   : milestone, 15534000,
appsec (14.807 s) : 14807000, 14807000
.   : milestone, 14807000,
iast (18.845 s) : 18845000, 18845000
.   : milestone, 18845000,
iast_GLOBAL (18.14 s) : 18140000, 18140000
.   : milestone, 18140000,
profiling (15.246 s) : 15246000, 15246000
.   : milestone, 15246000,
tracing (15.045 s) : 15045000, 15045000
.   : milestone, 15045000,
section candidate
no_agent (15.4 s) : 15400000, 15400000
.   : milestone, 15400000,
appsec (14.843 s) : 14843000, 14843000
.   : milestone, 14843000,
iast (18.945 s) : 18945000, 18945000
.   : milestone, 18945000,
iast_GLOBAL (18.033 s) : 18033000, 18033000
.   : milestone, 18033000,
profiling (15.263 s) : 15263000, 15263000
.   : milestone, 15263000,
tracing (14.955 s) : 14955000, 14955000
.   : milestone, 14955000,
Loading
  • baseline results
Variant Execution Time [CI 0.99] Δ no_agent
no_agent 15.534 s [15.534 s, 15.534 s] -
appsec 14.807 s [14.807 s, 14.807 s] -727.0 ms (-4.7%)
iast 18.845 s [18.845 s, 18.845 s] 3.311 s (21.3%)
iast_GLOBAL 18.14 s [18.14 s, 18.14 s] 2.606 s (16.8%)
profiling 15.246 s [15.246 s, 15.246 s] -288.0 ms (-1.9%)
tracing 15.045 s [15.045 s, 15.045 s] -489.0 ms (-3.1%)
  • candidate results
Variant Execution Time [CI 0.99] Δ no_agent
no_agent 15.4 s [15.4 s, 15.4 s] -
appsec 14.843 s [14.843 s, 14.843 s] -557.0 ms (-3.6%)
iast 18.945 s [18.945 s, 18.945 s] 3.545 s (23.0%)
iast_GLOBAL 18.033 s [18.033 s, 18.033 s] 2.633 s (17.1%)
profiling 15.263 s [15.263 s, 15.263 s] -137.0 ms (-0.9%)
tracing 14.955 s [14.955 s, 14.955 s] -445.0 ms (-2.9%)

Copy link

datadog-official bot commented Aug 11, 2025

🎯 Code Coverage
Patch Coverage: 45.59%
Total Coverage: 57.32% (-0.03%)

View detailed report

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: 49eaebf | Docs | Was this helpful? Give us feedback!

@daniel-mohedano daniel-mohedano changed the title Failed Test Replay Implement Failed Test Replay Aug 12, 2025
@daniel-mohedano daniel-mohedano removed the tag: do not merge Do not merge changes label Aug 13, 2025
@daniel-mohedano daniel-mohedano marked this pull request as ready for review August 20, 2025 10:07
@daniel-mohedano daniel-mohedano requested review from a team as code owners August 20, 2025 10:07
@daniel-mohedano daniel-mohedano requested review from shatzi, Mariovido, bric3 and PerfectSlayer and removed request for a team August 20, 2025 10:07
return getFinalDebuggerBaseUrl() + "/debugger/v1/input";
if (Strings.isNotBlank(dynamicInstrumentationSnapshotUrl)) {
return dynamicInstrumentationSnapshotUrl;
} else if (isCiVisibilityFailedTestReplayActive() && isCiVisibilityAgentlessEnabled()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this condition should be isCiVisibilityAgentlessEnabled alone: if we're running in agentless mode chances are there's no agent to connect to. I understand that debugger wasn't working in agentless before these changes, but why not fix it while we're at it

@@ -1044,6 +1051,7 @@ public static String getHostName() {
private final boolean DBMTracePreparedStatements;

private final boolean dynamicInstrumentationEnabled;
private final String dynamicInstrumentationSnapshotUrl;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this needed for our testing or for customer set ups where a custom URL (a proxy?) needs to be used to communicate with DD?

Copy link
Contributor Author

@daniel-mohedano daniel-mohedano Aug 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change was taken from #9186. We don't need it for our own testing given that we use CIVISIBILITY_AGENTLESS_URL, but it will be useful to let customers use a custom URL for the Exception Replay feature in general.

@@ -1029,6 +1034,8 @@ public static String getHostName() {
private final String gitPullRequestBaseBranch;
private final String gitPullRequestBaseBranchSha;
private final String gitCommitHeadSha;
private final boolean ciVisibilityFailedTestReplayEnabled;
private boolean ciVisibilityFailedTestReplayActive = false; // propagates setting to DI
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It needs to be volatile if we choose to keep it :)

Having a mutable field in config doesn't quite fit how it is currently used: all the other fields are immutable (with the exception of the two that are just lazily initialized).
Also, having "test replay enabled" and "test replay active" next to each other is quite confusing.

I wonder if instead of setting this field we could harness datadog.trace.bootstrap.debugger.DebuggerContext#updateConfig.

An even better way would be to not call the debugger API directly, but try to invoke the remote config mechanism. This should be more robust: we avoid coupling Test Optimization to Debugger, and we make use of the centralized config updates logic that should (hopefully) take care of all the pitfalls for us

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't find a way to call datadog.remoteconfig.state.ProductState#apply programmatically (seems like it's only being called by the remote config poller when we receive the config from the backend), but I don't think adding it is impossible. We can discuss this with the core tracer team

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my conversations with Live Debugger team, they wanted to move away from having their products only available when remote config is enabled, which is why originally I didn't take it into account. But we could technically limit FTR to be used only when remote config is enabled (and actually it is only needed in headless mode). Let's discuss the approach 👍

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed offline, let's see if we can separate "dynamic config" from "remote config" and add some means of programmatically controlling the former

if (t instanceof Error) {
if (LOGGER.isDebugEnabled()) {
LOGGER.debug("Skip handling error: {}", t.toString());
if (isFailedTestReplayActive) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we can get away without propagating this flag, and just check the test strategy

@@ -181,6 +183,10 @@ public void onTestStart(
}

if (testExecutionHistory != null) {
if (testExecutionHistory instanceof RetryUntilSuccessful) {
// Used by FailedTestReplay to limit the instrumentation to AutoTestRetries
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder how correct it is to be checking this on a per-test basis: if ATR is enabled in the backend, then every test is subject to auto-retry with the exception of attempt-to-fixes and new tests (as respective execution policies have higher priority than ATR). But do we really not want to enable exception replay for these two as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opinion I think it might be beneficial for all retry mechanisms. Attempt to fix could be a bit more dangerous with its 20 retries regarding the overhead FTR could introduce, but given that right now we only support the manual flow, it shouldn't be too big of an issue. I made the changes to limit it to ATR after the Guild meeting in order to align it with JS' implementation

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed offline, let's add a dedicated method to TestExecutionPolicy that'll determine whether FTR is enabled for a given test

@@ -181,6 +183,10 @@ public void onTestStart(
}

if (testExecutionHistory != null) {
if (testExecutionHistory instanceof RetryUntilSuccessful) {
// Used by FailedTestReplay to limit the instrumentation to AutoTestRetries
test.setTag(DDTags.TEST_STRATEGY, RetryReason.atr.toString());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use datadog.trace.civisibility.domain.TestImpl#context to store data that you don't want to send to the backend. It is more idiomatic than adding/removing tags. As a nice side effect, the context is propagated to children spans, so if a test makes an HTTP request and the exception happens inside the child HTTP span, the context will be there as well

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, can we call testExecutionHistory.currentExecutionRetryReason() and just store the result of that? Doing instanceof is breaking encapsulation.

Retry reason will be null for the initial test run, but IIUC we don't apply exception replay to the first run anyway, right?

Copy link
Contributor Author

@daniel-mohedano daniel-mohedano Aug 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case we also need the information during the first execution in order to create the Exception Replay probe. The flow is:

  1. First run fails, probe is created and exception instrumented
  2. Test is retried, fails again, the context information from the probe is captured and sent.

Maybe a testExecutionHistory.retryReason() could have been a better approach to avoid the breaking of encapsulation. But if we're able to propagate this information either through the test context or by accessing the test strategy I agree it is a much cleaner approach.

@@ -632,6 +632,7 @@ public void execute() {
}

maybeStartAppSec(scoClass, sco);
// start civisibility before debugger to enable Failed Test Replay correctly in headless mode
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we manage to plug into remote config as described in the other comment, there should be no ordering dependency 🤞

@@ -180,6 +181,18 @@ private Map<String, String> getPropertiesPropagatedToChildProcess(
Strings.propertyNameToSystemPropertyName(CiVisibilityConfig.TEST_MANAGEMENT_ENABLED),
Boolean.toString(executionSettings.getTestManagementSettings().isEnabled()));

propagatedSystemProperties.put(
Strings.propertyNameToSystemPropertyName(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need a dedicated setting for this? I wonder if it can be derived from DebuggerConfig.EXCEPTION_REPLAY_ENABLED && CI_VISIBILITY_ENABLED

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this could cause problems because we use the ..._ENABLED settings as kill switches in datadog.trace.civisibility.config.ExecutionSettingsFactoryImpl#doCreate. So, although it would work when propagating the settings to the child process, in the parent process EXCEPTION_REPLAY_ENABLED==false and CI_VISIBILITY_ENABLED==true would mean that FTR wouldn't be enabled (even if it was marked as enabled by the backend)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed offline let's enable TO-specific debugger behaviour whenever exception replay is enabled and test optimization is enabled

@@ -108,7 +139,11 @@ public void handleException(Throwable t, AgentSpan span) {
exceptionProbeManager.createProbesForException(
throwable.getStackTrace(), chainedExceptionIdx);
if (creationResult.probesCreated > 0) {
AgentTaskScheduler.INSTANCE.execute(() -> applyExceptionConfiguration(fingerprint));
if (isFailedTestReplayActive || !applyConfigAsync) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we get rid of applyConfigAsync and the corresponding config field/methods?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The applyConfigAsync config was also added from #9186, to let Exception Replay apply the instrumentation synchronously even without Failed Test Replay

@Override
public void beforeSuiteEnd() {
LOGGER.debug("CiVisibility BeforeSuiteEnd fired, flushing sink");
sink.lowRateFlush(sink);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think debugger is already doing this asynchronously at a scheduled interval.
If I'm not mistaken, we just need to make sure whatever's left in the sink is flushed in com.datadog.debugger.sink.DebuggerSink#stop that is called from the shutdown hook.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp: ci visibility Continuous Integration Visibility comp: debugger Dynamic Instrumentation type: enhancement Enhancements and improvements
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants