Implement Failed Test Replay #9214

daniel-mohedano · 2025-07-22T14:22:37Z

What Does This Do

Implements Test Optimization's Failed Test Replay using Live Debugger's Exception Replay. When the feature is enabled and a test is retried due to Auto Test Retries, Exception Replay's logic will create a probe for the exception thrown (in the case of the test probably an assertion error, but not limited to it). When the test is retried, the probe captures debugging information if the exception is encountered again, creating a snapshot of the variables. If the snapshot is captured, it is send as a log to Datadog. The following modifications were made to Exception Replay's original implementation:

Exception Replay is enabled if Failed Test Replay is enabled by the user.
- If the build system (Maven or Gradle) is instrumented, the property is propagated to the child process
- If running in headless mode, without build system instrumentation, Failed Test Replay is marked as active through Config. The enabling of Exception Replay now checks for either the property to be marked as enabled or Failed Test Replay being marked as active. This works due to CiVisibility's system being initialized before Live Debugger's.
DefaultExceptionDebugger was modified to support Failed Test Replay. If working on Failed Test Replay mode it will:
- Instrument Errors, which were previously ignored.
- Ignore the max number of exception per second limit.
- Ignore the exception capturing cooldown.
- Apply the instrumentation synchronously. Failed test retries can be performed in rapid succession and the async approach to the instrumentation meant that most of the times the instrumentation was not performed before the next test failure. This has also been added as a separate configuration to support it in regular Exception Replay.
Adds a product field to snapshots, populated with test_optimization if Failed Test Replay was marked as active. This allows us to have the option of not billing customers for logs generated by the product.
Removed Live Debugger's dependency on Remote Config being enabled for its configuration to be initialized.
Exception Replay now supports Agentless mode when Failed Test Replay is enabled. If DD_CIVISIBILITY_AGENTLESS_ENABLED is set, Live Debugger's logic for Exception Replay will use the logs API instead of the agent's.
If Failed Test Replay is enabled, a TestListener is registered to flush DebuggerSink on test suite end, avoiding unsent snapshots.

Additional changes:

Refactored BackendApiFactory.Intake to a standalone Intake, given that it is useful in order to compute agentless mode URLs.
Updated libraries capabilities to add failed_test_replay in test frameworks that support Auto Test Retries.
Other changes related to adding di_enabled to the Settings response and telemetry.

Validation:

MavenSmokeTest now has an additional test for Failed Test Replay, validating the feature when build system instrumentation is present.
Implemented JUnitConsoleSmokeTest to validate the feature in headless mode. This test should ensure that the ordering dependency between CiVisibility's system and Live Debugger's is always accounted for.
Both smoke tests also validate:
- Tests that do not have an Auto Test Retries execution strategy will not have probes installed.
- Snapshot data is captured for all test retries and not limited to the first one.

Motivation

Test Optimization wants to improve the support for Failed Test Replay, implementing it in additional languages apart from JS.

Contributor Checklist

Format the title according the contribution guidelines
Assign the type: and (comp: or inst:) labels in addition to any usefull labels
Don't use close, fix or any linking keywords when referencing an issue.
Use solves instead, and assign the PR milestone to the issue
Update the CODEOWNERS file on source file addition, move, or deletion
Update the public documentation in case of new configuration flag or behavior

Jira ticket: SDTEST-2242

pr-commenter · 2025-07-23T14:41:30Z

Debugger benchmarks

Parameters

	Baseline	Candidate
baseline_or_candidate	baseline	candidate
ci_job_date	1755783970	1755784316
end_time	2025-08-21T13:47:31	2025-08-21T13:53:17
git_branch	master	daniel.mohedano/failed-test-replay
git_commit_sha	`9aad755`	`49eaebf`
start_time	2025-08-21T13:46:11	2025-08-21T13:51:57

See matching parameters

	Baseline	Candidate
ci_job_id	1091974999	1091974999
ci_pipeline_id	74366557	74366557
cpu_model	Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz	Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
git_commit_date	1755783414	1755783414

Summary

Found 0 performance improvements and 0 performance regressions! Performance is the same for 9 metrics, 6 unstable metrics.

See unchanged results

scenario	Δ mean agg_http_req_duration_min	Δ mean agg_http_req_duration_p50	Δ mean agg_http_req_duration_p75	Δ mean agg_http_req_duration_p99	Δ mean throughput
scenario:noprobe	unstable [-29.120µs; +18.595µs] or [-10.303%; +6.579%]	unstable [-40.439µs; +25.286µs] or [-12.485%; +7.807%]	unstable [-53.570µs; +34.431µs] or [-15.787%; +10.147%]	unstable [-18.786µs; +222.563µs] or [-1.935%; +22.930%]	same
scenario:basic	same	same	same	unstable [+38.458µs; +248.830µs] or [+5.270%; +34.098%]	unstable [-179.662op/s; +179.662op/s] or [-6.827%; +6.827%]
scenario:loop	unsure [-12.194µs; -5.690µs] or [-0.137%; -0.064%]	unsure [-16.975µs; -7.170µs] or [-0.189%; -0.080%]	unsure [-19.183µs; -8.669µs] or [-0.213%; -0.096%]	same	same

Request duration reports for reports

gantt
    title reports - request duration [CI 0.99] : candidate=None, baseline=None
    dateFormat X
    axisFormat %s
section baseline
noprobe (323.895 µs) : 286, 362
.   : milestone, 324,
basic (279.943 µs) : 273, 287
.   : milestone, 280,
loop (8.971 ms) : 8966, 8976
.   : milestone, 8971,
section candidate
noprobe (316.318 µs) : 293, 339
.   : milestone, 316,
basic (278.618 µs) : 272, 285
.   : milestone, 279,
loop (8.959 ms) : 8955, 8963
.   : milestone, 8959,

baseline results

Scenario	Request median duration [CI 0.99]
noprobe	323.895 µs [286.025 µs, 361.765 µs]
basic	279.943 µs [273.38 µs, 286.507 µs]
loop	8.971 ms [8.966 ms, 8.976 ms]

candidate results

Scenario	Request median duration [CI 0.99]
noprobe	316.318 µs [293.442 µs, 339.194 µs]
basic	278.618 µs [272.482 µs, 284.753 µs]
loop	8.959 ms [8.955 ms, 8.963 ms]

pr-commenter · 2025-07-23T14:55:48Z

Benchmarks

Startup

Parameters

	Baseline	Candidate
baseline_or_candidate	baseline	candidate
git_branch	master	daniel.mohedano/failed-test-replay
git_commit_date	1755780320	1755783414
git_commit_sha	`9aad755`	`49eaebf`
release_version	1.53.0-SNAPSHOT~9aad75597f	1.51.0-SNAPSHOT~49eaebf239

See matching parameters

	Baseline	Candidate
application	insecure-bank	insecure-bank
ci_job_date	1755785130	1755785130
ci_job_id	1091974992	1091974992
ci_pipeline_id	74366557	74366557
cpu_model	Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz	Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
kernel_version	Linux runner-zfyrx7zua-project-304-concurrent-0-jn9tj0x5 6.8.0-1031-aws #33~22.04.1-Ubuntu SMP Thu Jun 26 14:22:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux	Linux runner-zfyrx7zua-project-304-concurrent-0-jn9tj0x5 6.8.0-1031-aws #33~22.04.1-Ubuntu SMP Thu Jun 26 14:22:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
module	Agent	Agent
parent	None	None

Summary

Found 0 performance improvements and 0 performance regressions! Performance is the same for 46 metrics, 13 unstable metrics.

Startup time reports for insecure-bank

gantt
    title insecure-bank - global startup overhead: candidate=1.51.0-SNAPSHOT~49eaebf239, baseline=1.53.0-SNAPSHOT~9aad75597f

    dateFormat X
    axisFormat %s
section tracing
Agent [baseline] (1.053 s) : 0, 1053200
Total [baseline] (8.641 s) : 0, 8640587
Agent [candidate] (1.045 s) : 0, 1044926
Total [candidate] (8.6 s) : 0, 8600380
section iast
Agent [baseline] (1.181 s) : 0, 1180830
Total [baseline] (9.349 s) : 0, 9348833
Agent [candidate] (1.177 s) : 0, 1176598
Total [candidate] (9.338 s) : 0, 9337754

baseline results

Module	Variant	Duration	Δ tracing
Agent	tracing	1.053 s	-
Agent	iast	1.181 s	127.63 ms (12.1%)
Total	tracing	8.641 s	-
Total	iast	9.349 s	708.246 ms (8.2%)

candidate results

Module	Variant	Duration	Δ tracing
Agent	tracing	1.045 s	-
Agent	iast	1.177 s	131.673 ms (12.6%)
Total	tracing	8.6 s	-
Total	iast	9.338 s	737.374 ms (8.6%)

gantt
    title insecure-bank - break down per module: candidate=1.51.0-SNAPSHOT~49eaebf239, baseline=1.53.0-SNAPSHOT~9aad75597f

    dateFormat X
    axisFormat %s
section tracing
crashtracking [baseline] (1.455 ms) : 0, 1455
crashtracking [candidate] (1.465 ms) : 0, 1465
BytebuddyAgent [baseline] (737.181 ms) : 0, 737181
BytebuddyAgent [candidate] (732.091 ms) : 0, 732091
GlobalTracer [baseline] (244.604 ms) : 0, 244604
GlobalTracer [candidate] (242.152 ms) : 0, 242152
AppSec [baseline] (30.47 ms) : 0, 30470
AppSec [candidate] (30.183 ms) : 0, 30183
Debugger [baseline] (6.096 ms) : 0, 6096
Debugger [candidate] (6.045 ms) : 0, 6045
Remote Config [baseline] (687.976 µs) : 0, 688
Remote Config [candidate] (674.735 µs) : 0, 675
Telemetry [baseline] (11.608 ms) : 0, 11608
Telemetry [candidate] (11.326 ms) : 0, 11326
section iast
crashtracking [baseline] (1.461 ms) : 0, 1461
crashtracking [candidate] (1.447 ms) : 0, 1447
BytebuddyAgent [baseline] (851.971 ms) : 0, 851971
BytebuddyAgent [candidate] (848.66 ms) : 0, 848660
GlobalTracer [baseline] (234.592 ms) : 0, 234592
GlobalTracer [candidate] (232.644 ms) : 0, 232644
AppSec [baseline] (26.908 ms) : 0, 26908
AppSec [candidate] (26.038 ms) : 0, 26038
Debugger [baseline] (6.564 ms) : 0, 6564
Debugger [candidate] (7.588 ms) : 0, 7588
Remote Config [baseline] (607.247 µs) : 0, 607
Remote Config [candidate] (612.559 µs) : 0, 613
Telemetry [baseline] (8.389 ms) : 0, 8389
Telemetry [candidate] (8.358 ms) : 0, 8358
IAST [baseline] (29.358 ms) : 0, 29358
IAST [candidate] (30.306 ms) : 0, 30306

Startup time reports for petclinic

gantt
    title petclinic - global startup overhead: candidate=1.51.0-SNAPSHOT~49eaebf239, baseline=1.53.0-SNAPSHOT~9aad75597f

    dateFormat X
    axisFormat %s
section tracing
Agent [baseline] (1.05 s) : 0, 1049624
Total [baseline] (10.85 s) : 0, 10849923
Agent [candidate] (1.05 s) : 0, 1049653
Total [candidate] (10.733 s) : 0, 10733366
section appsec
Agent [baseline] (1.231 s) : 0, 1231332
Total [baseline] (10.889 s) : 0, 10889016
Agent [candidate] (1.233 s) : 0, 1233172
Total [candidate] (10.806 s) : 0, 10806078
section iast
Agent [baseline] (1.18 s) : 0, 1180497
Total [baseline] (10.886 s) : 0, 10885645
Agent [candidate] (1.181 s) : 0, 1181286
Total [candidate] (10.972 s) : 0, 10971634
section profiling
Agent [baseline] (1.196 s) : 0, 1195745
Total [baseline] (10.903 s) : 0, 10903246
Agent [candidate] (1.195 s) : 0, 1194558
Total [candidate] (10.904 s) : 0, 10904434

baseline results

Module	Variant	Duration	Δ tracing
Agent	tracing	1.05 s	-
Agent	appsec	1.231 s	181.709 ms (17.3%)
Agent	iast	1.18 s	130.873 ms (12.5%)
Agent	profiling	1.196 s	146.122 ms (13.9%)
Total	tracing	10.85 s	-
Total	appsec	10.889 s	39.093 ms (0.4%)
Total	iast	10.886 s	35.722 ms (0.3%)
Total	profiling	10.903 s	53.323 ms (0.5%)

candidate results

Module	Variant	Duration	Δ tracing
Agent	tracing	1.05 s	-
Agent	appsec	1.233 s	183.518 ms (17.5%)
Agent	iast	1.181 s	131.632 ms (12.5%)
Agent	profiling	1.195 s	144.905 ms (13.8%)
Total	tracing	10.733 s	-
Total	appsec	10.806 s	72.711 ms (0.7%)
Total	iast	10.972 s	238.268 ms (2.2%)
Total	profiling	10.904 s	171.067 ms (1.6%)

gantt
    title petclinic - break down per module: candidate=1.51.0-SNAPSHOT~49eaebf239, baseline=1.53.0-SNAPSHOT~9aad75597f

    dateFormat X
    axisFormat %s
section tracing
crashtracking [baseline] (1.457 ms) : 0, 1457
crashtracking [candidate] (1.458 ms) : 0, 1458
BytebuddyAgent [baseline] (733.983 ms) : 0, 733983
BytebuddyAgent [candidate] (733.588 ms) : 0, 733588
GlobalTracer [baseline] (244.242 ms) : 0, 244242
GlobalTracer [candidate] (242.936 ms) : 0, 242936
AppSec [baseline] (30.47 ms) : 0, 30470
AppSec [candidate] (30.098 ms) : 0, 30098
Debugger [baseline] (6.1 ms) : 0, 6100
Debugger [candidate] (6.053 ms) : 0, 6053
Remote Config [baseline] (677.511 µs) : 0, 678
Remote Config [candidate] (672.866 µs) : 0, 673
Telemetry [baseline] (11.587 ms) : 0, 11587
Telemetry [candidate] (13.695 ms) : 0, 13695
section appsec
crashtracking [baseline] (1.457 ms) : 0, 1457
crashtracking [candidate] (1.48 ms) : 0, 1480
BytebuddyAgent [baseline] (760.282 ms) : 0, 760282
BytebuddyAgent [candidate] (761.561 ms) : 0, 761561
GlobalTracer [baseline] (237.004 ms) : 0, 237004
GlobalTracer [candidate] (237.799 ms) : 0, 237799
AppSec [baseline] (170.252 ms) : 0, 170252
AppSec [candidate] (169.948 ms) : 0, 169948
Debugger [baseline] (7.443 ms) : 0, 7443
Debugger [candidate] (6.576 ms) : 0, 6576
Remote Config [baseline] (652.821 µs) : 0, 653
Remote Config [candidate] (645.144 µs) : 0, 645
Telemetry [baseline] (9.247 ms) : 0, 9247
Telemetry [candidate] (10.11 ms) : 0, 10110
IAST [baseline] (23.754 ms) : 0, 23754
IAST [candidate] (23.762 ms) : 0, 23762
section iast
crashtracking [baseline] (1.455 ms) : 0, 1455
crashtracking [candidate] (1.461 ms) : 0, 1461
BytebuddyAgent [baseline] (851.843 ms) : 0, 851843
BytebuddyAgent [candidate] (852.233 ms) : 0, 852233
GlobalTracer [baseline] (234.423 ms) : 0, 234423
GlobalTracer [candidate] (233.675 ms) : 0, 233675
AppSec [baseline] (26.777 ms) : 0, 26777
AppSec [candidate] (26.923 ms) : 0, 26923
Debugger [baseline] (6.621 ms) : 0, 6621
Debugger [candidate] (7.487 ms) : 0, 7487
Remote Config [baseline] (598.757 µs) : 0, 599
Remote Config [candidate] (594.26 µs) : 0, 594
Telemetry [baseline] (8.24 ms) : 0, 8240
Telemetry [candidate] (8.403 ms) : 0, 8403
IAST [baseline] (29.338 ms) : 0, 29338
IAST [candidate] (29.347 ms) : 0, 29347
section profiling
crashtracking [baseline] (1.414 ms) : 0, 1414
crashtracking [candidate] (1.42 ms) : 0, 1420
BytebuddyAgent [baseline] (761.342 ms) : 0, 761342
BytebuddyAgent [candidate] (760.042 ms) : 0, 760042
GlobalTracer [baseline] (222.388 ms) : 0, 222388
GlobalTracer [candidate] (221.786 ms) : 0, 221786
AppSec [baseline] (29.973 ms) : 0, 29973
AppSec [candidate] (29.945 ms) : 0, 29945
Debugger [baseline] (7.07 ms) : 0, 7070
Debugger [candidate] (6.297 ms) : 0, 6297
Remote Config [baseline] (710.31 µs) : 0, 710
Remote Config [candidate] (709.172 µs) : 0, 709
Telemetry [baseline] (15.518 ms) : 0, 15518
Telemetry [candidate] (16.325 ms) : 0, 16325
ProfilingAgent [baseline] (107.7 ms) : 0, 107700
ProfilingAgent [candidate] (108.284 ms) : 0, 108284
Profiling [baseline] (108.348 ms) : 0, 108348
Profiling [candidate] (108.922 ms) : 0, 108922

Load

Parameters

	Baseline	Candidate
baseline_or_candidate	baseline	candidate
git_branch	master	daniel.mohedano/failed-test-replay
git_commit_date	1755780320	1755783414
git_commit_sha	`9aad755`	`49eaebf`
release_version	1.53.0-SNAPSHOT~9aad75597f	1.51.0-SNAPSHOT~49eaebf239

See matching parameters

	Baseline	Candidate
application	insecure-bank	insecure-bank
ci_job_date	1755784886	1755784886
ci_job_id	1091974993	1091974993
ci_pipeline_id	74366557	74366557
cpu_model	Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz	Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
kernel_version	Linux runner-zfyrx7zua-project-304-concurrent-1-hlwa0p8r 6.8.0-1031-aws #33~22.04.1-Ubuntu SMP Thu Jun 26 14:22:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux	Linux runner-zfyrx7zua-project-304-concurrent-1-hlwa0p8r 6.8.0-1031-aws #33~22.04.1-Ubuntu SMP Thu Jun 26 14:22:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux

Summary

Found 1 performance improvements and 3 performance regressions! Performance is the same for 8 metrics, 12 unstable metrics.

scenario	Δ mean http_req_duration	Δ mean throughput	candidate mean http_req_duration	candidate mean throughput	baseline mean http_req_duration	baseline mean throughput
scenario:load:insecure-bank:iast_GLOBAL:high_load	worse [+299.577µs; +705.308µs] or [+2.927%; +6.892%]	unstable [-92.711op/s; +24.291op/s] or [-20.424%; +5.351%]	10.736ms	419.727op/s	10.234ms	453.938op/s
scenario:load:insecure-bank:tracing:high_load	worse [+266.214µs; +504.126µs] or [+3.580%; +6.780%]	unstable [-108.537op/s; +47.787op/s] or [-17.433%; +7.675%]	7.821ms	592.219op/s	7.435ms	622.594op/s
scenario:load:petclinic:code_origins:high_load	better [-2.188ms; -1.375ms] or [-4.854%; -3.050%]	unstable [-4.395op/s; +10.242op/s] or [-4.182%; +9.745%]	43.291ms	108.025op/s	45.072ms	105.101op/s
scenario:load:petclinic:tracing:high_load	worse [+1.199ms; +2.037ms] or [+2.695%; +4.577%]	unstable [-10.416op/s; +5.588op/s] or [-10.028%; +5.380%]	46.116ms	101.450op/s	44.498ms	103.864op/s

Request duration reports for insecure-bank

gantt
    title insecure-bank - request duration [CI 0.99] : candidate=1.51.0-SNAPSHOT~49eaebf239, baseline=1.53.0-SNAPSHOT~9aad75597f
    dateFormat X
    axisFormat %s
section baseline
no_agent (4.295 ms) : 4247, 4343
.   : milestone, 4295,
iast (9.122 ms) : 8974, 9269
.   : milestone, 9122,
iast_FULL (13.791 ms) : 13515, 14068
.   : milestone, 13791,
iast_GLOBAL (10.234 ms) : 10055, 10413
.   : milestone, 10234,
profiling (9.177 ms) : 8992, 9362
.   : milestone, 9177,
tracing (7.435 ms) : 7328, 7543
.   : milestone, 7435,
section candidate
no_agent (4.322 ms) : 4274, 4371
.   : milestone, 4322,
iast (9.191 ms) : 9042, 9339
.   : milestone, 9191,
iast_FULL (14.065 ms) : 13790, 14340
.   : milestone, 14065,
iast_GLOBAL (10.736 ms) : 10538, 10934
.   : milestone, 10736,
profiling (9.289 ms) : 9132, 9446
.   : milestone, 9289,
tracing (7.821 ms) : 7707, 7934
.   : milestone, 7821,

baseline results

Variant	Request duration [CI 0.99]	Δ no_agent
no_agent	4.295 ms [4.247 ms, 4.343 ms]	-
iast	9.122 ms [8.974 ms, 9.269 ms]	4.827 ms (112.4%)
iast_FULL	13.791 ms [13.515 ms, 14.068 ms]	9.496 ms (221.1%)
iast_GLOBAL	10.234 ms [10.055 ms, 10.413 ms]	5.939 ms (138.3%)
profiling	9.177 ms [8.992 ms, 9.362 ms]	4.882 ms (113.7%)
tracing	7.435 ms [7.328 ms, 7.543 ms]	3.14 ms (73.1%)

candidate results

Variant	Request duration [CI 0.99]	Δ no_agent
no_agent	4.322 ms [4.274 ms, 4.371 ms]	-
iast	9.191 ms [9.042 ms, 9.339 ms]	4.868 ms (112.6%)
iast_FULL	14.065 ms [13.79 ms, 14.34 ms]	9.742 ms (225.4%)
iast_GLOBAL	10.736 ms [10.538 ms, 10.934 ms]	6.414 ms (148.4%)
profiling	9.289 ms [9.132 ms, 9.446 ms]	4.967 ms (114.9%)
tracing	7.821 ms [7.707 ms, 7.934 ms]	3.498 ms (80.9%)

Request duration reports for petclinic

gantt
    title petclinic - request duration [CI 0.99] : candidate=1.51.0-SNAPSHOT~49eaebf239, baseline=1.53.0-SNAPSHOT~9aad75597f
    dateFormat X
    axisFormat %s
section baseline
no_agent (37.27 ms) : 36965, 37575
.   : milestone, 37270,
appsec (47.289 ms) : 46852, 47725
.   : milestone, 47289,
code_origins (45.072 ms) : 44687, 45457
.   : milestone, 45072,
iast (45.508 ms) : 45115, 45902
.   : milestone, 45508,
profiling (48.595 ms) : 48160, 49030
.   : milestone, 48595,
tracing (44.498 ms) : 44122, 44873
.   : milestone, 44498,
section candidate
no_agent (36.277 ms) : 35982, 36571
.   : milestone, 36277,
appsec (47.65 ms) : 47236, 48063
.   : milestone, 47650,
code_origins (43.291 ms) : 42920, 43662
.   : milestone, 43291,
iast (45.965 ms) : 45576, 46354
.   : milestone, 45965,
profiling (47.437 ms) : 46954, 47921
.   : milestone, 47437,
tracing (46.116 ms) : 45713, 46518
.   : milestone, 46116,

baseline results

Variant	Request duration [CI 0.99]	Δ no_agent
no_agent	37.27 ms [36.965 ms, 37.575 ms]	-
appsec	47.289 ms [46.852 ms, 47.725 ms]	10.019 ms (26.9%)
code_origins	45.072 ms [44.687 ms, 45.457 ms]	7.802 ms (20.9%)
iast	45.508 ms [45.115 ms, 45.902 ms]	8.238 ms (22.1%)
profiling	48.595 ms [48.16 ms, 49.03 ms]	11.325 ms (30.4%)
tracing	44.498 ms [44.122 ms, 44.873 ms]	7.228 ms (19.4%)

candidate results

Variant	Request duration [CI 0.99]	Δ no_agent
no_agent	36.277 ms [35.982 ms, 36.571 ms]	-
appsec	47.65 ms [47.236 ms, 48.063 ms]	11.373 ms (31.4%)
code_origins	43.291 ms [42.92 ms, 43.662 ms]	7.014 ms (19.3%)
iast	45.965 ms [45.576 ms, 46.354 ms]	9.688 ms (26.7%)
profiling	47.437 ms [46.954 ms, 47.921 ms]	11.161 ms (30.8%)
tracing	46.116 ms [45.713 ms, 46.518 ms]	9.839 ms (27.1%)

Dacapo

Parameters

	Baseline	Candidate
baseline_or_candidate	baseline	candidate
git_branch	master	daniel.mohedano/failed-test-replay
git_commit_date	1755780320	1755783414
git_commit_sha	`9aad755`	`49eaebf`
release_version	1.53.0-SNAPSHOT~9aad75597f	1.51.0-SNAPSHOT~49eaebf239

See matching parameters

	Baseline	Candidate
application	biojava	biojava
ci_job_date	1755785392	1755785392
ci_job_id	1091974994	1091974994
ci_pipeline_id	74366557	74366557
cpu_model	Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz	Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
kernel_version	Linux runner-zfyrx7zua-project-304-concurrent-0-wvbqpjz8 6.8.0-1031-aws #33~22.04.1-Ubuntu SMP Thu Jun 26 14:22:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux	Linux runner-zfyrx7zua-project-304-concurrent-0-wvbqpjz8 6.8.0-1031-aws #33~22.04.1-Ubuntu SMP Thu Jun 26 14:22:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux

Summary

Found 0 performance improvements and 0 performance regressions! Performance is the same for 11 metrics, 1 unstable metrics.

Execution time for tomcat

gantt
    title tomcat - execution time [CI 0.99] : candidate=1.51.0-SNAPSHOT~49eaebf239, baseline=1.53.0-SNAPSHOT~9aad75597f
    dateFormat X
    axisFormat %s
section baseline
no_agent (1.483 ms) : 1471, 1495
.   : milestone, 1483,
appsec (3.636 ms) : 3423, 3849
.   : milestone, 3636,
iast (2.214 ms) : 2151, 2277
.   : milestone, 2214,
iast_GLOBAL (2.254 ms) : 2191, 2318
.   : milestone, 2254,
profiling (2.057 ms) : 2007, 2108
.   : milestone, 2057,
tracing (2.027 ms) : 1978, 2076
.   : milestone, 2027,
section candidate
no_agent (1.483 ms) : 1472, 1495
.   : milestone, 1483,
appsec (3.61 ms) : 3398, 3822
.   : milestone, 3610,
iast (2.216 ms) : 2153, 2279
.   : milestone, 2216,
iast_GLOBAL (2.255 ms) : 2192, 2318
.   : milestone, 2255,
profiling (2.051 ms) : 2001, 2102
.   : milestone, 2051,
tracing (2.019 ms) : 1970, 2067
.   : milestone, 2019,

baseline results

Variant	Execution Time [CI 0.99]	Δ no_agent
no_agent	1.483 ms [1.471 ms, 1.495 ms]	-
appsec	3.636 ms [3.423 ms, 3.849 ms]	2.153 ms (145.2%)
iast	2.214 ms [2.151 ms, 2.277 ms]	731.222 µs (49.3%)
iast_GLOBAL	2.254 ms [2.191 ms, 2.318 ms]	771.529 µs (52.0%)
profiling	2.057 ms [2.007 ms, 2.108 ms]	574.192 µs (38.7%)
tracing	2.027 ms [1.978 ms, 2.076 ms]	544.048 µs (36.7%)

candidate results

Variant	Execution Time [CI 0.99]	Δ no_agent
no_agent	1.483 ms [1.472 ms, 1.495 ms]	-
appsec	3.61 ms [3.398 ms, 3.822 ms]	2.127 ms (143.4%)
iast	2.216 ms [2.153 ms, 2.279 ms]	733.034 µs (49.4%)
iast_GLOBAL	2.255 ms [2.192 ms, 2.318 ms]	772.1 µs (52.1%)
profiling	2.051 ms [2.001 ms, 2.102 ms]	568.299 µs (38.3%)
tracing	2.019 ms [1.97 ms, 2.067 ms]	535.482 µs (36.1%)

Execution time for biojava

gantt
    title biojava - execution time [CI 0.99] : candidate=1.51.0-SNAPSHOT~49eaebf239, baseline=1.53.0-SNAPSHOT~9aad75597f
    dateFormat X
    axisFormat %s
section baseline
no_agent (15.534 s) : 15534000, 15534000
.   : milestone, 15534000,
appsec (14.807 s) : 14807000, 14807000
.   : milestone, 14807000,
iast (18.845 s) : 18845000, 18845000
.   : milestone, 18845000,
iast_GLOBAL (18.14 s) : 18140000, 18140000
.   : milestone, 18140000,
profiling (15.246 s) : 15246000, 15246000
.   : milestone, 15246000,
tracing (15.045 s) : 15045000, 15045000
.   : milestone, 15045000,
section candidate
no_agent (15.4 s) : 15400000, 15400000
.   : milestone, 15400000,
appsec (14.843 s) : 14843000, 14843000
.   : milestone, 14843000,
iast (18.945 s) : 18945000, 18945000
.   : milestone, 18945000,
iast_GLOBAL (18.033 s) : 18033000, 18033000
.   : milestone, 18033000,
profiling (15.263 s) : 15263000, 15263000
.   : milestone, 15263000,
tracing (14.955 s) : 14955000, 14955000
.   : milestone, 14955000,

baseline results

Variant	Execution Time [CI 0.99]	Δ no_agent
no_agent	15.534 s [15.534 s, 15.534 s]	-
appsec	14.807 s [14.807 s, 14.807 s]	-727.0 ms (-4.7%)
iast	18.845 s [18.845 s, 18.845 s]	3.311 s (21.3%)
iast_GLOBAL	18.14 s [18.14 s, 18.14 s]	2.606 s (16.8%)
profiling	15.246 s [15.246 s, 15.246 s]	-288.0 ms (-1.9%)
tracing	15.045 s [15.045 s, 15.045 s]	-489.0 ms (-3.1%)

candidate results

Variant	Execution Time [CI 0.99]	Δ no_agent
no_agent	15.4 s [15.4 s, 15.4 s]	-
appsec	14.843 s [14.843 s, 14.843 s]	-557.0 ms (-3.6%)
iast	18.945 s [18.945 s, 18.945 s]	3.545 s (23.0%)
iast_GLOBAL	18.033 s [18.033 s, 18.033 s]	2.633 s (17.1%)
profiling	15.263 s [15.263 s, 15.263 s]	-137.0 ms (-0.9%)
tracing	14.955 s [14.955 s, 14.955 s]	-445.0 ms (-2.9%)

datadog-official · 2025-08-11T09:23:09Z

🎯 Code Coverage
• Patch Coverage: 45.59%
• Total Coverage: 57.32% (-0.03%)

View detailed report

_{This comment will be updated automatically if new data arrives.

🔗 Commit SHA: 49eaebf | Docs | Was this helpful? Give us feedback!}

nikita-tkachenko-datadog · 2025-08-21T15:35:47Z

internal-api/src/main/java/datadog/trace/api/Config.java

-    return getFinalDebuggerBaseUrl() + "/debugger/v1/input";
+    if (Strings.isNotBlank(dynamicInstrumentationSnapshotUrl)) {
+      return dynamicInstrumentationSnapshotUrl;
+    } else if (isCiVisibilityFailedTestReplayActive() && isCiVisibilityAgentlessEnabled()) {


I wonder if this condition should be isCiVisibilityAgentlessEnabled alone: if we're running in agentless mode chances are there's no agent to connect to. I understand that debugger wasn't working in agentless before these changes, but why not fix it while we're at it

nikita-tkachenko-datadog · 2025-08-21T15:37:16Z

internal-api/src/main/java/datadog/trace/api/Config.java

@@ -1044,6 +1051,7 @@ public static String getHostName() {
  private final boolean DBMTracePreparedStatements;

  private final boolean dynamicInstrumentationEnabled;
+  private final String dynamicInstrumentationSnapshotUrl;


Is this needed for our testing or for customer set ups where a custom URL (a proxy?) needs to be used to communicate with DD?

This change was taken from #9186. We don't need it for our own testing given that we use CIVISIBILITY_AGENTLESS_URL, but it will be useful to let customers use a custom URL for the Exception Replay feature in general.

nikita-tkachenko-datadog · 2025-08-21T15:47:12Z

internal-api/src/main/java/datadog/trace/api/Config.java

@@ -1029,6 +1034,8 @@ public static String getHostName() {
  private final String gitPullRequestBaseBranch;
  private final String gitPullRequestBaseBranchSha;
  private final String gitCommitHeadSha;
+  private final boolean ciVisibilityFailedTestReplayEnabled;
+  private boolean ciVisibilityFailedTestReplayActive = false; // propagates setting to DI


It needs to be volatile if we choose to keep it :)

Having a mutable field in config doesn't quite fit how it is currently used: all the other fields are immutable (with the exception of the two that are just lazily initialized).
Also, having "test replay enabled" and "test replay active" next to each other is quite confusing.

I wonder if instead of setting this field we could harness datadog.trace.bootstrap.debugger.DebuggerContext#updateConfig.

An even better way would be to not call the debugger API directly, but try to invoke the remote config mechanism. This should be more robust: we avoid coupling Test Optimization to Debugger, and we make use of the centralized config updates logic that should (hopefully) take care of all the pitfalls for us

I couldn't find a way to call datadog.remoteconfig.state.ProductState#apply programmatically (seems like it's only being called by the remote config poller when we receive the config from the backend), but I don't think adding it is impossible. We can discuss this with the core tracer team

From my conversations with Live Debugger team, they wanted to move away from having their products only available when remote config is enabled, which is why originally I didn't take it into account. But we could technically limit FTR to be used only when remote config is enabled (and actually it is only needed in headless mode). Let's discuss the approach 👍

As discussed offline, let's see if we can separate "dynamic config" from "remote config" and add some means of programmatically controlling the former

nikita-tkachenko-datadog · 2025-08-21T15:56:49Z

...nt/agent-debugger/src/main/java/com/datadog/debugger/exception/DefaultExceptionDebugger.java

-    if (t instanceof Error) {
-      if (LOGGER.isDebugEnabled()) {
-        LOGGER.debug("Skip handling error: {}", t.toString());
+    if (isFailedTestReplayActive) {


I wonder if we can get away without propagating this flag, and just check the test strategy

nikita-tkachenko-datadog · 2025-08-21T16:04:34Z

...ent-ci-visibility/src/main/java/datadog/trace/civisibility/events/TestEventsHandlerImpl.java

@@ -181,6 +183,10 @@ public void onTestStart(
    }

    if (testExecutionHistory != null) {
+      if (testExecutionHistory instanceof RetryUntilSuccessful) {
+        // Used by FailedTestReplay to limit the instrumentation to AutoTestRetries


I wonder how correct it is to be checking this on a per-test basis: if ATR is enabled in the backend, then every test is subject to auto-retry with the exception of attempt-to-fixes and new tests (as respective execution policies have higher priority than ATR). But do we really not want to enable exception replay for these two as well?

In my opinion I think it might be beneficial for all retry mechanisms. Attempt to fix could be a bit more dangerous with its 20 retries regarding the overhead FTR could introduce, but given that right now we only support the manual flow, it shouldn't be too big of an issue. I made the changes to limit it to ATR after the Guild meeting in order to align it with JS' implementation

As discussed offline, let's add a dedicated method to TestExecutionPolicy that'll determine whether FTR is enabled for a given test

nikita-tkachenko-datadog · 2025-08-21T16:08:10Z

...ent-ci-visibility/src/main/java/datadog/trace/civisibility/events/TestEventsHandlerImpl.java

@@ -181,6 +183,10 @@ public void onTestStart(
    }

    if (testExecutionHistory != null) {
+      if (testExecutionHistory instanceof RetryUntilSuccessful) {
+        // Used by FailedTestReplay to limit the instrumentation to AutoTestRetries
+        test.setTag(DDTags.TEST_STRATEGY, RetryReason.atr.toString());


You can use datadog.trace.civisibility.domain.TestImpl#context to store data that you don't want to send to the backend. It is more idiomatic than adding/removing tags. As a nice side effect, the context is propagated to children spans, so if a test makes an HTTP request and the exception happens inside the child HTTP span, the context will be there as well

Also, can we call testExecutionHistory.currentExecutionRetryReason() and just store the result of that? Doing instanceof is breaking encapsulation.

Retry reason will be null for the initial test run, but IIUC we don't apply exception replay to the first run anyway, right?

In this case we also need the information during the first execution in order to create the Exception Replay probe. The flow is:

First run fails, probe is created and exception instrumented

Test is retried, fails again, the context information from the probe is captured and sent.

Maybe a testExecutionHistory.retryReason() could have been a better approach to avoid the breaking of encapsulation. But if we're able to propagate this information either through the test context or by accessing the test strategy I agree it is a much cleaner approach.

nikita-tkachenko-datadog · 2025-08-21T16:20:02Z

dd-java-agent/agent-bootstrap/src/main/java/datadog/trace/bootstrap/Agent.java

@@ -632,6 +632,7 @@ public void execute() {
      }

      maybeStartAppSec(scoClass, sco);
+      // start civisibility before debugger to enable Failed Test Replay correctly in headless mode


If we manage to plug into remote config as described in the other comment, there should be no ordering dependency 🤞

nikita-tkachenko-datadog · 2025-08-21T16:24:37Z

...ility/src/main/java/datadog/trace/civisibility/domain/buildsystem/BuildSystemModuleImpl.java

@@ -180,6 +181,18 @@ private Map<String, String> getPropertiesPropagatedToChildProcess(
        Strings.propertyNameToSystemPropertyName(CiVisibilityConfig.TEST_MANAGEMENT_ENABLED),
        Boolean.toString(executionSettings.getTestManagementSettings().isEnabled()));

+    propagatedSystemProperties.put(
+        Strings.propertyNameToSystemPropertyName(


Do we really need a dedicated setting for this? I wonder if it can be derived from DebuggerConfig.EXCEPTION_REPLAY_ENABLED && CI_VISIBILITY_ENABLED

I think this could cause problems because we use the ..._ENABLED settings as kill switches in datadog.trace.civisibility.config.ExecutionSettingsFactoryImpl#doCreate. So, although it would work when propagating the settings to the child process, in the parent process EXCEPTION_REPLAY_ENABLED==false and CI_VISIBILITY_ENABLED==true would mean that FTR wouldn't be enabled (even if it was marked as enabled by the backend)

As discussed offline let's enable TO-specific debugger behaviour whenever exception replay is enabled and test optimization is enabled

nikita-tkachenko-datadog · 2025-08-21T16:27:21Z

...nt/agent-debugger/src/main/java/com/datadog/debugger/exception/DefaultExceptionDebugger.java

@@ -108,7 +139,11 @@ public void handleException(Throwable t, AgentSpan span) {
            exceptionProbeManager.createProbesForException(
                throwable.getStackTrace(), chainedExceptionIdx);
        if (creationResult.probesCreated > 0) {
-          AgentTaskScheduler.INSTANCE.execute(() -> applyExceptionConfiguration(fingerprint));
+          if (isFailedTestReplayActive || !applyConfigAsync) {


Can we get rid of applyConfigAsync and the corresponding config field/methods?

The applyConfigAsync config was also added from #9186, to let Exception Replay apply the instrumentation synchronously even without Failed Test Replay

nikita-tkachenko-datadog · 2025-08-21T16:31:02Z

dd-java-agent/agent-debugger/src/main/java/com/datadog/debugger/sink/DebuggerSink.java

+    @Override
+    public void beforeSuiteEnd() {
+      LOGGER.debug("CiVisibility BeforeSuiteEnd fired, flushing sink");
+      sink.lowRateFlush(sink);


I think debugger is already doing this asynchronously at a scheduled interval.
If I'm not mistaken, we just need to make sure whatever's left in the sink is flushed in com.datadog.debugger.sink.DebuggerSink#stop that is called from the shutdown hook.

daniel-mohedano added 4 commits July 4, 2025 12:22

add di_enabled to settings response

2c219fd

add FTR related metrics

a911ae0

add FTR to execution settings

d20159e

add basic exception replay integration in agent mode

c4c84d6

daniel-mohedano added type: enhancement Enhancements and improvements tag: do not merge Do not merge changes comp: ci visibility Continuous Integration Visibility comp: debugger Dynamic Instrumentation labels Jul 22, 2025

daniel-mohedano added 3 commits July 22, 2025 16:27

feat: headless and agentless changes

ae31670

Merge branch 'master' into daniel.mohedano/failed-test-replay

4f2d6a9

fix: tests

c7eeac8

daniel-mohedano added 8 commits August 4, 2025 10:21

fix: testng capabilities

14cf11c

feat: refactor agentless intakes

461fb1f

Merge branch 'master' into daniel.mohedano/failed-test-replay

14b1054

chore: update smoke test fixtures

a87eff0

test: add unit test for new Intake enum

19b4edf

test: remove ftr from instrumentation tests (not used)

9e81cc7

test: introduce FTR smoke tests for headfull and headless modes

4e3deb4

style: spotless and codenarc

f931654

daniel-mohedano added 3 commits August 11, 2025 12:31

feat: add test event finished FTR telemetry

eacde60

feat: add product field to snapshots

0228f71

feat: implement SuiteEnd listener for sink flushing

88fd665

daniel-mohedano changed the title ~~Failed Test Replay~~ Implement Failed Test Replay Aug 12, 2025

feat: introduce new config variables for debugger

ec9c54d

daniel-mohedano removed the tag: do not merge Do not merge changes label Aug 13, 2025

daniel-mohedano added 2 commits August 19, 2025 10:29

chore: remove todo

d4d2432

Merge branch 'master' into daniel.mohedano/failed-test-replay

ffda9b3

daniel-mohedano marked this pull request as ready for review August 20, 2025 10:07

daniel-mohedano requested review from a team as code owners August 20, 2025 10:07

daniel-mohedano requested review from shatzi, Mariovido, bric3 and PerfectSlayer and removed request for a team August 20, 2025 10:07

feat: align FTR settings with JS' implementation

49eaebf

nikita-tkachenko-datadog reviewed Aug 21, 2025

View reviewed changes

Implement Failed Test Replay #9214

Are you sure you want to change the base?

Implement Failed Test Replay #9214

Uh oh!

Conversation

daniel-mohedano commented Jul 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What Does This Do

Motivation

Contributor Checklist

Uh oh!

pr-commenter bot commented Jul 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Debugger benchmarks

Parameters

Summary

Uh oh!

pr-commenter bot commented Jul 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmarks

Startup

Parameters

Summary

Load

Parameters

Summary

Dacapo

Parameters

Summary

Uh oh!

datadog-official bot commented Aug 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

daniel-mohedano Aug 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

daniel-mohedano Aug 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

daniel-mohedano commented Jul 22, 2025 •

edited

Loading

pr-commenter bot commented Jul 23, 2025 •

edited

Loading

pr-commenter bot commented Jul 23, 2025 •

edited

Loading

datadog-official bot commented Aug 11, 2025 •

edited

Loading

daniel-mohedano Aug 22, 2025 •

edited

Loading

daniel-mohedano Aug 22, 2025 •

edited

Loading