-
Notifications
You must be signed in to change notification settings - Fork 312
Implement Failed Test Replay #9214
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Debugger benchmarksParameters
See matching parameters
SummaryFound 0 performance improvements and 0 performance regressions! Performance is the same for 9 metrics, 6 unstable metrics. See unchanged results
Request duration reports for reportsgantt
title reports - request duration [CI 0.99] : candidate=None, baseline=None
dateFormat X
axisFormat %s
section baseline
noprobe (323.895 µs) : 286, 362
. : milestone, 324,
basic (279.943 µs) : 273, 287
. : milestone, 280,
loop (8.971 ms) : 8966, 8976
. : milestone, 8971,
section candidate
noprobe (316.318 µs) : 293, 339
. : milestone, 316,
basic (278.618 µs) : 272, 285
. : milestone, 279,
loop (8.959 ms) : 8955, 8963
. : milestone, 8959,
|
BenchmarksStartupParameters
See matching parameters
SummaryFound 0 performance improvements and 0 performance regressions! Performance is the same for 46 metrics, 13 unstable metrics. Startup time reports for insecure-bankgantt
title insecure-bank - global startup overhead: candidate=1.51.0-SNAPSHOT~49eaebf239, baseline=1.53.0-SNAPSHOT~9aad75597f
dateFormat X
axisFormat %s
section tracing
Agent [baseline] (1.053 s) : 0, 1053200
Total [baseline] (8.641 s) : 0, 8640587
Agent [candidate] (1.045 s) : 0, 1044926
Total [candidate] (8.6 s) : 0, 8600380
section iast
Agent [baseline] (1.181 s) : 0, 1180830
Total [baseline] (9.349 s) : 0, 9348833
Agent [candidate] (1.177 s) : 0, 1176598
Total [candidate] (9.338 s) : 0, 9337754
gantt
title insecure-bank - break down per module: candidate=1.51.0-SNAPSHOT~49eaebf239, baseline=1.53.0-SNAPSHOT~9aad75597f
dateFormat X
axisFormat %s
section tracing
crashtracking [baseline] (1.455 ms) : 0, 1455
crashtracking [candidate] (1.465 ms) : 0, 1465
BytebuddyAgent [baseline] (737.181 ms) : 0, 737181
BytebuddyAgent [candidate] (732.091 ms) : 0, 732091
GlobalTracer [baseline] (244.604 ms) : 0, 244604
GlobalTracer [candidate] (242.152 ms) : 0, 242152
AppSec [baseline] (30.47 ms) : 0, 30470
AppSec [candidate] (30.183 ms) : 0, 30183
Debugger [baseline] (6.096 ms) : 0, 6096
Debugger [candidate] (6.045 ms) : 0, 6045
Remote Config [baseline] (687.976 µs) : 0, 688
Remote Config [candidate] (674.735 µs) : 0, 675
Telemetry [baseline] (11.608 ms) : 0, 11608
Telemetry [candidate] (11.326 ms) : 0, 11326
section iast
crashtracking [baseline] (1.461 ms) : 0, 1461
crashtracking [candidate] (1.447 ms) : 0, 1447
BytebuddyAgent [baseline] (851.971 ms) : 0, 851971
BytebuddyAgent [candidate] (848.66 ms) : 0, 848660
GlobalTracer [baseline] (234.592 ms) : 0, 234592
GlobalTracer [candidate] (232.644 ms) : 0, 232644
AppSec [baseline] (26.908 ms) : 0, 26908
AppSec [candidate] (26.038 ms) : 0, 26038
Debugger [baseline] (6.564 ms) : 0, 6564
Debugger [candidate] (7.588 ms) : 0, 7588
Remote Config [baseline] (607.247 µs) : 0, 607
Remote Config [candidate] (612.559 µs) : 0, 613
Telemetry [baseline] (8.389 ms) : 0, 8389
Telemetry [candidate] (8.358 ms) : 0, 8358
IAST [baseline] (29.358 ms) : 0, 29358
IAST [candidate] (30.306 ms) : 0, 30306
Startup time reports for petclinicgantt
title petclinic - global startup overhead: candidate=1.51.0-SNAPSHOT~49eaebf239, baseline=1.53.0-SNAPSHOT~9aad75597f
dateFormat X
axisFormat %s
section tracing
Agent [baseline] (1.05 s) : 0, 1049624
Total [baseline] (10.85 s) : 0, 10849923
Agent [candidate] (1.05 s) : 0, 1049653
Total [candidate] (10.733 s) : 0, 10733366
section appsec
Agent [baseline] (1.231 s) : 0, 1231332
Total [baseline] (10.889 s) : 0, 10889016
Agent [candidate] (1.233 s) : 0, 1233172
Total [candidate] (10.806 s) : 0, 10806078
section iast
Agent [baseline] (1.18 s) : 0, 1180497
Total [baseline] (10.886 s) : 0, 10885645
Agent [candidate] (1.181 s) : 0, 1181286
Total [candidate] (10.972 s) : 0, 10971634
section profiling
Agent [baseline] (1.196 s) : 0, 1195745
Total [baseline] (10.903 s) : 0, 10903246
Agent [candidate] (1.195 s) : 0, 1194558
Total [candidate] (10.904 s) : 0, 10904434
gantt
title petclinic - break down per module: candidate=1.51.0-SNAPSHOT~49eaebf239, baseline=1.53.0-SNAPSHOT~9aad75597f
dateFormat X
axisFormat %s
section tracing
crashtracking [baseline] (1.457 ms) : 0, 1457
crashtracking [candidate] (1.458 ms) : 0, 1458
BytebuddyAgent [baseline] (733.983 ms) : 0, 733983
BytebuddyAgent [candidate] (733.588 ms) : 0, 733588
GlobalTracer [baseline] (244.242 ms) : 0, 244242
GlobalTracer [candidate] (242.936 ms) : 0, 242936
AppSec [baseline] (30.47 ms) : 0, 30470
AppSec [candidate] (30.098 ms) : 0, 30098
Debugger [baseline] (6.1 ms) : 0, 6100
Debugger [candidate] (6.053 ms) : 0, 6053
Remote Config [baseline] (677.511 µs) : 0, 678
Remote Config [candidate] (672.866 µs) : 0, 673
Telemetry [baseline] (11.587 ms) : 0, 11587
Telemetry [candidate] (13.695 ms) : 0, 13695
section appsec
crashtracking [baseline] (1.457 ms) : 0, 1457
crashtracking [candidate] (1.48 ms) : 0, 1480
BytebuddyAgent [baseline] (760.282 ms) : 0, 760282
BytebuddyAgent [candidate] (761.561 ms) : 0, 761561
GlobalTracer [baseline] (237.004 ms) : 0, 237004
GlobalTracer [candidate] (237.799 ms) : 0, 237799
AppSec [baseline] (170.252 ms) : 0, 170252
AppSec [candidate] (169.948 ms) : 0, 169948
Debugger [baseline] (7.443 ms) : 0, 7443
Debugger [candidate] (6.576 ms) : 0, 6576
Remote Config [baseline] (652.821 µs) : 0, 653
Remote Config [candidate] (645.144 µs) : 0, 645
Telemetry [baseline] (9.247 ms) : 0, 9247
Telemetry [candidate] (10.11 ms) : 0, 10110
IAST [baseline] (23.754 ms) : 0, 23754
IAST [candidate] (23.762 ms) : 0, 23762
section iast
crashtracking [baseline] (1.455 ms) : 0, 1455
crashtracking [candidate] (1.461 ms) : 0, 1461
BytebuddyAgent [baseline] (851.843 ms) : 0, 851843
BytebuddyAgent [candidate] (852.233 ms) : 0, 852233
GlobalTracer [baseline] (234.423 ms) : 0, 234423
GlobalTracer [candidate] (233.675 ms) : 0, 233675
AppSec [baseline] (26.777 ms) : 0, 26777
AppSec [candidate] (26.923 ms) : 0, 26923
Debugger [baseline] (6.621 ms) : 0, 6621
Debugger [candidate] (7.487 ms) : 0, 7487
Remote Config [baseline] (598.757 µs) : 0, 599
Remote Config [candidate] (594.26 µs) : 0, 594
Telemetry [baseline] (8.24 ms) : 0, 8240
Telemetry [candidate] (8.403 ms) : 0, 8403
IAST [baseline] (29.338 ms) : 0, 29338
IAST [candidate] (29.347 ms) : 0, 29347
section profiling
crashtracking [baseline] (1.414 ms) : 0, 1414
crashtracking [candidate] (1.42 ms) : 0, 1420
BytebuddyAgent [baseline] (761.342 ms) : 0, 761342
BytebuddyAgent [candidate] (760.042 ms) : 0, 760042
GlobalTracer [baseline] (222.388 ms) : 0, 222388
GlobalTracer [candidate] (221.786 ms) : 0, 221786
AppSec [baseline] (29.973 ms) : 0, 29973
AppSec [candidate] (29.945 ms) : 0, 29945
Debugger [baseline] (7.07 ms) : 0, 7070
Debugger [candidate] (6.297 ms) : 0, 6297
Remote Config [baseline] (710.31 µs) : 0, 710
Remote Config [candidate] (709.172 µs) : 0, 709
Telemetry [baseline] (15.518 ms) : 0, 15518
Telemetry [candidate] (16.325 ms) : 0, 16325
ProfilingAgent [baseline] (107.7 ms) : 0, 107700
ProfilingAgent [candidate] (108.284 ms) : 0, 108284
Profiling [baseline] (108.348 ms) : 0, 108348
Profiling [candidate] (108.922 ms) : 0, 108922
LoadParameters
See matching parameters
SummaryFound 1 performance improvements and 3 performance regressions! Performance is the same for 8 metrics, 12 unstable metrics.
Request duration reports for insecure-bankgantt
title insecure-bank - request duration [CI 0.99] : candidate=1.51.0-SNAPSHOT~49eaebf239, baseline=1.53.0-SNAPSHOT~9aad75597f
dateFormat X
axisFormat %s
section baseline
no_agent (4.295 ms) : 4247, 4343
. : milestone, 4295,
iast (9.122 ms) : 8974, 9269
. : milestone, 9122,
iast_FULL (13.791 ms) : 13515, 14068
. : milestone, 13791,
iast_GLOBAL (10.234 ms) : 10055, 10413
. : milestone, 10234,
profiling (9.177 ms) : 8992, 9362
. : milestone, 9177,
tracing (7.435 ms) : 7328, 7543
. : milestone, 7435,
section candidate
no_agent (4.322 ms) : 4274, 4371
. : milestone, 4322,
iast (9.191 ms) : 9042, 9339
. : milestone, 9191,
iast_FULL (14.065 ms) : 13790, 14340
. : milestone, 14065,
iast_GLOBAL (10.736 ms) : 10538, 10934
. : milestone, 10736,
profiling (9.289 ms) : 9132, 9446
. : milestone, 9289,
tracing (7.821 ms) : 7707, 7934
. : milestone, 7821,
Request duration reports for petclinicgantt
title petclinic - request duration [CI 0.99] : candidate=1.51.0-SNAPSHOT~49eaebf239, baseline=1.53.0-SNAPSHOT~9aad75597f
dateFormat X
axisFormat %s
section baseline
no_agent (37.27 ms) : 36965, 37575
. : milestone, 37270,
appsec (47.289 ms) : 46852, 47725
. : milestone, 47289,
code_origins (45.072 ms) : 44687, 45457
. : milestone, 45072,
iast (45.508 ms) : 45115, 45902
. : milestone, 45508,
profiling (48.595 ms) : 48160, 49030
. : milestone, 48595,
tracing (44.498 ms) : 44122, 44873
. : milestone, 44498,
section candidate
no_agent (36.277 ms) : 35982, 36571
. : milestone, 36277,
appsec (47.65 ms) : 47236, 48063
. : milestone, 47650,
code_origins (43.291 ms) : 42920, 43662
. : milestone, 43291,
iast (45.965 ms) : 45576, 46354
. : milestone, 45965,
profiling (47.437 ms) : 46954, 47921
. : milestone, 47437,
tracing (46.116 ms) : 45713, 46518
. : milestone, 46116,
DacapoParameters
See matching parameters
SummaryFound 0 performance improvements and 0 performance regressions! Performance is the same for 11 metrics, 1 unstable metrics. Execution time for tomcatgantt
title tomcat - execution time [CI 0.99] : candidate=1.51.0-SNAPSHOT~49eaebf239, baseline=1.53.0-SNAPSHOT~9aad75597f
dateFormat X
axisFormat %s
section baseline
no_agent (1.483 ms) : 1471, 1495
. : milestone, 1483,
appsec (3.636 ms) : 3423, 3849
. : milestone, 3636,
iast (2.214 ms) : 2151, 2277
. : milestone, 2214,
iast_GLOBAL (2.254 ms) : 2191, 2318
. : milestone, 2254,
profiling (2.057 ms) : 2007, 2108
. : milestone, 2057,
tracing (2.027 ms) : 1978, 2076
. : milestone, 2027,
section candidate
no_agent (1.483 ms) : 1472, 1495
. : milestone, 1483,
appsec (3.61 ms) : 3398, 3822
. : milestone, 3610,
iast (2.216 ms) : 2153, 2279
. : milestone, 2216,
iast_GLOBAL (2.255 ms) : 2192, 2318
. : milestone, 2255,
profiling (2.051 ms) : 2001, 2102
. : milestone, 2051,
tracing (2.019 ms) : 1970, 2067
. : milestone, 2019,
Execution time for biojavagantt
title biojava - execution time [CI 0.99] : candidate=1.51.0-SNAPSHOT~49eaebf239, baseline=1.53.0-SNAPSHOT~9aad75597f
dateFormat X
axisFormat %s
section baseline
no_agent (15.534 s) : 15534000, 15534000
. : milestone, 15534000,
appsec (14.807 s) : 14807000, 14807000
. : milestone, 14807000,
iast (18.845 s) : 18845000, 18845000
. : milestone, 18845000,
iast_GLOBAL (18.14 s) : 18140000, 18140000
. : milestone, 18140000,
profiling (15.246 s) : 15246000, 15246000
. : milestone, 15246000,
tracing (15.045 s) : 15045000, 15045000
. : milestone, 15045000,
section candidate
no_agent (15.4 s) : 15400000, 15400000
. : milestone, 15400000,
appsec (14.843 s) : 14843000, 14843000
. : milestone, 14843000,
iast (18.945 s) : 18945000, 18945000
. : milestone, 18945000,
iast_GLOBAL (18.033 s) : 18033000, 18033000
. : milestone, 18033000,
profiling (15.263 s) : 15263000, 15263000
. : milestone, 15263000,
tracing (14.955 s) : 14955000, 14955000
. : milestone, 14955000,
|
🎯 Code Coverage 🔗 Commit SHA: 49eaebf | Docs | Was this helpful? Give us feedback! |
return getFinalDebuggerBaseUrl() + "/debugger/v1/input"; | ||
if (Strings.isNotBlank(dynamicInstrumentationSnapshotUrl)) { | ||
return dynamicInstrumentationSnapshotUrl; | ||
} else if (isCiVisibilityFailedTestReplayActive() && isCiVisibilityAgentlessEnabled()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this condition should be isCiVisibilityAgentlessEnabled
alone: if we're running in agentless mode chances are there's no agent to connect to. I understand that debugger wasn't working in agentless before these changes, but why not fix it while we're at it
@@ -1044,6 +1051,7 @@ public static String getHostName() { | |||
private final boolean DBMTracePreparedStatements; | |||
|
|||
private final boolean dynamicInstrumentationEnabled; | |||
private final String dynamicInstrumentationSnapshotUrl; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this needed for our testing or for customer set ups where a custom URL (a proxy?) needs to be used to communicate with DD?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change was taken from #9186. We don't need it for our own testing given that we use CIVISIBILITY_AGENTLESS_URL
, but it will be useful to let customers use a custom URL for the Exception Replay feature in general.
@@ -1029,6 +1034,8 @@ public static String getHostName() { | |||
private final String gitPullRequestBaseBranch; | |||
private final String gitPullRequestBaseBranchSha; | |||
private final String gitCommitHeadSha; | |||
private final boolean ciVisibilityFailedTestReplayEnabled; | |||
private boolean ciVisibilityFailedTestReplayActive = false; // propagates setting to DI |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It needs to be volatile
if we choose to keep it :)
Having a mutable field in config doesn't quite fit how it is currently used: all the other fields are immutable (with the exception of the two that are just lazily initialized).
Also, having "test replay enabled" and "test replay active" next to each other is quite confusing.
I wonder if instead of setting this field we could harness datadog.trace.bootstrap.debugger.DebuggerContext#updateConfig
.
An even better way would be to not call the debugger API directly, but try to invoke the remote config mechanism. This should be more robust: we avoid coupling Test Optimization to Debugger, and we make use of the centralized config updates logic that should (hopefully) take care of all the pitfalls for us
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I couldn't find a way to call datadog.remoteconfig.state.ProductState#apply
programmatically (seems like it's only being called by the remote config poller when we receive the config from the backend), but I don't think adding it is impossible. We can discuss this with the core tracer team
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my conversations with Live Debugger team, they wanted to move away from having their products only available when remote config is enabled, which is why originally I didn't take it into account. But we could technically limit FTR to be used only when remote config is enabled (and actually it is only needed in headless mode). Let's discuss the approach 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed offline, let's see if we can separate "dynamic config" from "remote config" and add some means of programmatically controlling the former
if (t instanceof Error) { | ||
if (LOGGER.isDebugEnabled()) { | ||
LOGGER.debug("Skip handling error: {}", t.toString()); | ||
if (isFailedTestReplayActive) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we can get away without propagating this flag, and just check the test strategy
@@ -181,6 +183,10 @@ public void onTestStart( | |||
} | |||
|
|||
if (testExecutionHistory != null) { | |||
if (testExecutionHistory instanceof RetryUntilSuccessful) { | |||
// Used by FailedTestReplay to limit the instrumentation to AutoTestRetries |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder how correct it is to be checking this on a per-test basis: if ATR is enabled in the backend, then every test is subject to auto-retry with the exception of attempt-to-fixes and new tests (as respective execution policies have higher priority than ATR). But do we really not want to enable exception replay for these two as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my opinion I think it might be beneficial for all retry mechanisms. Attempt to fix could be a bit more dangerous with its 20 retries regarding the overhead FTR could introduce, but given that right now we only support the manual flow, it shouldn't be too big of an issue. I made the changes to limit it to ATR after the Guild meeting in order to align it with JS' implementation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed offline, let's add a dedicated method to TestExecutionPolicy
that'll determine whether FTR is enabled for a given test
@@ -181,6 +183,10 @@ public void onTestStart( | |||
} | |||
|
|||
if (testExecutionHistory != null) { | |||
if (testExecutionHistory instanceof RetryUntilSuccessful) { | |||
// Used by FailedTestReplay to limit the instrumentation to AutoTestRetries | |||
test.setTag(DDTags.TEST_STRATEGY, RetryReason.atr.toString()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use datadog.trace.civisibility.domain.TestImpl#context
to store data that you don't want to send to the backend. It is more idiomatic than adding/removing tags. As a nice side effect, the context is propagated to children spans, so if a test makes an HTTP request and the exception happens inside the child HTTP span, the context will be there as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, can we call testExecutionHistory.currentExecutionRetryReason()
and just store the result of that? Doing instanceof
is breaking encapsulation.
Retry reason will be null
for the initial test run, but IIUC we don't apply exception replay to the first run anyway, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case we also need the information during the first execution in order to create the Exception Replay probe. The flow is:
- First run fails, probe is created and exception instrumented
- Test is retried, fails again, the context information from the probe is captured and sent.
Maybe a testExecutionHistory.retryReason()
could have been a better approach to avoid the breaking of encapsulation. But if we're able to propagate this information either through the test context or by accessing the test strategy I agree it is a much cleaner approach.
@@ -632,6 +632,7 @@ public void execute() { | |||
} | |||
|
|||
maybeStartAppSec(scoClass, sco); | |||
// start civisibility before debugger to enable Failed Test Replay correctly in headless mode |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we manage to plug into remote config as described in the other comment, there should be no ordering dependency 🤞
@@ -180,6 +181,18 @@ private Map<String, String> getPropertiesPropagatedToChildProcess( | |||
Strings.propertyNameToSystemPropertyName(CiVisibilityConfig.TEST_MANAGEMENT_ENABLED), | |||
Boolean.toString(executionSettings.getTestManagementSettings().isEnabled())); | |||
|
|||
propagatedSystemProperties.put( | |||
Strings.propertyNameToSystemPropertyName( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need a dedicated setting for this? I wonder if it can be derived from DebuggerConfig.EXCEPTION_REPLAY_ENABLED && CI_VISIBILITY_ENABLED
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this could cause problems because we use the ..._ENABLED
settings as kill switches in datadog.trace.civisibility.config.ExecutionSettingsFactoryImpl#doCreate
. So, although it would work when propagating the settings to the child process, in the parent process EXCEPTION_REPLAY_ENABLED==false
and CI_VISIBILITY_ENABLED==true
would mean that FTR wouldn't be enabled (even if it was marked as enabled by the backend)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed offline let's enable TO-specific debugger behaviour whenever exception replay is enabled and test optimization is enabled
@@ -108,7 +139,11 @@ public void handleException(Throwable t, AgentSpan span) { | |||
exceptionProbeManager.createProbesForException( | |||
throwable.getStackTrace(), chainedExceptionIdx); | |||
if (creationResult.probesCreated > 0) { | |||
AgentTaskScheduler.INSTANCE.execute(() -> applyExceptionConfiguration(fingerprint)); | |||
if (isFailedTestReplayActive || !applyConfigAsync) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we get rid of applyConfigAsync
and the corresponding config field/methods?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The applyConfigAsync
config was also added from #9186, to let Exception Replay apply the instrumentation synchronously even without Failed Test Replay
@Override | ||
public void beforeSuiteEnd() { | ||
LOGGER.debug("CiVisibility BeforeSuiteEnd fired, flushing sink"); | ||
sink.lowRateFlush(sink); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think debugger is already doing this asynchronously at a scheduled interval.
If I'm not mistaken, we just need to make sure whatever's left in the sink is flushed in com.datadog.debugger.sink.DebuggerSink#stop
that is called from the shutdown hook.
What Does This Do
Implements Test Optimization's Failed Test Replay using Live Debugger's Exception Replay. When the feature is enabled and a test is retried due to Auto Test Retries, Exception Replay's logic will create a probe for the exception thrown (in the case of the test probably an assertion error, but not limited to it). When the test is retried, the probe captures debugging information if the exception is encountered again, creating a snapshot of the variables. If the snapshot is captured, it is send as a log to Datadog. The following modifications were made to Exception Replay's original implementation:
Config
. The enabling of Exception Replay now checks for either the property to be marked as enabled or Failed Test Replay being marked as active. This works due to CiVisibility's system being initialized before Live Debugger's.DefaultExceptionDebugger
was modified to support Failed Test Replay. If working on Failed Test Replay mode it will:Error
s, which were previously ignored.product
field to snapshots, populated withtest_optimization
if Failed Test Replay was marked as active. This allows us to have the option of not billing customers for logs generated by the product.DD_CIVISIBILITY_AGENTLESS_ENABLED
is set, Live Debugger's logic for Exception Replay will use the logs API instead of the agent's.DebuggerSink
on test suite end, avoiding unsent snapshots.Additional changes:
BackendApiFactory.Intake
to a standaloneIntake
, given that it is useful in order to compute agentless mode URLs.failed_test_replay
in test frameworks that support Auto Test Retries.di_enabled
to the Settings response and telemetry.Validation:
MavenSmokeTest
now has an additional test for Failed Test Replay, validating the feature when build system instrumentation is present.JUnitConsoleSmokeTest
to validate the feature in headless mode. This test should ensure that the ordering dependency between CiVisibility's system and Live Debugger's is always accounted for.Motivation
Test Optimization wants to improve the support for Failed Test Replay, implementing it in additional languages apart from JS.
Contributor Checklist
type:
and (comp:
orinst:
) labels in addition to any usefull labelsclose
,fix
or any linking keywords when referencing an issue.Use
solves
instead, and assign the PR milestone to the issueJira ticket: SDTEST-2242