Skip to content

Attempt to debug CI flaky timeout. DO NOT MERGE! #9249

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 41 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
41 commits
Select commit Hold shift + click to select a range
e858d8e
Attempt to debug CI flaky timeout. DO NOT MERGE!
AlexeyKuznetsov-DD Jul 25, 2025
2ec7817
Merge branch 'master' into alexeyk/debug-ci-timeout
AlexeyKuznetsov-DD Jul 25, 2025
2a53ed6
Merge branch 'master' into alexeyk/debug-ci-timeout
AlexeyKuznetsov-DD Jul 26, 2025
99bf38c
Attempt to collect thread dump.
AlexeyKuznetsov-DD Jul 28, 2025
1e09bdf
Merge branch 'master' into alexeyk/debug-ci-timeout
AlexeyKuznetsov-DD Jul 28, 2025
057d106
Attempt to collect thread dump.
AlexeyKuznetsov-DD Jul 28, 2025
d338595
Attempt to collect thread dump.
AlexeyKuznetsov-DD Jul 28, 2025
cb5002a
Merge branch 'master' into alexeyk/debug-ci-timeout
AlexeyKuznetsov-DD Jul 29, 2025
119f529
Another attempt.
AlexeyKuznetsov-DD Jul 29, 2025
6f55eed
Merge branch 'master' into alexeyk/debug-ci-timeout
AlexeyKuznetsov-DD Jul 29, 2025
bbe67ef
Another attempt.
AlexeyKuznetsov-DD Jul 29, 2025
3b1a643
Merge branch 'master' into alexeyk/debug-ci-timeout
AlexeyKuznetsov-DD Jul 30, 2025
9e8a13e
Attempt to specify full path
AlexeyKuznetsov-DD Jul 30, 2025
88dcfa3
Attempt to specify full path
AlexeyKuznetsov-DD Jul 30, 2025
9d9f161
Cleanup
AlexeyKuznetsov-DD Jul 31, 2025
5296e75
Merge branch 'master' into alexeyk/debug-ci-timeout
AlexeyKuznetsov-DD Jul 31, 2025
8429062
One more try
AlexeyKuznetsov-DD Jul 31, 2025
12703ea
Merge remote-tracking branch 'origin/alexeyk/debug-ci-timeout' into a…
AlexeyKuznetsov-DD Jul 31, 2025
1a8a272
Merge branch 'master' into alexeyk/debug-ci-timeout
AlexeyKuznetsov-DD Jul 31, 2025
cd0ad30
Merge branch 'master' into alexeyk/debug-ci-timeout
AlexeyKuznetsov-DD Aug 1, 2025
d58d75b
Added test info to dump.
AlexeyKuznetsov-DD Aug 1, 2025
9deb033
Merge branch 'master' into alexeyk/debug-ci-timeout
AlexeyKuznetsov-DD Aug 3, 2025
1f22349
One more try
AlexeyKuznetsov-DD Aug 3, 2025
8d42824
Merge branch 'master' into alexeyk/debug-ci-timeout
AlexeyKuznetsov-DD Aug 4, 2025
8539c89
Try to fix test freeze by updating logback version.
AlexeyKuznetsov-DD Aug 4, 2025
b0fa34b
Merge branch 'master' into alexeyk/debug-ci-timeout
AlexeyKuznetsov-DD Aug 4, 2025
e186248
Try to fix test freeze by updating logback version to 1.2.9.
AlexeyKuznetsov-DD Aug 4, 2025
b44cf56
Merge branch 'master' into alexeyk/debug-ci-timeout
AlexeyKuznetsov-DD Aug 5, 2025
ffa412c
Added heap dump
AlexeyKuznetsov-DD Aug 5, 2025
8ddf5dc
Merge branch 'master' into alexeyk/debug-ci-timeout
AlexeyKuznetsov-DD Aug 5, 2025
48e34d7
Fixed report script
AlexeyKuznetsov-DD Aug 5, 2025
c3bdd3e
Merge branch 'master' into alexeyk/debug-ci-timeout
AlexeyKuznetsov-DD Aug 5, 2025
892be19
Another attempt.
AlexeyKuznetsov-DD Aug 5, 2025
2844c00
Merge branch 'master' into alexeyk/debug-ci-timeout
AlexeyKuznetsov-DD Aug 5, 2025
7518114
Another attempt.
AlexeyKuznetsov-DD Aug 5, 2025
ab90b23
Another attempt.
AlexeyKuznetsov-DD Aug 5, 2025
7bda4d4
Merge branch 'master' into alexeyk/debug-ci-timeout
AlexeyKuznetsov-DD Aug 6, 2025
f0cd3d4
Another attempt 3.
AlexeyKuznetsov-DD Aug 6, 2025
901b3a3
Spock 2.4
AlexeyKuznetsov-DD Aug 6, 2025
d0916a9
Merge branch 'master' into alexeyk/debug-ci-timeout
AlexeyKuznetsov-DD Aug 6, 2025
9bf1b6b
Update Spock only for one module.
AlexeyKuznetsov-DD Aug 6, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitlab/collect_reports.sh
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,7 @@ function process_reports () {
cp -r workspace/$project_to_save/build/reports/* $report_path/ 2>/dev/null || true
cp workspace/$project_to_save/build/hs_err_pid*.log $report_path/ 2>/dev/null || true
cp workspace/$project_to_save/build/javacore*.txt $report_path/ 2>/dev/null || true
cp workspace/$project_to_save/build/*.* $report_path/ 2>/dev/null || true
fi
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,7 @@ tasks.named("check").configure {

dependencies {
testImplementation project(':dd-java-agent:instrumentation:trace-annotation')
testImplementation libs.bundles.spock24
}

// Set all compile tasks to use JDK21 but let instrumentation code targets 1.8 compatibility
Expand Down
Original file line number Diff line number Diff line change
@@ -1,14 +1,46 @@
import com.sun.management.HotSpotDiagnosticMXBean
import datadog.trace.agent.test.AgentTestRunner
import datadog.trace.api.Trace

import javax.management.MBeanServer
import java.lang.management.ManagementFactory
import java.util.concurrent.Callable
import java.util.concurrent.Executors
import java.util.concurrent.ScheduledExecutorService
import java.util.concurrent.ScheduledFuture
import java.util.concurrent.StructuredTaskScope
import java.util.concurrent.TimeUnit

import static datadog.trace.agent.test.utils.TraceUtils.runUnderTrace
import static datadog.trace.agent.test.utils.TraceUtils.runnableUnderTrace
import static java.time.Instant.now

class StructuredConcurrencyTest extends AgentTestRunner {
ThreadDumpLogger threadDumpLogger

def setup() {
File reportDir = new File("build")
String fullPath = reportDir.absolutePath.replace("dd-trace-java/dd-java-agent",
"dd-trace-java/workspace/dd-java-agent")

reportDir = new File(fullPath)
if (!reportDir.exists()) {
println("Folder not found: " + fullPath)
reportDir.mkdirs()
} else println("Folder found: " + fullPath)

// Use the current feature name as the test name
String testName = "${specificationContext?.currentSpec?.name ?: "unknown-spec"} : ${specificationContext?.currentFeature?.name ?: "unknown-test"}"

threadDumpLogger = new ThreadDumpLogger(testName, reportDir)
threadDumpLogger.start()
}

def cleanup() {
threadDumpLogger.stop()
}


/**
* Tests the structured task scope with a single task.
*/
Expand All @@ -18,14 +50,15 @@ class StructuredConcurrencyTest extends AgentTestRunner {
def result = false

when:
Thread.sleep(100)
runUnderTrace("parent") {
def task = taskScope.fork(new Callable<Boolean>() {
@Trace(operationName = "child")
@Override
Boolean call() throws Exception {
return true
}
})
@Trace(operationName = "child")
@Override
Boolean call() throws Exception {
return true
}
})
taskScope.joinUntil(now() + 10) // Wait for 10 seconds at maximum
result = task.get()
}
Expand Down Expand Up @@ -164,4 +197,48 @@ class StructuredConcurrencyTest extends AgentTestRunner {
}
}
}
// 🔒 Private helper class for thread dump logging
private static class ThreadDumpLogger {
private final String testName
private final File outputDir
private final ScheduledExecutorService scheduler = Executors.newSingleThreadScheduledExecutor()
private ScheduledFuture<?> task

ThreadDumpLogger(String testName, File outputDir) {
this.testName = testName
this.outputDir = outputDir
}

void start() {
// new File(outputDir, "${System.currentTimeMillis()}-start-mark.txt") << testName

task = scheduler.scheduleAtFixedRate({
heapDump("test")

def reportFile = new File(outputDir, "${System.currentTimeMillis()}-thread-dump.log")
try (def writer = new FileWriter(reportFile)) {
writer.write("=== Test: ${testName} ===\n")
writer.write("=== Thread Dump Triggered at ${new Date()} ===\n")
Thread.getAllStackTraces().each { thread, stack ->
writer.write("Thread: ${thread.name}, daemon: ${thread.daemon}\n")
stack.each { writer.write("\tat ${it}\n") }
}
writer.write("==============================================\n")
}
}, 10003, 60000, TimeUnit.MILLISECONDS)
}

void heapDump(String kind) {
def heapDumpFile = new File(outputDir, "${System.currentTimeMillis()}-heap-dump-${kind}.hprof").absolutePath
MBeanServer server = ManagementFactory.getPlatformMBeanServer()
HotSpotDiagnosticMXBean mxBean = ManagementFactory.newPlatformMXBeanProxy(
server, "com.sun.management:type=HotSpotDiagnostic", HotSpotDiagnosticMXBean.class)
mxBean.dumpHeap(heapDumpFile, true)
}

void stop() {
task?.cancel(false)
scheduler.shutdownNow()
}
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -3,11 +3,19 @@ import com.lambdaworks.redis.RedisClient
import com.lambdaworks.redis.api.StatefulConnection
import com.lambdaworks.redis.api.async.RedisAsyncCommands
import com.lambdaworks.redis.api.sync.RedisCommands
import com.sun.management.HotSpotDiagnosticMXBean
import datadog.trace.agent.test.naming.VersionedNamingTestBase
import datadog.trace.agent.test.utils.PortUtils
import redis.embedded.RedisServer
import spock.lang.Shared

import javax.management.MBeanServer
import java.lang.management.ManagementFactory
import java.util.concurrent.Executors
import java.util.concurrent.ScheduledExecutorService
import java.util.concurrent.ScheduledFuture
import java.util.concurrent.TimeUnit

import static datadog.trace.agent.test.utils.TraceUtils.runUnderTrace

abstract class Lettuce4ClientTestBase extends VersionedNamingTestBase {
Expand All @@ -32,6 +40,8 @@ abstract class Lettuce4ClientTestBase extends VersionedNamingTestBase {
@Shared
RedisServer redisServer

ThreadDumpLogger threadDumpLogger

@Shared
Map<String, String> testHashMap = [
firstname: "John",
Expand All @@ -53,14 +63,30 @@ abstract class Lettuce4ClientTestBase extends VersionedNamingTestBase {
embeddedDbUri = "redis://" + dbAddr

redisServer = RedisServer.newRedisServer()
// bind to localhost to avoid firewall popup
.setting("bind " + HOST)
// set max memory to avoid problems in CI
.setting("maxmemory 128M")
.port(port).build()
// bind to localhost to avoid firewall popup
.setting("bind " + HOST)
// set max memory to avoid problems in CI
.setting("maxmemory 128M")
.port(port).build()
}

def setup() {
File reportDir = new File("build")
String fullPath = reportDir.absolutePath.replace("dd-trace-java/dd-java-agent",
"dd-trace-java/workspace/dd-java-agent")

reportDir = new File(fullPath)
if (!reportDir.exists()) {
println("Folder not found: " + fullPath)
reportDir.mkdirs()
} else println("Folder found: " + fullPath)

// Use the current feature name as the test name
String testName = "${specificationContext?.currentSpec?.name ?: "unknown-spec"} : ${specificationContext?.currentFeature?.name ?: "unknown-test"}"

threadDumpLogger = new ThreadDumpLogger(testName, reportDir)
threadDumpLogger.start()

redisServer.start()

redisClient = RedisClient.create(embeddedDbUri)
Expand All @@ -79,8 +105,55 @@ abstract class Lettuce4ClientTestBase extends VersionedNamingTestBase {
}

def cleanup() {
threadDumpLogger.stop()

connection.close()
redisClient.shutdown()
redisServer.stop()
}

// 🔒 Private helper class for thread dump logging
private static class ThreadDumpLogger {
private final String testName
private final File outputDir
private final ScheduledExecutorService scheduler = Executors.newSingleThreadScheduledExecutor()
private ScheduledFuture<?> task

ThreadDumpLogger(String testName, File outputDir) {
this.testName = testName
this.outputDir = outputDir
}

void start() {
// new File(outputDir, "${System.currentTimeMillis()}-start-mark.txt") << testName

task = scheduler.scheduleAtFixedRate({
heapDump("test")

def reportFile = new File(outputDir, "${System.currentTimeMillis()}-thread-dump.log")
try (def writer = new FileWriter(reportFile)) {
writer.write("=== Test: ${testName} ===\n")
writer.write("=== Thread Dump Triggered at ${new Date()} ===\n")
Thread.getAllStackTraces().each { thread, stack ->
writer.write("Thread: ${thread.name}, daemon: ${thread.daemon}\n")
stack.each { writer.write("\tat ${it}\n") }
}
writer.write("==============================================\n")
}
}, 10003, 60000, TimeUnit.MILLISECONDS)
}

void heapDump(String kind) {
def heapDumpFile = new File(outputDir, "${System.currentTimeMillis()}-heap-dump-${kind}.hprof").absolutePath
MBeanServer server = ManagementFactory.getPlatformMBeanServer()
HotSpotDiagnosticMXBean mxBean = ManagementFactory.newPlatformMXBeanProxy(
server, "com.sun.management:type=HotSpotDiagnostic", HotSpotDiagnosticMXBean.class)
mxBean.dumpHeap(heapDumpFile, true)
}

void stop() {
task?.cancel(false)
scheduler.shutdownNow()
}
}
}
4 changes: 3 additions & 1 deletion gradle/libs.versions.toml
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,8 @@ objenesis = { module = "org.objenesis:objenesis", version = "3.3" } # Used by Sp

spock24-core = { module = "org.spockframework:spock-core", version.ref = "spock24" }
spock24-junit4 = { module = "org.spockframework:spock-junit4", version.ref = "spock24" }
spock24-spring = { module = "org.spockframework:spock-spring", version = "spock24" }
spock24-spring = { module = "org.spockframework:spock-spring", version.ref = "spock24" }
objenesis34 = { module = "org.objenesis:objenesis", version = "3.4" }

groovy = { module = "org.codehaus.groovy:groovy-all", version.ref = "groovy" }
groovy-yaml = { module = "org.codehaus.groovy:groovy-yaml", version.ref = "groovy" }
Expand Down Expand Up @@ -107,6 +108,7 @@ asm = ["asm", "asmcommons"]
cafe-crypto = ["cafe-crypto-curve25519", "cafe-crypto-ed25519"]
# Testing
spock = ["spock-core", "spock-junit4", "objenesis"]
spock24 = ["spock24-core", "spock24-junit4", "objenesis34"]
spock24-spring = ["spock24-core", "spock24-junit4", "spock24-spring"]
junit5 = ["junit-jupiter", "junit-jupiter-params"]
mockito = ["mokito-core", "mokito-junit-jupiter", "byte-buddy", "byte-buddy-agent"]
Expand Down