Skip to content

Commit 6b483d7

Browse files
authored
Use P/Invoke to call lstat instead of shelling out to stat to retrieve the inode (#7453)
## Summary of changes Adds a P/Invoke path for retrieving the inode on linux and only later falls back to using `stat -c %i <path>` ## Reason for change Calling `stat` will fail if `stat` isn't available (as in chiseled containers, for example). It also results in additional spans being created when we're running in a version-mismatch scenario(because the `Process` invocation is traced) ## Implementation details Based my initial PoC on P/Invoking into `System.Native`, which is a shim shipped with recent .NET that handles a bunch of marshalling issues etc and returns [this type](https://github.com/dotnet/runtime/blob/main/src/libraries/Common/src/Interop/Unix/System.Native/Interop.Stat.cs). That's [how recent .NET versions](https://github.com/dotnet/runtime/blob/main/src/libraries/System.Private.CoreLib/src/System/IO/FileStatus.Unix.cs#L498) handle various bits like this. However, System.Native only gives the correct inode value in .NET 7+ (from my local testing) and isn't available in early .NET Core. I then tried the "raw" route, trying to P/Invoke directly into `libc`. I also investigated making the syscall directly, but libc papers over differences in old kernels etc, so it seemed the easiest route. However, older versions of `libc` don't export the `lstat` or `fstatat` symbols as public symbols, and instead they're just macros. Finally settled on a similar approach to System.Native, i.e. we P/Invoke into Datadog.Trace.Native which then makes the `lstat` call. ## Test coverage Should _mostly_ be covered by existing tests (and smoke tests). I added an additional unit test to make sure the `stat -c %i` call and the P/Invoke give the same value, otherwise we should be covered. I also rebased on top of [this branch](#7446) which is where the error appeared - and did a [full-installer run](https://dev.azure.com/datadoghq/dd-trace-dotnet/_build/results?buildId=186579&view=results), and it fixed all the issues, so we should be good 🤞 And we still have the existing fallback if not Note that I explicitly disabled the process spans in the version-conflict tests, as whether we get the spans or not is dependent on the underlying host system, which is a bit gross. ## Other details ~~AI assisted in various places, including generating the structs.~~ None of that worked in the end. Great job AI. I made the failure an error as it _shouldn't_ fail in normal execution, and we'd like to report if it does, however there are some cases where the P/Invoke calls aren't re-written so we know this won't work. In case of failure, we explicitly check for that, and log the error as debug instead. To get everything to compile easily I split the `ContainerMetadata` implementation for .NET FX as it's not supported in that scenario anyway. Also, to be able to unit test the P/Invoke work, I copied the native libs into the test directories. Finally, it's worth us looking into [this method](https://learn.microsoft.com/en-us/dotnet/api/system.runtime.interopservices.nativelibrary.setdllimportresolver?view=net-9.0) to allow P/Invokes to happen in the native runner, given we know where the native libs are, they're just not re-written. Something to think about in a separate PR I think
1 parent b44de97 commit 6b483d7

File tree

8 files changed

+180
-9
lines changed

8 files changed

+180
-9
lines changed

tracer/build/_build/Build.Steps.cs

Lines changed: 15 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -680,18 +680,27 @@ async Task DownloadWafVersion(string libddwafVersion = null, string uncompressFo
680680

681681
var testBinFolder = testDir / "bin" / BuildConfiguration;
682682

683-
var (ext, source) = Platform switch
683+
var (ext, source, libdatadog) = Platform switch
684684
{
685-
PlatformFamily.Windows => ("dll", MonitoringHomeDirectory / $"win-{TargetPlatform}" / "datadog_profiling_ffi.dll"),
686-
PlatformFamily.Linux => ("so", MonitoringHomeDirectory / GetUnixArchitectureAndExtension().Arch / "libdatadog_profiling.so"),
687-
PlatformFamily.OSX => ("dylib", MonitoringHomeDirectory / "osx" / $"libdatadog_profiling.dylib"),
685+
PlatformFamily.Windows => ("dll", MonitoringHomeDirectory / $"win-{TargetPlatform}", "datadog_profiling_ffi.dll"),
686+
PlatformFamily.Linux => ("so", MonitoringHomeDirectory / GetUnixArchitectureAndExtension().Arch, "libdatadog_profiling.so"),
687+
PlatformFamily.OSX => ("dylib", MonitoringHomeDirectory / "osx", "libdatadog_profiling.dylib"),
688688
_ => throw new NotSupportedException($"Unsupported platform: {Platform}")
689689
};
690690

691+
var libs = new[]
692+
{
693+
(libdatadog, $"LibDatadog.{ext}"),
694+
($"Datadog.Tracer.Native.{ext}", $"Datadog.Tracer.Native.{ext}"),
695+
};
696+
691697
foreach (var framework in frameworks)
692698
{
693-
var dest = testBinFolder / framework / $"LibDatadog.{ext}";
694-
CopyFile(source, dest, FileExistsPolicy.Overwrite);
699+
foreach (var lib in libs)
700+
{
701+
var dest = testBinFolder / framework / lib.Item2;
702+
CopyFile(source / lib.Item1, dest, FileExistsPolicy.Overwrite);
703+
}
695704
}
696705
}
697706
});

tracer/src/Datadog.Trace/ClrProfiler/NativeMethods.cs

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -243,6 +243,23 @@ public static int GetUserStrings(int arrSize, [In, Out] UserStringInterop[] arr)
243243
}
244244
}
245245

246+
public static bool TryGetInodeForPath(string path, out long result)
247+
{
248+
if (IsWindows)
249+
{
250+
result = -1;
251+
return false;
252+
}
253+
254+
result = NonWindows.GetInodeForPath(path);
255+
if (result < 0)
256+
{
257+
return false;
258+
}
259+
260+
return true;
261+
}
262+
246263
// the "dll" extension is required on .NET Framework
247264
// and optional on .NET Core
248265
// The DllImport methods are re-written by cor_profiler to have the correct vales
@@ -330,6 +347,9 @@ private static class NonWindows
330347

331348
[DllImport("Datadog.Tracer.Native", CharSet = CharSet.Unicode)]
332349
public static extern int GetUserStrings(int arrSize, [In, Out] UserStringInterop[] arr);
350+
351+
[DllImport("Datadog.Tracer.Native")]
352+
public static extern long GetInodeForPath([MarshalAs(UnmanagedType.LPWStr)]string path);
333353
}
334354
}
335355
}
Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
// <copyright file="ContainerMetadata.NetFramework.cs" company="Datadog">
2+
// Unless explicitly stated otherwise all files in this repository are licensed under the Apache 2 License.
3+
// This product includes software developed at Datadog (https://www.datadoghq.com/). Copyright 2017 Datadog, Inc.
4+
// </copyright>
5+
6+
#nullable enable
7+
8+
#if NETFRAMEWORK
9+
10+
namespace Datadog.Trace.PlatformHelpers;
11+
12+
/// <summary>
13+
/// Utility class with methods to interact with container hosts.
14+
/// </summary>
15+
internal static class ContainerMetadata
16+
{
17+
/// <summary>
18+
/// Gets the id of the container executing the code.
19+
/// Return <c>null</c> if code is not executing inside a supported container.
20+
/// </summary>
21+
/// <returns>The container id or <c>null</c>.</returns>
22+
public static string? GetContainerId() => null;
23+
24+
/// <summary>
25+
/// Gets the unique identifier of the container executing the code.
26+
/// Return values may be:
27+
/// <list type="bullet">
28+
/// <item>"ci-&lt;containerID&gt;" if the container id is available.</item>
29+
/// <item>"in-&lt;inode&gt;" if the cgroup node controller's inode is available.
30+
/// We use the memory controller on cgroupv1 and the root cgroup on cgroupv2.</item>
31+
/// <item><c>null</c> if neither are available.</item>
32+
/// </list>
33+
/// </summary>
34+
/// <returns>The entity id or <c>null</c>.</returns>
35+
public static string? GetEntityId() => null;
36+
}
37+
#endif

tracer/src/Datadog.Trace/PlatformHelpers/ContainerMetadata.cs

Lines changed: 52 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -3,12 +3,17 @@
33
// This product includes software developed at Datadog (https://www.datadoghq.com/). Copyright 2017 Datadog, Inc.
44
// </copyright>
55

6+
#if !NETFRAMEWORK
7+
68
using System;
79
using System.Collections.Generic;
10+
using System.Diagnostics.CodeAnalysis;
811
using System.IO;
912
using System.Linq;
13+
using System.Runtime.InteropServices;
1014
using System.Text.RegularExpressions;
1115
using System.Threading;
16+
using Datadog.Trace.ClrProfiler;
1217
using Datadog.Trace.Logging;
1318
using Datadog.Trace.Util;
1419

@@ -117,8 +122,8 @@ public static string ExtractInodeFromCgroupLines(string controlGroupsMountPath,
117122
{
118123
var tuple = ParseControllerAndPathFromCgroupLine(line);
119124
if (tuple is not null
120-
&& !string.IsNullOrEmpty(tuple.Item2)
121-
&& (tuple.Item1 == string.Empty || string.Equals(tuple.Item1, "memory", StringComparison.OrdinalIgnoreCase)))
125+
&& !string.IsNullOrEmpty(tuple.Item2)
126+
&& (tuple.Item1 == string.Empty || string.Equals(tuple.Item1, "memory", StringComparison.OrdinalIgnoreCase)))
122127
{
123128
string controller = tuple.Item1;
124129
string cgroupNodePath = tuple.Item2;
@@ -149,6 +154,49 @@ public static Tuple<string, string> ParseControllerAndPathFromCgroupLine(string
149154
}
150155

151156
internal static bool TryGetInode(string path, out long result)
157+
=> TryGetInodeUsingPInvoke(path, out result)
158+
|| TryGetInodeUsingStat(path, out result);
159+
160+
// Internal for testing
161+
internal static bool TryGetInodeUsingPInvoke(string path, out long result)
162+
{
163+
result = 0;
164+
165+
try
166+
{
167+
if (!NativeMethods.TryGetInodeForPath(path, out result))
168+
{
169+
LogError(null, "Error obtaining inode using PInvoke, returned " + result);
170+
return false;
171+
}
172+
173+
return true;
174+
}
175+
catch (Exception ex)
176+
{
177+
LogError(ex, "Error obtaining inode using PInvoke");
178+
return false;
179+
}
180+
181+
static void LogError(Exception ex, string message)
182+
{
183+
#pragma warning disable DDLOG004 // Must use constant strings - disabled as it's an integer only, and only called twice in the app lifetime
184+
if (EnvironmentHelpersNoLogging.IsClrProfilerAttachedSafe())
185+
{
186+
// if it's attached, then this really is an error
187+
Log.Error(ex, message);
188+
}
189+
else
190+
{
191+
// The profiler isn't there, so this is "normal"
192+
Log.Debug(ex, message);
193+
}
194+
}
195+
#pragma warning restore DDLOG004
196+
}
197+
198+
// Internal for testing
199+
internal static bool TryGetInodeUsingStat(string path, out long result)
152200
{
153201
result = 0;
154202

@@ -159,7 +207,7 @@ internal static bool TryGetInode(string path, out long result)
159207
}
160208
catch (Exception ex)
161209
{
162-
Log.Warning(ex, "Error obtaining inode.");
210+
Log.Warning(ex, "Error obtaining inode using stat");
163211
return false;
164212
}
165213
}
@@ -223,3 +271,4 @@ private static bool IsHostCgroupNamespaceInternal()
223271
}
224272
}
225273
}
274+
#endif

tracer/src/Datadog.Tracer.Native/Datadog.Tracer.Native.def

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,5 +24,6 @@ EXPORTS
2424
ShouldHeal
2525
ReportSuccessfulInstrumentation
2626
GetUserStrings
27+
GetInodeForPath
2728
InitEmbeddedCallSiteDefinitions
2829
InitEmbeddedCallTargetDefinitions

tracer/src/Datadog.Tracer.Native/interop.cpp

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -268,7 +268,37 @@ EXTERN_C BOOL STDAPICALLTYPE ShouldHeal(ModuleID moduleId, int methodToken, WCHA
268268
return trace::profiler->ShouldHeal(moduleId, methodToken, instrumentationId, products);
269269
}
270270

271+
EXTERN_C u_long STDAPICALLTYPE GetInodeForPath(const WCHAR* path)
272+
{
273+
#ifdef _WIN32
274+
// We don't need to support this on Windows
275+
return -1;
276+
#else
277+
if (!path)
278+
{
279+
return -1;
280+
}
281+
282+
const std::string str = ToString(path);
283+
if (str.empty())
284+
{
285+
return -1;
286+
}
287+
288+
// We use lstat, because that matches the behaviour of the 'stat' command
289+
// i.e. don't follow symlinks
290+
struct stat st {};
291+
if (lstat(str.c_str(), &st) != 0)
292+
{
293+
return -errno;
294+
}
295+
296+
return st.st_ino;
297+
#endif
298+
}
299+
271300
#ifndef _WIN32
301+
272302
EXTERN_C void *dddlopen (const char *__file, int __mode)
273303
{
274304
return dlopen(__file, __mode);

tracer/test/Datadog.Trace.ClrProfiler.IntegrationTests/VersionConflict/VersionConflict2xTests.cs

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,11 @@ public class VersionConflict2xTests : TestHelper
1818
public VersionConflict2xTests(ITestOutputHelper output)
1919
: base("VersionConflict.2x", output)
2020
{
21+
// We disable the process integration to prevent the `stat` process spans that
22+
// _may_ be present depending on the underlying host system (cgroup v1/v2).
23+
// The "do not trace" helper that normally blocks tracing these spans doesn't
24+
// work in version conflict scenarios.
25+
SetEnvironmentVariable("DD_TRACE_Process_ENABLED", "0");
2126
}
2227

2328
[SkippableFact]

tracer/test/Datadog.Trace.Tests/PlatformHelpers/ContainerMetadataTests.cs

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,8 @@
33
// This product includes software developed at Datadog (https://www.datadoghq.com/). Copyright 2017 Datadog, Inc.
44
// </copyright>
55

6+
#if !NETFRAMEWORK
7+
68
using System;
79
using System.Collections.Generic;
810
using System.Diagnostics;
@@ -255,5 +257,23 @@ public void Parse_TryGetInode()
255257
success.Should().BeTrue();
256258
inode.Should().BeGreaterThan(0);
257259
}
260+
261+
[SkippableFact]
262+
public void Parse_TryGetInode_ShouldGetSameValueFromPinvokeAndProcess()
263+
{
264+
SkipOn.Platform(SkipOn.PlatformValue.Windows);
265+
SkipOn.Platform(SkipOn.PlatformValue.MacOs);
266+
267+
string currentDirectory = Environment.CurrentDirectory;
268+
bool success1 = ContainerMetadata.TryGetInodeUsingStat(currentDirectory, out long inode1);
269+
bool success2 = ContainerMetadata.TryGetInodeUsingPInvoke(currentDirectory, out long inode2);
270+
271+
success1.Should().BeTrue();
272+
success2.Should().BeTrue();
273+
inode1.Should().BeGreaterThan(0);
274+
inode2.Should().Be(inode1);
275+
}
258276
}
259277
}
278+
279+
#endif

0 commit comments

Comments
 (0)