Skip to content

Conversation

@simonswine
Copy link
Contributor

@simonswine simonswine commented Nov 11, 2025

This PR adds support for .pyroscope.yaml configuration files for custom source code mappings and adds logic to use those mappings to find Go and Java code

  • Added .pyroscope.yaml configuration support for custom source code location mappings
  • Supports both local paths and GitHub repository mappings for Go and Java
  • Implements prefix-based matching with longest-prefix-wins strategy
  • Enhanced Go standard library resolution with version detection and configurable source
  • Added function_name parameter to file lookups, so java profiles (that don't contain file paths), can be looked up.
  • Improved error tracing with OpenTracing tags in GitHub client

Configuration Example (also in the PR):

This will map a special path prefix called $GOROOT to a particular go version. This signals to the backend the go version used to compile this should be kept update to date with the version of Pyroscope used to build.

  source_code:
    mappings:
      - path:
          - prefix: $GOROOT/src
        language: go
        source:
          github:
            owner: golang
            repo: go
            ref: go1.24.8
            path: src

@simonswine simonswine changed the title 20251111 experiment pyroscope yaml WIP: Experiment with .pyroscope.yaml Nov 11, 2025
@simonswine simonswine force-pushed the 20251111_experiment-pyroscope-yaml branch from 1a8e75f to fa7f045 Compare November 13, 2025 11:57
type: string
title: rootPath
description: the root path where the project lives inside the repository
functionName:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like currently logic is: use localPath if go, use functionName if java. May lead to confusion down the road if we start seeing java profiles with the file paths properly set.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I guess the profiling data you send needs to match with the .pyroscope.yaml you set. I do think this makes sense and is both in customer control.

@@ -0,0 +1,36 @@
source_code:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This config file is great. Wondering if there's also an opportunity to auto-detect Java files w/o a config file too and support automatic 3rd party path resolution like we do with go? I imagine at Uber scale it would be a large lift to make a config file that captures everything they'd like.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the only opportunity we have is setting it on the client site (as a parameter to GetFile).

Then we could try to match a path in the repo with the package path. Still fairly tricky to detect the source code prefix (like src/main or <package>/src/main

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noting here that we discussed offline a tool for programmatically generating this config that I'm working on. Has the added benefit of moving the expensive logic of translating function names to git repos outside of the GetFile call path.

@simonswine simonswine force-pushed the 20251111_experiment-pyroscope-yaml branch from d5ab32d to 85d78f1 Compare November 14, 2025 17:22
@simonswine simonswine force-pushed the 20251111_experiment-pyroscope-yaml branch from e34c558 to 7d5ab4e Compare November 21, 2025 09:45
@simonswine simonswine changed the title WIP: Experiment with .pyroscope.yaml feat: Add .pyroscope.yaml with source code mapping support Nov 21, 2025
@simonswine simonswine marked this pull request as ready for review November 21, 2025 11:43
@simonswine simonswine requested review from a team, bryanhuhta and marcsanmi as code owners November 21, 2025 11:43
Copy link
Contributor

@bryanhuhta bryanhuhta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some thoughts for the future, but I like where this is at right now.

Comment on lines +28 to +30
type PyroscopeConfig struct {
SourceCode SourceCodeConfig `yaml:"source_code"`
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we also want a version tag here so we can continue to support older versions easily.

Comment on lines +159 to +170
func (c *PyroscopeConfig) FindMapping(file FileSpec) *MappingConfig {
// Find the longest matching prefix
var bestMatch *MappingConfig
var bestMatchLen = -1
for _, m := range c.SourceCode.Mappings {
if result := m.Match(file); result > bestMatchLen {
bestMatch = &m
bestMatchLen = result
}
}
return bestMatch
}
Copy link
Contributor

@bryanhuhta bryanhuhta Nov 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'll be interesting to see how expensive this linear search is time-wise for very large projects. I suspect there could easily be many thousands of these mappings.

It's out of scope of this PR, but later we may need to improve this search heuristic.

}

// loadConfig attempts to load .pyroscope.yaml from the repository root
func (ff *FileFinder) loadConfig(ctx context.Context) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again a future improvement, but it would be nice to cache the config + git commit (at least in memory, maybe even in object storage) so we only need to make this call once.

Copy link
Contributor Author

@simonswine simonswine Nov 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also needs the token or userid, some users might not have access to the particular repo.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah there might be some more details to iron out. I think caching just the config should be fine because eventually we will need to hit the GitHub API to fetch an actual file, which permissions will apply then.

Copy link
Contributor

@jake-kramer jake-kramer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@@ -0,0 +1,36 @@
source_code:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noting here that we discussed offline a tool for programmatically generating this config that I'm working on. Has the added benefit of moving the expensive logic of translating function names to git repos outside of the GetFile call path.

}

// Validate checks if a source configuration is valid
func (m *Source) Validate() error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be helpful error message to surface if the supplied source is not supported (i.e. not github or local)


func (ff FileFinder) findFallback(ctx context.Context) (*vcsv1.GetFileResponse, error) {
switch filepath.Ext(ff.file.Path) {
case ExtGo, ExtAsm:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should add a comment about the ExtAsm case being updated when other compiled languages are supported

return
}
sp.SetTag("config.url", file.URL)
sp.SetTag("config", file.Content)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this too large to tag in span?

sp.SetTag("config.url", file.URL)
sp.SetTag("config", file.Content)

cfg, err := config.ParsePyroscopeConfig([]byte(file.Content))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I still think it would be useful to load up this config when the frontend loads up and WARN any errors just as a useful sanity check. OK to instead rely on config tooling I'm working on.

// fetchRepoFile fetches the file content from the configured repository.
func (arg FileFinder) fetchRepoFile(ctx context.Context, path, ref string) (*vcsv1.GetFileResponse, error) {
if arg.rootPath != "" {
path = filepath.Join(arg.rootPath, path)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: There's sort of a rough edge here where this root path is defined in the UI at repo level and then the mapping config can contain local paths that are function-level root paths. I can see as a user being confused playing with these two configs.

Maybe at some point a helper tooltip in the UI can clarify how these configs interact.


javaPath := convertJavaFunctionNameToPath(ff.file.FunctionName)
for _, m := range mappings {
return ff.fetchMappingFile(ctx, m, javaPath)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't work as expected for multiple mappings. Not sure I see the benefit of passing multiple mappings to these functions, though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants