feat: parse GitHub.copilot-chat/transcripts/*.jsonl event-stream format#70
Open
hora7ce wants to merge 3 commits into
Open
feat: parse GitHub.copilot-chat/transcripts/*.jsonl event-stream format#70hora7ce wants to merge 3 commits into
hora7ce wants to merge 3 commits into
Conversation
…SH / devcontainer findVsCodeDirs() only scanned desktop installation paths (~/.config/Code, AppData, ~/Library/...) and missed the VS Code Server path used by WSL2, Remote SSH, and Dev Containers: ~/.vscode-server/data/User/workspaceStorage ~/.vscode-server-insiders/data/User/workspaceStorage Add both server editions to the scan on non-Windows platforms. Also extend harnessFromPath() with .vscode-server-insiders and .vscode-server checks (ordered most-specific first to avoid the Insiders path matching the plain .vscode-server substring) so sessions discovered via these paths are labelled 'Local Agent (Server)' or 'Local Agent (Server Insiders)' rather than the fallback 'Local Agent'. Fixes microsoft#62
- README.extension.md: add Local Agent (Server) and Local Agent (Server Insiders) rows to Supported Harnesses table - docs/content/_index.md: add server harness row to Multi-Harness Support table - docs/content/getting-started/supported-tools.md: note Remote-WSL/SSH/devcontainer log paths under Local Agent section - parser-vscode.ts: tighten harnessFromPath ordering comment (substring collision) and findVsCodeDirs platform guard comment per reviewer suggestions - parser-vscode.test.ts: add findVsCodeDirs test covering server workspaceStorage path inclusion via temporary home directory
Fixes microsoft#64. VS Code stores Copilot Chat sessions in two locations inside each workspace's workspaceStorage entry: 1. chatSessions/*.{json,jsonl} — existing format (already parsed) 2. GitHub.copilot-chat/transcripts/*.jsonl — newer event-stream format (silently ignored until now) This commit adds support for the second format. ## New helpers (parser-vscode.ts) - listTranscriptFiles(dir) — lists *.jsonl files in a transcripts/ dir - parseTranscriptLines(raw) — parses JSONL text into TranscriptEvent[] - buildToolNameIndex(events) — pre-indexes toolCallId → toolName - collectToolsFromToolRequests(...) — extracts tool names from assistant message toolRequests arrays with fallback to the pre-built index - buildRequestsFromTranscriptEvents(events, toolNames) — groups events into per-turn SessionRequest[] (one request per user.message) - parseTranscriptFile(filePath, wsId, wsName, harness, customInstrBytes) — public API: reads a transcript file and returns a Session or null ## Integration processWorkspaceEntry / processWorkspaceEntryAsync now scan the transcript directory alongside chatSessions and wire discovered sessions into the same sessions[] / sessionSourceIndex pipeline so the dashboard picks them up transparently. The async path tracks transcript files in the same progress-reporting budget as chat files (totalUnits includes both). ## Tests (parser-vscode.test.ts) Five new cases in the parseTranscriptFile describe block: - full flow: session.start → user.message → assistant.message with tool calls → tool.execution_start/complete → final assistant.message - multi-turn: two user/assistant pairs produce two requests - empty session: no user messages → null - malformed file: all-corrupt lines → null (events.length === 0) - deduplication: same tool appearing in both toolRequests and tool.execution_start is deduplicated to a single entry
Contributor
|
@san360 LGTM can you check if this PR works and does not introduce any duplication? Thanks! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Summary
Closes #64.
VS Code stores Copilot Chat sessions in two distinct locations inside each workspace's
workspaceStorageentry:chatSessions/*.{json,jsonl}GitHub.copilot-chat/transcripts/*.jsonlSessions recorded only in format 2 never appeared in the dashboard. This PR makes the extension parse both formats transparently.
The transcript event-stream format
Each
.jsonlfile represents one session. Each line is a typed JSON event:session.startsessionIdand metadatauser.messagecontenttextassistant.messagetoolRequests[]arraytool.execution_starttoolCallId+toolNametool.execution_completeChanges
src/core/parser-vscode.tsNew private helpers — each kept small to stay within the ESLint complexity limit:
listTranscriptFiles(dir)*.jsonlfiles under atranscripts/directory; returns[]when the dir does not existparseTranscriptLines(raw)buildToolNameIndex(events)toolCallId → toolNamefromtool.execution_starteventscollectToolsFromToolRequests(...)assistant.message.toolRequests[], withtoolCallIdfallback to the pre-built indexbuildRequestsFromTranscriptEvents(events, toolNames)SessionRequest[]; eachuser.messagestarts a new turn that is flushed when the nextuser.message(or end-of-stream) arrivesNew exported function:
parseTranscriptFile(filePath, wsId, wsName, harness, customInstructionsBytes?)— reads a transcript file, builds theSessionobject (or returnsnullfor empty / unreadable files), and is the single public surface consumed by the integration points below.Integration in
processWorkspaceEntryandprocessWorkspaceEntryAsync:chatSessions/, both functions now also scanGitHub.copilot-chat/transcripts/and add discovered sessions to the samesessions[]/sessionSourceIndexpipeline — fully transparent to the rest of the analyzer.totalUnitsso progress reporting stays accurate.src/core/parser-vscode.test.tsFive new tests in a
parseTranscriptFiledescribe block:SessionData.requests[]; validatessessionId,workspaceId,harness,messageText,responseText,toolsUsed,agentMode,timestampnullgracefullynullgracefully (guards theevents.length === 0path)toolRequestsandtool.execution_startcollapses to a single entryDependency
This branch is based on PR #63 (
fix/vscode-server-log-discovery), which adds~/.vscode-serverpath discovery. The two PRs are independent at the code level — they touch disjoint functions — but merging #63 first is recommended so the transcript sessions are attributed the correctLocal Agent (Server)harness label for VS Code Server users.Checklist
npm run checkpasses (typecheck + lint + spellcheck + knip + tests)