Skip to content

feat: extract call recording transcripts from Apple Notes CRDT#35

Open
ephraimm wants to merge 3 commits into
antoniorodr:mainfrom
litescale-ai:main
Open

feat: extract call recording transcripts from Apple Notes CRDT#35
ephraimm wants to merge 3 commits into
antoniorodr:mainfrom
litescale-ai:main

Conversation

@ephraimm

@ephraimm ephraimm commented Mar 25, 2026

Copy link
Copy Markdown

Reverse-engineers Apple's CRArchive protobuf format (NSKeyedArchiver over protobuf) to extract call recording transcripts from the ZMERGEABLEDATA1 column in NoteStore.sqlite.

Key changes

CRArchive decoder (recording_utils.py)

  • Schema-driven protobuf parser for CRArchive structure
  • Walks ICTTTranscriptSegment objects with speaker/text/timestamp resolved through registerLatest → NSString/NSNumber chains
  • Timestamp-based sorting (fixed64 doubleValue) for correct reading order
  • Speaker labels from root ICTTAudioRecording (callLocalSpeakerHandle / callRemoteSpeakerHandle)
  • Contact name resolution via macOS Contacts AppleScript (both +27 and 0 formats, cached)

Regression tests (test_transcript_parsing.py)

9 structural tests using a local-only CRDT blob fixture (gitignored) validating:

  • Speaker turn alternation and formatting
  • Multi-word sentence merging
  • No binary garbage or float artifacts
  • Contact name substitution
  • Tests skip gracefully in CI when fixture is absent

Checklist

  • I have read CONTRIBUTING.md and this PR follows the guidelines
  • A human has reviewed the entire diff of this PR, every line of code
  • A human understands the changes and can explain why this approach is correct
  • This PR doesn't have AI-generated boilerplate or co-author lines
  • This PR was authored and submitted by an AI agent without human review"

- Add 'memo recordings' command to list, view, extract, and search call recordings
- get_recordings.py: dual-strategy detection (smart folder + name-pattern fallback)
- recording_utils.py: transcript retrieval, attachment listing, audio extraction
- search_memo.py: refactor fzf into reusable _run_fzf(); add fuzzy_recordings()
- 12 new tests, all 35 tests pass
Replace speculative proposal with actual implementation docs.
Also commit uv.lock for reproducible builds.
Reverse-engineer Apple's CRArchive protobuf format (NSKeyedArchiver
over protobuf) to extract call recording transcripts from the
ZMERGEABLEDATA1 column in NoteStore.sqlite.

Key implementation:
- Schema-driven protobuf parser for CRArchive structure
- Walk ICTTTranscriptSegment objects with speaker/text/timestamp
  resolved through registerLatest→NSString/NSNumber chains
- Timestamp-based sorting for correct reading order
- Speaker labels from root ICTTAudioRecording (callLocalSpeakerHandle
  / callRemoteSpeakerHandle)
- Contact name resolution via macOS Contacts AppleScript

Includes 8 regression tests with a real CRDT blob fixture validating
ground-truth transcript ordering and speaker assignment.
@antoniorodr

Copy link
Copy Markdown
Owner

Hello @ephraimm!

I am not sure if this is a feature available in Europe (I have never tried to do it). Could you explain shortly how this works?

Thanks!)

@ephraimm

ephraimm commented Mar 26, 2026 via email

Copy link
Copy Markdown
Author

@antoniorodr

Copy link
Copy Markdown
Owner

Hi @ephraimm

I understand. I have not a note with a call transcript to tests this PR against, unfortunately.

But I can try to find (or create) one. I am still not sure if this feature is available in Europe.

Thanks!

@ephraimm

ephraimm commented Mar 27, 2026 via email

Copy link
Copy Markdown
Author

@antoniorodr

antoniorodr commented Mar 27, 2026 via email

Copy link
Copy Markdown
Owner

@antoniorodr

Copy link
Copy Markdown
Owner

Hello @ephraimm

I hope you are doing well. Do you have the test note we spoke about? So I can test this PR out and see if it works as expected.

I am looking forward to hearing from you.

@ephraimm

ephraimm commented Apr 15, 2026 via email

Copy link
Copy Markdown
Author

@antoniorodr

Copy link
Copy Markdown
Owner

what format should I send it as? Ephraim Moss Founder & Chief Excitement Officer C+27 72 679 6838 @.*** Wwww.goseamless.co.za The information contained in this email is confidential and may contain proprietary information. It is meant solely for the intended recipient. Access to this email by anyone else is unauthorised. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted in reliance on this, is prohibited and may be unlawful. No liability or responsibility is accepted if information or data is, for whatever reason corrupted or does not reach its intended recipient. No warranty is given that this email is free of viruses.

On Wed, Apr 15, 2026 at 11:13 AM Antonio Rodriguez @.> wrote: antoniorodr left a comment (antoniorodr/memo#35) <#35 (comment)> Hello @ephraimm https://github.com/ephraimm I hope you are doing well. Do you have the test note we spoke about? So I can test this PR out and see if it works as expected. I am looking forward to hearing from you. — Reply to this email directly, view it on GitHub <#35 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAVMPVJADUHTYH6VKV5S3NT4V5HEHAVCNFSM6AAAAACW7DY6D6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHM2DENJQG42TGMRRGY . You are receiving this because you were mentioned.Message ID: @.>

I think markdown should be fine, if it keeps the information tho.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants