Skip to content

RFC: Better CFI Format #558

@Swatinem

Description

@Swatinem

Status Quo

We have this thing called symbolic-mimidump::cfi::AsciiCfiWriter which outputs breakpad ASCII STACK records for all the various debug formats that we support. We use this as part of symbolicator, and afaik mozilla uses it as part of their dump_cfi utility.

Problem Statement

This ascii format has a couple of shortcomings:

  • It needs to be parsed, either ahead-of-time, or lazily.
  • Parsing it is super slow, and needs to happen every damn time.
  • It might even end up being larger than the actual unwind info.
  • It is low fidelity and a bad common denominator; for example it does not support some DWARF operations like "set return addr to 0; aka end of stack".
  • Did I mention its a text format that needs to be parsed?

Proposed Solution

So I was thinking for quite some time about an "indexed" format that I don’t need to parse from beginning to end every single time, but can quickly look up unwind info based on instruction offset.

Also, while working on #549 I thought that converting the unwind instructions into this bad intermediate text format is a bad fit, since it would be a lot nicer to just execute the unwind operations.

Long story short, how about we had a serialized, mmap-able format similar to SymCaches that have something like the following format, in pseudocode:

struct CfiCache {
  ranges: BTreeMap<usize, UnwindInfo> // instruction addr => unwind info
}

enum UnwindInfo {
  Breakpad(String), // same as now, just tiny indexed snippets so you don’t need to parse the whole file ahead of time
  WindowsX64(goblin::pe::exception::UnwindInfo), // well, a reference to a binary representation of the raw info
  Dwarf(gimli::read::CallFrameInstructionIter), // again, binary representation of the raw DWARF unwind info
  Compact(symbolic_debuginfo::macho::compact::CompactCfiOpIter), // again, same for apples format
  // etc, whatever other formats there are
}

The proposed CfiCache / UnwindInfo would implement minidump_processor::symbols::SymbolProvider (or at least walk_frame) to just execute the provided unwind info directly, without needing to go through that horrible intermediate format.

Open Questions

Since we are not the only users of this code, I would have some questions especially for external users: (hello @Gankra, @gabrielesvelto, etc)

  • Pls give feedback on the proposal ;-)
  • Would a "unwind info -> breakpad ASCII" converter still be useful in that scenario? Can we just remove that completely?
  • Would you expect to create such an unwinder directly from an object file, without needing to go through an intermediate format/struct?
  • How transparent / opaque should this format be? Is it sufficient to have "object file -> (opaque intermediate format) -> .unwind(caller frame) -> Option<callee frame>"; as in: having unwind being the only public API it has? Or would you expect to have access to the underlying raw unwind info (raw DWARF bytes; whatever)?

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions