feat: Support diffing text "binary" snapshots #708

PookieBuns · 2025-01-09T11:19:47Z

Thanks to #610 we have covered a large step in being able to use insta for schema snapshotting as per #475 . However, currently binary files cannot be diffed when running cargo test or cargo insta review. This makes it difficult to compare and review changes when examining schema changes. This PR allows insta to try to utf-8 decode binary files that are text based to allow this subset of files to be diffed in the workflow

max-sixty · 2025-01-13T23:03:16Z

There's definitely something useful here; I can see some good cases for using the binary snapshot format for text files; for example when we want a raw file without a header.

I'm a bit concerned that this could lead to some unexpected behavior though:

If the binary snapshots are multiple megabytes, how does this behave?
- We could add a limit, though then could be confusing why some snapshots are presented for a review and some aren't
What are the chances of some bytes getting through the String::from_utf8(new.to_vec()).ok())) check and the screen being filled with meaningless characters?

PookieBuns · 2025-01-14T01:24:59Z

I think we can draw some inspiration to how diff behaves as part of GNU. Section 1.7 of this manual states

diff determines whether a file is text or binary by checking the first few bytes in the
file; the exact number of bytes is system dependent, but it is typically several thousand. If
every byte in that part of the file is non-null, diff considers the file to be text; otherwise
it considers the file to be binary.

You could also force the diff to compare files as text using the --text (-a) option. I think this would be a good start for this feature, as it would only be opt-in.

I could add an option in the Tool Config File that could enable diffing binary files as text with a behavior.text_diff option that by default is turned off. We could also add this as an option in cargo insta when reviewing files.

In regards to addressing your concerns in the case we make it opt-in

It appears that if you force a text diff on large files (I tested with 2 video files), It will still try to diff them and produce erroneous output (at least this is the behavior on MacOS although its not technically GNU diff).
I think if you do opt in for text_diff, you probably either
1. Know for sure your files are text
2. Are prepared to see some erroneous output

max-sixty · 2025-01-27T21:07:45Z

Yes I think that could be viable!

That said, I'm a bit concerned that we're adding more options to our interface when there's a quite reasonable alternative — accepting the snapshot and running a git diff on the files.

If there's a way of building this so it's either:

really well-behaved by default (i.e. doesn't show a mess of characters), or
it's a single option in the config file and it doesn't have any really bad behavior (i.e. doesn't freeze on a multi-megabyte file)

...then I would support this

Also ofc open to others' views!

PookieBuns · 2025-02-15T00:48:10Z

I updated the PR to accept a behavior.force_text or INSTA_FORCE_TEXT for clients to be able to opt in to text diffs on binary files. Also cargo-insta can take a --text argument to achieve the same effect. Note these are all opt in features which don't modify current behavior.

In regards to the better alternative in accepting then git diff, under the use case of a CI job, this would not be applicable and a pretty formatted textual diff would be a very nice-to-have when your test CI job fails

Some follow up

Do you have resources where I can look at in regards to testing the SnapshotPrinter for these changes? I couldn't find any in the tests in the repo.
As per really well behaved by default, gnu diff if you do choose to opt in and you end up comparing non-text files, it will also show a mess of characters, which is what I basically tried mimicking for the entire MR.

Thanks for the feedback and am as always super open on iterating more on this to achieve the desired behavior!

max-sixty · 2025-02-17T21:44:21Z

Thanks @PookieBuns

I'm still -0.2 about this — it seems to add settings to get to a config that a) is only marginally useful relative to alternatives, even taking into account CI b) is a setting that's better per-snapshot than per-project, because some binary snapshots are going to be too large or show mangled characters.

I'm very open to other thoughts — does anyone else have a view here?

I wonder whether there's some snapshot type that's like plain which uses the same architecture as binary snapshots, but it's text, and it's expected to be diff-ed. That would be useful for projects that want to commit generated text artifacts and test they match the output. (but not sure, possibly there are similar ideas that will develop)

feat: Support diffing text "binary" files

b242de3

PookieBuns and others added 2 commits February 14, 2025 01:47

Merge branch 'mitsuhiko:master' into master

3814f2f

feat: Add configuration for forcing text diff on binary files

796ca51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

feat: Support diffing text "binary" snapshots #708

feat: Support diffing text "binary" snapshots #708

PookieBuns commented Jan 9, 2025

Uh oh!

max-sixty commented Jan 13, 2025

Uh oh!

PookieBuns commented Jan 14, 2025 •

edited

Loading

Uh oh!

max-sixty commented Jan 27, 2025 •

edited

Loading

Uh oh!

PookieBuns commented Feb 15, 2025 •

edited

Loading

Uh oh!

max-sixty commented Feb 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

feat: Support diffing text "binary" snapshots #708

Are you sure you want to change the base?

feat: Support diffing text "binary" snapshots #708

Conversation

PookieBuns commented Jan 9, 2025

Uh oh!

max-sixty commented Jan 13, 2025

Uh oh!

PookieBuns commented Jan 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

max-sixty commented Jan 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

PookieBuns commented Feb 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

max-sixty commented Feb 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

PookieBuns commented Jan 14, 2025 •

edited

Loading

max-sixty commented Jan 27, 2025 •

edited

Loading

PookieBuns commented Feb 15, 2025 •

edited

Loading