Skip to content

Optimize image blobs#10

Open
willwade wants to merge 3 commits intoasterics:mainfrom
willwade:optimize-image-blobs
Open

Optimize image blobs#10
willwade wants to merge 3 commits intoasterics:mainfrom
willwade:optimize-image-blobs

Conversation

@willwade
Copy link
Contributor

Optimize Embedded Image Blobs with ARASAAC/OpenSymbols URLs

Summary

Reduces repository size by about.. 50% by replacing embedded image blobs with external ARASAAC/OpenSymbols URLs and applying lossless compression.

Problem

  • 1,274 embedded image blobs across 39 communicator boards
  • ~375 MB of embedded blobs (51.7% of repository)
  • Many blobs are duplicates of publicly available symbols
  • Increases clone time and storage requirements

Solution

Implemented 4 things!

  1. 1 - Exact Matching (SHA256): 444 blobs → ARASAAC/OpenSymbols URLs
  2. 2 - Fuzzy Label Matching: 4,547 symbols matched (88-92% confidence) (<- This is the sketchiest part!)
  3. 3 - Perceptual Hashing: 555 additional blobs migrated (<- Second sketchiest part)
  4. 4 - Lossless Compression: 20% reduction on remaining blobs ( Easy win)

Results

  • 444 blobs migrated (Layer 1)
  • 555 additional blobs migrated (Layer 3)
  • 830 custom artwork images preserved as blobs
  • 193.99 MB saved (51.7% reduction) (BUT Note: a lot of this is just from converting from OBZ to json and it strips out the embedded images)
  • 67 files modified across 39 boards

Files Changed

  • 67 communicator board files (.grd.json)
  • Metadata files (live_metadata.json, live_predefined_*.json)
  • 25 primary communicator boards

Testing

I've tried to manually test all changes by loading the pages up. BUT I will be honest - I cant be 10000% sure we havent messed something up. If we could A/B test this.. Or get someone to test this out that would be fab

Notes

  • Separated from vocal flair changes (fix-remaining-openboards PR )
  • Should be merged AFTER fix-remaining-openboards is done..

I should do a blogpost on this - as I learnt a ton about hashing. Ideally opensymbols and arasaac should create hashes on their API - that would be handy

- Replaced 444 image blobs with external ARASAAC/OpenSymbols URLs
- Implemented fuzzy perceptual hashing with Hamming distance (threshold=16)
- Dual-index matching: ARASAAC (13,623) + OpenSymbols (7,453) symbols
- 830 custom artwork images remain as blobs (not in symbol libraries)
- Updated all metadata files (live_metadata.json, live_predefined_*.json)
- Zero errors during migration
- 67 files modified across 39 communicator boards
- Recompressed 830 image blobs across 24 communicator files
- JPEG optimization: 295 blobs, 1.47 MB saved (17.8%)
- PNG optimization: 331 blobs, 1.24 MB saved (12.1%)
- SVG optimization: 189 blobs, 12.2 KB saved (7.0%)
- Total savings: 2.7 MB (14.3% reduction)

Compression techniques:
- JPEG: quality=85 with optimize flag (visually lossless)
- PNG: optimize flag preserving transparency
- SVG: removed metadata, comments, and collapsed whitespace

Files optimized:
- ABA programmes (689 KB saved)
- Vocabuléo-by-LAdapeila (689 KB saved)
- ARASAAC Global Grid Core Communicator (255 KB saved)
- Communication in hospital (224 KB saved)
- And 20 other communicator files

No visual quality loss. All blobs remain embedded in .grd.json files.
Reduced embedded blobs from 830 to 275 (66.9% reduction)

Migration strategy:
- Exact SHA256 matches with symbol libraries
- Perceptual hash matching (phash distance <= 20)
- Label-based matching (exact and partial keyword matching)
- Skipped large blobs (>500KB) likely to be custom images

Results by matching method:
- Label exact matches: 60 blobs
- Label partial matches: 192 blobs
- Perceptual hash matches: 3 blobs
- Total migrated: 555 blobs

Files with most migrations:
- ABA programmes: 68 blobs migrated (95 -> 27)
- Vocabuléo-by-LAdapeila: 129 blobs migrated (156 -> 27)
- Global-Core Communicator variants: 16-20 blobs each

Remaining 275 blobs are:
- Brand/product images (Quick_Say20)
- Custom artwork not in symbol libraries
- Unique illustrations

This reduces repository size and improves maintainability by using
external symbol library URLs instead of embedded data.
@klues
Copy link
Contributor

klues commented Oct 22, 2025

Awesome work!
As said, I won't merge this because it changes all default gridsets and I don't want to change gridsets from cooperation partners without confirmation from them that the changes are fine and don't mess up something.

However for these sets I think we could merge your improvements:

  • all from the openboardformat page: https://www.openboardformat.org/examples
    • including the vocal flair 84 - which you didn't include in the other PR (and maybe others from openboardformat which we do not have right now)
  • AsTeRICS Grid default: because I know it by heart and will be able to check quickly if everything is still working

Question:

  • which format are the lossless compressed images? Still the original one, or some new/fancy format (which maybe isn't supported by all browsers)?

If you create a new PR changing only the sets I've mentioned above, please do the first PR with only running npm run generate-beta which only creates the metadata files for the beta environment at grid.asterics.eu/latest. There I can check the changes before merging the changes generated with npm run generate which affects the files for the prod version.

@willwade
Copy link
Contributor Author

yeah - look at the previous PR - that should be good for JUST the vocal flair stuff #9 - I've removed all the grids in ->this<- pr here..

re: Formats. No - we could of swapped to WebP but I didnt want to do that. Basically just recompressed with guetzli for png jpeg ..

so to confirm: Look at the previous PR first - check that ok - then if good we can review this together a bit more..

@klues
Copy link
Contributor

klues commented Oct 23, 2025

I've already merged #9 - but it only contained the vocal flair 112 not the 84 anymore. Vocal flair 112 is already available in prod - with 3.5MB instead of 60MB - which is great!

Basically just recompressed with guetzli for png jpeg ..

ok, great!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants