Conversation
- Replaced 444 image blobs with external ARASAAC/OpenSymbols URLs - Implemented fuzzy perceptual hashing with Hamming distance (threshold=16) - Dual-index matching: ARASAAC (13,623) + OpenSymbols (7,453) symbols - 830 custom artwork images remain as blobs (not in symbol libraries) - Updated all metadata files (live_metadata.json, live_predefined_*.json) - Zero errors during migration - 67 files modified across 39 communicator boards
- Recompressed 830 image blobs across 24 communicator files - JPEG optimization: 295 blobs, 1.47 MB saved (17.8%) - PNG optimization: 331 blobs, 1.24 MB saved (12.1%) - SVG optimization: 189 blobs, 12.2 KB saved (7.0%) - Total savings: 2.7 MB (14.3% reduction) Compression techniques: - JPEG: quality=85 with optimize flag (visually lossless) - PNG: optimize flag preserving transparency - SVG: removed metadata, comments, and collapsed whitespace Files optimized: - ABA programmes (689 KB saved) - Vocabuléo-by-LAdapeila (689 KB saved) - ARASAAC Global Grid Core Communicator (255 KB saved) - Communication in hospital (224 KB saved) - And 20 other communicator files No visual quality loss. All blobs remain embedded in .grd.json files.
Reduced embedded blobs from 830 to 275 (66.9% reduction) Migration strategy: - Exact SHA256 matches with symbol libraries - Perceptual hash matching (phash distance <= 20) - Label-based matching (exact and partial keyword matching) - Skipped large blobs (>500KB) likely to be custom images Results by matching method: - Label exact matches: 60 blobs - Label partial matches: 192 blobs - Perceptual hash matches: 3 blobs - Total migrated: 555 blobs Files with most migrations: - ABA programmes: 68 blobs migrated (95 -> 27) - Vocabuléo-by-LAdapeila: 129 blobs migrated (156 -> 27) - Global-Core Communicator variants: 16-20 blobs each Remaining 275 blobs are: - Brand/product images (Quick_Say20) - Custom artwork not in symbol libraries - Unique illustrations This reduces repository size and improves maintainability by using external symbol library URLs instead of embedded data.
|
Awesome work! However for these sets I think we could merge your improvements:
Question:
If you create a new PR changing only the sets I've mentioned above, please do the first PR with only running |
|
yeah - look at the previous PR - that should be good for JUST the vocal flair stuff #9 - I've removed all the grids in ->this<- pr here.. re: Formats. No - we could of swapped to WebP but I didnt want to do that. Basically just recompressed with guetzli for png jpeg .. so to confirm: Look at the previous PR first - check that ok - then if good we can review this together a bit more.. |
|
I've already merged #9 - but it only contained the vocal flair
ok, great! |
Optimize Embedded Image Blobs with ARASAAC/OpenSymbols URLs
Summary
Reduces repository size by about.. 50% by replacing embedded image blobs with external ARASAAC/OpenSymbols URLs and applying lossless compression.
Problem
Solution
Implemented 4 things!
Results
Files Changed
Testing
I've tried to manually test all changes by loading the pages up. BUT I will be honest - I cant be 10000% sure we havent messed something up. If we could A/B test this.. Or get someone to test this out that would be fab
Notes
I should do a blogpost on this - as I learnt a ton about hashing. Ideally opensymbols and arasaac should create hashes on their API - that would be handy