Add convenience wrapper for quick similarity computation#1
Add convenience wrapper for quick similarity computation#1miracvbasaran wants to merge 2 commits intovitali-fedulov:masterfrom
Conversation
|
Hello Miraç, I am very excited you have found the package useful! And I have looked at your pull request. But I am confused, because the addition is contradictory to the purpose of imagehash package: (1) The main purpose of imagehash package is to avoid each-to-each image comparison, which happens, when using images4.Similar or any (!) Similar function. Hashes in this package are not meant for each-to-each image comparison. func Similar implies each-to-each comparison. You propose a func Similar, which merges image4.Similar with imagehash similarity. But it is faster to use image4.Similar directly. That is imagehash is not needed at all for this scenario. And it should not impact precision of comparison. (2) You propose the Image struct, which is a very large object to keep it in memory for every image. The purpose of hashes and icons was to avoid keeping images in memory. Additional purpose of icon is generalization for comparison, because pixel-by-pixel comparison of full images does not generalize well. Actually hashes are an additional memory-saving step to avoid keeping icons in memory. RECOMMENDATION Try to use images4 directly, unless you work with billions images. If you do work with billions of images, then imagehash(es) are used to compare by those hashes only - not any Similar func. One huge hash table is necessary - not the one like in the example on the package README. It is probably the confusion of terms, because I use "hash" as precise term related to an address in hash table, unlike many people use it for, say, Hamming distance calculations. In imagehash, no distances are calculated between hashes. But for icons - yes, distances are calculated. Am I missing something? Let me know what you think. Thank you for the contribution! It helps clarify package descriptions. |
|
Hello Vitali, Thanks for getting back to me! IIUC, you have two concerns: 1. PerformancePerhaps my PR got a bit confusing with all the images and similarities: My PR doesn't actually run return foundSimilarImage && images4.Similar(img.Icon, img2.Icon)When foundSimilarImage is false,i.e. when imagehash comparison doesn't yield a match or in other words, when images are dissimilar, images4.Similar() is not called because the first part of the expression ( This PR indeed improves performance as opposed to using I ran some informal experiments: 2. MemoryWhen I wrote this snippet for my own project before creating this PR, I didn't have type Image struct {
Icon images4.IconT
CentralHash uint64
HashSet []uint64
}I added I didn't consider the memory impact. You are right, when working with billions of images that would be prohibitive. I removed it now. Just as an example, after this change, in my experiments an average I hope that makes sense! Cheers, |
|
Miraç, got it, thanks! At this point I am thinking:
Hopefully your benchmarks reflect purely the comparison step with all icons/hash structs ready, without including image reads time. I am going slowly on this to make sure we are getting an elegant solution. Best, |
Thanks for working on this, it works like a charm!
As I was using your libraries, I thought it was a low-hanging fruit to add a simple convenience wrapper around imagehash & images4, to simplify user experience. It seems to me like new
Image.Similar()is the only thing users need from the imagehash library, so we can hide away the complexity of centralHash and hashSet.Additionally, users could directly use the
imagehash.Imagestruct to create a hashtable for quick lookup, e.g.:LMK if you think this is a good idea and what tests you'd like me to add for them.