SimSearch is a deep learning project focused on self-supervised learning for image representation and similarity-based retrieval. The goal is to learn meaningful feature embeddings without explicit labels, enabling clustering and efficient search across visual data.
Traditional supervised learning relies heavily on labeled datasets. In contrast, SimSearch leverages self-supervised learning to extract patterns and structure directly from raw images.
The model learns to:
- Understand visual similarity
- Separate different object categories
- Form meaningful clusters in embedding space
The dataset consists of 5 subcategories:
- 👜 Bags
- 🚗 Cars
- 🐶 Dogs
- 📱 Phones
- 👟 Shoes
Even without labels during training, the model gradually learns to distinguish between these categories.
- Self-supervised learning approach (contrastive / representation learning)
- Feature embedding generation
- Dimensionality reduction for visualization
- Clustering in latent space
After training, the model learns to separate datapoints and form clusters based on semantic similarity.
- PyTorch
- NumPy, Pandas, Matplotlib
- Scikit-learn
Abhinandan