A sophisticated movie recommendation engine powered by collaborative filtering algorithms, featuring an intuitive Streamlit web interface with real-time movie poster integration.
- User-Based Collaborative Filtering: Get personalized movie recommendations based on users with similar taste preferences
- Item-Based Collaborative Filtering: Discover movies similar to ones you already love
- Matrix Factorization (SVD & NMF): Advanced dimensionality reduction techniques for improved recommendation accuracy
- Hybrid Recommendation System: Combines multiple algorithms with customizable weights for superior performance
- Interactive Web Interface: Clean, responsive Streamlit UI with tabbed navigation and real-time interactions
- Database Integration: Persistent storage with SQLite for user data, ratings, and recommendation caching
- A/B Testing Framework: Statistical testing and comparison of different recommendation algorithms
- Real-Time Movie Posters: Automatic poster fetching via OMDB API integration
- User Analytics: Comprehensive user profiling and behavior tracking
- Performance Metrics: Advanced analytics including diversity scores, RMSE, and engagement rates
- Scalable Architecture: Modular codebase with separated concerns for easy maintenance and extension
- Caching System: Smart recommendation caching for improved performance
The application provides two main recommendation approaches:
-
π€ User-Based Recommendations:
- Select a user ID from the dropdown
- Adjust the number of recommendations (1-20)
- Get personalized movie suggestions based on similar users' preferences
-
π Item-Based Recommendations:
- Choose a movie you enjoyed
- Set the number of similar movies to discover
- Find movies with similar characteristics and ratings patterns
- Backend: Python 3.8+
- Web Framework: Streamlit
- Data Processing: Pandas, NumPy
- Machine Learning: scikit-learn (Cosine Similarity)
- API Integration: OMDB API for movie posters
- Data: MovieLens dataset (ratings.csv, movies.csv)
movie_recommender/
βββ app.py # Original Streamlit application
βββ enhanced_app.py # Enhanced app with all new features
βββ demo_enhanced_features.py # Demo script for testing features
βββ requirements.txt # Python dependencies
βββ README.md # Project documentation
βββ notebook.ipynb # Jupyter notebook for analysis
βββ data/ # Dataset directory
β βββ movies.csv # Movie metadata
β βββ ratings.csv # User ratings data
β βββ movie_recommender.db # SQLite database (auto-generated)
βββ src/ # Source code modules
βββ __init__.py # Package initialization
βββ preprocess.py # Data loading and preprocessing
βββ similarity.py # Similarity computation algorithms
βββ recommend.py # Basic recommendation logic
βββ posters.py # Movie poster fetching utilities
βββ matrix_factorization.py # SVD & NMF implementations
βββ database.py # Database management system
βββ hybrid_recommender.py # Hybrid recommendation engine
βββ ab_testing.py # A/B testing framework
- Python 3.8 or higher
- pip package manager
-
Clone the repository
git clone <repository-url> cd movie_recommender
-
Create a virtual environment (recommended)
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install dependencies
pip install -r requirements.txt
-
Prepare the data
- Ensure
movies.csvandratings.csvare in thedata/directory - The project expects MovieLens dataset format
- Ensure
Basic Application:
streamlit run app.pyEnhanced Application (with all new features):
streamlit run enhanced_app.pyDemo Script (test all features):
python demo_enhanced_features.pyThe enhanced application includes:
- π’ Matrix Factorization (SVD & NMF)
- π Hybrid Recommendation System
- πΎ Database Integration
- π§ͺ A/B Testing Framework
- π Advanced Analytics & Metrics
- User-Item Matrix Creation: Builds a sparse matrix with users as rows and movies as columns
- Similarity Computation: Calculates cosine similarity between user rating vectors
- Recommendation Generation:
- Finds users most similar to the target user
- Weights their ratings by similarity scores
- Recommends unrated movies with highest weighted scores
- Item Similarity Matrix: Computes cosine similarity between movie rating patterns
- Similar Item Discovery: Identifies movies with similar user rating distributions
- Recommendation Ranking: Returns top-N most similar movies to the selected title
similarity(A,B) = cos(ΞΈ) = (AΒ·B) / (||A|| Γ ||B||)
Where A and B are rating vectors for users or items.
- Update
OMDB_API_KEYinsrc/posters.pywith your own OMDB API key - Get a free key at: http://www.omdbapi.com/apikey.aspx
The system expects CSV files with the following structure:
movies.csv:
movieId,title,genres
1,"Toy Story (1995)",Adventure|Animation|Children|Comedy|Fantasy
ratings.csv:
userId,movieId,rating,timestamp
1,1,4.0,964982703
python -m pytest tests/preprocess.py: Data loading and user-item matrix creationsimilarity.py: Cosine similarity computation for users and itemsrecommend.py: Core recommendation algorithmsposters.py: OMDB API integration for movie poster retrievalapp.py: Streamlit web application interface
- Memory Usage: Large datasets may require sparse matrix implementations
- Computation Time: Similarity matrices are computed once at startup
- API Rate Limits: OMDB API has usage limits; consider caching poster URLs
- Scalability: For production use, consider implementing incremental similarity updates
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
- Matrix Factorization techniques (SVD, NMF) - Advanced dimensionality reduction for improved recommendations
- Database integration for persistent storage - SQLite database with comprehensive data management
- Hybrid recommendation systems - Combines multiple algorithms for superior accuracy
- A/B testing framework for recommendation quality - Statistical testing and performance comparison
- Real-time user feedback integration - Interactive rating and feedback collection
- Performance metrics and analytics - Comprehensive system monitoring and reporting
- Deep Learning approaches (Neural Collaborative Filtering)
- User authentication and personalized profiles
- Advanced filtering options (genre, year, rating thresholds)
- Real-time recommendation updates
- Social features and collaborative playlists
- Mobile app development
- Cloud deployment and scaling
- Machine Learning pipeline automation
This project is licensed under the MIT License - see the LICENSE file for details.
- MovieLens Dataset by GroupLens Research
- OMDB API for movie poster data
- Streamlit for the amazing web framework
- scikit-learn for machine learning utilities
For questions, suggestions, or collaboration opportunities, please reach out:
- Project Maintainer: [Rutik Tetare]
- Email: [[email protected]]
- LinkedIn: [https://www.linkedin.com/in/rutik-tetare-3154b3281/]
β If you found this project helpful, please consider giving it a star!