This microservice is responsible for all machine learning capabilities in the platform, including resume analysis, job recommendations, and real-time message moderation.
It is built using FastAPI and deployed as an independent service to ensure scalability and modularity.
The ML service processes user inputs such as resumes and chat messages, applies trained machine learning models, and returns structured insights used across the application.
- Resume parsing and analysis
- Job recommendation based on dataset similarity
- Skill extraction and gap detection
- Toxic message detection
- Spam message classification
- TF-IDF based vectorization of resume and job description
- Cosine similarity scoring for compatibility
- Extraction of relevant skills from resume text
- Identification of missing skills compared to job description
- Generation of structured analysis output
- Uses a preprocessed job dataset
- Matches resumes with similar job roles
- Provides relevant job suggestions
- Integrates external job APIs (Adzuna) for live listings
- Model: Naive Bayes
- Vectorization: TF-IDF
- Purpose: Detect unwanted or promotional messages
- Model: Logistic Regression
- Purpose: Identify harmful, abusive, or inappropriate content
- job_dataset.pkl → Preprocessed dataset for job matching
- toxic_model.pkl → Toxicity classification model
- spam_model.pkl → Spam classification model
- tfidf_vectorizer copy.pkl → Vectorizer for resume analysis
- spam_vectorizer.pkl → Vectorizer for spam detection
Due to GitHub file size limitations, trained models are not stored in the repository.
Instead, they are hosted on Hugging Face Hub and downloaded dynamically during runtime.
- Keeps repository lightweight
- Avoids large file issues (>100MB)
- Enables easy model updates without redeploying backend
- Supports scalable deployment
Models are fetched at runtime using a helper function:
download_file(HF_URL, "filename.pkl")- Ensure correct Hugging Face URL format:
- Use
/resolve/main/ - Do NOT use
/blob/
- Use
- Encode spaces in filenames:
- Example:
tfidf_vectorizer%20copy.pkl
- Example:
POST /moderate
{
"text": "User message"
}- Spam classification
- Toxicity classification
Render (Cloud Deployment)
Root Directory: ml_services
pip install -r requirements.txtuvicorn app:app --host 0.0.0.0 --port 10000Run the service locally:
uvicorn app:app --reload --port 80004Sensitive keys should not be hardcoded.
Use environment variables:
ADZUNA_APP_ID
ADZUNA_APP_KEY
- Data collection and preprocessing
- Text cleaning and normalization
- Feature extraction using TF-IDF
- Model training and evaluation
- Model serialization using pickle (.pkl)
- Deployment via FastAPI
- Runtime inference through API endpoints
The ML logic is separated from the main backend to:
- Improve scalability
- Allow independent deployment
- Reduce backend complexity
- External Model Hosting
- Faster deployments
- Centralized model management
- Easy version control for models
- Initial request may be slower due to model download
- No caching implemented yet
- Basic models (can be improved with deep learning)
- Lazy loading for faster startup
- Redis caching for predictions
- Docker containerization
- Model versioning system
- Rate limiting and authentication
- Advanced NLP models (transformers)
- Production-ready
- Fully deployed
- Integrated with frontend and backend
- Actively maintained
