This piece of work is to showcase content based (item-item) recommendation of movies. To do this first top 250 imdb movies are crawled and stored as json leveraging BeautifulSoup package. Then a cosine similarity based recommendation system is built using functionalities of pandas, numpy and scikit-learn.
- create aws access key and secret key
- Create an S3 bucket.
- Set the envoronment variables in
docker-compose-recsys.ymlanddocker-compose-aws.yml, fromNonevalue to appropriate values.
git clone https://github.com/tuhinsharma/imdb-rec-sys.gitcd imdb-rec-sys- Follow
COMMON_STEP - Use
docker-composedocker-compose -f docker-compose-recsys.yml builddocker-compose -f docker-compose-recsys.yml up
- For Crawling :
curl -H 'Content-Type: application/json' -X POST -d {} http://localhost:6006/api/v1/schemas/crawl_imdb - For Training :
curl -H 'Content-Type: application/json' -X POST -d {} http:imdb//localhost:6006/api/v1/schemas/train - For Recommendation :
curl -H 'Content-Type: application/json' -X POST -d '{"movie_list": ["The Green Mile","Witness for the Prosecution"]}' http://localhost:6006/api/v1/schemas/score
- Choose EC2 instance Ubuntu 16.04 LTS - Xenial (HVM)
- Configure security group - SSH - custom, HTTP - anywhere
- Launch instance using key-value pair - tuhin-aws
- ssh into EC2 machine -
ssh -i "tuhin-aws.pem"ubuntu@ec2-54-234-224-219.compute-1.amazonaws.com sudo apt update --fix-missingsudo apt install -y python3-pipsudo apt install -y nginx- open
nginx.conffile →sudo vi /etc/nginx/nginx.conf→ changeuserubuntu;and addserver_names_hash_bucket_size 128;in the http block - open
virtual.conffile →sudo vi /etc/nginx/conf.d/virtual.conf→ add
server {
listen 80;
server_name ec2-54-234-224-219.compute-1.amazonaws.com;
location / {
proxy_pass http://127.0.0.1:8000;
}
}
sudo systemctl start nginxgit clone https://github.com/tuhinsharma/imdb-rec-sys.gitcd imdb-rec-sys- Follow
COMMON_STEP sudo pip3 install -r requirements.txtcp ./rec_platform/deployment/app.py ./app.pysudo systemctl restart nginxgunicorn --pythonpath / -b localhost:8000 -k gevent -t 900 app:app -w 5 &
- In local system:-
curl -H 'Content-Type: application/json' -X POST -d '{"movie_list": ["The Green Mile","Witness for the Prosecution"]}' http://ec2-54-234-224-219.compute-1.amazonaws.com/api/v1/schemas/scoreThe output should be:-
{
"movies": [
"L.A. Confidential",
"Salinui chueok",
"Les diaboliques",
"12 Angry Men",
"Double Indemnity",
"Chinatown",
"On the Waterfront",
"A Wednesday",
"Se7en",
"The Usual Suspects"
]
}
- In remote system do
pkill gunicornandsudo systemctl stop nginxif service no longer needed.
- Choose EC2 instance
ubuntu 16.04 LTS - Xenial (HVM) - Configure security group -
SSH - custom,HTTP - anywhere - Launch instance using key-value pair -
tuhin-aws - ssh into EC2 machine -
ssh -i "tuhin-aws.pem"ubuntu@ec2-54-234-224-219.compute-1.amazonaws.com sudo apt update --fix-missingsudo apt install -y docker.iosudo apt install -y docker-composegit clone https://github.com/tuhinsharma/imdb-rec-sys.gitcd imdb-rec-sys- Update the docker-compose-recsys.yml with suitable
ACCESS_KEYandSECRET_ACCESS_KEYandAWS_BUCKET_NAME. Port mapping should be"80:6006" sudo docker-compose -f docker-compose-recsys.yml buildsudo docker-compose -f docker-compose-recsys.yml up
- In local system:-
curl -H 'Content-Type: application/json' -X POST -d '{"movie_list": ["The Green Mile","Witness for the Prosecution"]}' http://ec2-54-234-224-219.compute-1.amazonaws.com/api/v1/schemas/scoreThe output should be:-
{
"movies": [
"L.A. Confidential",
"Salinui chueok",
"Les diaboliques",
"12 Angry Men",
"Double Indemnity",
"Chinatown",
"On the Waterfront",
"A Wednesday",
"Se7en",
"The Usual Suspects"
]
}
- configure
awswithACCESS_KEYandSECRET_ACCESS_KEY git clone https://github.com/tuhinsharma/imdb-rec-sys.gitcd imdb-rec-sysaws ecr create-repository --repository-name recsys-ubuntu$(aws ecr get-login --no-include-email --region us-east-1)docker build -t recsys-ubuntu -f Dockerfile.ubuntu .docker tag recsys-ubuntu:latest 184213940252.dkr.ecr.us-east-1.amazonaws.com/recsys-ubuntu:latestdocker push 184213940252.dkr.ecr.us-east-1.amazonaws.com/recsys-ubuntu:latest- Update the docker-compose-aws.yml with suitable
ACCESS_KEYandSECRET_ACCESS_KEYandAWS_BUCKET_NAME. Port mapping should be"80:6006".imageshould be184213940252.dkr.ecr.us-east-1.amazonaws.com/recsys-ubuntu ecs-cli configure --region us-east-1 --cluster fastfilmz-analytics-clusterecs-cli up --keypair tuhin-aws --capability-iam --size 1 --instance-type t2.micro --force --cluster fastfilmz-analytics-cluster --region us-east-1ecs-cli compose --project-name imdb-recsys --file docker-compose-aws.yml up- In case Outdated ECS Agent -
aws ecs update-container-agent --cluster fastfilmz-analytics-cluster --container-instance bc7e2a68-1be6-48d2-85a6-7f08232f298b
- In local system:-
curl -H 'Content-Type: application/json' -X POST -d '{"movie_list": ["The Green Mile","Witness for the Prosecution"]}' http://ec2-54-234-224-219.compute-1.amazonaws.com/api/v1/schemas/scoreThe output should be:-
{
"movies": [
"L.A. Confidential",
"Salinui chueok",
"Les diaboliques",
"12 Angry Men",
"Double Indemnity",
"Chinatown",
"On the Waterfront",
"A Wednesday",
"Se7en",
"The Usual Suspects"
]
}
- If done with the service
ecs-cli down