Skip to content

Commit 6c81be0

Browse files
Merge pull request #161 from DrifterKaru/master
Add tor proxy and onion basic spider configurations
2 parents bddecde + 81bef55 commit 6c81be0

File tree

22 files changed

+13084
-19961
lines changed

22 files changed

+13084
-19961
lines changed

.gitignore

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -106,7 +106,7 @@ celerybeat.pid
106106
*.sage.py
107107

108108
# Environments
109-
.env
109+
*.env
110110
.venv
111111
env/
112112
venv/

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,4 @@
1+
12
# CrawlerX - Develop Extensible, Distributed, Scalable Crawler System
23

34
The CrawlerX is a platform which we can use for crawl web URLs in different kind of protocols in a distributed way. Web crawling often called web scraping is a method of programmatically going over a collection of web pages and extracting data which useful for data analysis with web-based data. With a web scraper, you can mine data about a set of products, get a large corpus of text or quantitative data to play around with, get data from a site without an official API, or just satisfy your own personal curiosity.

crawlerx_app/.env

Lines changed: 11 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,11 @@
1-
VUE_APP_FIREBASE_API_KEY = "<your-api-key>"
2-
VUE_APP_FIREBASE_AUTH_DOMAIN = "<your-auth-domain>"
3-
VUE_APP_FIREBASE_DB_DOMAIN= "<your-db-domain>"
4-
VUE_APP_FIREBASE_PROJECT_ID = "<your-project-id>"
5-
VUE_APP_FIREBASE_STORAGE_BUCKET = "<your-storage-bucket>"
6-
VUE_APP_FIREBASE_MESSAGING_SENDER_ID= "<your-messaging-sender-id>"
7-
VUE_APP_FIREBASE_APP_ID = "<your-app-id>"
8-
VUE_APP_FIREBASE_MEASURMENT_ID = "<your-measurementId>"
1+
VUE_APP_FIREBASE_API_KEY = "AIzaSyBz5zJU8nWCwpB4N60b1pyGyW88g5CdBpY"
2+
VUE_APP_FIREBASE_AUTH_DOMAIN = "crawlerx-e4a5d.firebaseapp.com"
3+
VUE_APP_FIREBASE_DB_DOMAIN= "https://crawlerx-e4a5d.firebaseapp.com"
4+
VUE_APP_FIREBASE_PROJECT_ID = "crawlerx-e4a5d"
5+
VUE_APP_FIREBASE_STORAGE_BUCKET = "crawlerx-e4a5d.appspot.com"
6+
VUE_APP_FIREBASE_MESSAGING_SENDER_ID= "352593421105"
7+
VUE_APP_FIREBASE_APP_ID = "1:352593421105:web:5b82330e1c74538a418610"
8+
VUE_APP_FIREBASE_MEASURMENT_ID = ""
9+
VUE_APP_DJANGO_PROTOCOL = "http"
10+
VUE_APP_DJANGO_HOSTNAME = "django"
11+
VUE_APP_DJANGO_PORT = "8000"

crawlerx_app/Dockerfile

Lines changed: 2 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,6 @@
1+
# Choose the Image which has Node installed already
12
FROM node:lts-alpine
23

3-
# install simple http server for serving static content
4-
RUN npm install -g http-server
5-
64
# make the 'app' folder the current working directory
75
WORKDIR /app
86

@@ -15,8 +13,5 @@ RUN npm install
1513
# copy project files and folders to the current working directory (i.e. 'app' folder)
1614
COPY . .
1715

18-
# build app for production with minification
19-
RUN npm run build
20-
2116
EXPOSE 8080
22-
CMD [ "http-server", "dist" ]
17+
CMD [ "npm", "run", "serve" ]

crawlerx_app/nginx/nginx.conf

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
server {
2+
listen 8080;
3+
server_name _;
4+
server_tokens off;
5+
client_max_body_size 20M;
6+
7+
location / {
8+
root /usr/share/nginx/html;
9+
index index.html index.htm;
10+
try_files $uri $uri/ /index.html;
11+
}
12+
13+
location /api {
14+
try_files $uri @proxy_api;
15+
}
16+
17+
18+
location @proxy_api {
19+
proxy_set_header X-Forwarded-Proto https;
20+
proxy_set_header X-Url-Scheme $scheme;
21+
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
22+
proxy_set_header Host $http_host;
23+
proxy_redirect off;
24+
proxy_pass http://backend:8000;
25+
}
26+
27+
28+
}

0 commit comments

Comments
 (0)