A modern web application for scraping and browsing creepypasta stories from the Creepypasta Wiki.
- 🕷️ Web Scraping: Scrapes stories from Creepypasta Wiki All Pages
- 🎨 Modern UI: Black/white/red color scheme with responsive design
- 📊 Real-time Progress: Live progress tracking during scraping
- 🏷️ Genre Classification: Automatic genre detection and filtering
- 📱 Responsive: Works on desktop and mobile devices
- 💾 Data Export: JSONL output with optional MongoDB storage
cd "/Users/ahnaflabib/Documents/Projects/mime."
source .venv/bin/activate
pip install -r requirements.txt # If you create onepython app.pyNavigate to: http://localhost:5000
- Click the "Start Scraping" button
- Watch real-time progress
- Browse stories by genre
- Click stories to read full content
# Scrape 50 stories
scrapy crawl creepypasta -s CLOSESPIDER_ITEMCOUNT=50 -s ROBOTSTXT_OBEY=False
# Scrape with MongoDB
scrapy crawl creepypasta -s MONGODB_ENABLED=True -s MONGODB_URI="mongodb://localhost:27017"mime./
├── app.py # Flask web application
├── mime_scraper/ # Scrapy project
│ ├── items.py # Story data schema
│ ├── pipelines.py # JSON & MongoDB pipelines
│ ├── middlewares.py # User-agent rotation
│ ├── settings.py # Scrapy configuration
│ └── spiders/
│ └── creepypasta_spider.py # Main scraping logic
├── templates/
│ └── index.html # Main web page
├── static/
│ ├── css/style.css # Styling
│ └── js/app.js # Frontend JavaScript
├── outputs/ # Scraped data (JSONL files)
└── scrapy.cfg # Scrapy configuration
GET /- Main web interfacePOST /api/start-scraping- Start scraping processGET /api/status- Get scraping statusGET /api/stories- Get all scraped storiesGET /api/stories/<genre>- Get stories by genre
- Rate Limiting: 2-second delays between requests
- User Agent Rotation: Multiple browser user agents
- AutoThrottle: Enabled for respectful scraping
- Output: JSONL files in
outputs/directory
Set in mime_scraper/settings.py:
MONGODB_ENABLED = True
MONGODB_URI = "mongodb://localhost:27017"
MONGODB_DATABASE = "mime"
MONGODB_COLLECTION = "creepypasta"Stories are automatically classified into genres based on tags:
- Supernatural: Ghosts, Demons, Monsters
- Psychological: Mental Illness, Dreams, Nightmares
- Creature: Vampires, Werewolves, Zombies
- Sci-Fi: Aliens, Technology, Space
- Digital: Internet, Video Games, Computers
- Crime: Serial Killers, Murder
- Urban Legend: Folklore, Mythology
- And more...
- Backend: Python, Flask, Scrapy
- Frontend: HTML5, CSS3, JavaScript (ES6+)
- Database: MongoDB (optional)
- Styling: Custom CSS with Inter font
- Icons: Unicode emojis
This project is for educational purposes. Please respect the Creepypasta Wiki's terms of service and robots.txt when scraping.