mime. - Creepypasta Scraper

A modern web application for scraping and browsing creepypasta stories from the Creepypasta Wiki.

Features

🕷️ Web Scraping: Scrapes stories from Creepypasta Wiki All Pages
🎨 Modern UI: Black/white/red color scheme with responsive design
📊 Real-time Progress: Live progress tracking during scraping
🏷️ Genre Classification: Automatic genre detection and filtering
📱 Responsive: Works on desktop and mobile devices
💾 Data Export: JSONL output with optional MongoDB storage

Quick Start

1. Install Dependencies

cd "/Users/ahnaflabib/Documents/Projects/mime."
source .venv/bin/activate
pip install -r requirements.txt  # If you create one

2. Run the Web Application

python app.py

3. Open in Browser

Navigate to: http://localhost:5000

4. Start Scraping

Click the "Start Scraping" button
Watch real-time progress
Browse stories by genre
Click stories to read full content

Manual Scraping (Command Line)

# Scrape 50 stories
scrapy crawl creepypasta -s CLOSESPIDER_ITEMCOUNT=50 -s ROBOTSTXT_OBEY=False

# Scrape with MongoDB
scrapy crawl creepypasta -s MONGODB_ENABLED=True -s MONGODB_URI="mongodb://localhost:27017"

Project Structure

mime./
├── app.py                          # Flask web application
├── mime_scraper/                   # Scrapy project
│   ├── items.py                    # Story data schema
│   ├── pipelines.py                # JSON & MongoDB pipelines
│   ├── middlewares.py              # User-agent rotation
│   ├── settings.py                 # Scrapy configuration
│   └── spiders/
│       └── creepypasta_spider.py  # Main scraping logic
├── templates/
│   └── index.html                  # Main web page
├── static/
│   ├── css/style.css              # Styling
│   └── js/app.js                  # Frontend JavaScript
├── outputs/                        # Scraped data (JSONL files)
└── scrapy.cfg                      # Scrapy configuration

API Endpoints

GET / - Main web interface
POST /api/start-scraping - Start scraping process
GET /api/status - Get scraping status
GET /api/stories - Get all scraped stories
GET /api/stories/<genre> - Get stories by genre

Configuration

Scrapy Settings

Rate Limiting: 2-second delays between requests
User Agent Rotation: Multiple browser user agents
AutoThrottle: Enabled for respectful scraping
Output: JSONL files in outputs/ directory

MongoDB (Optional)

Set in mime_scraper/settings.py:

MONGODB_ENABLED = True
MONGODB_URI = "mongodb://localhost:27017"
MONGODB_DATABASE = "mime"
MONGODB_COLLECTION = "creepypasta"

Genre Classification

Stories are automatically classified into genres based on tags:

Supernatural: Ghosts, Demons, Monsters
Psychological: Mental Illness, Dreams, Nightmares
Creature: Vampires, Werewolves, Zombies
Sci-Fi: Aliens, Technology, Space
Digital: Internet, Video Games, Computers
Crime: Serial Killers, Murder
Urban Legend: Folklore, Mythology
And more...

Technology Stack

Backend: Python, Flask, Scrapy
Frontend: HTML5, CSS3, JavaScript (ES6+)
Database: MongoDB (optional)
Styling: Custom CSS with Inter font
Icons: Unicode emojis

License

This project is for educational purposes. Please respect the Creepypasta Wiki's terms of service and robots.txt when scraping.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.cursor/plans		.cursor/plans
.venv		.venv
__pycache__		__pycache__
backup		backup
mime_scraper		mime_scraper
outputs		outputs
scripts		scripts
static		static
templates		templates
.env.example		.env.example
.gitignore		.gitignore
MONGODB_SETUP.md		MONGODB_SETUP.md
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt
scrapy.cfg		scrapy.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

mime. - Creepypasta Scraper

Features

Quick Start

1. Install Dependencies

2. Run the Web Application

3. Open in Browser

4. Start Scraping

Manual Scraping (Command Line)

Project Structure

API Endpoints

Configuration

Scrapy Settings

MongoDB (Optional)

Genre Classification

Technology Stack

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

mime. - Creepypasta Scraper

Features

Quick Start

1. Install Dependencies

2. Run the Web Application

3. Open in Browser

4. Start Scraping

Manual Scraping (Command Line)

Project Structure

API Endpoints

Configuration

Scrapy Settings

MongoDB (Optional)

Genre Classification

Technology Stack

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages