LLM Chat Scraper – AI Share URL Extractor

Identify what people are querying AI for

A Python tool that scrapes chat content from live share URLs of ChatGPT, Claude, and Grok.

It first pulls URLs from the Web Archive CDX API, then it uses Playwright to open each live page, handle JavaScript-rendered content, strip out UI clutter, and save only the clean chat messages to a text file.

✨ Built for speed, simplicity, and fun – and of course, vibe coded using AI 🤖
⭐ If you found this useful, don’t forget to star the repo!

Features

🔎 Fetches share URLs from Web Archive CDX API
📂 Scrapes ChatGPT, Claude, and Grok share pages
🧹 Filters out UI/boilerplate text, saving only clean chat content
🎛️ Interactive CLI: scrape All, a Range, or a Number of URLs
🕵️ Random User-Agents + delays to avoid detection
⚡ Uses Playwright for robust JavaScript rendering

Installation

Clone the repo:

git clone [email protected]:andreasc1/LLM-Chat-Scraper-AI-Share-URL-Extractor.git
cd LLM-Chat-Scraper-AI-Share-URL-Extractor

Install dependencies:
```
pip install -r requirements.txt
```
Install Playwright browsers (first-time setup only):
```
playwright install
```

Run the script:

Interactive mode:

python scraper.py

Non-interactive mode (great for Docker):

# Scrape all sources, first 5 URLs each with 10 parallel workers
python scraper.py --source 0 --mode number --count 5 --parallel 10

# Scrape only ChatGPT, first 50 URLs with 20 parallel workers
python scraper.py --source 1 --mode number --count 50 --parallel 20

# Scrape Claude, URLs 50-100 with 5 parallel workers (be gentle)
python scraper.py --source 2 --mode range --range "50-100" --parallel 5

# Scrape all URLs from Grok with maximum speed
python scraper.py --source 3 --mode all --parallel 15

Script Arguments

`--source` (Source Selection)

Selects which chatbot platform to scrape:

0 - All sources (ChatGPT, Claude, and Grok)
1 - ChatGPT only
2 - Claude only
3 - Grok only

Example: --source 1 scrapes only ChatGPT share URLs

`--mode` (URL Selection Mode)

Determines how many URLs to scrape:

all - Scrape all found URLs (can be thousands)
range - Scrape a specific range of URLs (requires --range)
number - Scrape the first N URLs (requires --count)

Example: --mode number limits scraping to a specific count

`--range` (URL Range)

When using --mode range, specifies which URLs to scrape by position. Format: "start-end" (1-indexed)

Example: --range "50-100" scrapes URLs 50 through 100

`--count` (URL Count)

When using --mode number, specifies how many URLs to scrape from the beginning.

Example: --count 20 scrapes the first 20 URLs found

`--parallel` (Parallel Workers)

Number of concurrent browser instances for faster scraping.

Default: 10
Higher values = faster but more resource intensive
Lower values = slower but gentler on target sites

Example: --parallel 5 uses 5 concurrent workers

`--proxy` (Proxy Server)

Optional proxy server for requests (useful for privacy or bypassing restrictions). Supports SOCKS5 and HTTP proxies.

Example: --proxy socks5://127.0.0.1:9050 routes traffic through Tor You’ll be prompted to:

Select a source (ChatGPT, Claude, or Grok)

Choose whether to scrape All, a Range, or a Specific number of URLs

The script will fetch, scrape, and save results into a text file (e.g. scraped_content.txt).

Docker Usage

Interactive Mode (with Docker Compose)

For interactive usage where you want to select options via prompts:

# Build and run interactively
docker-compose up --build

# Or run without rebuilding
docker-compose up

This will start the container with an interactive terminal where you can select sources and scraping options.

Unattended Mode (with Docker Compose)

For automated/unattended usage with predefined parameters:

# Run with specific parameters (no interaction required)
docker-compose run --rm scraper python scraper.py --source 1 --mode number --count 20 --parallel 10

# Examples for different scenarios:
# Scrape all ChatGPT URLs with 15 parallel workers
docker-compose run --rm scraper python scraper.py --source 1 --mode all --parallel 15

# Scrape first 50 Claude URLs with 5 parallel workers
docker-compose run --rm scraper python scraper.py --source 2 --mode number --count 50 --parallel 5

# Scrape Grok URLs 100-200 with 8 parallel workers
docker-compose run --rm scraper python scraper.py --source 3 --mode range --range "100-200" --parallel 8

Manual Docker Build

If you prefer building and running manually:

# Build the image
docker build -t chat-scraper .

# Run unattended with custom parameters
docker run --rm -v $(pwd)/output:/app/output chat-scraper --source 0 --mode number --count 20 --parallel 10

# Run interactively
docker run --rm -it -v $(pwd)/output:/app/output chat-scraper

The output will be saved to the ./output/ directory on your host machine.

Demo:

Example output:

Fetching share URLs for ChatGPT...
✅ Found 103347 URLs for ChatGPT.
Scrape (A)ll, (R)ange, or (N)umber? R
Enter range (1-103347): 888-891

 Scraping: https://chatgpt.com/share/714ea0c0-04b4-40e4-8c02-2e0059b4d854
✅ Scraped successfully.

🔹 Scraping: https://chatgpt.com/share/675489e9-36e8-800e-a8b8-0d4d296a0a6b
✅ Scraped successfully.

Results output:

The cleaned results are saved in:

scraped_content.txt

⭐ If you found this useful, don’t forget to star the repo!

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
assets		assets
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt
scraper.py		scraper.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LLM Chat Scraper – AI Share URL Extractor

Features

Installation

Run the script:

Script Arguments

`--source` (Source Selection)

`--mode` (URL Selection Mode)

`--range` (URL Range)

`--count` (URL Count)

`--parallel` (Parallel Workers)

`--proxy` (Proxy Server)

Docker Usage

Interactive Mode (with Docker Compose)

Unattended Mode (with Docker Compose)

Manual Docker Build

Demo:

Example output:

Results output:

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

andreasc1/LLM-Chat-Scraper-AI-Share-URL-Extractor

Folders and files

Latest commit

History

Repository files navigation

LLM Chat Scraper – AI Share URL Extractor

Features

Installation

Run the script:

Script Arguments

--source (Source Selection)

--mode (URL Selection Mode)

--range (URL Range)

--count (URL Count)

--parallel (Parallel Workers)

--proxy (Proxy Server)

Docker Usage

Interactive Mode (with Docker Compose)

Unattended Mode (with Docker Compose)

Manual Docker Build

Demo:

Example output:

Results output:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

`--source` (Source Selection)

`--mode` (URL Selection Mode)

`--range` (URL Range)

`--count` (URL Count)

`--parallel` (Parallel Workers)

`--proxy` (Proxy Server)

Packages