A universal tool for scraping and analyzing any Grokipedia or Wikipedia article for SEO optimization. Designed for use by humans and AI agents (like GitHub Copilot or Grok) to help articles rank on the first page of search results.
This repository contains tools for scraping and analyzing Grokipedia pages. The tool helps optimize pages for search engine ranking through comprehensive SEO analysis and crawler testing.
- Multi-method Scraping: Automatically tries direct access, Wayback Machine archives, and screenshot fallback
- Firewall/Paywall Workaround: Works around access restrictions using archived data
- SEO Analysis: Comprehensive analysis of page structure and optimization
- Universal: Works with any Grokipedia or Wikipedia article
- AI Agent Ready: Designed for automated use by AI tools
npm installEdit config.json to set your target article:
{
"url": "https://grokipedia.com/page/YOUR_ARTICLE_NAME",
"articleName": "Your Article Name",
"outputDir": "seo_reports",
"scrapeOutput": "scrape.html"
}Or use command-line arguments (see Usage section below).
# Use config.json
npm run scrape
# Or specify URL directly
node scrape.js https://grokipedia.com/page/YOUR_ARTICLE_NAME# Use config.json
npm run analyze
# Or specify URL directly
./seo_analyzer.sh https://grokipedia.com/page/YOUR_ARTICLE_NAMEThis will:
- Analyze page structure and SEO elements
- Test crawler compatibility (Googlebot, Bingbot)
- Generate comprehensive SEO report
- Provide optimization recommendations
The tool automatically handles access restrictions with a three-tier approach:
- Direct Access: Tries to fetch the page normally
- Wayback Machine: Falls back to the most recent archived snapshot from archive.org
- Screenshot Fallback: Captures a visual screenshot if all else fails
Use the dedicated CLI tool to fetch archived content:
./wayback_scrape.sh https://grokipedia.com/page/ARTICLE_NAME
./wayback_scrape.sh https://example.com/paywalled-article output.htmlThis is useful for:
- Getting around corporate firewalls
- Accessing paywalled content (archived versions)
- Retrieving deleted or modified content
- Ensuring consistent historical data
- Edit
config.json:
{
"url": "https://grokipedia.com/page/Albert_Einstein",
"articleName": "Albert Einstein",
"outputDir": "seo_reports",
"scrapeOutput": "scrape.html"
}- Run the tools:
npm run scrape
npm run analyzeScrape any article:
node scrape.js https://grokipedia.com/page/Albert_Einstein
node scrape.js https://grokipedia.com/page/World_War_II
node scrape.js https://en.wikipedia.org/wiki/Quantum_mechanicsAnalyze any URL:
./seo_analyzer.sh https://grokipedia.com/page/Albert_Einstein
./seo_analyzer.sh https://grokipedia.com/page/World_War_IIThe examples/ directory contains pre-configured settings for different articles:
# Use an example configuration
cp examples/albert_einstein_config.json config.json
npm run scrape
npm run analyze
# Or use them directly
node scrape.js --config=examples/world_war_ii_config.json
./seo_analyzer.sh --config=examples/quantum_mechanics_config.jsonSee examples/README.md for more details.
Quick examples using direct URLs:
# Scrape and analyze a science article
node scrape.js https://grokipedia.com/page/Quantum_mechanics
./seo_analyzer.sh https://grokipedia.com/page/Quantum_mechanics
# Scrape and analyze a historical event
node scrape.js https://grokipedia.com/page/Moon_landing
./seo_analyzer.sh https://grokipedia.com/page/Moon_landing
# Scrape and analyze a biography
node scrape.js https://grokipedia.com/page/Marie_Curie
./seo_analyzer.sh https://grokipedia.com/page/Marie_CurieSee SEO_CRAWLER_GUIDE.md for comprehensive documentation on:
- Search engine crawler testing (Googlebot, Bingbot, spider crawl)
- SEO element validation and optimization
- Command-line tools for improving search rankings
- Best practices for first-page ranking alongside Wikipedia
Replace YOUR_ARTICLE_URL with your target article:
# Test as Googlebot
curl -A "Mozilla/5.0 (compatible; Googlebot/2.1)" \
YOUR_ARTICLE_URL > googlebot_test.html
# Analyze SEO elements
curl -s YOUR_ARTICLE_URL | \
grep -E "<title>|<meta name=\"description\"|<h1" | head -10
# Check page speed
time curl -o /dev/null -s YOUR_ARTICLE_URL# Replace YOUR_ARTICLE_NAME with your target article
curl https://grokipedia.com/page/YOUR_ARTICLE_NAME > scrape.htmlOr extract specific elements:
curl https://grokipedia.com/page/YOUR_ARTICLE_NAME 2>/dev/null | \
grep -oP '(?<=<p[^>]*>).*?(?=</p>)'# Use config.json
node scrape.js
# Or specify URL directly
node scrape.js https://grokipedia.com/page/YOUR_ARTICLE_NAMEThe config.json file controls the default behavior of the tool:
{
"url": "https://grokipedia.com/page/2012_Aurora_theater_shooting",
"articleName": "2012 Aurora Theater Shooting",
"outputDir": "seo_reports",
"scrapeOutput": "scrape.html"
}- url: The Grokipedia or Wikipedia article URL to analyze
- articleName: Human-readable name for the article (used in reports)
- outputDir: Directory where SEO analysis reports are saved (default:
seo_reports) - scrapeOutput: Filename for scraped HTML content (default:
scrape.html)
The tool uses this priority for determining which URL to use:
- Command-line argument (highest priority)
- Config file specified with
--config=path/to/config.json - Default
config.jsonin the root directory - Fallback to hardcoded default URL (lowest priority)
This allows maximum flexibility for different use cases.
scrape.js- Puppeteer script with automatic firewall/paywall workaroundsscrape.html- Template/output file for scraped contentconfig.json- Configuration file for article URL and settingspackage.json- Node.js dependencies
SEO_CRAWLER_GUIDE.md- Comprehensive SEO and crawler optimization guideseo_analyzer.sh- Automated SEO analysis script (works with any URL)wayback_scrape.sh- CLI tool for manual Wayback Machine scrapingseo_reports/- Generated SEO analysis reports (gitignored)
test_tool.sh- Test suite for verifying tool functionality
.gitignore- Excludes node_modules, reports, and temporary filesREADME.md- This file
The SEO analyzer checks:
- β HTTP status and accessibility
- β Title tag optimization (50-60 characters)
- β Meta description (150-160 characters)
- β Heading structure (H1, H2, H3 hierarchy)
- β Content length (1500+ words recommended)
- β Page load speed (<3 seconds)
- β Internal linking
- β Image alt text optimization
- β Mobile-friendliness (viewport meta tag)
- β Structured data (JSON-LD)
- β HTTPS security
- β Crawler compatibility (Googlebot, Bingbot)
The tool is designed to work with any Grokipedia or Wikipedia article. Simply configure the URL in config.json or pass it as a command-line argument. The repository name references a specific article but the tool itself is universal.
The tool comes pre-configured with the 2012 Aurora theater shooting article as a default example:
- URL:
https://grokipedia.com/page/2012_Aurora_theater_shooting - This is just a default - you can analyze any article by changing
config.jsonor using command-line arguments
To help any Grokipedia article rank on the first page:
- Content Quality: Ensure 1500+ words of unique, well-researched content
- Technical SEO: Fast load times, mobile-responsive, HTTPS enabled
- On-Page Optimization: Proper title, meta description, heading hierarchy
- Structured Data: Implement Schema.org Article markup
- Regular Updates: Keep content fresh with recent information
- Backlinks: Build quality backlinks from reputable sources
Run ./seo_analyzer.sh YOUR_ARTICLE_URL regularly to monitor SEO score and identify improvements.
This tool is designed to be used by AI agents like GitHub Copilot or Grok:
- Configure: Set the target article in
config.json - Analyze: Run
npm run analyzeto get SEO insights - Optimize: Use the recommendations to improve content and structure
- Monitor: Re-run analysis after changes to track improvements
AI agents can use this tool to:
- Automatically analyze multiple articles
- Generate SEO reports for comparison
- Identify optimization opportunities
- Track ranking improvements over time
- Google Search Console - Submit sitemap and monitor performance
- Bing Webmaster Tools - Submit to Bing index
- Schema.org - Structured data guidelines
- PageSpeed Insights - Analyze page performance