This repository contains Python scripts for collecting and analyzing sentiment in Reddit posts and comments from Autism and ADHD-focused communities.
This project analyzes sentiment patterns in discussions about Autism and ADHD on Reddit, using VADER (Valence Aware Dictionary and sEntiment Reasoner) sentiment analysis. The analysis covers posts and comments from eight subreddits:
Autism-focused:
- r/autism
- r/aspergers
- r/aspergirls
- r/AutisticAdults
ADHD-focused:
- r/ADHD
- r/ADHDmemes
- r/adhdwomen
- r/adhd_anxiety
Data collection began on 2026-04-09 via Tor (with automatic exit-node rotation). The weekly GitHub Actions workflow continues to grow this dataset over time.
| Metric | Value |
|---|---|
| Total posts | 6,055 |
| Unique redditors (posts) | 4,421 (counted by hashed author ID) |
| Autism-community posts | 5,262 (r/autism, r/aspergers, r/aspergirls, r/AutisticAdults) |
| ADHD-community posts | 793 (r/ADHD, r/ADHDmemes, r/adhdwomen, r/adhd_anxiety) |
| Date range | 2015-03-29 → 2026-04-14 |
| Sentiment | Count | % |
|---|---|---|
| Positive (score ≥ 0.05) | 3,247 | 53.6% |
| Negative (score ≤ −0.05) | 2,277 | 37.6% |
| Neutral | 531 | 8.8% |
Average compound score: 0.138 (mildly positive overall)
| Community | Avg. Sentiment |
|---|---|
| Autism | 0.142 |
| ADHD | 0.107 |
| Subreddit | Avg. Sentiment |
|---|---|
| r/adhdwomen | +0.243 (most positive) |
| r/aspergirls | +0.207 |
| r/AutisticAdults | +0.156 |
| r/autism | +0.136 |
| r/ADHD | +0.134 |
| r/aspergers | +0.087 |
| r/ADHDmemes | +0.064 |
| r/adhd_anxiety | -0.015 (only subreddit with net-negative avg.) |
Most positive post: "Does autism make it harder to get over someone?..." (r/aspergirls, sentiment 0.9999)
Most negative post: "anxiety, depression, IBS, ADHD, but no proper relief from pills?" (r/adhd_anxiety, sentiment −0.9997)
Note: This dataset is automatically updated weekly via GitHub Actions. Statistics shown reflect the most recent analysis run.
After running analyze_sentiment.py, the following visualization files will be generated:
Comprehensive dashboard showing pie charts of sentiment distribution, time series of sentiment trends, and category comparisons.
Detailed comparison of Autism vs ADHD communities over time.
Individual subreddit sentiment comparisons.
Reddit data is collected via the official Reddit JSON API using collect_reddit_data.py.
The script paginates through the new and top listings for each subreddit and fetches
top-level comments for each post. A GitHub Actions workflow runs the collection automatically
every Sunday so the dataset stays up-to-date.
Author usernames are SHA-256 hashed (16-char prefix, stored as author_hash) so that
no raw Reddit usernames are committed to the repository; uniqueness is preserved for counting
contributors.
The data collection targets:
- Recent posts (submissions) via Reddit's
newlisting (~100–1000 per subreddit) - Additional high-ranking posts surfaced by Reddit's
top?t=alllisting (often older) - Top-level comments on the collected posts
This provides a broad sample of recent and historically notable discussions.
Note that the new and top listings are each capped at ~1000 items by Reddit, so
this does not guarantee complete historical coverage — the dataset grows incrementally
with each weekly run.
The project uses vaderSentiment, a lexicon and rule-based sentiment analysis tool specifically attuned to social media text. VADER provides:
- Compound scores ranging from -1 (most negative) to +1 (most positive)
- Classification into positive, negative, or neutral categories
- Good performance on short social media texts
Sentiment Categories:
- Positive: Compound score ≥ 0.05
- Negative: Compound score ≤ -0.05
- Neutral: Compound score between -0.05 and 0.05
The data/ directory contains historical Reddit data from Academic Torrents, providing a comprehensive baseline of posts and comments from the target subreddits.
Source: Reddit r/datasets NDJSON.zst dumps
| Subreddit | Submissions | Size (MB) | Comments | Size (MB) |
|---|---|---|---|---|
| r/ADHD | 1,050,855 | 355.5 | 9,475,358 | 1,698.4 |
| r/ADHDmemes | 16,336 | 7.6 | 195,510 | 24.8 |
| r/AutisticAdults | 56,449 | 32.3 | 804,993 | 166.7 |
| r/adhd_anxiety | 19,577 | 8.8 | 130,263 | 24.9 |
| r/adhdwomen | 253,927 | 129.4 | 4,199,060 | 793.3 |
| r/aspergers | 222,958 | 85.1 | 3,392,339 | 588.6 |
| r/aspergirls | 58,464 | 24.1 | 765,775 | 153.9 |
| r/autism | 496,232 | 229.8 | 7,056,844 | 1,138.5 |
| Total | 2,174,798 | 864.6 | 26,020,142 | 4,589.1 |
These archives provide historical context spanning multiple years of community discussions. New data collected in 2026 is stored separately in reddit_submissions_2026.csv and reddit_comments_2026.csv to clearly distinguish from the historical seed data.
reddit_AuDHD/
├── collect_reddit_data.py # Script to collect Reddit data via the official JSON API
├── import_seed_data.py # Import historical data from Academic Torrents
├── analyze_sentiment.py # Perform sentiment analysis and generate visualizations
├── update_readme_stats.py # Update README with latest statistics
├── requirements.txt # Python dependencies
├── data/ # Historical seed data: Zstandard-compressed (.zst) NDJSON archives (LFS-tracked)
│ ├── ADHD_submissions.zst # Submissions from r/ADHD (historical)
│ ├── ADHD_comments.zst # Comments from r/ADHD (historical)
│ ├── autism_submissions.zst # Submissions from r/autism (historical)
│ ├── autism_comments.zst # Comments from r/autism (historical)
│ └── ... # Other subreddit archives
├── reddit_submissions_2026.csv # 2026 submissions collected via API (NEW DATA)
├── reddit_comments_2026.csv # 2026 comments collected via API (NEW DATA)
├── reddit_submissions_with_sentiment_2026.csv # Submissions with sentiment scores (for analysis)
├── reddit_comments_with_sentiment_2026.csv # Comments with sentiment scores (for analysis)
├── sentiment_analysis_overview.png # Main visualization (generated by analyze_sentiment.py)
├── sentiment_by_category.png # Category comparison visualization (generated by analyze_sentiment.py)
├── sentiment_by_subreddit.png # Subreddit comparison visualization (generated by analyze_sentiment.py)
└── README.md # This file
Note: Historical seed data is stored as zstandard-compressed NDJSON archives in data/ and tracked with Git LFS. New data collected in 2026 is stored in separate CSV files to clearly distinguish from the historical archives. Analysis scripts read from both sources and combine them.
- Clone this repository:
git clone https://github.com/neon-ninja/reddit_AuDHD.git
cd reddit_AuDHD- Install required dependencies:
pip install -r requirements.txt# Full collection (~1000 posts per subreddit + comments)
# Saves new 2026 data to reddit_submissions_2026.csv and reddit_comments_2026.csv
python3 collect_reddit_data.py
# Seed collection: 1 page per subreddit (~100–200 posts each), no comments
python3 collect_reddit_data.py --seedThis fetches posts and comments from the target subreddits via the official Reddit JSON API
and saves new data to reddit_submissions_2026.csv and reddit_comments_2026.csv. The
historical seed data in data/*.zst archives remains unchanged. The script automatically
deduplicates by ID, ensuring efficient incremental updates.
Tor routing: Reddit blocks datacenter IPs (including GitHub-hosted runners). The script automatically detects and uses a local Tor SOCKS5 proxy (
127.0.0.1:9050) or aTOR_PROXYenvironment variable. If Reddit returns 429 or 403, the script automatically rotates the Tor exit node (restarts the Tor daemon) and retries, so collection continues uninterrupted.To use Tor locally:
sudo apt-get install -y tor torsocks sudo systemctl start tor TOR_PROXY=socks5h://127.0.0.1:9050 python3 collect_reddit_data.py --seed # or torsocks python3 collect_reddit_data.py
python3 analyze_sentiment.pyThis will:
- Load historical data from zst archives in
data/ - Load 2026 data from CSV files (if they exist)
- Combine and deduplicate the data by ID
- Analyze sentiment using VADER
- Generate visualizations
- Save results to CSV files with sentiment scores (for further analysis)
- Print summary statistics
A GitHub Actions workflow (.github/workflows/collect_data.yml) runs every Sunday at
midnight UTC. It installs Tor, waits for the circuit to bootstrap, then collects fresh
data via torsocks to bypass Reddit's datacenter IP block. The workflow then runs
sentiment analysis and commits the updated files back to the repository automatically.
The workflow can also be triggered manually from the Actions tab.
requests- HTTP library for API callspandas- Data manipulation and analysisvaderSentiment- Sentiment analysismatplotlib- Plotting and visualizationseaborn- Statistical data visualizationnumpy- Numerical computingtqdm- Progress bars
Insights will be updated as real Reddit data accumulates. Based on the literature and community observations, we expect to find:
These communities likely provide valuable emotional support and encouragement to members discussing their neurodevelopmental conditions, reflected in higher positive sentiment in comments vs. original posts.
The presence of negative sentiment in posts suggests users feel comfortable sharing struggles and challenges — essential for genuine peer support.
Comments are typically more positive than original posts, as community members actively provide support to those seeking help or sharing difficulties.
Both Autism and ADHD communities are expected to show broadly similar sentiment patterns, reflecting common themes of support, struggle, and community building across neurodivergent spaces.
- VADER Limitations: May not capture nuanced expressions specific to neurodivergent communication
- Context: Sentiment analysis can't fully understand context, sarcasm, or complex emotions
- Selection Bias: Only analyzes public Reddit posts from specific subreddits
- Tor Reliability: Tor exit nodes may be occasionally slow or temporarily unavailable; the workflow emits a warning rather than failing in those cases
- Implement topic modeling to identify key discussion themes
- Analyze sentiment changes around specific events or awareness campaigns
- Compare sentiment patterns during different times of day/week/year
- Investigate correlation between post engagement (score, comments) and sentiment
- Expand to additional neurodiversity-focused communities
This project is licensed under the MIT License - see the LICENSE file for details.
- VADER Sentiment Analysis tool by C.J. Hutto
- Reddit communities for creating supportive spaces for neurodivergent individuals
Contributions are welcome! Please feel free to submit a Pull Request.
For questions or feedback, please open an issue on GitHub.
This analysis is for research and educational purposes. All data is from public Reddit posts.


