Turn a research brief concerning a brand or a broad topic into a scored, client-ready consumer research report.
Given a brand or topic, category, geography, and business questions, the pipeline collects evidence from online sources, normalizes it into a shared schema, filters for relevance, analyzes sentiment and themes, scores confidence from data, and generates a versioned DOCX deliverable.
This repo does not bundle commercial data, funded API credentials, or a free turnkey demo run. Running it requires your own API keys and your own data.
There are two ways to evaluate it:
Go to examples/outcomes/. Three complete studies are there:
| Study | Type | Items | Insights | NSS |
|---|---|---|---|---|
| Weight Loss Medication in India | Topic | 2,484 | 12 | +16.8% |
| Make in India | Brand | 3,516 | 12 | +24.0% |
| India Hair Colour | Topic | 8,969 | 16 | +22.3% |
Each folder has the research brief, a study README, the final DOCX report, and a manifest of what the pipeline produced.
Open any DOCX to see the full deliverable: executive summary, theme landscape, per-insight deep dives with radar charts, brand health score, and methodology disclosure.
See What you need below, then Quick Start.
- Accepts a brief: brand, category, geography, competitors, business questions
- Collects from Reddit, YouTube, NewsData.io, OpenAlex, Google Trends, Serper - or ingests any pre-collected JSON
- Normalizes all sources into a common schema with deterministic SHA-256 item IDs
- Filters irrelevant content using LLM classification (multilingual, handles Hindi/Hinglish)
- Extracts themes inductively from the corpus using a two-pass LLM approach - no predefined keyword lists
- Synthesizes one structured insight per theme (Observation / Insight / Implication / Recommendation)
- Scores each insight with data-driven confidence and signal metrics - no LLM judgment in scoring
- Generates charts and a versioned DOCX report
Themes emerge from the data. If consumers are discussing something the brief never anticipated - a cultural reference, a misinformation narrative, an untracked quality perception - it surfaces.
Required:
| Requirement | What it's for |
|---|---|
| Python 3.11+ | Runtime |
| Data | Your own pre-collected JSON, or API keys for the built-in collectors |
How analysis runs - two workflows:
The codebase supports two workflows for Stages 3-5 (filter, analyze, synthesize). Both produce the same output.
| Workflow | What does the LLM work | Cost | When to use |
|---|---|---|---|
| In-context (Claude Code session) | The Claude Code session itself reads items and writes classifications directly to disk | Included in your Claude Code / Claude Max subscription | Primary workflow. All three sample studies were produced this way. |
Automated pipeline (consumer-research run) |
External API calls via ANTHROPIC_API_KEY through utils/llm_client.py |
~$1-2 per run | Unattended batch runs without a Claude Code session open. |
Stages 0-2 (brief, collect, normalize) and 6-7 (score, report) are code-based and free in both workflows.
Optional collector API keys (all fail gracefully if absent):
| Key | Source | Free tier |
|---|---|---|
YOUTUBE_API_KEY |
YouTube comments | 10K quota/day |
NEWSDATA_API_KEY |
News articles | 200 credits/day |
SERPER_API_KEY |
Web search results | 2,500 queries total |
Reddit, OpenAlex, and Google Trends are free with no key.
Bring your own data: If you have pre-collected data in any JSON format (Twitter exports, Instagram scrapes, vendor feeds), the pipeline ingests it directly via consumer-research ingest. See docs/running_with_your_own_data.md.
git clone https://github.com/joleneann/sentiment-research-report.git
cd sentiment-research-report
pip install -r requirements.txtOption 1: Bring your own data (no API keys needed)
pip install -e .
# Ingest your JSON into a run directory (Stages 0-2)
consumer-research ingest \
--data data/my_collected_data.json \
--brand "Brand Name" \
--category "Product Category"
# Stages 3-5 run in your Claude Code session (in-context analysis)
# Stage 6-7: score and generate the DOCX report
consumer-research score-report [run_id]Option 2: Run the full automated pipeline (requires ANTHROPIC_API_KEY)
consumer-research run \
--brand "Brand Name" \
--category "Product Category" \
--geo IN \
--objectives "What do consumers think about quality?"Regenerate a report from an existing run
consumer-research regenerate [run_id]Eight stages, each writing artifacts to runs/<run_id>/. If the pipeline fails, resume from the last completed stage.
- Brief - structure the research question
- Collect - pull from 6+ sources at max limits (or ingest your own JSON)
- Normalize - deduplicate, engagement filter, common schema
- Filter - LLM relevance classification (multilingual)
- Analyze - sentiment, Plutchik emotion, ABSA, two-pass theme extraction, narrative review
- Synthesize - one structured insight per theme (via dedicated skill with fresh context window)
- Score - confidence (5 factors) + signal strength (4 factors) + brand health
- Report - charts + versioned DOCX
Every insight receives two independent scores, both computed entirely from data:
Confidence (how sure we are this is real):
- Sample size (log-scaled relative to corpus)
- Source diversity (Herfindahl index with balance penalty)
- Temporal consistency (evenness across time quartiles)
- Internal agreement (sentiment consensus)
- Data recency (exponential decay)
Signal Strength (how loud this is in the data):
- Prevalence (% of corpus)
- Engagement level (percentile-ranked across insights)
- Sentiment intensity
- Conversation depth
Insights are ranked by confidence (primary) and signal strength (tiebreaker). Both scores shown as percentages in the report.
Brand Health Score (0-100): Sentiment (30%) + Engagement (25%) + Advocacy (20%) + Resilience (15%) + Conversation (10%).
There are three layers. Each has a different purpose.
Product interface - the packaged CLI:
consumer-research ingest Ingest external JSON data (Stages 0-2)
consumer-research run Full automated pipeline (Stages 0-7, requires API key)
consumer-research score-report Re-score + regenerate report (no API)
consumer-research regenerate Regenerate DOCX only (no API)
This is the canonical way to use the system. Installed via pip install -e ..
Analysis skills - Claude Code skills for LLM-intensive stages:
skills/
narrative-review.md Stage 4c: discover narrative patterns keywords miss
synthesize.md Stage 5: generate decision-grade insights per theme
These run as sub-agents with fresh context windows during in-context analysis. Each takes a run_id and reads config dynamically - works for any study. The synthesis skill produces measurably better insights than in-context synthesis (validated in a controlled experiment comparing output quality across 12 themes).
Operator tools - recovery and maintenance scripts:
scripts/resume_stage3.py Resume pipeline from Stage 3
scripts/resume_stage4.py Resume pipeline from Stage 4
scripts/rescore.py Re-score insights from a specific run
scripts/resynthesize.py Re-run synthesis + scoring
scripts/fix_quotes.py Fix representative quotes per theme
scripts/add_narrative_themes.py Add narrative themes missed by keyword pass
scripts/export_to_excel.py Export run data to Excel
These are for operators recovering from failures or re-running specific stages on existing data. They accept run IDs as arguments and load config from the run's own config.json.
Example study workflows - demonstrations of completed research:
examples/outcomes/ Four complete studies with DOCX reports
examples/briefs/ Research brief JSONs
examples/studies/weight_loss/ Study-specific analysis scripts
The study scripts (run_weight_loss.py, stage4_analysis.py, stage5_synthesis.py) are intentionally custom - they show how a specific study was conducted, not how to build a generic pipeline. Evaluate the product through the CLI and the outcomes, not through the study scripts.
Source code:
consumer_research/ Core package: collectors, pipeline, report generation
tests/ Unit + smoke + integration tests (126 tests, no API calls)
docs/ Methodology, product docs, running guide
CLAUDE.md- Architecture, procedures, failure modes (developer reference)docs/methodology.docx- Research methodology (client-facing)docs/product_documentation.docx- Product guidedocs/running_with_your_own_data.md- Bring-your-own-data guide
Social listening methodology has structural limitations that cannot be fully eliminated:
| # | Limitation | Severity | Fixable? | Status |
|---|---|---|---|---|
| 1 | Query framing bias - You find what you search for. No adversarial queries. | CRITICAL | Partially | Not implemented |
| 2 | Platform demographic bias - No weighting for platform skew. | MEDIUM | Partially | Not implemented |
| 3 | No sampling frame - Prevalence is within-corpus only, never projectable. | HIGH | No | Inherent limitation |
| 4 | Engagement filter excludes silent majority - Hard thresholds removed. Collectors still fetch visibility-ranked content from platforms. | MEDIUM | Partially | MITIGATED |
| 5 | No bot/astroturf detection - Zero coordinated campaign detection. | MEDIUM | Yes | Not implemented |
| 6 | Sarcasm/irony misclassification - Keyword mode reads sarcasm as literal. | MEDIUM | Partially | Not implemented |
| 7 | Influencer vs authentic voice conflated - No sponsored content detection. | MEDIUM | Yes | Not implemented |
| 8 | Near-duplicate inflation - Paraphrases not caught by SHA-256 dedup. | LOW | Yes | Not implemented |
| 9 | Language classification accuracy unvalidated - LLM processes any language but accuracy only validated on English/Hindi. | LOW | Yes | Not implemented |
| 10 | No temporal weighting in theme extraction - Old viral threads count equally. | LOW | Yes | Not implemented |
| 11 | Cross-theme interactions not surfaced - Multi-coded items split, never analysed jointly. | MEDIUM | Yes | Not implemented |
| 12 | No reliable demographic data - No verified age, gender, or location from any platform. | CRITICAL | No | Inherent limitation |
| 13 | LLM non-determinism in theme extraction - Same corpus can produce slightly different themes across runs due to model inference variance. | MEDIUM | Partially | MITIGATED |
| 14 | Geographic relevance not verified - No verified geolocation; 2-3% of corpus from non-target markets. | LOW | Partially | Not implemented |
Full details with "How to Fix" column in CLAUDE.md.
MIT