Brand or Topic Sentiment Report Generator

Turn a research brief concerning a brand or a broad topic into a scored, client-ready consumer research report.

Given a brand or topic, category, geography, and business questions, the pipeline collects evidence from online sources, normalizes it into a shared schema, filters for relevance, analyzes sentiment and themes, scores confidence from data, and generates a versioned DOCX deliverable.

Evaluation

This repo does not bundle commercial data, funded API credentials, or a free turnkey demo run. Running it requires your own API keys and your own data.

There are two ways to evaluate it:

Path A: Inspect the outcomes (no setup needed)

Go to examples/outcomes/. Three complete studies are there:

Study	Type	Items	Insights	NSS
Weight Loss Medication in India	Topic	2,484	12	+16.8%
Make in India	Brand	3,516	12	+24.0%
India Hair Colour	Topic	8,969	16	+22.3%

Each folder has the research brief, a study README, the final DOCX report, and a manifest of what the pipeline produced.

Open any DOCX to see the full deliverable: executive summary, theme landscape, per-insight deep dives with radar charts, brand health score, and methodology disclosure.

Path B: Run it with your own data and credentials

See What you need below, then Quick Start.

What This Does

Accepts a brief: brand, category, geography, competitors, business questions
Collects from Reddit, YouTube, NewsData.io, OpenAlex, Google Trends, Serper - or ingests any pre-collected JSON
Normalizes all sources into a common schema with deterministic SHA-256 item IDs
Filters irrelevant content using LLM classification (multilingual, handles Hindi/Hinglish)
Extracts themes inductively from the corpus using a two-pass LLM approach - no predefined keyword lists
Synthesizes one structured insight per theme (Observation / Insight / Implication / Recommendation)
Scores each insight with data-driven confidence and signal metrics - no LLM judgment in scoring
Generates charts and a versioned DOCX report

Themes emerge from the data. If consumers are discussing something the brief never anticipated - a cultural reference, a misinformation narrative, an untracked quality perception - it surfaces.

What You Need to Run This

Required:

Requirement	What it's for
Python 3.11+	Runtime
Data	Your own pre-collected JSON, or API keys for the built-in collectors

How analysis runs - two workflows:

The codebase supports two workflows for Stages 3-5 (filter, analyze, synthesize). Both produce the same output.

Workflow	What does the LLM work	Cost	When to use
In-context (Claude Code session)	The Claude Code session itself reads items and writes classifications directly to disk	Included in your Claude Code / Claude Max subscription	Primary workflow. All three sample studies were produced this way.
Automated pipeline (`consumer-research run`)	External API calls via `ANTHROPIC_API_KEY` through `utils/llm_client.py`	~$1-2 per run	Unattended batch runs without a Claude Code session open.

Stages 0-2 (brief, collect, normalize) and 6-7 (score, report) are code-based and free in both workflows.

Optional collector API keys (all fail gracefully if absent):

Key	Source	Free tier
`YOUTUBE_API_KEY`	YouTube comments	10K quota/day
`NEWSDATA_API_KEY`	News articles	200 credits/day
`SERPER_API_KEY`	Web search results	2,500 queries total

Reddit, OpenAlex, and Google Trends are free with no key.

Bring your own data: If you have pre-collected data in any JSON format (Twitter exports, Instagram scrapes, vendor feeds), the pipeline ingests it directly via consumer-research ingest. See docs/running_with_your_own_data.md.

Quick Start

git clone https://github.com/joleneann/sentiment-research-report.git
cd sentiment-research-report
pip install -r requirements.txt

Option 1: Bring your own data (no API keys needed)

pip install -e .

# Ingest your JSON into a run directory (Stages 0-2)
consumer-research ingest \
  --data data/my_collected_data.json \
  --brand "Brand Name" \
  --category "Product Category"

# Stages 3-5 run in your Claude Code session (in-context analysis)
# Stage 6-7: score and generate the DOCX report
consumer-research score-report [run_id]

Option 2: Run the full automated pipeline (requires ANTHROPIC_API_KEY)

consumer-research run \
  --brand "Brand Name" \
  --category "Product Category" \
  --geo IN \
  --objectives "What do consumers think about quality?"

Regenerate a report from an existing run

consumer-research regenerate [run_id]

Pipeline

Eight stages, each writing artifacts to runs/<run_id>/. If the pipeline fails, resume from the last completed stage.

Brief - structure the research question
Collect - pull from 6+ sources at max limits (or ingest your own JSON)
Normalize - deduplicate, engagement filter, common schema
Filter - LLM relevance classification (multilingual)
Analyze - sentiment, Plutchik emotion, ABSA, two-pass theme extraction, narrative review
Synthesize - one structured insight per theme (via dedicated skill with fresh context window)
Score - confidence (5 factors) + signal strength (4 factors) + brand health
Report - charts + versioned DOCX

Scoring

Every insight receives two independent scores, both computed entirely from data:

Confidence (how sure we are this is real):

Sample size (log-scaled relative to corpus)
Source diversity (Herfindahl index with balance penalty)
Temporal consistency (evenness across time quartiles)
Internal agreement (sentiment consensus)
Data recency (exponential decay)

Signal Strength (how loud this is in the data):

Prevalence (% of corpus)
Engagement level (percentile-ranked across insights)
Sentiment intensity
Conversation depth

Insights are ranked by confidence (primary) and signal strength (tiebreaker). Both scores shown as percentages in the report.

Brand Health Score (0-100): Sentiment (30%) + Engagement (25%) + Advocacy (20%) + Resilience (15%) + Conversation (10%).

Project Structure

There are three layers. Each has a different purpose.

Product interface - the packaged CLI:

consumer-research ingest        Ingest external JSON data (Stages 0-2)
consumer-research run           Full automated pipeline (Stages 0-7, requires API key)
consumer-research score-report  Re-score + regenerate report (no API)
consumer-research regenerate    Regenerate DOCX only (no API)

This is the canonical way to use the system. Installed via pip install -e ..

Analysis skills - Claude Code skills for LLM-intensive stages:

skills/
  narrative-review.md             Stage 4c: discover narrative patterns keywords miss
  synthesize.md                   Stage 5: generate decision-grade insights per theme

These run as sub-agents with fresh context windows during in-context analysis. Each takes a run_id and reads config dynamically - works for any study. The synthesis skill produces measurably better insights than in-context synthesis (validated in a controlled experiment comparing output quality across 12 themes).

Operator tools - recovery and maintenance scripts:

scripts/resume_stage3.py        Resume pipeline from Stage 3
scripts/resume_stage4.py        Resume pipeline from Stage 4
scripts/rescore.py              Re-score insights from a specific run
scripts/resynthesize.py         Re-run synthesis + scoring
scripts/fix_quotes.py           Fix representative quotes per theme
scripts/add_narrative_themes.py Add narrative themes missed by keyword pass
scripts/export_to_excel.py      Export run data to Excel

These are for operators recovering from failures or re-running specific stages on existing data. They accept run IDs as arguments and load config from the run's own config.json.

Example study workflows - demonstrations of completed research:

examples/outcomes/              Four complete studies with DOCX reports
examples/briefs/                Research brief JSONs
examples/studies/weight_loss/   Study-specific analysis scripts

The study scripts (run_weight_loss.py, stage4_analysis.py, stage5_synthesis.py) are intentionally custom - they show how a specific study was conducted, not how to build a generic pipeline. Evaluate the product through the CLI and the outcomes, not through the study scripts.

Source code:

consumer_research/              Core package: collectors, pipeline, report generation
tests/                          Unit + smoke + integration tests (126 tests, no API calls)
docs/                           Methodology, product docs, running guide

Documentation

CLAUDE.md - Architecture, procedures, failure modes (developer reference)
docs/methodology.docx - Research methodology (client-facing)
docs/product_documentation.docx - Product guide
docs/running_with_your_own_data.md - Bring-your-own-data guide

Known Limitations

Social listening methodology has structural limitations that cannot be fully eliminated:

#	Limitation	Severity	Fixable?	Status
1	Query framing bias - You find what you search for. No adversarial queries.	CRITICAL	Partially	Not implemented
2	Platform demographic bias - No weighting for platform skew.	MEDIUM	Partially	Not implemented
3	No sampling frame - Prevalence is within-corpus only, never projectable.	HIGH	No	Inherent limitation
4	Engagement filter excludes silent majority - Hard thresholds removed. Collectors still fetch visibility-ranked content from platforms.	MEDIUM	Partially	MITIGATED
5	No bot/astroturf detection - Zero coordinated campaign detection.	MEDIUM	Yes	Not implemented
6	Sarcasm/irony misclassification - Keyword mode reads sarcasm as literal.	MEDIUM	Partially	Not implemented
7	Influencer vs authentic voice conflated - No sponsored content detection.	MEDIUM	Yes	Not implemented
8	Near-duplicate inflation - Paraphrases not caught by SHA-256 dedup.	LOW	Yes	Not implemented
9	Language classification accuracy unvalidated - LLM processes any language but accuracy only validated on English/Hindi.	LOW	Yes	Not implemented
10	No temporal weighting in theme extraction - Old viral threads count equally.	LOW	Yes	Not implemented
11	Cross-theme interactions not surfaced - Multi-coded items split, never analysed jointly.	MEDIUM	Yes	Not implemented
12	No reliable demographic data - No verified age, gender, or location from any platform.	CRITICAL	No	Inherent limitation
13	LLM non-determinism in theme extraction - Same corpus can produce slightly different themes across runs due to model inference variance.	MEDIUM	Partially	MITIGATED
14	Geographic relevance not verified - No verified geolocation; 2-3% of corpus from non-target markets.	LOW	Partially	Not implemented

Full details with "How to Fix" column in CLAUDE.md.

License

MIT

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Brand or Topic Sentiment Report Generator

Evaluation

Path A: Inspect the outcomes (no setup needed)

Path B: Run it with your own data and credentials

What This Does

What You Need to Run This

Quick Start

Pipeline

Scoring

Project Structure

Documentation

Known Limitations

License

About

Uh oh!

Releases 1

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 89 Commits
consumer_research		consumer_research
docs		docs
examples		examples
scripts		scripts
skills		skills
tests		tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Brand or Topic Sentiment Report Generator

Evaluation

Path A: Inspect the outcomes (no setup needed)

Path B: Run it with your own data and credentials

What This Does

What You Need to Run This

Quick Start

Pipeline

Scoring

Project Structure

Documentation

Known Limitations

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Contributors

Uh oh!

Languages