ToneWeave: GenZ Persona Simulation

What if you could ask: “How would different types of Gen‑Z react to this?”
This prototype lets you try that in seconds. Type a message, pick a persona, and get two perspectives:

Public Reaction — how they’d likely respond out loud
Internal Thought — what they might actually feel inside

It’s simple, fast, and fun—built from real blog posts, emotion & sentiment models, light clustering, and a tiny retrieval + LLM layer.

✨ TL;DR — How it feels to use

Open the app, choose a persona (e.g., Love, Surprise, Fear, Neutral vibes).
Paste any statement (news headline, tweet, your scenario).
Click Find Reaction.
See the persona’s Public vs Private reactions side‑by‑side...grounded by similar past posts we found.

We don’t send matched text to the LLM...only attributes (emotions/sentiment/masking) once similarity is high—so the reaction stays persona‑aligned without copying anyone’s text.

🎨 The Frontend (single, friendly dashboard)

We keep the UI minimal so anyone can play with the idea quickly.

Live code (public):
https://github.com/tarashagarwal/genz-persona-simulation/blob/main/src/app/auth/(dashboard)/dashboard/page.tsx

What’s on the page

A persona picker (4 clusters discovered from data)
A text box for your input
A Find Reaction button to call the backend agent
Two result cards:
- Public Reaction (emotions + sentiment + masking applied)
- Private Reaction (Internal Thought) (emotions only)
Optional metadata: similarity score, top emotions, and whether masking seems likely

The dashboard is intentionally one page..no maze of routes. Paste text, pick persona, get reactions.

🧠 How it works (descriptive version)

We start with the Blog Authorship Corpus (Blogger.com): age, job, zodiac, post date, and text.
From date + age, we infer if the author would be Gen‑Z at the time of posting.
We enrich each post with emotions via GoEmotions (RoBERTa) and sentiment via a Reddit‑trained XLNet model.
If emotion and sentiment disagree, we flag it as masking (a realistic “I look fine, I feel awful” pattern).
We then discover 4 personas by clustering primarily on emotional tendencies.
Finally, we build four FAISS indexes...one per persona. When you type something, we:
1. search the chosen persona’s index,
2. if similarity is high,
3. pull its attributes (emotions/sentiment/masking), and
4. ask an LLM to write both the Public and Internal reactions in that style.

That’s it...no massive training jobs, just smart reuse of signals already in the data.

🧪 Gen‑Z Persona Simulation ... Design Notes

Data Sources & Rationale

We initially aimed for PANDORA Talks (demographics + personality): https://arxiv.org/pdf/2004.04460
Access required approval; we’ve requested it and may update later.
We used the Blog Authorship Corpus instead: https://u.cs.biu.ac.il/~koppel/BlogCorpus.htm
Helpful fields: post date, age (to infer Gen‑Z), job, and zodiac.
To enrich the text we used:
- GoEmotions (RoBERTa‑base): https://huggingface.co/SamLowe/roberta-base-go_emotions
- XLNet Reddit Sentiment: https://huggingface.co/minh21/XLNet-Reddit-Sentiment-Analysis

This gave us a workable dataset for clustering and persona design.

Modeling Choice & Justification

We didn’t train a new model from scratch (limited time & data). Instead, we:

Clustered posts into personas using emotion‑forward features,
Embedded comments and stored them per‑persona in FAISS,
Matched new inputs by vector similarity (with a cutoff),
Generated persona‑aligned reactions using an LLM (e.g., gpt‑4.1‑mini) with only the attributes.

Why this route?

Fine‑tuning wasn’t feasible quickly.
Vector similarity worked well in early trials.
It keeps infra light while still feeling persona‑aware.

Persona Design Assumptions

Personas largely reflect emotional tendencies (Love, Surprise, Fear, Neutral).
Zodiac/job/writing style didn’t produce strong clusters here.
Emotion vs Sentiment can diverge; we treat that as masking.

Ethical & Bias Considerations

LLM tone may soften strong language...true Gen‑Z speech can be spicier.
Zero‑shot models may miss slang/cultural nuance...fine‑tuning would help.
Sampling bias: Blogger.com ≠ a perfect Gen‑Z mirror.

Scaling Path to a Full Product

Data & Training

Persona sizes: 3,395 • 17,429 • 58 (sparse) • 75,757
Needs: bigger, fresher Gen‑Z text; balanced clusters; regular re‑indexing.

Infrastructure

Cloud GPUs for faster indexing/refresh.
Automated pipelines for fetch → enrich → cluster → index.
Cost per request tracking; start serverless, scale to GPU nodes if needed.

UX & Testing

Define a test set + confidence threshold (≥85%).
A/B test whether reactions feel relevant, authentic, and engaging.

Summary Vector search + LLMs can simulate persona reactions today with modest resources.
To ship it as a product, add: larger balanced datasets, Gen‑Z‑aware fine‑tuning, solid retraining pipelines, and ethics & trust checks.

📊 Snapshot of the Data & Personas

Rows processed: 96,639
Discovered personas: 4
Vibe summary (informal):
- P0 — Love‑forward: affectionate, upbeat, warm
- P1 — Amusement/Gratitude: playful, appreciative
- P2 — Fear/Nervous: anxious, sensitive (very small cluster)
- P3 — Neutral/Info‑sharing: mostly matter‑of‑fact posts

Full technical breakdown (silhouette, feature importances, per‑persona stats) lives in the generated artifacts: persona_cards.json, persona_assignments.csv, persona_pca_scatter.png.

🛠️ Setup (3 minutes)

Prerequisites

Node.js v20.19.1 (LTS) + npm
yarn v1.22.22
Python 3.10.9 + pip 25.1.1
OpenAI API key in genz-persona-simulation/agentic_logic/.env
```
OPENAI_API_KEY=sk-...
```

1) Start the Frontend (Next.js)

yarn
yarn dev
# http://localhost:3000

2) Build Indexes

cd agentic_logic
python build_index.py

3) Start the Backend (Flask + Agent Logic)

cd agentic_logic
pip install -r requirements.txt
python -m agent.HRAgent
# http://localhost:5000

4) Log in (dev)

Any valid email/password format works. Example:

Email:    [email protected]
Password: test12345678

Components & Ports

Component	Command	Port
Next.js Frontend	`yarn dev`	3000
Flask HR Agent	`python -m agent.HRAgent`	5000

🔗 Key Links

Blog Authorship Corpus — https://u.cs.biu.ac.il/~koppel/BlogCorpus.htm
GoEmotions blog — https://research.google/blog/goemotions-a-dataset-for-fine-grained-emotion-classification/
GoEmotions model (HF) — https://huggingface.co/SamLowe/roberta-base-go_emotions
Reddit Sentiment model (HF) — https://huggingface.co/minh21/XLNet-Reddit-Sentiment-Analysis
Processed dataset (HF) — https://huggingface.co/datasets/tarashagarwal/genz-persona-simulation
Frontend page (public) — https://github.com/tarashagarwal/genz-persona-simulation/blob/main/src/app/auth/(dashboard)/dashboard/page.tsx

🗂️ Repo Structure (high‑level)

genz-persona-simulation/
├─ agentic_logic/
│  ├─ agent/
│  │  └─ HRAgent.py            # Flask app entry (python -m agent.HRAgent)
│  └─ .env                     # OPENAI_API_KEY=...
├─ data_processing_code/
│  ├─ BuildBlogsData.py
│  ├─ BuildBlogDataWithSentiments.py
│  └─ get_personas.py
├─ persona_assignments.csv      # (generated)
├─ persona_cards.json           # (generated)
├─ persona_pca_scatter.png      # (generated)
└─ README.md

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
agentic_logic		agentic_logic
data_processing_code		data_processing_code
hr		hr
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
components.json		components.json
next-env.d.ts		next-env.d.ts
next.config.mjs		next.config.mjs
package-lock.json		package-lock.json
package.json		package.json
postcss.config.mjs		postcss.config.mjs
requirements.txt		requirements.txt
tailwind.config.ts		tailwind.config.ts
tsconfig.json		tsconfig.json
yarn.lock		yarn.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ToneWeave: GenZ Persona Simulation

✨ TL;DR — How it feels to use

🎨 The Frontend (single, friendly dashboard)

🧠 How it works (descriptive version)

🧪 Gen‑Z Persona Simulation ... Design Notes

Data Sources & Rationale

Modeling Choice & Justification

Persona Design Assumptions

Ethical & Bias Considerations

Scaling Path to a Full Product

📊 Snapshot of the Data & Personas

🛠️ Setup (3 minutes)

Prerequisites

1) Start the Frontend (Next.js)

2) Build Indexes

3) Start the Backend (Flask + Agent Logic)

4) Log in (dev)

Components & Ports

🔗 Key Links

🗂️ Repo Structure (high‑level)

About

Uh oh!

Releases

Packages

Languages

License

tarashagarwal/genz-persona-simulation

Folders and files

Latest commit

History

Repository files navigation

ToneWeave: GenZ Persona Simulation

✨ TL;DR — How it feels to use

🎨 The Frontend (single, friendly dashboard)

🧠 How it works (descriptive version)

🧪 Gen‑Z Persona Simulation ... Design Notes

Data Sources & Rationale

Modeling Choice & Justification

Persona Design Assumptions

Ethical & Bias Considerations

Scaling Path to a Full Product

📊 Snapshot of the Data & Personas

🛠️ Setup (3 minutes)

Prerequisites

1) Start the Frontend (Next.js)

2) Build Indexes

3) Start the Backend (Flask + Agent Logic)

4) Log in (dev)

Components & Ports

🔗 Key Links

🗂️ Repo Structure (high‑level)

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages