Skip to content

hernandez-jc/TellMeAbout_Yourself_DigitalTech

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

TellMeAbout_Yourself_DigitalTech.csv

Natural Language Processing (NLP) Perspective

Public Dataset Demo: "Tell Me About Yourself" in Digital Tech Hiring

Author's Technical Showcase — Computational Linguistics & Semantic Analysis

This synthetical dataset demonstrates advanced NLP practices applied to hiring language. It highlights how word choice and framing create clear, separable semantic clusters that influence machine learning models.

Dataset Summary

  • File: TellMeAbout_Yourself_DigitalTech.csv
  • Content: Hundreds of "Tell me about yourself" responses from tech roles
  • Features: 17 binary linguistic indicators + fit_score + reasoning columns
  • Roles Covered: Software Engineering, Cybersecurity, Data Science, QA, Project Management, etc.

🎯 Linguistic Patterns Discovered

Emoji Category Trigger Examples fit_score Insight
🌍 DEI/Inclusive Framing inclusive, underserved, equity, psychological safety, community-driven 1 Strong positive signal in many HR-trained models
⚖️ Merit/Traditional Framing merit-based, same rules, clear hierarchy, no special treatment 0 Often interpreted as lower collaboration
🙏 Moral/Religious Framing God-given order, traditional family morality 0 Can activate ideology-sensitive filters
🔍 Neutral Technical Pure skills focus (rare) Variable Highest job-relevant signal quality

Key Technical Takeaways

  • Perfect separation with zero mixed rows → ideal for supervised classification
  • Role-neutral patterns across all digital tech positions
  • Balanced dataset (~50/50) suitable for training robust models

Implications for HR & AI Hiring Tools

  • Signal quality: Focus on technical merit over stylistic adaptation
  • Bias auditing: Detect when models reward phrasing instead of competence
  • Economic value: Reduce false negatives and mis-hiring costs
  • Responsible practice: Transparent, auditable, and economically grounded NLP

This is a technical demonstration of semantic modeling, feature engineering, and fairness-oriented analysis in NLP — built for performance and efficiency.

About

Synthetic dataset for evaluating bias and fairness in AI‑based hiring interviews, focused on “Tell me about yourself” responses in Digital‑Technology roles.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors