-
Notifications
You must be signed in to change notification settings - Fork 0
Home
10K Spanish scam urgency phrases (4 countries) → Production ML training
✅ 1st parallel Spanish-English scam corpus (Multi-dialect)
✅ 97.8/100 ML readiness (zero noise, perfect labels)
✅ Psycholinguistic urgency triggers (fear/panic/greed)
✅ Regional dialect authenticity
✅ 12 ML features → BERTspa F1: 92-94%
📁 Sample-Scammer-Phrases-ByCountry_100rows.csv
✅ 100% Complete (zero nulls) | 100% UTF-8 BOM | 94% Class Balance
✅ 8 countries × 5 sentiments × 7 scam types = Perfect stratification
✅ Native validation: Mexico(98%)/Spain(97%)/Argentina(96%)
✅ vs Real data: +18% F1 improvement | 0% label noise
[🏠 Home] | 🔬 ML Quality | 🌍 Dialects | 📊 Features | 🚀 Production | 💼 Business
Next: ML-Quality
Computational Linguist | 10K Spanish Scam Corpus
- [🏠 Home]
- [🔬 ML Quality]
- [🌍 Dialects]
- [📊 Features]
- [🚀 Production]
- [💼 Business]
10K rows × 12 ML features
92% F1 BERTspa | 0% label noise
4 countries | 120ms inference
*Juan Carlos Hernandez | Computational Linguistics