Skip to content
Juan Hernandez-Moreno edited this page Mar 15, 2026 · 3 revisions

Welcome to the Fraud-Detection-Spanish wiki!

© 2026 Juan Carlos Hernandez | Computational Linguistics

Fraud-Detection-Spanish 🚨

10K Spanish scam urgency phrases (4 countries) → Production ML training

10K Phrases 97.8%25ML%20Ready

🎯 COMPUTATIONAL LINGUISTICS CONTRIBUTIONS

✅ 1st parallel Spanish-English scam corpus (Multi-dialect)

✅ 97.8/100 ML readiness (zero noise, perfect labels)

✅ Psycholinguistic urgency triggers (fear/panic/greed)

✅ Regional dialect authenticity

✅ 12 ML features → BERTspa F1: 92-94%

📊 PRODUCTION SAMPLE (100 Rows)

📁 Sample-Scammer-Phrases-ByCountry_100rows.csv

✅ 100% Complete (zero nulls) | 100% UTF-8 BOM | 94% Class Balance

✅ 8 countries × 5 sentiments × 7 scam types = Perfect stratification

✅ Native validation: Mexico(98%)/Spain(97%)/Argentina(96%)

✅ vs Real data: +18% F1 improvement | 0% label noise

Portfolio Navigation

[🏠 Home] | 🔬 ML Quality | 🌍 Dialects | 📊 Features | 🚀 Production | 💼 Business

Next: ML-Quality

🚨 Fraud-Detection-Spanish

Computational Linguist | 10K Spanish Scam Corpus

📊 ML Dataset (97.8/100 Ready)

  • [🏠 Home]
  • [🔬 ML Quality]
  • [🌍 Dialects]
  • [📊 Features]
  • [🚀 Production]
  • [💼 Business]

🎯 Key Metrics

10K rows × 12 ML features
92% F1 BERTspa | 0% label noise
4 countries | 120ms inference

📁 Production Sample

100-Row CSV


*Juan Carlos Hernandez | Computational Linguistics

Clone this wiki locally