-
-
Notifications
You must be signed in to change notification settings - Fork 291
Open
Milestone
Description
Tre-1 Model: Train POS Tag with CRFSuite
Overview
Train a CRF-based Vietnamese POS tagger using CRFSuite with Universal POS tags (UPOS), trained on the Universal Dependencies Dataset (UDD-v0.1).
Model
- HuggingFace: https://huggingface.co/undertheseanlp/tre-1
- Architecture: CRF (Conditional Random Fields) via python-crfsuite
- Tagset: Universal POS tags (UPOS)
- Training Data: undertheseanlp/UDD-v0.1
- License: Apache 2.0
Performance
| Metric | Score |
|---|---|
| Accuracy | ~94% |
| F1 (macro) | ~90% |
| F1 (weighted) | ~94% |
Feature Templates
- Token features: word form, lowercase, prefix/suffix (2-3 chars), character type checks
- Context features: previous and next 1-2 tokens
- Bigram features: adjacent token combinations
- Dictionary features: in-vocabulary checks
Training Configuration
- L1 regularization (c1): 1.0
- L2 regularization (c2): 1e-3
- Max iterations: 100
Status
- Train model
- Publish to HuggingFace
- Integrate into underthesea library
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels