Skip to content

Tre-1 Model: Train POS Tag with CRFSuite #894

@rain1024

Description

@rain1024

Tre-1 Model: Train POS Tag with CRFSuite

Overview

Train a CRF-based Vietnamese POS tagger using CRFSuite with Universal POS tags (UPOS), trained on the Universal Dependencies Dataset (UDD-v0.1).

Model

  • HuggingFace: https://huggingface.co/undertheseanlp/tre-1
  • Architecture: CRF (Conditional Random Fields) via python-crfsuite
  • Tagset: Universal POS tags (UPOS)
  • Training Data: undertheseanlp/UDD-v0.1
  • License: Apache 2.0

Performance

Metric Score
Accuracy ~94%
F1 (macro) ~90%
F1 (weighted) ~94%

Feature Templates

  • Token features: word form, lowercase, prefix/suffix (2-3 chars), character type checks
  • Context features: previous and next 1-2 tokens
  • Bigram features: adjacent token combinations
  • Dictionary features: in-vocabulary checks

Training Configuration

  • L1 regularization (c1): 1.0
  • L2 regularization (c2): 1e-3
  • Max iterations: 100

Status

  • Train model
  • Publish to HuggingFace
  • Integrate into underthesea library

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions