This project aims to provide a parser for the CoNLL-U format of the Universal Dependencies project: https://universaldependencies.org/format.html.
Parse a file in CoNLL-U format and iterate over the containing sentences.
use rs_conllu::parse_file;
use std::fs::File;
let file = File::open("tests/example.conllu")?;
let parsed = parse_file(file)?;
// Iterate over the contained sentences.
for sentence in parsed {
// We can also iterate over the tokens in the sentence.
for token in sentence {
// Process token, e.g. access individual fields.
println!("{}", token.form)
}
}- Tested on version 2.11 UD treebanks
- Handles different types of token ids (single, range, empty)
Parsing happens in a "flat" manner, relations between tokens are not respected.