ZemberekDotNet is the C#/.NET Port of Zemberek-NLP (Natural Language Processing tools for Turkish).
This library will be kept in sync with Zemberek-NLP and same module structure will be maintained in .NET platform using NuGet packages under seperate projects.
| Module | Package Name | Description | Status |
|---|---|---|---|
| All | ZemberekDotNet.All | Wrapper Package that includes all the modules. | |
| Core | ZemberekDotNet.Core | Special Collections, Hash functions and helpers. | |
| Morphology | ZemberekDotNet.Morphology | Turkish morphological analysis, disambiguation and word generation. | |
| Tokenization | ZemberekDotNet.Tokenization | Turkish Tokenization and sentence boundary detection. | |
| Normalization | ZemberekDotNet.Normalization | Basic spell checker, word suggestion. Noisy text normalization. | |
| NER | ZemberekDotNet.NER | Turkish Named Entity Recognition. | |
| Classification | ZemberekDotNet.Classification | Text classification based on Java port of fastText project. | |
| Language Identification | ZemberekDotNet.LangID | Fast identification of text language. | |
| Language Modeling | ZemberekDotNet.LM | Provides a language model compression algorithm. | |
| Applications | ZemberekDotNet.Apps | Console applications | Pending |
| gRPC Server | ZemberekDotNet.GRPC | gRPC server for access from other languages. | Pending |
| Examples | ZemberekDotNet.Examples | Usage examples. | Pending |
Packages are targeting .NET Standart 2.1 Framework so that it can be used within .Net Core and .Net Framework projects. Examples/console applications will also be prepared with .Net Core aiming that the whole library can be used cross platform.
Repository is configured to continuously trigger a build, test and release cycle using Azure DevOps. At the end of a successful release, it automatically publishes the artifacts to NuGet.org.