Skip to content

rhythmus/MetaLang

Repository files navigation

MetaLang: The Cross-Linguistic Terminology Alignment Layer

MetaLang is a multilingual, ontology-based framework for linguistic metalanguage — the vocabulary used to describe language itself. It serves as a canonical pivot layer between academic, pedagogical, and computational grammar systems (e.g., Universal Dependencies, EAGLES, national school grammars).

🌍 The Problem

Linguistic terminology is fragmented. The same concept (e.g., an "article") might be labeled ART, DET, lidwoord, or επίθετο depending on the tradition, standard, or language. MetaLang resolves this by mapping these disparate labels to stable, globally unique identifiers (GUIDs).

🚀 The Solution

MetaLang provides:

  • Canonical Ontology: A stable hierarchy of linguistic concepts across domains (POS, Morphology, Syntax, etc.).
  • Multilingual Labels: localized terms and abbreviations for end-users and software (EN, NL, EL, DE, FR, PT, ES, IT, RU).
  • Plugin Architecture: A system for mapping any external tagset (e.g., UD, CELEX, PTB) to the MetaLang core.

🎓 Academic & Professional Context

MetaLang is positioned within the emerging subdiscipline of Language Resource Infrastructure and Standardization. It addresses the "Backbone infrastructure" problems of:

  • Linguistic Data Infrastructure (LDI): Organizing and disclosing heterogeneous linguistic data.
  • Interoperability: Bridging the gap between legacy institutional data (e.g., Greek-specific INTERA) and modern standards like Universal Dependencies (UD) and CLDF.
  • Lexicography & Morphology: Providing a machine-readable path for historical lexicographic terms to modern NLP pipelines.

Related Standards

MetaLang complements and integrates with:

  • UD: Standardized syntactic annotation.
  • BabelNet: The world's largest multilingual encyclopedic dictionary and semantic network.
  • LexInfo: An ontology for linguistic annotations in the LLOD (Linguistic Linked Open Data) cloud.
  • New Plugins: @metalang/plugin-babelnet (64 tags) and @metalang/plugin-lexinfo (119 tags).
  • Ontology Alignment: Expanded PoS taxonomy with granular concepts for pronouns, particles, and specialized verb types to match LexInfo's high-fidelity classification.
  • Schema: Added retrievedAt to BibliographicSource to support web-based resource citations.

📂 Project Structure

This is a monorepo containing the following components:

  • packages/schema: Core TypeScript interfaces and JSON schemas.
  • packages/core: The central ontology engine and registry.
  • packages/plugin-ud: Universal Dependencies (UD) tag mapping provider.
  • packages/plugin-BabelNet: BabelNet Universal POS tag mapping provider.
  • packages/plugin-LexInfo: LexInfo PartOfSpeech tag mapping provider.
  • docs/: Comprehensive specifications and concept notes.
  • ontology/: The single-source-of-truth directory of definitions, defining the entire "world" of MetaLang—the domains, the concepts, and their hierarchical relationships

📖 Key Documentation

  • Concept Note: Philosophical and architectural introduction.
  • Core Specification: Functional and technical requirements for the engine.
  • GUI Specification: Requirements for the MetaLang authoring and governance tool.

🛠️ Getting Started

Prerequisites

Installation

pnpm install

💻 API Usage Examples

MetaLang provides a powerful programmatic interface to resolve, translate, and explore linguistic metalanguage data.

1. Unified Search & Resolution

Quickly find concepts or resolve tags from specific systems.

import { defaultRegistry as registry } from '@metalang/core';

// Search across all plugins for any tag or term
const results = registry.search("znw"); 
// [ { systemId: "nl-generic", tag: "znw.", conceptId: "ML_POS_NOUN", matchType: "partial" }, ... ]

// Resolve a tag in a specific context
const concepts = registry.resolve("v", "nl-taalunie"); 
// Returns full Concept objects for ML_MORPH-VALUE_GENDER_FEMININE

2. Cross-System Translation (Conversion)

Map terminology directly from one tradition to another.

// Translate a Dutch school grammar tag to its English pedagogical equivalent
const enTags = registry.translateTag("znw", "nl-generic", "en-generic");
// Returns: ["noun", "n.", "noun phrase"]

// Get all tags for a concept in a specific system
const elTags = registry.translateConcept("ML_POS_NOUN", "el-generic");
// Returns: ["ουσιαστικό", "ουσ.", ...]

3. Linguistic Forms & Fallbacks

Retrieve localized singular, plural, and abbreviated forms with a robust fallback chain.

// Get forms for 'article' in a specific system, with automatic fallbacks
const forms = registry.getForms("ML_POS_ARTICLE", "nl-taalunie");

console.log(forms.singular);      // "lidwoord"
console.log(forms.abbreviations);   // ["lw."]
console.log(forms.sourceSystemId); // "nl-generic" (resolves via language fallback)

4. Ontology Navigation & Metadata

Traverse the concept hierarchy and link to global knowledge bases.

// Navigate the ontology
const children = registry.getChildren("ML_POS_NOUN"); 
// [Concept(ML_POS_NOUN-COMMON), Concept(ML_POS_PROPER-NOUN), ...]

// External Links
const wikidata = registry.getWikiDataId("ML_POS_NOUN"); // "Q1401131"
const wikiUrl = registry.getWikipediaUrl("ML_POS_NOUN", "nl"); 
// "https://nl.wikipedia.org/wiki/zelfstandig_naamwoord"

5. Structured Data Parsing (UI/Dropdowns)

Turn a raw array of database strings directly into a multilingual, nested taxonomy tree ready for UI rendering.

// Pass a flat array of database labels (mixed with MetaLang IDs)
const rawLabels = ["bnw.", "zin", "ML_POS_NOUN"];

// MetaLang automatically resolves variants, drops duplicates, constructs the domain taxonomy,
// and maps any known multi-parent relationships down to a fully structured nested object.
const dataset = registry.processDataset(rawLabels, "nl", {
  format: "tree",
  languages: ["nl", "en", "el"],
  includeAbbreviations: true
});

console.log(dataset.nodes); // Array of Domain roots containing nested cleanly-mapped children
console.log(dataset.unmapped); // Array of strings that could not be recognized

🧪 Verification

Run the comprehensive API stress test:

npx tsx scripts/verify_api.ts

📝 Citation

If you use MetaLang in your research or project, please cite it as follows:

@software{Soudan_MetaLang_2026,
  author = {Soudan, Wouter},
  affiliation = {Independent Scholar; PhD from KU Leuven; former postdoc in Computational Linguistics, Universiteit Antwerpen, Belgium},
  license = {ISC},
  month = {2},
  title = {{MetaLang: Cross-Linguistic Terminology Alignment Layer}},
  url = {https://github.com/rhythmus/MetaLang},
  version = {1.0.0},
  year = {2026}
}

Alternatively, you can use the CITATION.cff file in this repository for other formats.

About

Translate, localize, internationalize linguistic and lexicograph terms, tags, etc.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors