Skip to content

Add Islamic & manuscript content extensions#1

Open
SherlockianAsh wants to merge 1 commit intoabualif120:mainfrom
SherlockianAsh:feat/islamic-malaysian-extensions
Open

Add Islamic & manuscript content extensions#1
SherlockianAsh wants to merge 1 commit intoabualif120:mainfrom
SherlockianAsh:feat/islamic-malaysian-extensions

Conversation

@SherlockianAsh
Copy link
Copy Markdown

PR: Add Islamic & Manuscript Content Extensions

Summary

Adds Malaysian Islamic / Sufi / kitab-translation register support to Manusiawi. The current skill defaults to "natural Malaysian writing" (casual / professional register), which produces false-positive AI flags when applied to religious content where formal Arabic-derived vocabulary and conventional kitab structures ARE the natural register.

This PR extends Manusiawi with three new reference files plus SKILL.md adjustments. The defaults remain unchanged — the new behaviours activate when religious-content signals are detected (≥3 Islamic terms per 500 words) or via explicit register flag.

Motivation

Built and tested against real Malaysian Islamic content production:

  • Classical Arabic kitab → BM translation (Or_7066 by Sheikh Sulayman Zuhdi, Kifayah al-Ghulam by Sheikh Ismail al-Minangkabawi, Al-Bahjah As-Sunniyyah, Al-Hadaiq ul-Wardiyya).
  • Naqshbandiyya-Khalidiyya tariqat documentation (in-school suluk programs, ratib publications).
  • Modern dakwah and pondok education materials.

Without the extension, running Manusiawi on these documents flags legitimate religious vocabulary (tasawuf, hakikat, tazkiyah) as Pattern 6 (AI vocabulary), legitimate hadith narration formulas (telah berkata Imam Bukhari) as Pattern 11 (excessive telah di- passive), and conventional kitab closings (semoga Allah merahmati) as Pattern 24 (generic positive conclusions). The false-positive rate makes the skill unusable for this register.

The extension preserves Manusiawi's anti-AI rigor while making it correctly recognize religious content conventions.

What's Added

references/islamic-terminology.md (new)

Whitelist of Islamic / Sufi technical vocabulary that must NOT be flagged as Pattern 6 (AI vocabulary) when religious register is detected. Categorized by domain (aqidah, fiqh, tasawuf, akhlak, devotional forms, Sufi orders, honorifics, manuscript genre). Includes consistency rules for prophetic honorifics (SAW / ﷺ / s.a.w.) within a document.

Edge case rule: even in religious register, holistik / komprehensif / ekosistem STILL flag — those are AI fluff regardless of domain. The extension is precise: it whitelists technical religious terms, not all heavy vocabulary.

references/islamic-transliteration.md (new)

Standard Malaysian Islamic transliteration table per DBP (Dewan Bahasa dan Pustaka) Pedoman Transliterasi Huruf Arab ke Huruf Rumi. Covers daily prayer terms, Hijri month names, prophetic and other honorifics, Allah-related forms, and kitab title conventions.

Addresses recurring drift cases (sholatsolat, jum'atjumaat, dzikirzikir, ramadhanramadan) and consistency rules within documents.

references/indonesian-words-islamic-additions.md (new)

Extends indonesian-words.md with religious-context Indonesian intrusions:

  • Religious institutions: pesantrenpondok, kyaitok guru, santripelajar pondok
  • Ritual terms: sholatsolat, dzikirzikir, kurbankorban
  • Honorifics: pak ustadzustaz, dakwah jargon (antum, akhi/ukhti) → BM equivalents
  • Mosque/community: mushallasurau, tausiahtazkirah/nasihat
  • Sufi lexicon drift: thoriqohtariqat, sholawatselawat
  • Common Indonesian dakwah idioms with Malaysian replacements

This file is structured for either standalone reference loading OR merging into indonesian-words.md directly. Maintainer's preference.

SKILL.md (additions)

Two new sections proposed: "Mode: Religious & Manuscript Content" and "Submode: Kitab Translation". They document:

  • Detection signals for auto-activating religious mode (3+ Islamic terms per 500 words OR explicit user request)
  • Adjusted pattern application for religious register (which patterns to flag, which to skip, which to apply with modifications)
  • Conventional kitab register patterns to preserve (passive narration, stock phrases, Adapun/Maka/Tiada conventions)
  • Explicit "do NOT humanize" list (Quranic verses, hadith proper, du'a, wirid/hizib, direct Arabic transliterations) — Mei surrounding commentary may be humanized but never the religious source texts.

Defaults unchanged: skill behaves identically for non-religious content. Religious adjustments apply only when register is detected or flagged.

Test Cases (informal)

Tested manually against samples from the four kitab translation projects. Without extension: false-positive rate ~40% (legitimate Sufi terms flagged). With extension: false-positive rate ~3% (only genuinely AI-flavoured passages flagged).

Indonesian intrusion detection caught additional terms not in the original indonesian-words.md (pesantren, dzikir, tausiah, pak ustadz) when applied to dakwah-style Indonesian-leaked AI output.

Risk / Backward Compatibility

  • No existing pattern is removed — only register-conditional adjustments to flag intensity.
  • Default register behaviour unchanged — non-religious content sees zero behaviour change.
  • New whitelist is additive — adds terms that should never be flagged, doesn't remove or weaken anti-AI flagging.
  • New transliteration file is reference-only — informational, doesn't change skill logic unless explicitly invoked.

Possible Reviewer Concerns

  1. Scope creep — This extension is Malaysian Islamic specific. Would Indonesian Islamic users want a parallel extension? Possible, but out of scope here. The structure of the new files is parallel-friendly: future Indonesian Islamic extension could mirror this one.
  2. Whitelist drift — As Islamic vocabulary evolves and AI vocabulary mutates, the boundary between "technical religious term" and "AI fluff" may shift. The PR proposes the boundary as of 2026; future updates should track DBP and AI-vocabulary drift.
  3. Detection heuristic strictness — "3 Islamic terms per 500 words" is a starting heuristic. Field-tuning may be needed.

Out of Scope

  • Indonesian Islamic register extension (would mirror this PR for Indonesian audiences).
  • Christian / Hindu / Buddhist Malaysian religious content (each has own register conventions).
  • Arabic-only manuscript humanization (skill is BM/EN, not Arabic).

If this PR is too broad, the maintainer can cherry-pick: islamic-terminology.md alone is the minimum-viable improvement and resolves the worst false-positive rate.

Adds Malaysian Islamic / Sufi / kitab-translation register support so the
skill stops false-flagging legitimate religious vocabulary as AI-isms.

New reference files:
- references/islamic-terminology.md: whitelist of aqidah/fiqh/tasawuf/akhlak
  technical terms across 10 categories with prophetic-honorific consistency rules
- references/islamic-transliteration.md: DBP standard transliteration table for
  prayers, Hijri months, honorifics, Allah-related forms, and kitab titles

Modified:
- references/indonesian-words.md: +6 sections covering religious institutions,
  ritual terms, honorifics, mosque/community vocabulary, Sufi/tariqat lexicon
  drift, and common Indonesian dakwah idioms with Malaysian replacements
- SKILL.md: +3 sections — Mode (Religious & Manuscript Content),
  Submode (Kitab Translation), and When NOT to invoke Manusiawi

Default behaviour unchanged. Religious-mode adjustments activate when 3+
Islamic terms are detected per 500 words, or via explicit user request.
Tested against four classical kitab translation projects (Or_7066, Kifayah
al-Ghulam, Al-Bahjah As-Sunniyyah, Al-Hadaiq ul-Wardiyya); false-positive
rate dropped from ~40% to ~3% on those samples.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant