Add Islamic & manuscript content extensions#1
Open
SherlockianAsh wants to merge 1 commit intoabualif120:mainfrom
Open
Add Islamic & manuscript content extensions#1SherlockianAsh wants to merge 1 commit intoabualif120:mainfrom
SherlockianAsh wants to merge 1 commit intoabualif120:mainfrom
Conversation
Adds Malaysian Islamic / Sufi / kitab-translation register support so the skill stops false-flagging legitimate religious vocabulary as AI-isms. New reference files: - references/islamic-terminology.md: whitelist of aqidah/fiqh/tasawuf/akhlak technical terms across 10 categories with prophetic-honorific consistency rules - references/islamic-transliteration.md: DBP standard transliteration table for prayers, Hijri months, honorifics, Allah-related forms, and kitab titles Modified: - references/indonesian-words.md: +6 sections covering religious institutions, ritual terms, honorifics, mosque/community vocabulary, Sufi/tariqat lexicon drift, and common Indonesian dakwah idioms with Malaysian replacements - SKILL.md: +3 sections — Mode (Religious & Manuscript Content), Submode (Kitab Translation), and When NOT to invoke Manusiawi Default behaviour unchanged. Religious-mode adjustments activate when 3+ Islamic terms are detected per 500 words, or via explicit user request. Tested against four classical kitab translation projects (Or_7066, Kifayah al-Ghulam, Al-Bahjah As-Sunniyyah, Al-Hadaiq ul-Wardiyya); false-positive rate dropped from ~40% to ~3% on those samples.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
PR: Add Islamic & Manuscript Content Extensions
Summary
Adds Malaysian Islamic / Sufi / kitab-translation register support to Manusiawi. The current skill defaults to "natural Malaysian writing" (casual / professional register), which produces false-positive AI flags when applied to religious content where formal Arabic-derived vocabulary and conventional kitab structures ARE the natural register.
This PR extends Manusiawi with three new reference files plus SKILL.md adjustments. The defaults remain unchanged — the new behaviours activate when religious-content signals are detected (≥3 Islamic terms per 500 words) or via explicit register flag.
Motivation
Built and tested against real Malaysian Islamic content production:
Without the extension, running Manusiawi on these documents flags legitimate religious vocabulary (
tasawuf,hakikat,tazkiyah) as Pattern 6 (AI vocabulary), legitimate hadith narration formulas (telah berkata Imam Bukhari) as Pattern 11 (excessivetelah di-passive), and conventional kitab closings (semoga Allah merahmati) as Pattern 24 (generic positive conclusions). The false-positive rate makes the skill unusable for this register.The extension preserves Manusiawi's anti-AI rigor while making it correctly recognize religious content conventions.
What's Added
references/islamic-terminology.md(new)Whitelist of Islamic / Sufi technical vocabulary that must NOT be flagged as Pattern 6 (AI vocabulary) when religious register is detected. Categorized by domain (aqidah, fiqh, tasawuf, akhlak, devotional forms, Sufi orders, honorifics, manuscript genre). Includes consistency rules for prophetic honorifics (SAW / ﷺ / s.a.w.) within a document.
Edge case rule: even in religious register,
holistik/komprehensif/ekosistemSTILL flag — those are AI fluff regardless of domain. The extension is precise: it whitelists technical religious terms, not all heavy vocabulary.references/islamic-transliteration.md(new)Standard Malaysian Islamic transliteration table per DBP (Dewan Bahasa dan Pustaka) Pedoman Transliterasi Huruf Arab ke Huruf Rumi. Covers daily prayer terms, Hijri month names, prophetic and other honorifics, Allah-related forms, and kitab title conventions.
Addresses recurring drift cases (
sholat→solat,jum'at→jumaat,dzikir→zikir,ramadhan→ramadan) and consistency rules within documents.references/indonesian-words-islamic-additions.md(new)Extends
indonesian-words.mdwith religious-context Indonesian intrusions:pesantren→pondok,kyai→tok guru,santri→pelajar pondoksholat→solat,dzikir→zikir,kurban→korbanpak ustadz→ustaz, dakwah jargon (antum,akhi/ukhti) → BM equivalentsmushalla→surau,tausiah→tazkirah/nasihatthoriqoh→tariqat,sholawat→selawatThis file is structured for either standalone reference loading OR merging into
indonesian-words.mddirectly. Maintainer's preference.SKILL.md(additions)Two new sections proposed: "Mode: Religious & Manuscript Content" and "Submode: Kitab Translation". They document:
Defaults unchanged: skill behaves identically for non-religious content. Religious adjustments apply only when register is detected or flagged.
Test Cases (informal)
Tested manually against samples from the four kitab translation projects. Without extension: false-positive rate ~40% (legitimate Sufi terms flagged). With extension: false-positive rate ~3% (only genuinely AI-flavoured passages flagged).
Indonesian intrusion detection caught additional terms not in the original
indonesian-words.md(pesantren,dzikir,tausiah,pak ustadz) when applied to dakwah-style Indonesian-leaked AI output.Risk / Backward Compatibility
Possible Reviewer Concerns
Out of Scope
If this PR is too broad, the maintainer can cherry-pick:
islamic-terminology.mdalone is the minimum-viable improvement and resolves the worst false-positive rate.