Skip to content

tuyentran-md/cite_checker

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CiteChecker

Verify academic citations against 240M+ papers across 4 databases.

CiteChecker detects potentially hallucinated or fabricated references in academic manuscripts — a growing concern with AI-assisted writing.

What it does

  • Extracts references from text, .docx, .pdf, or .md files
  • Checks each citation against CrossRef, PubMed, Semantic Scholar, and OpenAlex
  • Reports which references are verified (with DOI links) and which are not found (potentially fabricated)

Quick Start

pip install cite-checker

Command Line

# Check a manuscript file
citecheck check manuscript.docx

# Check from text
citecheck check --text "Smith J, et al. Fake Paper Title. Nature. 2024;123:45-67."

# Set similarity threshold
citecheck check manuscript.pdf --threshold 0.75

Python API

from citecheck import check_references, extract_references

# Extract references from text
refs = extract_references(text)

# Check each reference
results = check_references(refs)

for r in results:
    print(f"{r['status']}: {r['reference']}")
    if r['doi']:
        print(f"  DOI: {r['doi']}")

How it works

  1. Extract — Parses reference lists from manuscripts (supports APA, Vancouver, numbered styles)
  2. Query — Searches each reference across 4 academic databases using fuzzy matching
  3. Score — Calculates similarity between the cited reference and database results
  4. Report — Flags references below the similarity threshold as potentially fabricated

Databases

Database Coverage
CrossRef 150M+ works, DOI resolution
PubMed 36M+ biomedical citations
Semantic Scholar 200M+ papers, all fields
OpenAlex 240M+ works, open metadata

Limitations

  • Books and book chapters may not be indexed in journal databases — manual verification recommended
  • Very recent papers (< 1 week old) may not yet be indexed
  • Non-English titles may have lower match rates due to transliteration
  • Similarity threshold can be adjusted to reduce false positives

Web App

A free web interface is available at researchcheck.streamlit.app — part of the Research Integrity Checker suite.

License

MIT — see LICENSE

Contributing

Issues and PRs welcome. Please include test cases for new features.

Citation

If you use CiteChecker in your research:

@software{citechecker2026,
  title={CiteChecker: Automated Citation Verification},
  author={Tran, Tuyen},
  year={2026},
  url={https://github.com/tuyentran-md/cite_checker}
}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages