Are births influenced by the lunar cycle? This project extracts every human birth date from Wikidata, maps each date to its position in the lunar cycle, and visualises the results.
main.py # Orchestrator: data check → extraction → analysis → figures
pyproject.toml # Project metadata and dependencies
README.md
data/
wikidata_birthdates_parts/ # Parquet partitions of extracted birth records
nb/
analysis.ipynb # Interactive exploration notebook
output/ # Generated SVG figures
src/
birthdates/
extract_birthdates_from_dump.py # Multiprocess pipeline that extracts
# human birth dates from a Wikidata JSON dump
extract_birthdates_from_dump_v2.py
wikidata_birthdates.csv
moonphase/
moonphase.py # Computes lunar cycle progress and illumination
# for a given date using pyephem
- Python ≥ 3.13
- uv (recommended) or pip
# Clone the repository
git clone <repo-url> && cd moonbirths
# Create a virtual environment and install dependencies
uv sync # or: pip install .If the data/wikidata_birthdates_parts/ folder is empty, the extraction step
requires the compressed Wikidata entity dump:
# ~70 GB download — place it at data/latest-all.json.bz2
wget -P data/ https://dumps.wikimedia.org/wikidatawiki/entities/latest-all.json.bz2python main.pymain.py performs three steps:
- Data check — looks for
.parquetfiles indata/wikidata_birthdates_parts/. - Extraction (only if no data) — runs the multiprocess Wikidata dump parser
(
src/birthdates/extract_birthdates_from_dump.py). This processes the fulllatest-all.json.bz2file and writes partitioned Parquet files. - Analysis — loads the parquet data, computes the moon phase for every
birth date, and generates the following figures in
output/:
| Figure | Description |
|---|---|
births_by_year.svg |
Bar chart of record counts per birth year |
births_lunarcycle_distributions.svg |
Histogram of births across the lunar cycle |
births_lunarcycle_distributions_split.svg |
Same histogram split by half-century (1800–1999) |
fft.svg |
FFT power-spectrum analysis of lunar-cycle birth distribution |
A summary table of dominant FFT frequencies is printed to the console.
The nb/analysis.ipynb notebook contains the same analysis with inline output
for interactive exploration. Open it in VS Code or JupyterLab.
| Package | Purpose |
|---|---|
polars |
Fast DataFrame operations on parquet data |
plotly |
Interactive and publication-quality plots |
ephem |
High-precision astronomical computations |
scipy |
FFT and signal processing |
orjson |
Fast JSON parsing for Wikidata dump |
tqdm |
Progress bars for long-running extraction |