Skip to content

andreariba/moonbirths

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MoonBirths

Are births influenced by the lunar cycle? This project extracts every human birth date from Wikidata, maps each date to its position in the lunar cycle, and visualises the results.

Repository structure

main.py                       # Orchestrator: data check → extraction → analysis → figures
pyproject.toml                # Project metadata and dependencies
README.md
data/
  wikidata_birthdates_parts/  # Parquet partitions of extracted birth records
nb/
  analysis.ipynb              # Interactive exploration notebook
output/                       # Generated SVG figures
src/
  birthdates/
    extract_birthdates_from_dump.py   # Multiprocess pipeline that extracts
                                      # human birth dates from a Wikidata JSON dump
    extract_birthdates_from_dump_v2.py
    wikidata_birthdates.csv
  moonphase/
    moonphase.py              # Computes lunar cycle progress and illumination
                              # for a given date using pyephem

Prerequisites

  • Python ≥ 3.13
  • uv (recommended) or pip

Setup

# Clone the repository
git clone <repo-url> && cd moonbirths

# Create a virtual environment and install dependencies
uv sync          # or: pip install .

Wikidata dump (only needed if no data exists yet)

If the data/wikidata_birthdates_parts/ folder is empty, the extraction step requires the compressed Wikidata entity dump:

# ~70 GB download — place it at data/latest-all.json.bz2
wget -P data/ https://dumps.wikimedia.org/wikidatawiki/entities/latest-all.json.bz2

How to run

python main.py

main.py performs three steps:

  1. Data check — looks for .parquet files in data/wikidata_birthdates_parts/.
  2. Extraction (only if no data) — runs the multiprocess Wikidata dump parser (src/birthdates/extract_birthdates_from_dump.py). This processes the full latest-all.json.bz2 file and writes partitioned Parquet files.
  3. Analysis — loads the parquet data, computes the moon phase for every birth date, and generates the following figures in output/:
Figure Description
births_by_year.svg Bar chart of record counts per birth year
births_lunarcycle_distributions.svg Histogram of births across the lunar cycle
births_lunarcycle_distributions_split.svg Same histogram split by half-century (1800–1999)
fft.svg FFT power-spectrum analysis of lunar-cycle birth distribution

A summary table of dominant FFT frequencies is printed to the console.

Interactive notebook

The nb/analysis.ipynb notebook contains the same analysis with inline output for interactive exploration. Open it in VS Code or JupyterLab.

Dependencies

Package Purpose
polars Fast DataFrame operations on parquet data
plotly Interactive and publication-quality plots
ephem High-precision astronomical computations
scipy FFT and signal processing
orjson Fast JSON parsing for Wikidata dump
tqdm Progress bars for long-running extraction

About

Analysis about lunar phase influence over the birth dates

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors