Skip to content

Latest commit

 

History

History
335 lines (266 loc) · 19.7 KB

File metadata and controls

335 lines (266 loc) · 19.7 KB

Python API

Reference for the two Python entry points. For a guided tour with full workloads, see Python examples. For a 60-second introduction, see the Quickstart.

  1. codegreen.Session — manual span-based measurement, imported and used directly in your code.
  2. CLI auto-instrumenter — runs codegreen measure ... over a script, injects checkpoints automatically.

Both share the same NEMB C++ backend, the same JSON output envelope, and the same libcodegreen-nemb.so ABI (v2+). They can coexist in one process.

Manual API: codegreen.Session

For end-to-end examples, see Python examples → Manual measurement with codegreen.Session.

import codegreen

with codegreen.Session("training-run") as s:
    with s.task("data_load"):
        load_data()
    with s.task("train"):
        train_model()

By default, results are written to codegreen_<pid>.json in the working directory. CSV is opt-in (pass output_file="x.csv" or output_format="csv"). Pass save_to_file=False to suppress file output.

Three usage forms are supported — context manager, explicit start_task / stop_task, and @codegreen.task decorator. Full code for each is in Python examples.

Constructor parameters

Param Default Notes
name "default" Session name written to output
output_file codegreen_<pid>.json Output path; CSV chosen automatically when path ends in .csv
output_format "auto" "auto" | "json" | "csv" | "none"; "auto" sniffs from extension, defaults to JSON
save_to_file True Set False to suppress file writes entirely
warn_on_concurrent True Warn at construction if another codegreen process is active on the same host (RAPL is system-wide)
record_time_series False Capture sampled (timestamp, power, energy, per-domain) tuples for each task
buffer_samples None Power-user override of the C++ ring-buffer size; usually unnecessary because Python drain is adaptive
sample_interval_ms None (uses config.json) Per-session override of the sampler's measurement interval; routes to the existing coordinator.measurement_interval_ms field via nemb_set_measurement_interval_ms — no parallel state
sampling_mode "fixed" "adaptive" is reserved for a future runtime-rate-control mode; today only "fixed" is implemented

Output schema (v0.4.7+)

Top-level keys: meta, tasks (list of task dicts), totals. Every numeric field carries an explicit unit suffix (_j, _s, _w, _ns). Field names are identical between the Session API and codegreen run CLI output.

{
  "meta": {
    "schema_version": "1",
    "codegreen_version": "0.4.8",
    "run_id": "b7856b409d72",
    "session_name": "training-run",
    "started_at":       "2026-05-10T18:16:56.209074+00:00",
    "ended_at":         "2026-05-10T18:17:01.345702+00:00",
    "started_at_local": "2026-05-10T11:16:56.209074-07:00",
    "ended_at_local":   "2026-05-10T11:17:01.345702-07:00",
    "host_timezone":    "PDT",
    "duration_total_s": 5.137,
    "hostname": "amd-epyc-9554p",
    "pid": 12345,
    "platform": "linux",
    "python_version": "3.13.0",
    "cpu_model": "AMD EPYC 9554P 64-Core Processor",
    "kernel": "Linux-5.15.0-...",
    "cwd": "/home/user/work",
    "argv": ["script.py"],
    "codegreen_env": {"CODEGREEN_LIB_PATH": "..."},
    "measurement_quality": "ok",
    "domain_support": "full",
    "outlier_method": "iqr_1.5",
    "iso_timestamp_format": "rfc3339_utc",
    "nemb_abi_version": 3,
    "domain_topology": {
      "package-0": {"top_level": true, "kind": "cpu_package", "includes": ["core"]},
      "core":      {"top_level": false, "kind": "nested", "includes": []},
      "gpu0":      {"top_level": true, "kind": "gpu", "includes": []}
    },
    "timeseries": {"enabled": true, "schema_version": "1",
                   "sample_keys": ["t_ns", "energy_j", "power_w", "domains"],
                   "t_ns_clock": "clock_monotonic",
                   "inclusive_of_children": true}
  },
  "tasks": [
    {"name": "data_load", "depth": 0, "parent": null,
     "energy_j": 12.4, "avg_power_w": 4.0, "duration_s": 3.1,
     "started_at": 1714155600.123, "ended_at": 1714155603.234,
     "started_at_mono_ns": 20364878312447553, "ended_at_mono_ns": 20364881412447553,
     "domains":         {"package-0": 10.2,  "core": 0.8,  "gpu0": 1.4},
     "domains_power_w": {"package-0": 3.29,  "core": 0.26, "gpu0": 0.45},
     "timeseries": [/* {t_ns, energy_j, power_w, domains}, ... */]}
  ],
  "totals": {
    "energy_j": 857.4,
    "duration_s":         123.1,
    "wall_duration_s":    125.5,
    "task_duration_s":    123.1,
    "gap_duration_s":       2.4,
    "concurrent_overlap_s": 0.0,
    "n_tasks": 2,
    "n_top_level_tasks": 2,
    "domains":         {"package-0": 705.1, "core": 56.2, "gpu0": 96.1},
    "domains_power_w": {"package-0": 5.73,  "core": 0.46, "gpu0": 0.78},
    "sample_interval_ms": 10,
    "worst_within_task_power_cv_percent": 7.25,
    "noise_warnings": []
  }
}

meta — run identity & environment (every output, including failure paths)

Field Meaning
schema_version output-schema version. Bump indicates a breaking field rename or removal
codegreen_version installed library version
run_id 12-hex-char UUID4 prefix; unique per process invocation, for log correlation
session_name the Session(name=…) argument; null for CLI runs
started_at / ended_at RFC 3339 UTC timestamp with +00:00 offset, microsecond precision. The canonical correlation key — use this for joins, sorts, and cross-machine comparisons
started_at_local / ended_at_local (v0.4.8+) Same instant rendered in the host's local timezone with its offset (e.g. -07:00). Display-only companion; never use for joins. UTC and local always describe the same instant within microseconds
host_timezone (v0.4.8+) Local timezone label at measurement time (e.g. PDT, ADT, +05:30 for non-DST regions)
duration_total_s monotonic-clock delta from session start to report build (NTP-immune)
hostname, pid, platform, python_version process & host identity
cpu_model, kernel hardware/OS reproducibility metadata
cwd, argv working directory & argv at measurement time
codegreen_env snapshot of all CODEGREEN_* environment variables
measurement_quality ok | no_tasks | no_backend | energy_zero | failed | checkpoints_only
domain_support full (per-domain breakdown) | scalar_only (overall energy only) | none (no backend)
outlier_method which outlier filter was applied to multi-run statistics (default "iqr_1.5")
iso_timestamp_format format contract for started_at/ended_at; pin in case future versions change it
nemb_abi_version C++ NEMB backend ABI version actually loaded
domain_topology machine-readable domain nesting (so consumers know which keys are top-level vs. nested)
timeseries block describing whether timeseries was recorded + its sample schema

totals — aggregate metrics

task_duration_s is the sum of depth-0 task durations (matches energy_j's window); wall_duration_s is s.start()s.stop() from monotonic clock; gap_duration_s = wall − union(task intervals) (uninstrumented work between tasks); concurrent_overlap_s is positive when tasks ran in parallel threads. domains_power_w[d] is the energy-weighted average power: Σenergy_d / Σduration_over_tasks_where_d_was_reported (so a domain present on only some tasks is not diluted).

Per-task fields

avg_power_w = energy_j / duration_s. domains_power_w[d] = domains[d] / duration_s per task. started_at_mono_ns/ended_at_mono_ns (added v0.4.7) let consumers align task windows with timeseries[].t_ns exactly. The parent field is the immediately-enclosing task name when nested.

  • domains — per-domain RAPL/NVML energy (J) for the task, computed atomically with the session stop (ABI v2 — race-free under concurrent threads).
  • domains_power_w — per-domain average power (W), computed as domains[d] / duration_s. Same time-base as avg_power_w so the two are directly comparable.
  • Domain nesting caveat: domain energies are NOT disjoint. On Intel, package already includes pp0/core and pp1 (uncore/igpu); dram measures a physically-separate counter (Intel SDM Vol 4 §14.9 — MSR 0x611 vs MSR 0x619); gpu* (NVML) is fully independent. On AMD EPYC, only package-0 is exposed. So sum(domains.values()) ≠ energy_j by design — energy_j aggregates package + dram + gpu and excludes the pp0/pp1/core/uncore subsets. Use meta.domain_topology to programmatically distinguish top-level from nested domains.
  • DRAM is always included (v0.4.6+): Linux exposes DRAM at intel-rapl:0/intel-rapl:0:0/name=dram on Skylake-SP+ Xeons (sub-zone) and at intel-rapl:1/name=dram-0 on older Xeons (zone-level). v0.4.6 promotes both layouts equivalently into the energy_j total — earlier versions undercounted by 10-15% on memory-bound workloads on Skylake-SP+ chips.
  • timeseries — present only when record_time_series=True (ABI v3+). Each sample is self-describing:
Key Type Unit Meaning
t_ns int nanoseconds CLOCK_MONOTONIC timestamp at sample (Linux); mach_continuous_time on macOS; QueryPerformanceCounter on Windows — all converted to ns
energy_j float joules system-wide cumulative energy from session start (sum across all providers)
power_w float watts system-wide instantaneous power at this sample (sum across all domains)
domain_j Dict[str,float] joules per-domain cumulative energy from session start (e.g. package-0, core, dram, gpu0)
domain_w Dict[str,float] watts per-domain average power since the previous sample. Domains whose provider does not expose per-domain power (Darwin IOReport, Windows EMI, AMD RAPL) are absent rather than reported as 0, so callers can distinguish "0 W" from "not measured"

So to get only GPU watts directly: [s["domain_w"].get("gpu0", 0.0) for s in ts].

TaskResult fields

Field Type Meaning
name str task name passed to start_task / task()
energy_j float total joules during the task (atomic via nemb_stop_session_v2)
avg_power_w float average watts over the task window (= energy_j / duration_s)
duration_s float task wall-clock seconds (monotonic-derived)
started_at, ended_at float wall-clock POSIX seconds (display only)
started_at_mono_ns, ended_at_mono_ns int monotonic-clock stamps for aligning with timeseries[].t_ns (v0.4.7+)
depth, parent int, Optional[str] nesting info; parent is the immediately-enclosing task name
domains Dict[str, float] per-RAPL/NVML domain energy (J) for the task
domains_power_w Dict[str, float] per-domain average power (W) = domains[d] / duration_s. Same time-base as avg_power_w.
timeseries Optional[List[Dict]] sorted, deduplicated samples within [started_at_mono_ns, ended_at_mono_ns]. None when record_time_series=False; empty list when enabled but the task was shorter than one sample interval. Inclusive of children (a parent's timeseries contains its children's samples — see meta.timeseries.inclusive_of_children).
noise Optional[Dict] quality summary computed from timeseries

See the timeseries-sample schema table above for sample keys (t_ns, energy_j, power_w, domain_j, domain_w).

Noise / quality reporting

When record_time_series=True, every task carries a noise dict and totals carry a roll-up:

"noise": {
  "samples_captured":         2847,
  "samples_expected":         3000,
  "samples_expected_method":  "observed_median",
  "drop_ratio":               0.0510,
  "power_mean_w":             102.3,
  "power_std_w":                7.4,
  "power_cv_percent":           7.25,
  "sample_interval_ms":            1,
  "quality":                  "moderate"
},
"totals": {
  ...,
  "worst_within_task_power_cv_percent": 7.25,
  "noise_warnings": [
    {"task": "data_load", "depth": 0,
     "within_task_power_cv_percent": 17.8, "drop_ratio": 0.003,
     "quality": "high-noise",
     "reasons": ["within_task_power_cv_above_10pct"]}
  ]
}

samples_expected_method is "observed_median" (interval inferred from captured samples; default when n ≥ 3) or "configured" (falls back to sample_interval_ms). quality is bucketed by power_cv_percent: excellent <2 %, good <5 %, moderate <10 %, high-noise ≥10 %. A RuntimeWarning is emitted (and the task is appended as a structured record to totals.noise_warnings) when CV ≥10 % or drop_ratio ≥20 %. All thresholds live in config.json under measurement.report.noise_warning so they can be overridden without code changes. Computation runs once at stop() time and adds ~0.05 % bias vs record_time_series=False.

Note — slight overhead when record_time_series=True. The drain thread that pulls samples out of the C++ ring buffer is cheap but not free. On reproducibility benchmarks (3 fresh subprocesses each, identical workload):

  • The mean energy/duration is unchanged: record_time_series=True vs =False agreed to ≤ 0.3 % (within run-to-run jitter).
  • The run-to-run spread is slightly wider with sampling on (CV of total energy ~5 % vs ~1 % off) because the drain wakes up at irregular intervals and competes briefly with the workload for CPU.

So enabling time-series gives you per-sample power, plot export and the noise/quality summary, at the cost of a marginally noisier individual total. Best-of-both-worlds: use it during development to inspect power traces and pick the right code regions, then turn it off for production benchmark runs where you want the tightest possible run-to-run CV.

Power-vs-time plotting

record_time_series=True collects samples at the coordinator's configured rate (config.json's coordinator.measurement_interval_ms, default 1 ms on this build). The Session.export_plot(path) helper renders a power-vs-time chart per task; area under the curve equals the task's energy.

with codegreen.Session("training", record_time_series=True) as s:
    with s.task("epoch1"): train_one_epoch()
    with s.task("epoch2"): train_one_epoch()
    s.export_plot("training.html")    # Plotly (interactive)
    s.export_plot("training.png")     # Matplotlib (static image)

Numerically, integrating w(t) over a task's window with the trapezoidal rule recovers the NEMB-reported energy_j to within ~0.2% (verified on a 5 s task with ~4,800 samples).

Time-series correctness for long tasks

The C++ sampling ring buffer is fixed-size (default 1000 samples — at the default 1 ms interval that's a ~1 s window; with sample_interval_ms=10 it's a ~10 s window, etc.). To prevent silent loss on long tasks, the Session runs a Python drain thread that pulls samples out faster than the buffer rotates. Drain is adaptive:

  • starts at 0.5 s,
  • halves to a 50 ms floor when buffer >50% saturated on a single drain pass,
  • doubles to a 2 s ceiling when <10% for three consecutive drains,
  • emits a warning at >90% saturation suggesting buffer_samples override.

Verified on a 30-second task with defaults only: 28,460 samples, full span, zero gaps >50 ms.

Sampling rate

Pre-existing: config.json's coordinator.measurement_interval_ms is the startup default (loaded by nemb::ConfigLoader::load_config()).

Per-session override: pass sample_interval_ms=N to Session(...) — it calls nemb_set_measurement_interval_ms which writes the same config_.measurement_interval field the sample loop reads. No parallel sampling-rate state, no duplicate config parsing.

Behavior rules

  • Single session per process. Constructing a second Session while one is active raises RuntimeError.
  • Mismatched stops raise RuntimeError with the actual innermost task name.
  • Forgotten .stop() is recovered by an atexit hook — the file is still written, the JSON envelope still emitted.
  • Concurrent threads can each maintain their own task stack (per-thread). nemb_stop_session_v2 makes domain breakdown race-free.
  • Forked children become no-ops automatically; only the parent process reports.
  • No NEMB lib loaded (CodeGreen built without C++ backend) → Session degrades to a warning + zero-energy results, your program still runs.

Multi-process / RAPL caveat

RAPL counters are system-wide, not per-process. If two CodeGreen sessions overlap in wall time on the same socket, both readings include the other's energy (double-counting). The Session constructor warns when it detects another live CodeGreen pid via $XDG_RUNTIME_DIR/codegreen-<uid>.pids. For benchmarks, run sequentially or accept "system energy during this window" semantics.

Runtime module (auto-instrumenter)

codegreen/instrumentation/language_runtimes/python/codegreen_runtime.py

This module is injected into instrumented code automatically. It uses ctypes to call libcodegreen-nemb.so.

checkpoint()

def checkpoint(checkpoint_id: str, name: str, checkpoint_type: str):
    """Mark a checkpoint in the energy measurement stream."""

Called by instrumented code at function boundaries:

from codegreen_runtime import checkpoint

checkpoint(checkpoint_id="1", name="my_function", checkpoint_type="enter")
# ... function body ...
checkpoint(checkpoint_id="2", name="my_function", checkpoint_type="exit")

Each call records a ~100ns timestamp signal. The NEMB backend tracks invocations automatically (#inv_N suffix).

measure_checkpoint()

def measure_checkpoint(checkpoint_id: str, checkpoint_type: str,
                       name: str, line_number: int, context: str):
    """Record a checkpoint marker with full metadata."""

Lower-level function with additional context. checkpoint() delegates to this.

Auto-instrumenter output format

At process exit (atexit), the runtime prints checkpoint data to stdout:

--- CODEGREEN_RESULT_START ---
{"measurements": [
  {"checkpoint_id": "enter:main:1#inv_1_t...", "timestamp": 13973..., "joules": 6.80, "watts": 0.76},
  {"checkpoint_id": "exit:main:2#inv_1_t...", "timestamp": 13973..., "joules": 8.91, "watts": 71.94}
]}
--- CODEGREEN_RESULT_END ---

The CLI parses this output to extract measurement results.

CLI usage

These commands drive the auto-instrumenter; the Quickstart and CLI reference cover them in full:

codegreen measure python script.py                              # basic
codegreen measure python script.py -g fine --export-plot energy.html
codegreen measure python script.py --json
codegreen analyze python script.py --save-instrumented --output-dir ./out

Package structure

codegreen/
  cli/cli.py                              # Typer CLI
  instrumentation/
    engine.py                             # MeasurementEngine
    language_engine.py                    # Tree-sitter parsing + query matching
    ast_processor.py                      # Checkpoint injection
    configs/*.json                        # Language-specific instrumentation configs
    language_runtimes/
      python/codegreen_runtime.py         # Python ctypes bridge to NEMB + Session
      java/CodeGreenRuntime.java          # Java JNI bridge to NEMB
  analyzer/plot.py                        # Plotly / matplotlib visualization
  measurement/src/nemb/
    codegreen_energy.cpp                  # C API + EnergyMeter implementation