Python API

Reference for the two Python entry points. For a guided tour with full workloads, see Python examples. For a 60-second introduction, see the Quickstart.

codegreen.Session — manual span-based measurement, imported and used directly in your code.
CLI auto-instrumenter — runs codegreen measure ... over a script, injects checkpoints automatically.

Both share the same NEMB C++ backend, the same JSON output envelope, and the same libcodegreen-nemb.so ABI (v2+). They can coexist in one process.

Manual API: `codegreen.Session`

For end-to-end examples, see Python examples → Manual measurement with codegreen.Session.

import codegreen

with codegreen.Session("training-run") as s:
    with s.task("data_load"):
        load_data()
    with s.task("train"):
        train_model()

By default, results are written to codegreen_<pid>.json in the working directory. CSV is opt-in (pass output_file="x.csv" or output_format="csv"). Pass save_to_file=False to suppress file output.

Three usage forms are supported — context manager, explicit start_task / stop_task, and @codegreen.task decorator. Full code for each is in Python examples.

Constructor parameters

Param	Default	Notes
`name`	`"default"`	Session name written to output
`output_file`	`codegreen_<pid>.json`	Output path; CSV chosen automatically when path ends in `.csv`
`output_format`	`"auto"`	`"auto"` \| `"json"` \| `"csv"` \| `"none"`; `"auto"` sniffs from extension, defaults to JSON
`save_to_file`	`True`	Set `False` to suppress file writes entirely
`warn_on_concurrent`	`True`	Warn at construction if another codegreen process is active on the same host (RAPL is system-wide)
`record_time_series`	`False`	Capture sampled (timestamp, power, energy, per-domain) tuples for each task
`buffer_samples`	`None`	Power-user override of the C++ ring-buffer size; usually unnecessary because Python drain is adaptive
`sample_interval_ms`	`None` (uses `config.json`)	Per-session override of the sampler's measurement interval; routes to the existing `coordinator.measurement_interval_ms` field via `nemb_set_measurement_interval_ms` — no parallel state
`sampling_mode`	`"fixed"`	`"adaptive"` is reserved for a future runtime-rate-control mode; today only `"fixed"` is implemented

Output schema (v0.4.7+)

Top-level keys: meta, tasks (list of task dicts), totals. Every numeric field carries an explicit unit suffix (_j, _s, _w, _ns). Field names are identical between the Session API and codegreen run CLI output.

{
  "meta": {
    "schema_version": "1",
    "codegreen_version": "0.4.8",
    "run_id": "b7856b409d72",
    "session_name": "training-run",
    "started_at":       "2026-05-10T18:16:56.209074+00:00",
    "ended_at":         "2026-05-10T18:17:01.345702+00:00",
    "started_at_local": "2026-05-10T11:16:56.209074-07:00",
    "ended_at_local":   "2026-05-10T11:17:01.345702-07:00",
    "host_timezone":    "PDT",
    "duration_total_s": 5.137,
    "hostname": "amd-epyc-9554p",
    "pid": 12345,
    "platform": "linux",
    "python_version": "3.13.0",
    "cpu_model": "AMD EPYC 9554P 64-Core Processor",
    "kernel": "Linux-5.15.0-...",
    "cwd": "/home/user/work",
    "argv": ["script.py"],
    "codegreen_env": {"CODEGREEN_LIB_PATH": "..."},
    "measurement_quality": "ok",
    "domain_support": "full",
    "outlier_method": "iqr_1.5",
    "iso_timestamp_format": "rfc3339_utc",
    "nemb_abi_version": 3,
    "domain_topology": {
      "package-0": {"top_level": true, "kind": "cpu_package", "includes": ["core"]},
      "core":      {"top_level": false, "kind": "nested", "includes": []},
      "gpu0":      {"top_level": true, "kind": "gpu", "includes": []}
    },
    "timeseries": {"enabled": true, "schema_version": "1",
                   "sample_keys": ["t_ns", "energy_j", "power_w", "domains"],
                   "t_ns_clock": "clock_monotonic",
                   "inclusive_of_children": true}
  },
  "tasks": [
    {"name": "data_load", "depth": 0, "parent": null,
     "energy_j": 12.4, "avg_power_w": 4.0, "duration_s": 3.1,
     "started_at": 1714155600.123, "ended_at": 1714155603.234,
     "started_at_mono_ns": 20364878312447553, "ended_at_mono_ns": 20364881412447553,
     "domains":         {"package-0": 10.2,  "core": 0.8,  "gpu0": 1.4},
     "domains_power_w": {"package-0": 3.29,  "core": 0.26, "gpu0": 0.45},
     "timeseries": [/* {t_ns, energy_j, power_w, domains}, ... */]}
  ],
  "totals": {
    "energy_j": 857.4,
    "duration_s":         123.1,
    "wall_duration_s":    125.5,
    "task_duration_s":    123.1,
    "gap_duration_s":       2.4,
    "concurrent_overlap_s": 0.0,
    "n_tasks": 2,
    "n_top_level_tasks": 2,
    "domains":         {"package-0": 705.1, "core": 56.2, "gpu0": 96.1},
    "domains_power_w": {"package-0": 5.73,  "core": 0.46, "gpu0": 0.78},
    "sample_interval_ms": 10,
    "worst_within_task_power_cv_percent": 7.25,
    "noise_warnings": []
  }
}

`meta` — run identity & environment (every output, including failure paths)

Field	Meaning
`schema_version`	output-schema version. Bump indicates a breaking field rename or removal
`codegreen_version`	installed library version
`run_id`	12-hex-char UUID4 prefix; unique per process invocation, for log correlation
`session_name`	the `Session(name=…)` argument; `null` for CLI runs
`started_at` / `ended_at`	RFC 3339 UTC timestamp with `+00:00` offset, microsecond precision. The canonical correlation key — use this for joins, sorts, and cross-machine comparisons
`started_at_local` / `ended_at_local`	(v0.4.8+) Same instant rendered in the host's local timezone with its offset (e.g. `-07:00`). Display-only companion; never use for joins. UTC and local always describe the same instant within microseconds
`host_timezone`	(v0.4.8+) Local timezone label at measurement time (e.g. `PDT`, `ADT`, `+05:30` for non-DST regions)
`duration_total_s`	monotonic-clock delta from session start to report build (NTP-immune)
`hostname`, `pid`, `platform`, `python_version`	process & host identity
`cpu_model`, `kernel`	hardware/OS reproducibility metadata
`cwd`, `argv`	working directory & argv at measurement time
`codegreen_env`	snapshot of all `CODEGREEN_*` environment variables
`measurement_quality`	`ok` \| `no_tasks` \| `no_backend` \| `energy_zero` \| `failed` \| `checkpoints_only`
`domain_support`	`full` (per-domain breakdown) \| `scalar_only` (overall energy only) \| `none` (no backend)
`outlier_method`	which outlier filter was applied to multi-run statistics (default `"iqr_1.5"`)
`iso_timestamp_format`	format contract for `started_at`/`ended_at`; pin in case future versions change it
`nemb_abi_version`	C++ NEMB backend ABI version actually loaded
`domain_topology`	machine-readable domain nesting (so consumers know which keys are top-level vs. nested)
`timeseries`	block describing whether timeseries was recorded + its sample schema

`totals` — aggregate metrics

task_duration_s is the sum of depth-0 task durations (matches energy_j's window); wall_duration_s is s.start()→s.stop() from monotonic clock; gap_duration_s = wall − union(task intervals) (uninstrumented work between tasks); concurrent_overlap_s is positive when tasks ran in parallel threads. domains_power_w[d] is the energy-weighted average power: Σenergy_d / Σduration_over_tasks_where_d_was_reported (so a domain present on only some tasks is not diluted).

Per-task fields

avg_power_w = energy_j / duration_s. domains_power_w[d] = domains[d] / duration_s per task. started_at_mono_ns/ended_at_mono_ns (added v0.4.7) let consumers align task windows with timeseries[].t_ns exactly. The parent field is the immediately-enclosing task name when nested.

domains — per-domain RAPL/NVML energy (J) for the task, computed atomically with the session stop (ABI v2 — race-free under concurrent threads).
domains_power_w — per-domain average power (W), computed as domains[d] / duration_s. Same time-base as avg_power_w so the two are directly comparable.
Domain nesting caveat: domain energies are NOT disjoint. On Intel, package already includes pp0/core and pp1 (uncore/igpu); dram measures a physically-separate counter (Intel SDM Vol 4 §14.9 — MSR 0x611 vs MSR 0x619); gpu* (NVML) is fully independent. On AMD EPYC, only package-0 is exposed. So sum(domains.values()) ≠ energy_j by design — energy_j aggregates package + dram + gpu and excludes the pp0/pp1/core/uncore subsets. Use meta.domain_topology to programmatically distinguish top-level from nested domains.
DRAM is always included (v0.4.6+): Linux exposes DRAM at intel-rapl:0/intel-rapl:0:0/name=dram on Skylake-SP+ Xeons (sub-zone) and at intel-rapl:1/name=dram-0 on older Xeons (zone-level). v0.4.6 promotes both layouts equivalently into the energy_j total — earlier versions undercounted by 10-15% on memory-bound workloads on Skylake-SP+ chips.
timeseries — present only when record_time_series=True (ABI v3+). Each sample is self-describing:

Key	Type	Unit	Meaning
`t_ns`	`int`	nanoseconds	`CLOCK_MONOTONIC` timestamp at sample (Linux); `mach_continuous_time` on macOS; `QueryPerformanceCounter` on Windows — all converted to ns
`energy_j`	`float`	joules	system-wide cumulative energy from session start (sum across all providers)
`power_w`	`float`	watts	system-wide instantaneous power at this sample (sum across all domains)
`domain_j`	`Dict[str,float]`	joules	per-domain cumulative energy from session start (e.g. `package-0`, `core`, `dram`, `gpu0`)
`domain_w`	`Dict[str,float]`	watts	per-domain average power since the previous sample. Domains whose provider does not expose per-domain power (Darwin IOReport, Windows EMI, AMD RAPL) are absent rather than reported as 0, so callers can distinguish "0 W" from "not measured"

So to get only GPU watts directly: [s["domain_w"].get("gpu0", 0.0) for s in ts].

TaskResult fields

Field	Type	Meaning
`name`	`str`	task name passed to `start_task` / `task()`
`energy_j`	`float`	total joules during the task (atomic via `nemb_stop_session_v2`)
`avg_power_w`	`float`	average watts over the task window (= `energy_j / duration_s`)
`duration_s`	`float`	task wall-clock seconds (monotonic-derived)
`started_at`, `ended_at`	`float`	wall-clock POSIX seconds (display only)
`started_at_mono_ns`, `ended_at_mono_ns`	`int`	monotonic-clock stamps for aligning with `timeseries[].t_ns` (v0.4.7+)
`depth`, `parent`	`int`, `Optional[str]`	nesting info; `parent` is the immediately-enclosing task name
`domains`	`Dict[str, float]`	per-RAPL/NVML domain energy (J) for the task
`domains_power_w`	`Dict[str, float]`	per-domain average power (W) = `domains[d] / duration_s`. Same time-base as `avg_power_w`.
`timeseries`	`Optional[List[Dict]]`	sorted, deduplicated samples within `[started_at_mono_ns, ended_at_mono_ns]`. `None` when `record_time_series=False`; empty list when enabled but the task was shorter than one sample interval. Inclusive of children (a parent's timeseries contains its children's samples — see `meta.timeseries.inclusive_of_children`).
`noise`	`Optional[Dict]`	quality summary computed from `timeseries`

See the timeseries-sample schema table above for sample keys (t_ns, energy_j, power_w, domain_j, domain_w).

Noise / quality reporting

When record_time_series=True, every task carries a noise dict and totals carry a roll-up:

"noise": {
  "samples_captured":         2847,
  "samples_expected":         3000,
  "samples_expected_method":  "observed_median",
  "drop_ratio":               0.0510,
  "power_mean_w":             102.3,
  "power_std_w":                7.4,
  "power_cv_percent":           7.25,
  "sample_interval_ms":            1,
  "quality":                  "moderate"
},
"totals": {
  ...,
  "worst_within_task_power_cv_percent": 7.25,
  "noise_warnings": [
    {"task": "data_load", "depth": 0,
     "within_task_power_cv_percent": 17.8, "drop_ratio": 0.003,
     "quality": "high-noise",
     "reasons": ["within_task_power_cv_above_10pct"]}
  ]
}

samples_expected_method is "observed_median" (interval inferred from captured samples; default when n ≥ 3) or "configured" (falls back to sample_interval_ms). quality is bucketed by power_cv_percent: excellent <2 %, good <5 %, moderate <10 %, high-noise ≥10 %. A RuntimeWarning is emitted (and the task is appended as a structured record to totals.noise_warnings) when CV ≥10 % or drop_ratio ≥20 %. All thresholds live in config.json under measurement.report.noise_warning so they can be overridden without code changes. Computation runs once at stop() time and adds ~0.05 % bias vs record_time_series=False.

Note — slight overhead when record_time_series=True. The drain thread that pulls samples out of the C++ ring buffer is cheap but not free. On reproducibility benchmarks (3 fresh subprocesses each, identical workload):

The mean energy/duration is unchanged: record_time_series=True vs =False agreed to ≤ 0.3 % (within run-to-run jitter).
The run-to-run spread is slightly wider with sampling on (CV of total energy ~5 % vs ~1 % off) because the drain wakes up at irregular intervals and competes briefly with the workload for CPU.

So enabling time-series gives you per-sample power, plot export and the noise/quality summary, at the cost of a marginally noisier individual total. Best-of-both-worlds: use it during development to inspect power traces and pick the right code regions, then turn it off for production benchmark runs where you want the tightest possible run-to-run CV.

Power-vs-time plotting

record_time_series=True collects samples at the coordinator's configured rate (config.json's coordinator.measurement_interval_ms, default 1 ms on this build). The Session.export_plot(path) helper renders a power-vs-time chart per task; area under the curve equals the task's energy.

with codegreen.Session("training", record_time_series=True) as s:
    with s.task("epoch1"): train_one_epoch()
    with s.task("epoch2"): train_one_epoch()
    s.export_plot("training.html")    # Plotly (interactive)
    s.export_plot("training.png")     # Matplotlib (static image)

Numerically, integrating w(t) over a task's window with the trapezoidal rule recovers the NEMB-reported energy_j to within ~0.2% (verified on a 5 s task with ~4,800 samples).

Time-series correctness for long tasks

The C++ sampling ring buffer is fixed-size (default 1000 samples — at the default 1 ms interval that's a ~1 s window; with sample_interval_ms=10 it's a ~10 s window, etc.). To prevent silent loss on long tasks, the Session runs a Python drain thread that pulls samples out faster than the buffer rotates. Drain is adaptive:

starts at 0.5 s,
halves to a 50 ms floor when buffer >50% saturated on a single drain pass,
doubles to a 2 s ceiling when <10% for three consecutive drains,
emits a warning at >90% saturation suggesting buffer_samples override.

Verified on a 30-second task with defaults only: 28,460 samples, full span, zero gaps >50 ms.

Sampling rate

Pre-existing: config.json's coordinator.measurement_interval_ms is the startup default (loaded by nemb::ConfigLoader::load_config()).

Per-session override: pass sample_interval_ms=N to Session(...) — it calls nemb_set_measurement_interval_ms which writes the same config_.measurement_interval field the sample loop reads. No parallel sampling-rate state, no duplicate config parsing.

Behavior rules

Single session per process. Constructing a second Session while one is active raises RuntimeError.
Mismatched stops raise RuntimeError with the actual innermost task name.
Forgotten .stop() is recovered by an atexit hook — the file is still written, the JSON envelope still emitted.
Concurrent threads can each maintain their own task stack (per-thread). nemb_stop_session_v2 makes domain breakdown race-free.
Forked children become no-ops automatically; only the parent process reports.
No NEMB lib loaded (CodeGreen built without C++ backend) → Session degrades to a warning + zero-energy results, your program still runs.

Multi-process / RAPL caveat

RAPL counters are system-wide, not per-process. If two CodeGreen sessions overlap in wall time on the same socket, both readings include the other's energy (double-counting). The Session constructor warns when it detects another live CodeGreen pid via $XDG_RUNTIME_DIR/codegreen-<uid>.pids. For benchmarks, run sequentially or accept "system energy during this window" semantics.

Runtime module (auto-instrumenter)

codegreen/instrumentation/language_runtimes/python/codegreen_runtime.py

This module is injected into instrumented code automatically. It uses ctypes to call libcodegreen-nemb.so.

checkpoint()

def checkpoint(checkpoint_id: str, name: str, checkpoint_type: str):
    """Mark a checkpoint in the energy measurement stream."""

Called by instrumented code at function boundaries:

from codegreen_runtime import checkpoint

checkpoint(checkpoint_id="1", name="my_function", checkpoint_type="enter")
# ... function body ...
checkpoint(checkpoint_id="2", name="my_function", checkpoint_type="exit")

Each call records a ~100ns timestamp signal. The NEMB backend tracks invocations automatically (#inv_N suffix).

measure_checkpoint()

def measure_checkpoint(checkpoint_id: str, checkpoint_type: str,
                       name: str, line_number: int, context: str):
    """Record a checkpoint marker with full metadata."""

Lower-level function with additional context. checkpoint() delegates to this.

Auto-instrumenter output format

At process exit (atexit), the runtime prints checkpoint data to stdout:

--- CODEGREEN_RESULT_START ---
{"measurements": [
  {"checkpoint_id": "enter:main:1#inv_1_t...", "timestamp": 13973..., "joules": 6.80, "watts": 0.76},
  {"checkpoint_id": "exit:main:2#inv_1_t...", "timestamp": 13973..., "joules": 8.91, "watts": 71.94}
]}
--- CODEGREEN_RESULT_END ---

The CLI parses this output to extract measurement results.

CLI usage

These commands drive the auto-instrumenter; the Quickstart and CLI reference cover them in full:

codegreen measure python script.py                              # basic
codegreen measure python script.py -g fine --export-plot energy.html
codegreen measure python script.py --json
codegreen analyze python script.py --save-instrumented --output-dir ./out

Package structure

codegreen/
  cli/cli.py                              # Typer CLI
  instrumentation/
    engine.py                             # MeasurementEngine
    language_engine.py                    # Tree-sitter parsing + query matching
    ast_processor.py                      # Checkpoint injection
    configs/*.json                        # Language-specific instrumentation configs
    language_runtimes/
      python/codegreen_runtime.py         # Python ctypes bridge to NEMB + Session
      java/CodeGreenRuntime.java          # Java JNI bridge to NEMB
  analyzer/plot.py                        # Plotly / matplotlib visualization
  measurement/src/nemb/
    codegreen_energy.cpp                  # C API + EnergyMeter implementation

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Python API

Manual API: `codegreen.Session`

Constructor parameters

Output schema (v0.4.7+)

`meta` — run identity & environment (every output, including failure paths)

`totals` — aggregate metrics

Per-task fields

TaskResult fields

Noise / quality reporting

Power-vs-time plotting

Time-series correctness for long tasks

Sampling rate

Behavior rules

Multi-process / RAPL caveat

Runtime module (auto-instrumenter)

checkpoint()

measure_checkpoint()

Auto-instrumenter output format

CLI usage

Package structure

FilesExpand file tree

python.md

Latest commit

History

python.md

File metadata and controls

Python API

Manual API: codegreen.Session

Constructor parameters

Output schema (v0.4.7+)

meta — run identity & environment (every output, including failure paths)

totals — aggregate metrics

Per-task fields

TaskResult fields

Noise / quality reporting

Power-vs-time plotting

Time-series correctness for long tasks

Sampling rate

Behavior rules

Multi-process / RAPL caveat

Runtime module (auto-instrumenter)

checkpoint()

measure_checkpoint()

Auto-instrumenter output format

CLI usage

Package structure

Manual API: `codegreen.Session`

`meta` — run identity & environment (every output, including failure paths)

`totals` — aggregate metrics