Status: Draft v1.0 (Reviewed 2025-11-04)
Owners: Pricing Platform Product & Engineering
Consumers: Data Engineering, Pricing API, Analytics, Ops
Change control: ADR + PR review
Cross-References:
- DOC-master-catalog-prd-v1.0.md: Master system catalog and dependency map
- STD-data-architecture-prd-v1.0.md: Data ingestion lifecycle and storage patterns
- STD-parser-contracts-prd-v2.0.md: Parser core contracts (ParseResult, versioning)
- STD-parser-contracts-impl-v2.0.md: Parser implementation templates
- STD-qa-testing-prd-v1.0.md: Testing requirements for MPFS ingestion
- REF-nearest-zip-resolver-prd-v1.0.md: ZIP resolver for geography mapping
- REF-cms-pricing-source-map-prd-v1.0.md: Source inventory & work-backwards checklist
- REF-pricing-calculator-prd-v1.0.md
Every MPFS ingester or schema change must trace back to REF-cms-pricing-source-map-prd-v1.0.md. Confirm the source table entry, authoritative layout, and checklist completion before authoring code or submitting review.
- Classification: Public CMS release (Internal derived metrics and aggregations)
- License & Attribution: CMS MPFS files (public domain); include CMS citation in manifests and curated docs
- Data Owner / Steward: Pricing Platform Product (product owner), Data Engineering (technical steward)
- Downstream Visibility: Curated tables are Internal; external pricing surfaces must pass compliance review and mask non-public enrichments
- Sources & Cadence: Quarterly RVU A/B/C/D files, annual Locality & GPCI tables, annual Conversion Factor notices, CMS abstracts/policy files
- Schema Contracts:
cms_pricing/ingestion/contracts/cms_pprrvu_v1.0.json,cms_gpci_v1.0.json,cms_localitycounty_v1.0.json,cms_anescf_v1.0.json; versions pinned in manifests - Landing Layout:
/raw/mpfs/{release_id}/files/*withmanifest.jsonrecordingsource_url,fetched_at,release_id,sha256,license,notes_url - Natural Keys & Partitioning: RVU tables keyed by
(hcpcs, modifier, quarter_vintage); locality/GPCI keyed by(carrier_id, locality_code, valuation_year); curated snapshots partitioned byvintage_date - Validations & Gates: Structural (required files), schema contract enforcement, HCPCS format and effective-dating checks, indicator completeness, locality/GPCI join coverage ≥99.5%, quarter-over-quarter diff thresholds (±1% row drift)
- Quarantine Policy: Failed records land in
/stage/mpfs/{release_id}/reject/with rule code + payload; publish blocks on any critical errors or missing quarters - Enrichment & Crosswalks: Join to ZIP→Locality resolver for analytics keys; compute effective windows using valuation quarter and CMS effective_from metadata
- Outputs:
/curated/mpfs/{vintage}/mpfs_rvu.parquet,mpfs_indicators_all.parquet,mpfs_locality.parquet,mpfs_gpci.parquet,mpfs_cf_vintage.parquet, plus latest-effective views for API usage - SLAs: Land + publish ≤7 business days from CMS posting; manifest digests recorded; backfills re-run through identical validations
- Deviations: None; any exceptions require ADR and update to this summary
- Discovery Manifest & Governance: MPFS ingestor uses snapshot-based discovery (reuses RVU/GPCI snapshots via
DatasetSnapshotService) andConversionFactorFetcherfor CF artifacts. No dedicated MPFS scraper; discovery generates manifests recording snapshot reuse vs download entries. CI runstools/verify_source_map.pysoREF-cms-pricing-source-map-prd-v1.0.mdstays synchronized with discovered artifacts.
CMS publishes four planned RVU refreshes per calendar year plus ad-hoc correction notices. We align ingest scope and release identifiers to CMS suffixes so downstream lineage is deterministic.
| CMS Release | Expected Publish | DIS Release ID Examples | Notes |
|---|---|---|---|
| RVU25A | Early January | rvu_2025_A, gpci_2025_A |
Final Rule baseline. |
| RVU25B | April 1 | rvu_2025_B |
Mid-year update; overrides B quarter. |
| RVU25C | July 1 | rvu_2025_C |
Mid-year update; overrides C quarter. |
| RVU25D | October 1 | rvu_2025_D |
Mid-year update; overrides D quarter. |
| RVU25AR/BR/CR/DR | As needed | rvu_2025_AR, etc. |
Correction notices; suffix keeps CMS naming. |
Implications for the ingestor:
MPFSIngestor.ingest(year, quarter)normalizesquarter/suffix/rvu_YYYY_SUFFIXso operators can target the correct CMS release; defaults to the latest registered snapshot when no quarter is supplied.- Snapshot discovery must prefer the requested release ID; if absent, the run should fail fast with instructions to ingest RVU first.
- RVU publish stage registers curated outputs with dataset-specific release IDs (
rvu_YYYY_S,gpci_YYYY_S,anescf_YYYY_S,locality_YYYY_S,oppscap_YYYY_S). MPFS discovery therefore requests the matching prefix per dataset; shared release IDs are no longer accepted. - Snapshot metadata persists manifest URLs for provenance, but
DatasetSnapshotServiceresolves the actual parquet path before MPFS loads a dataset. Local manifest-only entries are acceptable as long as the referenced parquet exists on disk. - If operators request a quarter (
Q2,B, etc.) that does not exist indataset_snapshots, the ingestor must fail fast with aValueErrorinstructing them to run the missing RVU ingestion first. MPFS is prohibited from silently falling back to “latest” when an explicit quarter is supplied. - Metadata emitted by the MPFS run (batch manifest, observability record,
mpfs_cf_vintage) includestarget_release_suffix,requested_release_param, and the resolved RVU/GPCI release IDs to keep lineage auditable across quarters.
- Curated Views:
mpfs_rvu_latest,mpfs_gpci_latest, andmpfs_cf_currentprovide Latest-Effective semantics for pricing services - Digest Pinning: APIs must accept
X-Dataset-Digest/?digestmatching curated manifest digests - Access Controls: Internal APIs behind token-based auth per STD-api-security-and-auth-prd-v1.0.md; pricing outputs include attribution if surfaced externally
Persist the full Medicare Physician Fee Schedule (MPFS) inputs — RVUs, policy/status indicators, Localities & GPCIs, and annual Conversion Factors — as authoritative datasets following the DIS pipeline.
The ingestor performs validation, normalization, and provenance tracking but does not compute payment amounts.
Payment calculations are executed on-request by the Pricing API using these curated inputs.
This design minimizes recomputation and decouples pricing logic from ingestion cadence.
-
Included
- RVU quarterly files (A / B / C / D)
- National / abstract / payment files (when published)
- Locality & GPCI files
- Annual Conversion Factors (Physician + Anesthesia)
- All published policy / status / indicator columns
- Curated DIS input tables (
mpfs_rvu,mpfs_gpci,mpfs_cf_vintage, etc.) for use by real-time calculators
-
Excluded (in v1)
- Sequestration adjustments
- Site-neutral transforms / overrides
- Pre-computed payment outputs (facility / non-facility amounts now computed dynamically)
-
Adjacent / references
- PFS Look-Up Tool (used for API parity testing)
- OPPS Addendum B (future site-neutral analysis)
- PFS Relative Value Files (e.g. RVU25A/B/C/D) — quarterly refreshes
- PFS documentation / abstracts
- Localities & GPCI (yearly sets + occasional updates)
- Conversion Factors (annually)
- Update cadence & vintage
- Ingest CY 2023–2025 now; retain 6 years
- Monthly scraper to detect updates
- Freeze historical quarterlies; diff reports between versions
- CMS Physician Fee Schedule Relative Value Files — authoritative quarterly RVU bundles (A/B/C/D) and supporting documentation
- CMS Physician Fee Schedule Look-Up Tool — parity benchmark for sampled HCPCS pricing checks
- CMS Locality Key & GPCI Documentation — locality metadata and GPCI definitions used in geography joins
- CY 2025 Physician Fee Schedule Final Rule / Conversion Factor Release — current conversion factor guidance
- How to Use the Medicare Physician Fee Schedule (CMS Booklet) — official payment calculation methodology reference
rvu_raw_YYYYQ— all columns from the corresponding RVU filepfs_abstract_raw_YYYYQ— abstracts / national payment snippetslocality_raw_YYYY— locality metadata (carrier, locality, name, state, etc.)gpci_raw_YYYY— GPCI indices (Work, PE, MP) by localitycf_raw_YYYY— Physician & Anesthesia conversion factors
mpfs_rvu— core RVUs + indicators keyed by(hcpcs, modifier, quarter_vintage)mpfs_indicators_all— exploded table of policy/indicator flags for analyticsmpfs_locality— locality dimension (id, code, name)mpfs_gpci— GPCI indices per locality and vintagempfs_cf_vintage— CF values per yearmpfs_link_keys— minimal key set for downstream joins (hcpcs, modifier, quarter, locality_id)`
Note: MPFS ingestion no longer produces
mpfs_payment_curated.
All payment computations are performed dynamically at request time.
- Primary key:
(hcpcs, modifier, quarter_vintage) - Localities keyed by
(carrier_id, locality_code, year) - ZIP linkage: use external ZIP → Locality resolver to derive pricing views (compute price downstream)
-
Schema presence checks (no dropped columns)
-
Idempotency (checksum, content type, consistent downloads)
-
Vintage lock (old versions immutable)
-
Diff reports: compare each new quarterly version with prior (additions, deletions, indicator changes)
-
Dashboards track the DIS five pillars (freshness, volume, schema, quality, lineage) with release-by-release diff summaries
-
Non-blocking parity review compares sampled HCPCS rows against the CMS PFS Look-Up Tool to confirm facility vs non-facility splits
- Retain all status / indicator columns verbatim (global surgery, PC/TC, bilateral, multiple proc, etc.)
- Document meaning of key indicators in annex (e.g. “Status Indicator A means …”)
- Computation Timing: The MPFS ingestor no longer computes payment amounts during ingestion.
Instead, it validates and stores authoritative input data (RVU, GPCI, Conversion Factor) for on-request calculation. - Runtime Calculator Responsibility: Payment formulas are applied by the Pricing API when requests are made, ensuring values remain current with CF or GPCI changes.
- Formula Reference:
Payment = CF × [(Work_RVU × GPCI_Work) + (PE_RVU × GPCI_PE) + (MP_RVU × GPCI_MP)] - Caching: The Pricing API may cache computed prices per
(hcpcs, modifier, zip, year)for performance. - DIS Role: The ingestor focuses purely on validation, lineage, and schema enforcement; no facility/non-facility calculations are materialized.
- Conversion Factor Scope: v1 retains the Physician CF; anesthesia or mid-year factors may be added later via YAML or CLI overrides.
- The Pricing API performs MPFS payment calculations on-request using curated input datasets.
- Facility vs. non-facility pricing is determined at runtime based on request context.
- The calculator reads the latest curated inputs:
mpfs_rvu(RVUs + indicators)mpfs_gpci(geography indices)mpfs_cf_vintage(conversion factors)
- Results are cached for repeat queries to improve latency.
- Parity with CMS Look-Up Tool is validated through API-level QA tests, not ingestion-time checks.
- Default behaviour derives the physician conversion factor directly from the registered RVU snapshot; CMS download/override is used only when a release publishes out-of-band CF guidance (e.g., correction notice).
- When CMS publishes a CF artefact, the fetcher downloads and caches under
data/ingestion/mpfs/raw/{year}; metadata records the source URL or override path. - Manual overrides are managed via the YAML Config Service (preferred) or CLI flags (emergency fallback). Overrides are keyed by release suffix (e.g.,
mpfs_2025_B.yaml) or year-level files with per-release sections. - CF governance supports both calculator and batch audit use cases; ingestion records metadata, not computed values.
YAML Config Service (cms_pricing/ingestion/services/mpfs_config_service.py):
- Per-release configuration files:
cf_overrides/{release_id}.yaml. Year-level files (e.g.,cf_overrides/mpfs_2025.yaml) may contain areleases:map with entries such asA,B,2025_Q2, etc.; the service merges matching keys (suffix + quarter alias) so checksum/path overrides can be specified independently. - YAML schema (top-level or under
releasesentries):manual_override_path: "/path/to/cf_2025.xlsx" expected_checksum: "abc123def456..."
- Config service checks YAML first (if available), falls back to CLI flags if missing or invalid
- Config is cached in-memory for process lifetime (restart required for updates)
- Error handling: Missing YAML → WARN + fallback; Malformed YAML → Error with file path/line number + fallback
- When no override is supplied, ingestion logs
conversion_factor_strategy=derive_from_rvuand records the RVU release ID used to compute CF rows; overrides switch the strategy todownload.
CLI Flags (Fallback):
- IMPORTANT: CLI flags remain the primary/fallback mechanism until YAML service is live and production-ready
- Pass
--cf-override-pathand--cf-expected-checksumwhen invoking ingestion script - Once YAML service is stable, CLI flags become override/emergency only
Migration Path:
- Current State: CLI flags are primary method
- After YAML Service Lands: YAML config becomes primary, CLI flags become override/emergency only
- Recommended: Store override files under
data/ingestion/mpfs/manual_overrides/with restricted permissions
Provenance:
-
Snapshot registry must capture override provenance by storing
manifest_urlorcurated_pathpointing to CF artefact used -
Override metadata (path + checksum) persisted on
mpfs_cf_vintagesnapshot metadata -
Sequestration or other adjustments are applied after base price
-
Geographic floors / overrides handled as metadata flags (not transforms)
- CMS publishes these as open program data; include attribution in manifests
- Record source URL, last-modified, robots/ToS metadata
- Follow
prds/RUN-mpfs-ingestion-v1.0.mdfor quarter-end validation, including HCPCS spot checks against CMS RVU documentation and locality/GPCI sampling across MACs - Confirm conversion factors (physician and anesthesia) are pinned in
mpfs_cf_vintagefor the current calendar year before enabling downstream consumers - Capture release notes and supporting PDFs alongside manifests so operators can reference CMS change summaries during incident triage
- Incorporate sequestration reductions downstream
- Add flags / override logic for geographic floors (e.g. PE floor)
- Site-neutral analytics via joining into OPPS data
- Support retroactive updates or corrections (CMS errata)