METHODOLOGY
How Fonteum is built.
Three principles govern every record: federal-only sourcing, row-level provenance, daily reconciliation against the upstream federal release.
v1.3 · Updated 2026-05-25 · 17+ federal source families
Methodology v1.3·Last reviewed 2026-05-25·Changelog ↓·Download PDF →
Principle 1
Federal source data only.
Every record in Fonteum originates from CMS or HHS-OIG. No claims aggregators. No commercial data vendors. No scraped third parties.
CMS publishes the primary source for facility quality, provider identity, staffing, and financial performance. HHS-OIG publishes the federal exclusion list. These two agencies represent the authoritative, legally-mandated reporting layer for U.S. healthcare providers.
When a commercial vendor says “we have more data,” they mean they have added unverifiable enrichment on top of the same federal baseline. Fonteum does not do that. What you see is what the federal record says.
Source agencies
CMS
Centers for Medicare & Medicaid Services. 12 datasets covering facility quality, provider identity, staffing, cost reports, and payment programs.
HHS-OIG
U.S. Department of Health & Human Services Office of Inspector General. 1 dataset: the List of Excluded Individuals and Entities (LEIE).
Principle 2
Row-level provenance.
Every record carries source URL, ingestion timestamp, methodology version, and confidence tier. These four fields survive export — they travel with the data.
This is the provenance contract. A buyer who receives a Fonteum export can trace any field back to the exact federal release that produced it. No opaque pipeline.
The FHIR R4 API surfaces this as structured meta.tag arrays. The JSON export surfaces it as flat provenance cells per field.
Show provenance fields · by FHIR resource
{
"meta": {
"tag": [
{
"system": "https://fonteum.com/fhir/provenance",
"code": "cms-pecos",
"display": "CMS PECOS Ordering & Referring"
},
{
"system": "https://fonteum.com/fhir/snapshot-date",
"code": "2026-05-25",
"display": "Snapshot date"
},
{
"system": "https://fonteum.com/fhir/methodology",
"code": "us-core-practitioner/v1",
"display": "Methodology version"
},
{
"system": "https://fonteum.com/fhir/confidence",
"code": "high",
"display": "Confidence tier"
}
]
}
}Principle 3
Daily reconciliation.
Every dataset is diffed against its federal source within 24 hours of a new CMS or HHS-OIG release. Drift is logged. The snapshot date on each record is the federal release date, not our ingestion date.
Reconciliation catches: fields that changed between releases, records added or removed from the federal file, methodology version bumps that affect field definitions.
Changes that affect published figures result in a public corrections-log entry. Methodology version bumps result in a new entry in the changelog below.
2
Daily datasets
Within 24h of CMS release
3
Monthly datasets
Within 24h of monthly release
7
Quarterly datasets
Within 48h of quarterly release
3
Annual datasets
Within 48h of annual release
Technical methodology
How each layer works.
Source-pack ingestion
How a public dataset becomes a wired source.
Five-stage pipeline: (1) versioned manifest — authority, license, field list, restricted-source check; (2) bounded dry-run pilot and match-rate validation; (3) confidence calibration and low-confidence exclusion; (4) registry write to the sources registry; (5) public surfacing via SourceChip and ProvenanceCard.
Entity matching
Connecting profiles to source records.
Deterministic match on stable identifier (NPI, CMS Certification Number) with probabilistic fallback on name, address, and taxonomy code. Confidence tier assigned at match time. Low-confidence matches stay internal and are never displayed publicly.
Provenance architecture
The four-piece contract.
Every public field carries source name, last-checked ISO date, confidence tier, and display permission. Aligned with W3C PROV-DM Entity / Activity / Agent decomposition and the FAIR data principles — Findable, Accessible, Interoperable, Reusable. Wire shape at /data-provenance; UI shape at SourceChip.
Confidence scoring
Tiering today and on the roadmap.
Tier high requires stable-identifier match plus address agreement. Tier medium requires multi-field agreement without a stable identifier. Tier low covers residual ambiguity and is not displayed publicly. Threshold is deterministic and manually reviewed; a learned-model approach is planned once labeled gold sets are in place.
Historical change detection
Versioned at the field level.
Fonteum versions the value, not the surrounding metadata. Change record: provider_id, field, previous_value, new_value, source, observed_at, snapshot_id. Cadence per dataset: HHS-OIG LEIE monthly, CMS PECOS monthly, CMS Care Compare quarterly, CMS Provider of Services quarterly, CMS PBJ daily.
Reproducibility
How to re-run any published figure.
Every federal dataset Fonteum uses is downloadable from its source agency at no cost. CMS publishes its files at data.cms.gov; HHS-OIG publishes the exclusion list at oig.hhs.gov/exclusions. A motivated researcher can pull the same source files and reconcile against any figure Fonteum publishes. The methodology version on each record tells you which pipeline version produced it.
Citation guidance
How to cite Fonteum data.
Recommended form: Fonteum, "[Dataset Name]," [Month YYYY]. https://fonteum.com/data/[slug]. Source: CMS [dataset] via data.cms.gov. For research reports: Fonteum Research, "[Report Title]," [Month YYYY]. https://fonteum.com/research/[slug]. AP-style short and BibTeX variants available on each report page.
What Fonteum does NOT claim
Explicit disclaimer.
Fonteum does not independently verify provider credentials. Fonteum does not claim a provider is qualified, practicing, or in good standing. Fonteum publishes what CMS and HHS-OIG publish. Fonteum does not enrich federal records with commercial data, scoring models, or clinical interpretations. Any count on this site describes the Fonteum dataset, not a representative sample of the U.S. healthcare market.
Open data. Open methods.
Every field traces to its federal source.
Every federal dataset we use is downloadable from its source agency. CMS publishes its files at data.cms.gov. HHS-OIG publishes the exclusion list at oig.hhs.gov/exclusions. A motivated researcher can pull the same source files and reconcile against any figure we publish.
For procurement teams that need the full audit package: methodology document, per-field provenance map, reproducibility statement, SOC 2 status, and BAA template are all in the audit pack.
Methodology changelog
What changed, by release.
2026-05-03
UX / surface
Data Graph Visual v2 — moat as a one-second picture.
Homepage and /data-provenance now render a unified `DataGraphV2` schematic — three columns (Sources → Pipeline → Surfaces) at desktop, stacked vertically on mobile. Source nodes are real `<Link>` elements pointing at `/sources/[slug]`, color-coded by status (live / research-only / pending). Counters are derived live from `getNetworkStats()` + `getAllStudies()` + `SOURCES.length` — no hardcoded literals.
2026-05-03
Doctrine
Sources Library v2 — status field + restricted-sources doctrine.
Source-registry types extended with a `status` enum (live / research-only / pending-records-request / deferred) and two new tier values (`pending-manual`, `first-party-research`). Each source page now renders a status chip + ToS-and-usage-notes section + a worked sample-provenance line. /sources adds a 'Sources we do not use' rail with the four restricted/no-go datasets (NMLS, state bars, ABMS / CertiFacts, Google Places / GBP backfill) and the honest reason per item.
2026-05-03
UX / surface
/press rebuilt as a journalist + data-user landing page.
Press kit now carries a 'What we do not claim' doctrine block, a featured-datasets list (6 studies with source / snapshot / row count / Limitations deep-link), a copy-paste citation template, 5 story angles tied to specific studies, and a /data-platform cross-link tile. Counters derive live from `getNetworkStats()` — never hardcoded on the page. Doctrine: no fake press-mention logo strip, no fake customer / partner claims, no headshot placeholder.
2026-05-03
UX / surface
/data-platform — B2B / data-product surface.
New /data-platform page surfaces every dataset in a live catalog (sourced from the research registry) plus four explicitly-labeled 'concept' B2B export scopes. Includes a 'What we do not provide' guardrail block (no pre-screened provider lists, no restricted-source resale, no patient/customer data, no Google Places backfill, no paid API). Replaces the legacy /data press kit; /datasets aliases redirect here.
2026-05-03
UX / surface
Brand chart palette + StatTable typography polish.
All eight Sprint-1 + CMS Care Compare research-chart SVGs were regenerated with a unified brand palette — bars in `#0F766E` (brand teal), bottom-rated emphasis in `#b91c1c` (warn). StatTable headers carry an explicit `ui-monospace` font-family + tightened weight, and Highest/Lowest emphasis pills now use brand teal + paper tones. No data changed; only chart color tokens + table typography.
2026-05-03
UX / surface
/directories Coverage Atlas — explicit 4-status taxonomy per cell.
Coverage Atlas grid now resolves every (vertical × source-family) cell into one of four statuses (`live`, `research-snapshot`, `pending-pack`, `not-applicable`) instead of a binary check/no-check. Status legend visible above the grid; each cell carries a status badge + tooltip + (where applicable) a vertical/source link.
2026-05-03
UX / surface
Research study template reskinned to brand tokens.
Study pages (`/research/[slug]`) now render in the brand-token palette (Fraunces hero, paper/cream cards, mist borders, brand-teal accents on charts and CTAs). Citation aside, methodology accordion, FAQs, related-studies block, and chart figures all share a unified visual register. StatTable wrapper gained a horizontal-scroll affordance for narrow viewports.
2026-05-03
UX / surface
/directories repositioned as Coverage / Network Map.
The /directories page is no longer a flat list of Fonteum-operated directories. Verticals are now grouped by source family (Healthcare graph, Trades graph, Care/Research graph, Indexed coverage). Each card carries a status chip — `live` when source-pack writes are active, `pending` when only the manifest is registered. The grouping is derived live from the sources registry.
2026-05-03
Schema
Single canonical data-snapshot date across the brand hub.
BrandNav, /press factsheet, /data-provenance, footer, and research aggregates now all read from a single `DATA_SNAPSHOT_DATE_ISO` constant. Previously the date drifted across four different literals (April 24, April 25, May 1, May 3), which made the brand hub feel un-versioned. Refresh procedure: bump the constant.
2026-05-03
Research rule
Research pages carry an explicit AI-citation summary + Limitations panel.
Every research study page now renders a `What this dataset covers / does NOT cover` block above the methodology section, plus a Limitations section before the methodology. Dataset JSON-LD only emits when downloadable data exists, with `temporalCoverage` and `spatialCoverage` populated. Doctrine fallback applies when an individual study hasn't authored explicit limitations yet.
2026-05-03
Doctrine
Public source library at /sources.
Every public-record source Fonteum cites now has a stable detail page at /sources/[slug]. Each source page documents tier (Tier-1 research-only vs Tier-2 profile-enrichment), refresh cadence, fields used, write-locked fields, and the doctrine sentence the source family carries. The launch gate fails if any entry ships without explicit limitations + a doctrine line.
2026-05-03
Display rule
Profile provenance reveal cards on listing detail pages.
Source-backed fields on individual provider profiles now render through a shared `ProvenanceCard` component carrying a SourceChip + `What this means` + `What this does not mean` panels per source family. The non-endorsement sentence renders inline next to every cited value, never tucked away in a footer.
2026-05-03
UX / surface
/data-provenance upgraded to the public Data Graph page.
/data-provenance now documents the 7-stage pipeline (Sources → Source pack → Ingestion → Entity match → Field provenance → Display → Research / Verticals), the 4 source-family clusters, and per-field display rules. Carries the source-counters that match the homepage so the numbers can't drift.
2026-05-03
UX / surface
Homepage repositioned as the source-provenanced provider graph.
The Fonteum homepage now leads with `Local provider data, traceable to its source.` instead of a flat directory pitch. Real-data counters (active businesses, registered sources, provenance field rows) replace any estimated network metrics. Every fact Fonteum displays cites a source, last-checked date, and limitations sentence — the visible moat replaces the link-farm framing.
2026-05-03
Data snapshot
CMS Care Compare research bundle (home health + hospice).
Two more Tier-1 research snapshots published from the CMS Care Compare cluster: home-health quality by state and hospice provider availability by state. Source/date/limitations triplet on every cited field; Fonteum does not independently rate, inspect, verify, endorse, or guarantee any agency or hospice.
2026-05-03
Data snapshot
Dialysis facility research snapshot published.
First state-level snapshot of CMS Care Compare dialysis facility quality data. Tier-1 research-only — no facility profile writes; data appears in /research aggregates only.
2026-05-03
Data snapshot
Nursing-home research snapshot published.
First state-level snapshot of CMS Care Compare Nursing Home Provider Information master dataset. Special Focus Facility status reported at state-aggregate level only — never on individual facility profiles. CMS ratings appear as CMS ratings.
2026-05-03
Research rule
CMS Care Compare display rules — research vs profile separation.
Care Compare ratings are cited as CMS-published ratings, with the source URL + last-checked date + limitations sentence. State-level aggregates (mean rating, share-of-stars) NEVER attach to individual facility profiles — they render on /research only. Special Focus Facility status, fines, and abuse-icon flags are write-locked: captured to provenance, never surfaced.
2026-05-03
Display rule
Florida state-board contractor-license display rules (superseded).
Florida state-board state-license fields rendered with classification + status + expiration, alongside a `confirm with the state board` qualifier. Bond, workers-compensation, insurance, and disciplinary-history fields were captured to provenance but write-locked pending operator copy review. This display rule is no longer active — the contractor-licensing source family is registered but no longer ingested or rendered under the current healthcare-only scope.
2026-05-03
Display rule
CMS PECOS Medicare-enrollment indicator display rules.
PECOS-derived `Medicare-billing-active` indicator renders only on medical-archetype profiles. Display copy frames absence-from-PECOS as a non-negative — providers can be high-quality and not enrolled in Medicare. PAC ID and Enrollment ID are captured to provenance for audit but never rendered.
2026-05-03
Display rule
NPPES NPI display rules + non-endorsement doctrine.
Source-backed NPI, taxonomy code, and taxonomy description render on dermatology + chiropractic profiles only when match confidence ≥ 0.75. Each value carries a `Source: CMS NPPES · Last checked YYYY-MM-DD` chip. Provider credential strings from NPPES are captured to provenance but write-locked. The non-endorsement sentence (`Fonteum does not independently rate, inspect, verify, endorse, or guarantee any provider`) renders inline next to every cited value.
2026-05-03
Schema
Source-provenance schema codified.
Per-(business, source, field) provenance rows are written to the warehouse with `display_allowed`, `last_checked`, `confidence`, and `source URL`. The display layer reads only through a `provider_field_displayable` view that filters confidence ≥ 0.75, freshness ≤ 180 days, `display_allowed = true`, and source `is_active = true`. Anything failing any of those four filters is captured but not rendered.
Every number on this site must come from a real source. That’s the whole product.