Atlas methodology
How Atlas computes the numbers it ships, where the data comes from, and what the confidence grades mean.
What Atlas is
Atlas is a question-answering engine for U.S. political geography. Ask a question in English, get a durable, citable answer composed of typed sections (overview, historical trend, demographic lens, flip analysis, peer list, comparison table, shape map). Every answer has a permalink and an auditable plan.
Atlas is not a forecast model. It does not predict future elections. It is not a polling aggregator. It does not collect or synthesize survey data. It is not partisan commentary. It reports results as they were certified.
Transparency matters because Atlas is an analytical tool used by journalists, academic researchers, campaign staff, and policy teams. The numbers drive decisions. This page documents every assumption the numbers rest on.
Data substrate
Every Atlas answer pulls from four substrates: precinct-level presidential results, block-level disaggregation, block-and-block-group demographics, and census geography.
Precinct-level presidential results
| Cycle | Source | Coverage |
|---|---|---|
| 2016 | VEST | 51 / 51 jurisdictions, 174K precincts |
| 2020 | VEST | 51 / 51 jurisdictions (46 data + 5 geometry-only: FL, MD, NJ, TX, AK), 163K precincts |
| 2024 | NYT precinct map | 50 / 51 jurisdictions (AK unavailable), 163,926 precincts, 481,541 D/R/OTH results |
Block disaggregation
Precinct tallies are pushed down to Census blocks using a population-weighted centroid-in-polygon assignment with Hare largest-remainder rounding. Every block inherits the vote shares of the precinct it falls inside, scaled by its share of the precinct’s 18+ population.
| Cycle | Coverage | Scale |
|---|---|---|
| 2020 | 50 / 51 states (AK excluded) | 8.0M blocks, 17M results, 154M votes, 0.12% orphan-block skip rate |
| 2016 | In progress (50 / 51 states planned, AK excluded) | Same method as 2020 |
The 0.12% orphan-block rate refers to blocks whose centroid falls outside any VEST precinct polygon (typically offshore islands, tribal enclaves, and block-group slivers). These blocks are excluded from disaggregation rather than back-filled.
Demographics
| Field class | Source | Vintage |
|---|---|---|
| Population, race, ethnicity (block-grain) | Census PL 94-171 | 2020 |
| Income, poverty, education, age, tenure, language (county + CD + place) | ACS 5-year | 2020–2024 release |
Geography
| Layer | Source | Vintage |
|---|---|---|
| States, counties | Census Cartographic Boundary | CB 2020, 500k resolution |
| 118th + 120th Congressional Districts, State Legislative Districts | Census Cartographic Boundary | CB 2023 |
| Census blocks and block groups | Census TIGER | CB 2020, 239,502 block-group polygons at 100% ST_IsValid |
Margin convention
Every margin on Akashic Edge, Atlas included, uses a single formula:
margin = (dem_votes - rep_votes) / total_votes * 100
Positive values mean a Democratic lead; negative values mean a Republican lead. The denominator is total votes cast, not the two-party total. Third-party and write-in votes count in the denominator.
Example: a county with 60 Democratic votes, 40 Republican votes, and 100 other votes has 200 total votes. The margin is (60 − 40) / 200 × 100 = D+10, not D+20. Two-party normalization would inflate the figure to 20, obscuring the 50% third-party share.
Two-party normalization is rejected on two grounds. It hides third-party strength in cycles where it mattered (1912, 1968, 1992, 1996, 2016, 2024). And it diverges from the way Secretaries of State, wire services, and the AP report results.
Reaggregation methods
Atlas resolves every shape to one of five execution paths. The planner picks the path based on the shape’s type and whether it maps to canonical geo_ids.
| Method | When it runs | Accuracy bound |
|---|---|---|
| direct_mv_lookup | Named geo with canonical geo_id (state, county, CD, SLD, place) | ≤ 0.01 pp vs certified totals |
| block_group_rollup | Derived shape composed of known block-group members (precomputed membership) | ≤ 0.5 pp vs block-level truth |
| block_group_spatial | Arbitrary drawn shape or isochrone; ST_Intersects against block-group centroids | ≤ 0.5 pp vs block-level truth |
| block_precise | Reserved for v2; not active in v1 | Target: ≤ 0.1 pp |
| state_fallback | Alaska only; county-level rollup because block disaggregation is unavailable | Certified state totals; sub-state precision unavailable |
The 0.01 pp bound on direct lookups was verified against Allegheny County, PA in 2024: Atlas returned D+20.3, the materialized view stored 20.31, and the certified county total matched both. The 0.5 pp bound on block-group methods was verified by comparing spatial rollups against the corresponding direct lookups for every state.
One hazard the planner guards against: when a resolver returns a single canonical geo_id but fails to stamp source_geo_ids on the resolved shape, downstream tools fall into the spatial path and pick up cross-border block groups. On Allegheny this produced a systematic 4-point error (D+16.3 observed vs D+20.3 true) before the fix landed. Every resolver now populates source_geo_ids when the shape is canonical.
Alaska
Alaska has no block-level disaggregation for 2016 or 2020. VEST does not publish precinct shapefiles for Alaska, and the state’s House Districts-as-precincts structure does not translate cleanly to Census blocks.
Atlas handles Alaska in three ways. Shape-bounded questions that depend on block math exclude Alaska and say so in the answer. Statewide Alaska questions fall back to certified borough-level totals aggregated from the Alaska Division of Elections precinct file (2024 presidential, 30 boroughs, 120 results). All Alaska answers carry a note on the reaggregation.notes field that the UI surfaces inline.
Confidence grades
Every Atlas answer carries a confidence grade on each section. The grade maps to the reaggregation method and the resolver path.
| Grade | Conditions | What it means |
|---|---|---|
| high | direct_mv_lookup on a canonical geo_id; no spatial math; data vintage matches question year | Trust the headline number. Suitable for publication. |
| medium | block_group_rollup on a composed shape with known block-group membership | Trust the direction and magnitude; cite with the method note. |
| low | block_group_spatial on arbitrary or isochrone shapes, or state_fallback on Alaska | Directionally correct; sanity-check against the containing geo before citing a specific margin. |
A low-confidence answer is not a wrong answer. It is an answer whose precision is bounded by spatial intersection or certified-total fallback rather than exact geo_id matching. Power users who need a specific number should prefer shapes that resolve to canonical geographies.
AI narrator
Atlas uses Claude Opus 4.7 with adaptive-high thinking and 1M context to plan the analysis. The planner reads the user’s question, picks section types, and selects the tools and parameters that feed each section. The resulting AnalysisPlan is JSON — every tool call, every geo filter, every threshold is auditable.
The narrator is a separate model per section, prompted with a cached ~102K-token system prompt that carries the full tool catalog, the Akashic semantic layer, the 133-template Historian corpus, and few-shot plan examples. Prompt caching keeps per-call cost and latency bounded.
The narrator reads pre-computed numbers and writes sentences about them. It cannot query the database. It cannot change a margin, a population count, or a cycle year. The numbers come from SQL and materialized-view reads; the prose is a function of those numbers plus the section’s narrative angle.
Voice is enforced by 41 forbidden-phrase regexes plus one automatic retry on violation. If the retry still fails, the narrator returns the raw section data without prose rather than shipping a voice-violating sentence. The full voice guide lives at plans/atlas/atlas-voice.md.
What the model can still fail on: mis-nesting a plan field under a placeholder key, over-routing compound questions to the Historian tool, or classifying an ambiguous similarity question as a compound query. The Plan Inspector on every answer surfaces voice_flags and structural warnings so reviewers can spot these cases.
How to verify any Atlas answer
Every Atlas answer is auditable four ways.
- Permalink. Each answer has an immutable 12-character
answer_idat/atlas/a/[answer_id]. The plan, the data, and the prose are frozen at write time. - Plan Inspector. "How Atlas thought about this" opens the full AnalysisPlan JSON — archetype, sections, tools, parameters, narrative angle. Users can trace every number back to the tool call that produced it.
- Containing-geo comparisons. Every answer lists the state, county, and 120th-Congress district that contain the shape. Users can sanity-check an arbitrary-shape margin against its parent geographies without leaving the page.
- Exports. JSON, CSV, and GeoJSON export endpoints let users re-run the analysis in their own tools. Shipping in Phase 4 Batch B-2.
Coalition Compare methodology
Coalition Compare is the Atlas v1.1 archetype that answers questions of the form "compare these candidates' coalitions." Subjects are candidate-cycle pairs (e.g. Gore 2000, Kerry 2004, Clinton 2016, Harris 2024); geography is a slicing dimension, not the focus.
A coalition, in Atlas terms, is the geographic and demographic profile of a candidate’s vote — "in counties X", not "among voters X." Atlas joins on county-level certified results, not on voter files or exit-poll panels. The read is a population-by-place comparison.
Vintage-matched controls
Each cycle uses era-appropriate demographics. A Gore 2000 coalition is read against 2000 demographics; a Harris 2024 coalition is read against 2020-vintage demographics. The quintile bins are recomputed per subject, so "top-college quintile" means top quintile of the 2000 distribution for Gore and top quintile of the 2020 distribution for Harris — different counties, but each canonically the high-college tier of its day. This is the published-research standard (Pew Research, Catalist) for cross-cycle coalition compare.
| Cycle | Socioeconomic | Race / Hispanic | Ancestry | Religion |
|---|---|---|---|---|
| 2000 | Census 2000 SF3 | PL 94-171 2000 + SF1 | Census 2000 SF3 P033 | ARDA RCMS 2000 |
| 2004 | ACS 5yr 2005-2009 | PL 94-171 2000 | ACS B04006 5yr 2005-2009 | ARDA RCMS 2000 |
| 2016 | ACS 5yr 2012-2016 | PL 94-171 2010 (deferred) | ACS B04006 5yr 2012-2016 | ARDA RCMS 2010 |
| 2024 | ACS 5yr 2020-2024 | PL 94-171 2020 | ACS B04006 5yr 2020-2024 | ARDA RCMS 2020 |
Vintage routing is data-driven. The demographic_control_assignments table maps each (election year, control family) pair to the right census_vintage tag. Coalition tools never hardcode a vintage. When a vintage is not yet loaded — PL 94-171 2010 is currently deferred — affected cells are skipped with a note rather than blocking the whole answer.
Per-era quintile binning
Each subject’s quintiles are computed against ITS OWN era’s county distribution. The window function used is NTILE(5) partitioned by (subject, dimension) so the same county can fall in different quintiles for different cycles when its relative position changes. A county that was median-college in 2000 may be a top-quintile college county in 2024 because the national distribution shifted up; both reads are correct within their own era’s frame.
Margins
Coalition margins use the same convention as the rest of Atlas: (dem_votes - rep_votes) / total_votes * 100. Two-party normalization is rejected for the same reasons outlined in "Margin convention" above. Third-party votes count in the denominator — important for subjects like Gore 2000 (Nader 2.7%) and Clinton 2016 (Johnson 3.3%).
Geographic-coalition framing
Coalition Compare describes geographic coalitions. "Harris ran 12 points stronger than Gore in top-college-quintile counties" is a county-grain statement, not a voter-grain statement. Voter-grain claims ("college-degree voters shifted X points") require voter-file or exit-poll joins that Atlas does not have. Section narrators are explicitly prompted to use county-grain framing. Every coalition_compare plan also surfaces this caveat in its ambiguity_notes.
County FIPS reconciliation
Four county boundary changes between 2000 and 2024 are handled via the geo_id_alias table. Coalition tools join through this alias map so a single county-ish region carries one stable identity across the four vintages.
- Colorado — Broomfield County (08014) created in 2001 from parts of Adams, Boulder, Jefferson, and Weld. 2000-vintage data joins to legacy parents.
- Virginia — Bedford City (51515) merged into Bedford County (51019) in 2013. 2000-vintage data is split-recombined.
- Alaska — Wade Hampton / Kusilvak (02158/02270) renamed in 2015. Atlas treats them as the same census-area identity.
- Alaska — Petersburg (02195/02275) created from parts of Hoonah-Angoon and Wrangell. Aliased back to the legacy parents for 2000/2010 vintages.
K-Means county clusters
The coalition_county_cluster section decomposes each subject’s map across 12 K-Means clusters from county_embeddings (32-dim, HNSW-indexed). The cluster labels — "dense urban core", "Sunbelt suburbs", "Plains farm belt", etc. — are derived from current demographics, not historical demographics. The framing is "places that today look like X", not "places that always looked like X." A 2000 county that has since gentrified may show up in the modern "dense urban core" cluster even though it wasn’t one in 2000; the cluster definition is the lens, the historical results inside it are the data.
Subjects: 2 to 8
Coalition Compare requires at least 2 subjects and caps at 8. The 2-subject lower bound is the natural pair compare; the 8-subject upper bound keeps the demographic grid readable and the parallel queries bounded. Adversarial questions like "every cycle since 1960" are limited or clarified by the planner.
Coalition Compare data sources
- U.S. Census Bureau. 2000 Decennial — Summary File 1 (SF1) and Summary File 3 (SF3). Census.gov. Public domain.
- U.S. Census Bureau. American Community Survey 5-Year Estimates — 2005-2009, 2012-2016, 2020-2024 releases. Census.gov. Public domain.
- Association of Religion Data Archives. Religion Census / Religious Congregations and Membership Study (RCMS), 2000 / 2010 / 2020. thearda.com.
- Bureau of Economic Analysis (BEA). Eight-region map used in the coalition_region_table section: New England, Mideast, Great Lakes, Plains, Southeast, Southwest, Rocky Mountain, Far West (plus DC). bea.gov. Public domain.
Known limitations (v1)
- No forecasts. Atlas is explanatory, not predictive. For forecasts, see the forecasts product.
- No exit-poll or survey data. Atlas reports precinct returns and demographics, not voter attitudes.
- No international data.
- Block-level precision covers 2008–2024. Pre-2008 questions resolve at state or county grain only.
- Drawing requires a pointer device. Mapbox Draw does not support keyboard-only drawing. Touch is supported on tablet-class viewports.
- Alaska precinct data gap. See the Alaska section above.
pct_non_hispanic_whiteis approximated at block grain. PL 94-171 publishes race and Hispanic-origin cross-tabulations at block-group grain but not at block grain; Atlas derives block-level NHW shares from the block-group rate applied to block totals.- Choropleth fills on compound block-group shapes are deferred to a future release. The underlying helper is built; the companion polygon fetch is not wired.
Source citations
- VEST — Voting and Election Science Team. Precinct-level results and shapefiles, 2016 and 2020. Harvard Dataverse. CC-BY.
- The New York Times. Presidential precinct map 2024. GitHub.
- Alaska Division of Elections. Certified 2024 precinct results. Official results.
- U.S. Census Bureau. PL 94-171 Redistricting Data Summary File, 2020 decennial census. Census.gov. Public domain.
- U.S. Census Bureau. American Community Survey 5-Year Estimates, 2020–2024 release. Census.gov. Public domain.
- U.S. Census Bureau. Cartographic Boundary shapefiles (CB 2020, CB 2023) and TIGER/Line block and block-group geometry. Census.gov. Public domain.
- MIT Election Data and Science Lab. County-level presidential and U.S. House returns, 1976–2024. MEDSL. CC-BY.
- Carl Klarner. State legislative election returns, 1967–2023. Used with permission via the State Legislative Elections Database. Academic access.
- Algara, Carlos; Amlani, Sharif. Replication data for U.S. county-level presidential, Senate, and gubernatorial returns, 1868–2020. Harvard Dataverse. CC-BY.
- Pettigrew, Stephen; Miller, Michael. U.S. House primary returns, 1956–2018. Harvard Dataverse. Academic access.
Last updated: 2026-04-26 · v1.1 (Coalition Compare)