Pennsylvania's 2026 State House Field: A Crowded and Varied Research Landscape

In the last three cycles, Pennsylvania's State House races have drawn some of the largest candidate fields in the nation, with hundreds of Democrats and Republicans filing to run across 203 districts. The 2026 cycle continues that pattern: OppIntell tracks 736 candidates across seven race categories in the state, split among 266 Republicans, 450 Democrats, and 20 third-party or independent candidates. Of those, 642 have at least one source-backed claim — meaning the vast majority of candidates have some public-record footprint. But the depth of that footprint varies enormously. The average candidate in Pennsylvania carries 102.48 source-backed claims, a figure driven by heavily researched incumbents like Brian Fitzpatrick, Scott Perry, and Mary Gay Scanlon, each of whom has thousands of claims. For a first-time candidate in a crowded primary, the research depth often sits far below that average. Andrew Harbaugh, a Democrat running in the 63rd State House district, currently holds one source-backed claim, placing him at rank 212 of 736 within the state and rank 96 of 517 within his race. That single-claim profile places him in the top quartile of research depth among all Pennsylvania candidates — a counterintuitive position that reflects how many candidates have zero claims at all.

Andrew Harbaugh's Public Record: One Source, One Valid Citation

Andrew Harbaugh's OppIntell profile shows exactly one source-backed claim, which is also auto-publishable — meaning it meets the platform's quality and verifiability standards. That claim originates from a state-level source, likely the Pennsylvania Department of State's candidate filing database. In the broader research universe of 24,983 tracked candidates across 54 states, 4,010 are classified as thinly sourced with zero claims, while 4,061 are well-sourced with five or more claims. Harbaugh sits in the developing tier, a category that includes candidates whose public record is limited but not absent. His cohort tags — state-sos-only, thinly-sourced, crowded-field, top-quartile-research-depth — describe a candidate who has filed with the state but has not yet generated the cross-platform footprint that researchers would expect from a well-known contender. No FEC committee has been found for him, which is common for State House candidates who do not raise or spend federal money. No cross-platform IDs exist; there is no Wikidata entry, no Ballotpedia page, and no other verified digital presence linked to his candidacy. These are honestly acknowledged research gaps, not failures of the system — they represent the starting point for any researcher or opposition team looking to build a fuller picture.

What Researchers Would Examine Next: Building a Profile from One Claim

When a candidate profile holds only one source-backed claim, the research process typically begins with that single document and fans outward. For Andrew Harbaugh, the state filing is the anchor. From that document, researchers would extract the candidate's address, filing date, party affiliation, and the specific office sought. The next step would be to search for local news coverage mentioning Harbaugh in connection with the 63rd district, as well as any previous runs for office, civic involvement, or professional background. Because Pennsylvania's State House districts are relatively small — roughly 62,000 residents per district — local newspapers, community blogs, and municipal meeting minutes often contain references that do not appear in national databases. Researchers would also check county-level voter registration records, property records, and business filings under Harbaugh's name. The absence of a Ballotpedia page or Wikidata entry is not unusual for a first-time candidate; those platforms typically require a certain threshold of media coverage or electoral activity before a page is created. In the 2026 cycle, 19,184 of the 24,983 tracked candidates are state-SoS-only, meaning they have no federal registration and no cross-platform verification. Harbaugh is part of that majority, and his profile reflects the typical starting state for a grassroots challenger.

Comparing Harbaugh to the Pennsylvania Democratic Field and the National Context

Across Pennsylvania's 450 Democratic candidates, the average source-backed claim count is heavily skewed by incumbents and high-profile challengers. For a candidate like Harbaugh, who is not an incumbent and has not yet registered with the FEC, the expectation is a thin file. Nationally, only 1,626 candidates across all 54 states are cross-platform-verified (FEC + Wikidata + Ballotpedia), leaving the vast majority with state-only or no records. Harbaugh's single claim places him ahead of 4,010 candidates who have zero claims, but far behind the 4,061 well-sourced candidates. In the crowded 63rd district primary — which is part of a 517-candidate race universe — his research depth rank of 96 means that roughly 80% of the candidates in that race have fewer source-backed claims than he does. That is a function of the race's size and the fact that many candidates have not yet generated any public record. For opposition researchers, a thin profile can be both a challenge and an opportunity: it means less material to work with, but also that any new finding — a past donation, a social media post, a local controversy — could carry disproportionate weight. The developing tier is where many campaigns begin, and the pace at which a candidate fills that tier often signals how seriously they are taking the race.

Source-Readiness and the Competitive Research Advantage

OppIntell's source-readiness framework evaluates how prepared a candidate's public record is for scrutiny in a competitive race. A candidate with one source-backed claim is in the earliest stage of that readiness. For a campaign team, understanding what opponents or outside groups could find in the public record is the first step in building a defensive research file. If Harbaugh's team were to conduct a self-audit, they would start by verifying that the single state filing is accurate and complete, then proactively surface any other documents that might be discovered later — such as past voter registrations, property tax records, or business licenses. The absence of cross-platform IDs means that no automated system has yet connected Harbaugh's name across multiple databases, but that does not mean those connections do not exist. A manual search by a skilled researcher could uncover links that automated scraping missed. For journalists and voters, the developing profile means that any public statements Harbaugh makes — in debates, on social media, or in interviews — become the primary source of biographical information. The race's outcome may hinge less on what the public record currently shows and more on how quickly the candidate fills the gaps.

Methodology: How OppIntell Tracks Source-Backed Claims and Research Depth

OppIntell's research methodology begins with automated ingestion of public records from state election offices, the Federal Election Commission, Wikidata, Ballotpedia, and other structured databases. Each claim is validated against the source document and assigned a quality score. The candidate research signature — which includes source-backed claim count, within-state rank, within-race rank, cross-platform IDs, and cohort tags — is computed from this validated dataset. For Andrew Harbaugh, the single claim came from a state-SoS source, and the system identified no additional matches across FEC, Wikidata, or Ballotpedia. The honestly acknowledged research gaps are a feature of the methodology, not a bug: they tell users exactly what is missing and where further investigation is needed. In a universe of 24,983 candidates, the system prioritizes transparency about what it knows and what it does not. The developing tier tag signals that the profile is likely to grow as more sources are ingested or as the candidate generates new public records. For campaigns, this methodology provides a baseline that can be compared across districts, parties, and states, enabling a systematic view of the competitive research landscape.

Questions Campaigns Ask

What is a source-backed claim in OppIntell's system?

A source-backed claim is a verified piece of information extracted from a public record — such as a candidate filing, campaign finance report, or official biography — that has been validated against the original source document. OppIntell assigns a quality score to each claim and only counts it as source-backed if it meets the platform's verifiability standards. For Andrew Harbaugh, the single source-backed claim comes from a Pennsylvania state election filing.

Why does Andrew Harbaugh have only one source-backed claim?

Andrew Harbaugh is a first-time candidate in a crowded State House primary, and his public record is still developing. He has filed with the Pennsylvania Department of State but has not yet registered with the FEC, nor does he have a Wikidata entry, Ballotpedia page, or other cross-platform digital presence. Many candidates in similar positions have zero claims; Harbaugh's single claim places him in the top quartile of research depth among all Pennsylvania candidates.

How can researchers find more information about Andrew Harbaugh?

Researchers would start with the state filing and then search local news archives, county voter records, property databases, and business filings. Because the 63rd district is relatively small, community sources like municipal meeting minutes and local blogs may contain references not captured in national databases. Manual searches by experienced researchers often uncover connections that automated systems miss.

What does the 'developing' research depth tier mean?

The 'developing' tier indicates that a candidate's public record is limited but not absent — typically one to four source-backed claims. It suggests that the profile is likely to grow as more sources are ingested or as the candidate generates new public records through campaign activity, media coverage, or additional filings. For Andrew Harbaugh, the tier reflects a starting point that could expand rapidly as the 2026 race progresses.