AI Historiographer & Evaluator.
I study what happens to institutional memory when AI touches it — with a focus on the civic record: the documents, histories, and public narratives that define how communities understand themselves.
A historian who ended up inside AI systems — and started asking historian questions.
Katryna Peart is a historian who ended up inside AI systems — and started asking historian questions. Whose voice is this? What narrative is being pushed? Where does this flatten the truth? Through hands-on evaluation work and original research on how language models handle civic and historic documents, she developed a practice around the gap between what AI claims and what it actually does. Her work sits at the intersection of AI historiography and institutional memory — studying how generative systems construct, compress, and sometimes erase the records that communities depend on to understand themselves.
Her research focuses on the civic record: municipal documents, public histories, commemorative narratives, and institutional archives. She is the developer of Civic Pair Prompting (CPP), a replicable framework for evaluating municipal AI, and the Civic AI Evaluation Standard (CAES), a governance suite built for local government. Her work has appeared in Governing Magazine, Route Fifty, and PM Magazine, with seven working papers on SSRN. She holds an MA in Medieval and Modern History from Royal Holloway, University of London, and a BA in History from NYU.
Two practices. One question: what is AI actually doing to the record?
Municipal AI Governance
Local governments are deploying AI without the tools to evaluate what it's actually doing. I help small and mid-sized cities and towns close that gap — through original research, replicable frameworks, and hands-on testing that doesn't require an enterprise budget or a data science team.
My work is built on Civic Pair Prompting (CPP), an evaluation methodology developed through direct testing of live municipal AI systems, and the Civic AI Evaluation Standard (CAES), a 15-document governance suite designed specifically for local government. If your city is considering AI — or already using it — I can help you understand what you have.
AI Document Faithfulness Testing
When organizations use AI to summarize, process, or retrieve from institutional documents, they assume the output reflects what the document actually says. Often it doesn't.
AI systems routinely harden aspirational language into apparent commitments, construct comparative frameworks the document never made, smooth over conflict and accountability language, and import external knowledge without flagging it — all while producing outputs that read as authoritative. Whether you're a council preparing for reorganisation or a corporation about to hand your policy library to an AI summarization tool, the failure modes are the same.
This service tests what AI actually does to your documents before you find out the hard way.
A structured empirical evaluation of how AI systems handle your specific institutional documents — reports, policies, historical records, reorganisation plans, DEI disclosures, or any document your organization depends on for accountability and decision-making.
An audit of your documents for accuracy or compliance. Your documents aren't the variable. The AI is.
Using an established prompt protocol developed through original research across civic and enterprise document corpora, I test your documents against one or more AI systems — including tools already deployed in your organization. I document verbatim outputs, annotate exactly where failure modes occur, and deliver a written findings report with specific recommendations.
A test report showing the prompts used, verbatim AI outputs, annotated findings mapped to failure mode categories (aspirational hardening, comparative fabrication, boundary violation, tension construction), and prioritized recommendations — including how AI is likely to restructure, reorganize, or summarize your documents at scale.
Local authorities, councils, and public sector organizations preparing for reorganisation. Legal, compliance, and records teams evaluating AI summarization tools. HR and communications functions considering AI for policy and institutional document processing. Enterprise clients evaluating how AI will restructure, summarize, or reorganize institutional documents before committing to a deployment.
One document tested against one AI system. Delivers an annotated output log showing exactly where aspirational hardening, boundary violations, or unflagged external language appeared. Ideal entry point for small councils and municipalities.
One core document tested across three AI systems including tools already on your staff's desktops. Delivers a comparative risk matrix and executive briefing report with specific procurement guardrails. The full protocol used in original research on UK and US civic documents.
A comprehensive multi-document corpus tested across three AI systems. Delivers an enterprise-wide governance blueprint, full risk documentation, and a live 60-minute diagnostic presentation to your leadership or procurement team. Scope and pricing depend on corpus volume.
All engagements are one-time unless retesting is requested following system changes or new deployments. Contact to discuss scope.
Three talks on what AI actually does.
Available for keynotes, concurrent sessions, workshops, and panels.
RAG testing that holds up: evaluating LLMs for faithfulness, boundaries, and trust
A practical session on testing RAG systems for faithfulness, boundary violations, and the failure modes that traditional QA misses — with a framework attendees can apply immediately.
When AI sounds right but isn't: practical failure mode detection for non-technical teams
A hands-on framework for identifying, documenting, and escalating AI errors — designed for department heads, compliance leads, and anyone responsible for AI outputs who doesn't have a data science background.
Your Documents, Their Narrative: When AI Rewrites History
When organizations use LLMs to summarize reports, policies, and institutional records, they're not just compressing text — they're making decisions about whose voice matters, what conflict gets smoothed over, and what gets lost permanently. Drawing on original research into how language models handle civic and historic documents, this session shows what narrative flattening actually looks like, why it's a governance and compliance risk, and what to do instead.
AI historiography & institutional memory.
My research sits at the intersection of AI historiography and institutional memory — examining how generative systems handle the documents, narratives, and civic records that define how communities understand themselves. Current work spans municipal AI evaluation, K-12 AI literacy, and the ways language models reproduce, flatten, and sometimes fabricate historical and institutional content.
A Stress Test Using Newark's 350th Anniversary
Original stress-testing research examining how AI systems handle a real municipal commemorative corpus — documenting narrative reproduction, reconciliation softening, and aspirational fabrication as structurally predictable failure modes in civic documents. Published on SSRN.
Track record & recognition.
Selected writing.
- Governing MagazinePicked up by ASPA
- Route Fifty
- NJ League of Municipalities MagazineMay 2026 cover featureFrom Newark 350 to America 250: What Municipal Leaders Can Learn
- PM Magazine / ICMA
- PM Magazine / ICMAIncorporated into materials in Bolton, CT
- PM Magazine / ICMASyndicated by CPSM
Book a talk or an evaluation.
Available for keynotes, concurrent sessions, workshops, panels, and consulting engagements.