01 — Practice

AI Historiographer & Evaluator.

I study what happens to institutional memory when AI touches it — with a focus on the civic record: the documents, histories, and public narratives that define how communities understand themselves.

02 — About

A historian who ended up inside AI systems — and started asking historian questions.

Katryna Peart is a historian who ended up inside AI systems — and started asking historian questions. Whose voice is this? What narrative is being pushed? Where does this flatten the truth? Through hands-on evaluation work and original research on how language models handle civic and historic documents, she developed a practice around the gap between what AI claims and what it actually does. Her work sits at the intersection of AI historiography and institutional memory — studying how generative systems construct, compress, and sometimes erase the records that communities depend on to understand themselves.

Her research focuses on the civic record: municipal documents, public histories, commemorative narratives, and institutional archives. She is the developer of Civic Pair Prompting (CPP), a replicable framework for evaluating municipal AI, and the Civic AI Evaluation Standard (CAES), a governance suite built for local government. Her work has appeared in Governing Magazine, Route Fifty, and PM Magazine, with seven working papers on SSRN. She holds an MA in Medieval and Modern History from Royal Holloway, University of London, and a BA in History from NYU.

03 — Services

Two practices. One question: what is AI actually doing to the record?

Practice 01

Municipal AI Governance

Local governments are deploying AI without the tools to evaluate what it's actually doing. I help small and mid-sized cities and towns close that gap — through original research, replicable frameworks, and hands-on testing that doesn't require an enterprise budget or a data science team.

My work is built on Civic Pair Prompting (CPP), an evaluation methodology developed through direct testing of live municipal AI systems, and the Civic AI Evaluation Standard (CAES), a 15-document governance suite designed specifically for local government. If your city is considering AI — or already using it — I can help you understand what you have.

Practice 02

AI Document Faithfulness Testing

When organizations use AI to summarize, process, or retrieve from institutional documents, they assume the output reflects what the document actually says. Often it doesn't.

AI systems routinely harden aspirational language into apparent commitments, construct comparative frameworks the document never made, smooth over conflict and accountability language, and import external knowledge without flagging it — all while producing outputs that read as authoritative. Whether you're a council preparing for reorganisation or a corporation about to hand your policy library to an AI summarization tool, the failure modes are the same.

This service tests what AI actually does to your documents before you find out the hard way.

What it is

A structured empirical evaluation of how AI systems handle your specific institutional documents — reports, policies, historical records, reorganisation plans, DEI disclosures, or any document your organization depends on for accountability and decision-making.

What it is not

An audit of your documents for accuracy or compliance. Your documents aren't the variable. The AI is.

How it works

Using an established prompt protocol developed through original research across civic and enterprise document corpora, I test your documents against one or more AI systems — including tools already deployed in your organization. I document verbatim outputs, annotate exactly where failure modes occur, and deliver a written findings report with specific recommendations.

What you get

A test report showing the prompts used, verbatim AI outputs, annotated findings mapped to failure mode categories (aspirational hardening, comparative fabrication, boundary violation, tension construction), and prioritized recommendations — including how AI is likely to restructure, reorganize, or summarize your documents at scale.

Who this is for

Local authorities, councils, and public sector organizations preparing for reorganisation. Legal, compliance, and records teams evaluating AI summarization tools. HR and communications functions considering AI for policy and institutional document processing. Enterprise clients evaluating how AI will restructure, summarize, or reorganize institutional documents before committing to a deployment.

Engagement tiers
Single-Document Diagnostic Baseline
$2,750 (£2,200)

One document tested against one AI system. Delivers an annotated output log showing exactly where aspirational hardening, boundary violations, or unflagged external language appeared. Ideal entry point for small councils and municipalities.

The Comparative Stress-Test
$4,850 (£3,900)

One core document tested across three AI systems including tools already on your staff's desktops. Delivers a comparative risk matrix and executive briefing report with specific procurement guardrails. The full protocol used in original research on UK and US civic documents.

Enterprise Corpus & Governance Audit
$8,500–$12,000 (£6,800–£9,600)

A comprehensive multi-document corpus tested across three AI systems. Delivers an enterprise-wide governance blueprint, full risk documentation, and a live 60-minute diagnostic presentation to your leadership or procurement team. Scope and pricing depend on corpus volume.

All engagements are one-time unless retesting is requested following system changes or new deployments. Contact to discuss scope.

04 — Talks

Three talks on what AI actually does.

Available for keynotes, concurrent sessions, workshops, and panels.

01 — Talk

RAG testing that holds up: evaluating LLMs for faithfulness, boundaries, and trust

As seen at STARWest 2026

A practical session on testing RAG systems for faithfulness, boundary violations, and the failure modes that traditional QA misses — with a framework attendees can apply immediately.

02 — Talk

When AI sounds right but isn't: practical failure mode detection for non-technical teams

A hands-on framework for identifying, documenting, and escalating AI errors — designed for department heads, compliance leads, and anyone responsible for AI outputs who doesn't have a data science background.

03 — Talk

Your Documents, Their Narrative: When AI Rewrites History

When organizations use LLMs to summarize reports, policies, and institutional records, they're not just compressing text — they're making decisions about whose voice matters, what conflict gets smoothed over, and what gets lost permanently. Drawing on original research into how language models handle civic and historic documents, this session shows what narrative flattening actually looks like, why it's a governance and compliance risk, and what to do instead.

05 — Research

AI historiography & institutional memory.

My research sits at the intersection of AI historiography and institutional memory — examining how generative systems handle the documents, narratives, and civic records that define how communities understand themselves. Current work spans municipal AI evaluation, K-12 AI literacy, and the ways language models reproduce, flatten, and sometimes fabricate historical and institutional content.

When AI Systems Tell Civic Histories

A Stress Test Using Newark's 350th Anniversary

Original stress-testing research examining how AI systems handle a real municipal commemorative corpus — documenting narrative reproduction, reconciliation softening, and aspirational fabrication as structurally predictable failure modes in civic documents. Published on SSRN.

06 — Credentials

Track record & recognition.

Conference
STARWest 2026, Anaheim
Publications
Governing, Route Fifty, PM Magazine
Research
7 SSRN working papers
Evaluation work
Google, Uber, enterprise clients
Award
PMJA Award, Newark 350
Based in
Leander, Texas
Available to travel
07 — Writing

Selected writing.

08 — Contact

Book a talk or an evaluation.

Available for keynotes, concurrent sessions, workshops, panels, and consulting engagements.