Building Reliable Insurance Agents

A deep dive into our contextual reasoning engine

Dec 4, 2025

Robby Manihani

Introduction

At Pace, we believe successful agents in production mirror how human experts actually work: selective attention, cross-referencing, spatial reasoning. Insurance workflows can be particularly challenging. Typical documents can span hundreds of pages with large amounts of context and tabular data. Relationships can be complex and subtle. To tackle this, we built COR: our Contextual Reasoning engine, enabling our agents to process thousands of insurance workflows accurately and reliably for our customers.

RAG Solves the Wrong Problem

We tried traditional RAG: semantic chunking, embedding similarity, vector databases, retrieve top-K passages. In some cases it performed better than providing full documents, but it still fell short for many use cases. RAG is great for document search, but our insurance workflows already have a fixed set of inputs. We didn't need broad document discovery, we needed strong contextual reasoning and retrieval within a defined, complex dataset. For example, semantic embeddings can recognize that “coverage limits” and “policy maximums” are related terms, but they can’t determine which value applies, how it affects a claim, or what the correct interpretation should be.

How COR Works

COR processes documents through four stages, each mirroring human cognitive behavior while solving specific technical problems that enables our agents to reason over insurance workflows:

Stage 1: Parsing and Smart Chunking: COR uses a combination of traditional and LLM based approaches to parse and chunk document content. Tables stay connected to their context, headers remain linked to their sections, and visual elements get summarized inline with text.

Stage 2: Relevance Scoring: COR then breaks a query into clear, specific parts by splitting multi-step questions and defining vague terms. Then it scores each chunk from stage 1 by asking a lightweight model whether the chunk is relevant to the query rather than relying on vector similarity. Chunks judged highly relevant automatically bring in their adjacent chunks so the system preserves the surrounding context.

Stage 3: Contextual Assembly: Selected chunks are ordered chronologically and enriched with a deep metadata layer which includes page numbers, section headers, table/figure references, spatial layout, entity mentions, etc. This preserves the original document flow while stripping out noise and exposing the structural signals models actually rely on.

Stage 4: Reasoning and Citations: The filtered context is passed to the reasoning model along with the query, allowing it to work only with the most relevant information. This leads to higher accuracy and faster responses. Every output is directly tied to a citation that references a part of the input document. This ensures deterministic grounding for every result given.

The Result

Our benchmarks highlight the impact of adding cognitive architecture to LLMs. We evaluated this across three approaches:

Pace: Our contextual reasoning engine
Textract + GPT-5: Traditional extraction then reasoning with GPT-5
GPT-5: Off-the-shelf model with direct file upload

Naive GPT-5 and Textract-style pipelines can overload the model with too much context, leaving it unable to consistently identify which sections actually matter for a given task. Pace agents retrieve only the specific policy sections, amendments, tables, and conditions needed for the task at hand. The same models that achieve 70-83% accuracy when given raw documents reach 95-100% through Pace.

Demo to Production

Building agentic systems that meet production SLAs requires understanding a fundamental paradox: language models have superhuman capabilities but often need human-like guidance to use them effectively.

Models can process millions of tokens and reason across complex documents faster than any human. Yet they still need information structured and prioritized the way human experts naturally approach problems.

Pace works because we built steering systems that mirror how humans process information, then let models execute that processing at machine speed. The result is agents that think systematically rather than merely pattern-match across massive context windows.

If you are interested in solving the hardest technical problems in applied AI, we’re actively hiring and would love to chat! See our open roles