Skip to content
AI Research & Development

The research behind insurance's most accurate AI.

Bevaya Labs is the applied research team behind InsurGPT™. We build specialized models on 300M+ real insurance documents, publish our methods, and ship every model into production. 93% accuracy on loss runs — verified against the frontier.

AI Research & Development

By the numbers.

Real insurance documents trained
300M+
Production accuracy
98%+
Specialized models in production
60+
Years of focused R&D
7+
Behind the platform

How we build and govern the models.

Specialized model development — each InsurGPT™ model is purpose-built for one insurance task and trained on real proprietary documents.

Continuous model improvement as Bevaya Labs monitors for drift, retrains on cadence, and shares gains across carriers through federated learning.

Open research and tooling published through Bevaya Labs, GutenOCR, and venues like COLING.

The Benchmarks

When put to the test, InsurGPT™ beats general AI on every insurance task.

Specialized models outperform general-purpose ones on the work your team actually handles. Here's the head-to-head.

On claims accuracy, InsurGPT™ scored 99%. The strongest general model managed 62%.

A 37-point gap on the work your team handles every day — claims indexing, FNOL, demand letters, medical bills.

Source: Roots benchmark tests, December 2025. View methodology

Claims Accuracy

InsurGPT™
0%
Mistral AI
62%
GPT-5.0
58%
Gemini 3.0 Pro
55%
Featured Research

GutenOCR. Grounded OCR for production documents.

Generic OCR fails on the messy, multi-format documents insurance operations actually receive: handwritten notes, faxed pages, six-generation scans, multi-column tables with merged cells. GutenOCR is Bevaya Labs' proprietary grounded OCR, built specifically for the documents that defeat off-the-shelf tools.

  • Handles handwritten notes, faxed pages, and low-quality scans where generic OCR fails.
  • Every extracted field traces back to its exact source location — the foundation of X-Ray Mode.
  • Powers Document Intelligence inside the Bevaya Platform at production scale today.
Claim Number
CLM-2024-118827
0.99
p1 · line 14 · col 28
Demand_Letter_FAXED.pdf Page 1 of 4
WEXLER LAW FIRM
Attorneys at Law · Personal Injury
 
May 2, 2026
VIA EMAIL: claims-intake@bevaya-demo.com
Bevaya Insurance Company · Claims Department
 
RE: Daniel R. Smith v. Stoltz Trucking & Logistics LLC
Date of Loss: November 14, 2024
Claim Number:  CLM-2024-118827
Policy Number: CGL-PA-7711-04
Every extracted field traces to its source token. Even from faxed pages, handwritten notes, and low-quality scans.
INSURANCE-NATIVE REASONING

It is not extraction.
It is judgment.

The hard part is not reading the document. The hard part is understanding what it means in an insurance context. Bevaya Labs builds models that reason about insurance — not just transcribe it.

Insurance math, validated

When a loss run omits Total Incurred because Paid and Reserves are listed separately, InsurGPT™ reconciles the calculation, validates the result, and flags low-confidence outputs for human review. Insurance arithmetic isn't a transcription problem — it's a reasoning problem the models have to get right.

Derived Field Reconciliation | Reconciled

Total Incurred missing from a loss run? InsurGPT™ derives it from Paid + Reserves + Expenses, validates the math against the document's own totals, and surfaces the computed value alongside its inputs.

Cross-Column Validation | Verified

Every numeric field is checked against neighboring columns and document-level subtotals. A reserve figure that doesn't roll up to the schedule total gets caught before it leaves the canvas.

Low-Confidence Routing | Flagged

Calculations that fall below your confidence threshold route automatically to a human reviewer with the source page, the inputs used, and the proposed value — not a black-box answer to rubber-stamp.

Clinical coherence on medical bills

Drug codes extracted from a medical bill are validated against national databases to verify clinical coherence — a check a general-purpose model has no concept of. Knowing what an NDC number is, and what it should appear next to, is insurance domain knowledge encoded in the model.

NDC Validation | Verified

Every National Drug Code pulled off a bill is checked against the FDA's NDC Directory to confirm the code exists, the drug name matches, and the dosage form is consistent with what's billed.

CPT & ICD Coherence | Cross-checked

Procedure codes (CPT/HCPCS) are cross-referenced against diagnosis codes (ICD-10) to catch billing combinations that don't clinically hang together — the kind of mismatch a generalist OCR has no way to see.

Provider & Pricing Sanity | Calibrated

Billed amounts are sanity-checked against expected ranges for the procedure, provider type, and jurisdiction. Outliers are flagged before they reach reserve-setting, not after.

Intelligent document orchestration

A 40-page submission package is identified as multiple document subtypes — ACORDs, supplementals, loss runs, schedules — split into indexed components, and routed through the right workflow for each type. No human triage.

Multi-Subtype Classification | Identified

InsurGPT™ reads a single bundled PDF and identifies every document inside it — ACORD 125, ACORD 140, supplemental applications, loss runs, SOVs — without anyone pre-tagging the pages.

Component Splitting & Indexing | Indexed

The package is split at the right page boundaries, each component is indexed with its subtype and page range, and downstream nodes pull the slice they need instead of re-parsing the whole bundle.

Subtype-Aware Routing | Routed

Each component is dispatched to the workflow tuned for it — loss runs to the reconciliation flow, ACORDs to underwriting intake, SOVs to schedule normalization — with no human in the middle making the routing call.

 
Research Output

Published, so customers can verify the claims.

 

Case Study - claims
Research

Page stream segmentation with LLMs

How Bevaya Labs approaches a foundational problem in insurance document AI.

Case Study - claims
Case Study

Workers' comp carrier processes claims 100x faster

How indexing automation delivered 432% ROI in 12 months.

2026.06.02-library-webinar-registration-how-to-establish-clear-ai-ownership-in-your-insurance-organization
Architecture

Inside the Bevaya platform architecture

How specialized models, HITL controls, and integrations come together in production.

FAQ

Frequently asked questions.

Bevaya Labs is the applied research arm of Bevaya. The team develops specialized AI models for insurance, publishes findings to demonstrate the depth of the work, and releases open tools like GutenOCR. Every model the team builds is shipped into customer production deployments.

InsurGPT™ is a mosaic of dozens of specialized models, each purpose-built for one insurance task. When a document enters the system, InsurGPT™ selects the right combination of models for that document. A loss run is processed by models trained specifically on loss run formats. An ACORD form is handled by models that know every field and variation. That specialization is why InsurGPT™ reaches 93% accuracy on loss runs while general-purpose models sit at 80–84%.

Months, not days. Our loss run model alone took seven months of expert annotation by insurance domain specialists before reaching production quality. Across the model portfolio, one to two years of data collection and labeling is typical to create a robust, cross-customer model for a use case. This is why prompt engineering and off-the-shelf APIs cannot match purpose-built insurance AI.

Bevaya Labs runs production-grade MLOps. Every model is monitored for drift, controlled rollouts use A/B testing for every new version, and data and model versions are tracked for full reproducibility. Federated learning across the customer base means every carrier benefits from platform-wide improvements while their individual data stays protected.

Two reasons. First, insurance buyers do not believe bold claims without substantiation — publishing methods lets technical evaluators verify the work for themselves. Second, the team participates in the broader AI research community. Open tools like GutenOCR move the field forward and demonstrate technical depth that prompt-engineered competitors cannot match.

Ratish Dalvi is VP of AI and Machine Learning. He leads a team of AI researchers, ML engineers, and data annotation specialists focused on vision-language reasoning models for insurance. The team publishes at venues including COLING and maintains the Bevaya Labs research blog.

Yes. Bevaya supports proof-of-value evaluations on real customer documents, with published benchmarks for context. The Bevaya Labs team will share methodology and engage directly with technical evaluators on model architecture, training approach, and production results. For GutenOCR specifically, a live demo runs at ocr.roots.ai.

Building an in-house equivalent takes 12–36 months, requires a dedicated AI team and infrastructure, and starts at $5M+ in upfront cost. A single-model approach cannot match the specialized mosaic InsurGPT™ is built on, and the review experience, orchestration, and integrations all have to be built from scratch. Bevaya Labs delivers seven years of focused research, 300M+ documents of training data, and an operating platform — today.

GET STARTED

See the research running on your documents.

Every model the Bevaya Labs team builds goes into production. Let's connect so we can show you how  InsurGPT™ will work on your actual documents — with the confidence scores, source grounding, and benchmark data your team needs to evaluate it.