AI Research & Development
The research behind insurance's most accurate AI.
Bevaya Labs is the applied research team behind InsurGPT™. We build specialized models on 300M+ real insurance documents, publish our methods, and ship every model into production. 93% accuracy on loss runs — verified against the frontier.
Confirms accuracy
Double-checks every field.
Loss RunReads loss history
Hundreds of carrier formats.
All ACORDsExtracts ACORDs
Every variant, 99%+ accuracy.
IndexingSorts 100+ doc types
Classifies every claim file.
Property schedules
Statement of Values in any layout.
Page StreamSeparates documents
Splits merged PDFs.
GutenOCRReads the unreadable
OCR that reads what others can't.
GroundingTraces every answer
Shows where each came from.
AI Research & Development
By the numbers.
Behind the platform
How we build and govern the models.
Specialized model development — each InsurGPT™ model is purpose-built for one insurance task and trained on real proprietary documents.
Continuous model improvement as Bevaya Labs monitors for drift, retrains on cadence, and shares gains across carriers through federated learning.
Open research and tooling published through Bevaya Labs, GutenOCR, and venues like COLING.
The Benchmarks
When put to the test, InsurGPT™ beats general AI on every insurance task.
Specialized models outperform general-purpose ones on the work your team actually handles. Here's the head-to-head.
On claims accuracy, InsurGPT™ scored 99%. The strongest general model managed 62%.
A 37-point gap on the work your team handles every day — claims indexing, FNOL, demand letters, medical bills.
Source: Roots benchmark tests, December 2025. View methodology →
Claims Accuracy
- InsurGPT™
- 0%
- Mistral AI
- 62%
- GPT-5.0
- 58%
- Gemini 3.0 Pro
- 55%
On underwriting accuracy, InsurGPT™ scored 93%. The strongest general model reached 84%.
9 to 13 points ahead of GPT-5.0, Gemini 3.0 Pro, and GPT-4.1 — across submission intake, loss runs, and exposure schedules.
Source: Roots benchmark tests, December 2025. View methodology →
Underwriting Accuracy
- InsurGPT™
- 0%
- Gemini 3.0 Pro
- 84%
- GPT-4.1
- 81%
- GPT-5.0
- 80%
Featured Research
GutenOCR. Grounded OCR for production documents.
Generic OCR fails on the messy, multi-format documents insurance operations actually receive: handwritten notes, faxed pages, six-generation scans, multi-column tables with merged cells. GutenOCR is Bevaya Labs' proprietary grounded OCR, built specifically for the documents that defeat off-the-shelf tools.
- Handles handwritten notes, faxed pages, and low-quality scans where generic OCR fails.
- Every extracted field traces back to its exact source location — the foundation of X-Ray Mode.
- Powers Document Intelligence inside the Bevaya Platform at production scale today.
INSURANCE-NATIVE REASONING
It is not extraction.
It is judgment.
The hard part is not reading the document. The hard part is understanding what it means in an insurance context. Bevaya Labs builds models that reason about insurance — not just transcribe it.
Insurance math, validated
When a loss run omits Total Incurred because Paid and Reserves are listed separately, InsurGPT™ reconciles the calculation, validates the result, and flags low-confidence outputs for human review. Insurance arithmetic isn't a transcription problem — it's a reasoning problem the models have to get right.
Derived Field Reconciliation | Reconciled
Total Incurred missing from a loss run? InsurGPT™ derives it from Paid + Reserves + Expenses, validates the math against the document's own totals, and surfaces the computed value alongside its inputs.
Cross-Column Validation | Verified
Every numeric field is checked against neighboring columns and document-level subtotals. A reserve figure that doesn't roll up to the schedule total gets caught before it leaves the canvas.
Low-Confidence Routing | Flagged
Calculations that fall below your confidence threshold route automatically to a human reviewer with the source page, the inputs used, and the proposed value — not a black-box answer to rubber-stamp.
Clinical coherence on medical bills
Drug codes extracted from a medical bill are validated against national databases to verify clinical coherence — a check a general-purpose model has no concept of. Knowing what an NDC number is, and what it should appear next to, is insurance domain knowledge encoded in the model.
NDC Validation | Verified
Every National Drug Code pulled off a bill is checked against the FDA's NDC Directory to confirm the code exists, the drug name matches, and the dosage form is consistent with what's billed.
CPT & ICD Coherence | Cross-checked
Procedure codes (CPT/HCPCS) are cross-referenced against diagnosis codes (ICD-10) to catch billing combinations that don't clinically hang together — the kind of mismatch a generalist OCR has no way to see.
Provider & Pricing Sanity | Calibrated
Billed amounts are sanity-checked against expected ranges for the procedure, provider type, and jurisdiction. Outliers are flagged before they reach reserve-setting, not after.
Intelligent document orchestration
A 40-page submission package is identified as multiple document subtypes — ACORDs, supplementals, loss runs, schedules — split into indexed components, and routed through the right workflow for each type. No human triage.
Multi-Subtype Classification | Identified
InsurGPT™ reads a single bundled PDF and identifies every document inside it — ACORD 125, ACORD 140, supplemental applications, loss runs, SOVs — without anyone pre-tagging the pages.
Component Splitting & Indexing | Indexed
The package is split at the right page boundaries, each component is indexed with its subtype and page range, and downstream nodes pull the slice they need instead of re-parsing the whole bundle.
Subtype-Aware Routing | Routed
Each component is dispatched to the workflow tuned for it — loss runs to the reconciliation flow, ACORDs to underwriting intake, SOVs to schedule normalization — with no human in the middle making the routing call.
More Capabilities
Explore the rest of the platform.
Designed, deployed, and governed together. Powered by InsurGPT™ and accessed through the AI Assistant.
Workflow Canvas
Visual builder and production runtime for every automation.
Current page ReviewHuman-in-the-Loop
Configurable review queues with X-Ray verification and a patented feedback loop.
Current page DocumentsDocument Intelligence
Read any insurance document — hundreds of carrier formats, scanned or digital.
Current page GroundingGrounded Explainability
Every value traceable to its source. X-Ray Highlight Mode brings citations to reviewers.
Current page AnalyticsAnalytics Dashboard
Live accuracy, STP rates, reviewer SLA, and agent performance across every workflow.
Current page GovernanceGoverned Automation
Immutable audit trails, role-based access, flow versioning. Compliance is the architecture.
Current pageResearch Output
Published, so customers can verify the claims.

Research
Page stream segmentation with LLMs
How Bevaya Labs approaches a foundational problem in insurance document AI.

Case Study
Workers' comp carrier processes claims 100x faster
How indexing automation delivered 432% ROI in 12 months.

Architecture
Inside the Bevaya platform architecture
How specialized models, HITL controls, and integrations come together in production.
FAQ
Frequently asked questions.
Bevaya Labs is the applied research arm of Bevaya. The team develops specialized AI models for insurance, publishes findings to demonstrate the depth of the work, and releases open tools like GutenOCR. Every model the team builds is shipped into customer production deployments.
InsurGPT™ is a mosaic of dozens of specialized models, each purpose-built for one insurance task. When a document enters the system, InsurGPT™ selects the right combination of models for that document. A loss run is processed by models trained specifically on loss run formats. An ACORD form is handled by models that know every field and variation. That specialization is why InsurGPT™ reaches 93% accuracy on loss runs while general-purpose models sit at 80–84%.
Months, not days. Our loss run model alone took seven months of expert annotation by insurance domain specialists before reaching production quality. Across the model portfolio, one to two years of data collection and labeling is typical to create a robust, cross-customer model for a use case. This is why prompt engineering and off-the-shelf APIs cannot match purpose-built insurance AI.
Bevaya Labs runs production-grade MLOps. Every model is monitored for drift, controlled rollouts use A/B testing for every new version, and data and model versions are tracked for full reproducibility. Federated learning across the customer base means every carrier benefits from platform-wide improvements while their individual data stays protected.
Two reasons. First, insurance buyers do not believe bold claims without substantiation — publishing methods lets technical evaluators verify the work for themselves. Second, the team participates in the broader AI research community. Open tools like GutenOCR move the field forward and demonstrate technical depth that prompt-engineered competitors cannot match.
Ratish Dalvi is VP of AI and Machine Learning. He leads a team of AI researchers, ML engineers, and data annotation specialists focused on vision-language reasoning models for insurance. The team publishes at venues including COLING and maintains the Bevaya Labs research blog.
Yes. Bevaya supports proof-of-value evaluations on real customer documents, with published benchmarks for context. The Bevaya Labs team will share methodology and engage directly with technical evaluators on model architecture, training approach, and production results. For GutenOCR specifically, a live demo runs at ocr.roots.ai.
Building an in-house equivalent takes 12–36 months, requires a dedicated AI team and infrastructure, and starts at $5M+ in upfront cost. A single-model approach cannot match the specialized mosaic InsurGPT™ is built on, and the review experience, orchestration, and integrations all have to be built from scratch. Bevaya Labs delivers seven years of focused research, 300M+ documents of training data, and an operating platform — today.
GET STARTED
See the research running on your documents.
Every model the Bevaya Labs team builds goes into production. Let's connect so we can show you how InsurGPT™ will work on your actual documents — with the confidence scores, source grounding, and benchmark data your team needs to evaluate it.


