Insurance AI Benchmarks: Accuracy Results

Bevaya Benchmark Results

Claims extraction

Field-level accuracy on common claims data extraction. Bevaya's fine-tuned model is shown with and without a 0.9 confidence threshold.

At a 0.9 confidence threshold, Bevaya reaches 98–100% accuracy across every claims field tested.

Field	GPT-4% Accuracy	Mistral 7B PE% Accuracy	Bevaya FTNo threshold	Bevaya FTThreshold > 0.9
Claim Number	52	32	68	98
Claimant Name	98	88	99	100
Date of Report	87	72	92	98
Date of Service	78	57	76	98

Last updated: December 2025

Underwriting extraction

Overall accuracy across underwriting field extracts, compared with state-of-the-art frontier models.

Bevaya delivers 93% overall accuracy on underwriting field extracts — nine points above the next-best model.

Field	GPT-4.1% Accuracy	GPT-5.0% Accuracy	Gemini 3.0 Pro% Accuracy	Bevaya% Accuracy
Overall Accuracy	81	80	84	93

Last updated: December 2025

Loss run extraction

Field-level accuracy across loss run extraction, organized by data category.

Bevaya leads on every claim-level field — including a 78% score on the notoriously hard "Type" field, where frontier models fall to 26–36%.

Field	GPT-4.1% Accuracy	GPT-5.0% Accuracy	Gemini 3.0 Pro% Accuracy	Bevaya% Accuracy
Accident Description	88	53	81	94
Accident State	90	53	83	92
Allocated Expense Reserves	85	60	91	99
Allocated Expenses Paid	88	57	86	99
Carrier	75	56	74	93
Claim Number	83	54	76	93
Claim Reported Date	89	59	86	95
Claimant Closed Date	93	61	90	99
Claimant Name	88	58	84	97
Date of Incident	88	58	85	97
Indemnity Paid	66	53	81	98
Indemnity Reserves	80	53	84	98
Line of Business	91	60	86	98
Medical Reserves	86	60	91	98
Nature of Injury	53	40	72	90
Paid Medical	90	59	89	97
Policy Number	84	53	81	95
Policy Year	81	53	74	87
Recoveries	97	61	91	99
Status	70	46	69	96
Total Incurred	73	48	72	88
Total Paid	74	52	73	88
Total Reserve	69	54	83	93
Type	36	26	32	78

Last updated: December 2025

Bevaya leads or ties on most policy-level fields — including Carrier, Experience Mod, Policy Number, Recoveries, and Total Reserve.

Field	GPT-4.1% Accuracy	GPT-5.0% Accuracy	Gemini 3.0 Pro% Accuracy	Bevaya% Accuracy
Allocated Expense Reserves	81	90	89	90
Allocated Expenses Paid	80	89	87	89
Carrier	73	72	75	90
Experience Mod	88	90	90	93
Expiration Date	72	65	74	81
Inception Date	81	74	82	82
Indemnity Paid	77	88	84	88
Indemnity Reserves	81	88	87	89
Line of Business	76	84	81	80
Medical Reserves	86	90	88	91
Named Insured	90	92	91	93
Paid Medical	84	87	84	89
Policy Number	72	77	81	86
Policy Year	86	86	88	92
Recoveries	86	91	89	92
Total Claims	84	88	83	87
Total Closed Claims	80	85	68	89
Total Incurred	75	78	79	83
Total Open Claims	89	92	85	91
Total Paid	69	72	73	76
Total Reserve	64	81	76	82
Validation Date	78	63	84	84

Last updated: December 2025

Bevaya delivers 100% accuracy on Recoveries and Allocated Expense Reserves, with double-digit gains over frontier models on Indemnity Paid, Reserves, and Total Reserve.

Field	GPT-4.1% Accuracy	DeepSeek R1% Accuracy	Gemini 2.5 Flash% Accuracy	Bevaya% Accuracy
Indemnity Paid	65	83	74	95
Total Incurred	66	52	71	74
Recoveries	52	64	71	100
Paid Medical	44	74	74	86
Medical Reserves	44	59	64	88
Indemnity Reserves	63	70	74	98
Total Paid	67	85	93	78
Total Reserve	59	69	81	97
Allocated Expenses Paid	86	84	93	93
Allocated Expense Reserves	76	55	100	100

Last updated: July 2025

Bevaya hits 100% on Line of Business, Carrier, and Status, and 99% on Policy Number and Policy Year — the fields that anchor every claim record.

Field	GPT-4.1% Accuracy	DeepSeek R1% Accuracy	Gemini 2.5 Flash% Accuracy	Bevaya% Accuracy
Claimant Name	80	90	94	94
Date of Incident	80	89	94	94
Claim Reported Date	83	89	82	93
Claim Number	45	43	61	94
Line of Business	76	95	95	100
Carrier	75	88	93	100
Policy Number	65	81	91	99
Policy Year	78	94	99	99
Status	57	88	68	99
Accident State	88	90	98	96
Claimant Closed Date	72	89	85	94
Accident Description	54	60	77	78

Last updated: July 2025

Loss run benchmarks are refreshed as new frontier models are released. December 2025 results reflect testing against GPT-4.1, GPT-5.0, and Gemini 3.0 Pro. July 2025 results reflect testing against GPT-4.1, DeepSeek R1, and Gemini 2.5 Flash.

The Platform →

Build

Run

Review

Govern

Underwriting Automation →

Claims Automation →

Policy Servicing →

AI Agent Library →

Technology

Accuracy

Trust

Stories

Who It's For

Why Bevaya

Featured Case Study →

Learn

Research

Updates

Featured Report →

About

News & Trust

Connect

QUANTIFYING VALUE

Benchmark Results

Built by Insurance Experts, for Insurance Experts

Roots Outperforms General Knowledge LLMs

98%+ Accuracy Guaranteed

Bevaya Benchmark Results

Claims extraction

Underwriting extraction

Loss run extraction