Integrating OCR and NLP in Claims Processing

Prasanna Hari
Sep 18, 2025
9 min read

Introduction

The purpose of this paper is to examine how life insurance claims can be automated. There are certain limitations on how these claims are being processed currently. The specific limitation is the models being used to process these claims. Currently, OCR (Optical Character Recognition) and NLP (Natural Language Processing) are used as separate tools and processes. This is the cause of an enormous hinderance on claims processing. The proposed model being discussed is a unified OCR and NLP model which would revolutionize how these claims are going to be processed. We will go in depth on various models that can be used and the pros and cons of these models.

Life insurance claims have been notoriously slow and prone to a lot of errors due to the over-reliance of manual systems instead of a fully automated system. A lot of companies report facing “challenges in recruiting and retaining experienced staff, particularly claim assessors.” (1) The reason is because a life insurance company can require a lot supporting documents such as death certificates and employer verifications. These documents can be submitted by mail or PDF upload and are then reviewed by claims examiners. These documents can have a lot of missing information and could be prone to errors and delays. This makes processing claims very time-consuming because these claims examiners must follow up a lot and there could be delays in payout. OCR and NLP tools are used to automate this process currently, but they are used separately and are quite complex to pair together. This leads to inefficiencies, integration issues, and limited scalability. However, there are newer unified AI Models that can address these issues by consolidating the workflow into one system. Some notable systems that will be talked about are the Donut and LayoutLMv3.

Problem Context and Challenges

Life insurance companies have a unique type of complexity in how they process their data. They generally have a lot of unstructured supporting documents. These documents are submitted by third parties a lot of times. These documents include death certificates and employer verification letters. These documents come in various formats. This includes scanned images, handwritten statements, PDFs, and even mobile pictures. These documents are usually prone to inconsistencies and quality issues. Because of this, these must be reviewed manually by a claim examiner, which comes with a whole array of problems.

These steps include a lot of various pain points. A very notable pain point is the long processing cycles. According to many insurance companies, certain insurance firms take up to “four months to process a claim.” This is likely due to the manual review, which can take a lot of time and effort. Moreover, a lot of claim examiners overlook key information, which can lead to delays in payout or could even lead to incorrect claim outcomes. This is also because of the shortage of claims assessors. Because of these difficulties, an OCR and NLP system are used to process these claims. However, these systems are not connected to each other. What usually needs to happen is that the document is processed via an OCR engine into text and then processed using an NLP model. These tools are not integrated very well, and they require custom code. This type of complexity can limit reliability and how far automation can scale.

AI Models used Currently

As discussed before, the current AI models being used are a combination of OCR and NLP models. Life insurance companies are not using a unified model to help them with this. These tools are deployed in a fragmented system which can limit their potential. The models explored below are what insurance companies are using currently and their limitations.

Optical Character Recognition (OCR) is the process of converting a scanned document into text which can be easily read by a machine. In life insurance claims processing, OCR is used to convert handwritten documents or printed documents into something that the machine can read. Once it is converted, the machine can then send this to the NLP model to be used for analysis. There are several models that life insurance companies use. This includes the Amazon Textract model, which is the OCR model used by a lot of life insurance companies. A notable company which uses this model is RGA (Reinsurance Group of America). They announced the launch of their “FCA Optimization solution featuring Amazon Textract” (2) back in 2022. There are a lot of notable problems with Amazon Textract. The main one being that OCR tools struggle with complex layouts. OCR tools are only known for extracting languages in a sequential format, which means that if for example, words were in a table, it would interpret them incorrectly since they would only be extracted sequentially.

Natural Language Processing (NLP) is the process of making sense of the words being fed to the system. This generally involves a machine learning model to interpret and extract meaning from words. In the context of insurance claims processing, NLP is generally used after OCR to extract data from text, identify document types, and flag missing or inconsistent data points. Most of these techniques are powered by the BERT model, which is a transformer-based NLP model. These NLP systems have limitations which are bounded by the OCR model. An NLP model can only give quality information if the information being read by the OCR model is high quality as well. This is where companies run into quite a bit of issues.

The combination of these models is used quite frequently in insurance companies. The combination of these models is usually used very sequentially in the pipeline. The limitation is that errors that are made in earlier stages are made worse throughout the lifecycle of these models. For example, if the OCR engine reads something incorrectly, the NLP model may extract something incorrectly as well. Another challenge is how complex integration between these two models can be. The integration includes APIs and microservices being passed between models which can introduce a lot of technical debt and may slow down error handling. A more unified model is often required to combat these problems.

Future Unified OCR and NLP Model

Due to these limitations of the two-step models, people decided to create a unified OCR and NLP pipeline. These models effectively combine the OCR and NLP models, which reduces the number of problems that are found in these models. These models are trained to process an entire document image. This captures both the visual structure and the content of the documents. Through this, the system can better understand the context of the documents being presented and extract better information. The two most notable models are the Donut (Document Understanding Transformer) model and the LayoutLMv3 model.

Donut is an OCR-free model which processes documents end-to-end. The Donut model accepts images as input and generates content directly, unlike other OCR systems. This model does not require text detection or recognition. Instead, the system uses a vision and text encoder which are trained jointly together. The visual encoder is used for “extracting features from a given document” (3) and the textual decoder is used to “construct a desired structured format (e.g. JSON).” The results of this are quite astonishing. The time per image went down from 1.7 seconds to 1.2 seconds and accuracy increased by 9 percentage points, achieving 91% accuracy. This Donut model can be trained and fine-tuned on a dataset for life insurance claim forms and certificates. The model can learn to extract key information that would need to be used without the need of an OCR model.

LayoutLMv3 is another model like the Donut model which can be used to interpret text of documents. Unlike the two-step OCR and NLP model, LayoutLMv3 uses the spacial arrangement of text to draw conclusions about what the document is saying. Results show that “LayoutLMv3 achieves state-of-the-art performance” (4) in text-centric tasks and image-centric tasks. This model can handle complex handwritten and printed text with higher accuracy than a standalone OCR tool. This model would excel at processing PDFs or scanned documents submitted by customers and extracting key fields while also understanding what these fields are saying.

These models are incredibly good for life insurance because they reduce complexity, improve accuracy, and are much faster than the two step approach. These models are not widely used by insurance companies due to their novelty and infrastructure needs. However, these unified models will become increasingly used as companies realize how much time could be saved by using these models.

Benefits of using these models

One primary benefit to using these models is that there is a reduction in time spent on the maintenance of these systems. Since the system only has one unified model instead of multiple tools, the time spent on maintenance reduces significantly. NAVER Clova, the company who created Donut, did an internal case study and they reported companies having a lower maintenance time when they shifted to an end-to-end model. The deployment time also went down significantly as well since they did not have to deal with custom rules and integration.

Another benefit is improved accuracy and reliability. A key weakness in the two-step system is error compounding. Fow example, if the OCR system misreads some data, that error will persist even in the NLP logic which is very inefficient. Donut massively outperformed OCR and NLP models. For example, Donut achieved a “94.1% F1 score” (3) while other models only received a score between 84%-88%. Moreover, LayoutLMv3 outperforms its prior version, LayoutLMv2, which uses the previously talked about two-step OCR and NLP architecture.

Another great benefit is that unified models such as Donut and LayoutLMv3 are built on foundation models. Foundation models are pre-trained AI models trained on large datasets which can be finished and adapted to a specific task. This is the same architecture being used for GPT-4 and Claude. This type of model is easily adaptable to other types of use cases that may pop up in the future. Since these models are pretrained, they do not need a lot of data to train on. They would only need a couple thousand documents to train instead of a lot of documents.

Challenges of Unified System

The first obvious challenge is the high initial investment needed by these insurance companies. Most insurance companies would need to invest a lot of money in hardware costs and maybe more software engineers to fine-tune a unified model. For example, these unified models require a high-performance GPU for training and fine-tuning these models. These companies would have to shift to a GPU-based platform, which could be very costly and time-consuming. A lot of insurance companies also do not have MLOps required to deal with a pipeline such as the one needed for this.

Another challenge is the available training data. These models need to be trained on life-insurance-specific documents such as death certificates and affidavits. Due to various HIPAA Laws, there is a problem with training these models on real data and claims documents. This would require companies to come up with their own documents to train the model. The creation of these documents can be very expensive and time-consuming. On top of that, manual labeling would also be incredibly expensive.

Sedgwick Case Study

As mentioned previously, there are not a lot of companies using the unified model for insurance claims processing. However, there is a notable company using a generative AI tool for claims summarization. This generative AI program is able to sift through documents up to 30 pages long and provide a summary withing minutes. The GenAI tool is called sidekick, and it is reported that this tool has a “98%-plus accuracy rate” (5) in document summarizations. Sidekick was mainly used for workers’ compensation claims and was very good at it. Much like Donut and LayoutLMv3, Sidekick is trained on a transformer-based model and is providing Sedgwick massive returns on investment. They report that the model will only get better as it’s being used.

Future Outlook

It is predicted that in the future more and more companies will start to use AI for specific tasks such as insurance claims by 2027. In fact, Gartner predicts that most companies will implement “small, task-specific AI Models” by 2027. This suggests that companies not willing to adopt AI models will fall behind. With the recent advancements in the unified AI model, this is the perfect time for companies to start adopting the new AI model.

Moreover, there is going to be an expansion beyond claims processing. Insurance companies are starting to use AI for things such as underwriting, fraud detection, and customer service. These use cases will further amplify the return on investment when using unified AI systems. Insurance companies already willing to start incorporating AI into their systems will have a massive leg up compared to some of the other companies.

Conclusion

In conclusion, life insurance claims processing is quite complex and time-consuming. This is due to the highly manual review that needs to be done. To combat this, many companies use OCR and NLP tools to somewhat automate this system. However, there are many problems with this including system complexity and accuracy. To combat these problems, a unified OCR and NLP model can be used. Examples of these are Donut and LayoutLMv3, which consolidates OCR and NLP tools into one end-to-end system. These models are not used very widely due to how new these models are, but using these models provides a wide variety of benefits, such as faster processing times and simpler architecture being used. To implement these models, companies will have to address various challenges such as the initial investment in these models and creating training data to feed these models.

References

Integrating OCR and NLP in Claims Processing

Recent Posts

Comments