This blog is a multi-part series covering OCR using Bedrock and Claude 3 Haiku. In part 1, we will explore a proof-of-concept OCR project that utilises the vision capabilities of Anthropic’s Claude 3 Haiku to extract key information from invoices.
History
Back in March 2024, Anthropic announced the release of its next-generation AI model, Claude 3, which included three advanced models optimised for different use cases: Haiku, Sonnet and Opus. These models offered improved performance, accuracy and reliability, making them suitable for various applications, including enterprise use cases.
Quick Overview of Claude 3 model family
The Claude 3 models have been trained on a diverse range of data formats, including language, images, charts and diagrams, allowing them to understand and generate multimedia content. This capability enabled businesses to build generative AI applications that can integrate diverse data sources and solve complex, cross-domain problems.
The Claude 3 model family offers several features:
- Haiku: Fastest and most cost-effective model, ideal for near-instant responsiveness
- Sonnet: 2x faster than Claude 2 and 2.1, with higher intelligence and ideal balance between intelligence and speed
- Opus: Most advanced model, with deep reasoning, advanced math and coding abilities, and top-level performance on complex tasks
- Vision capabilities: Understands structured and unstructured data across different formats, including language, images, charts and diagrams
Claude 3 models are easily accessible to customers via Amazon Bedrock to build scalable, secure and personalised AI applications.
OCR Use Case
Since the introduction of Large Language Models (LLMs) to the mainstream, Optical Character Recognition (OCR) technology has made significant strides recently. This is where the Claude 3 family model shines the most as they can interpret complex visual data.
Let us have a look at our sample invoice document saved as an image (*.jpg or *.png):
Looking at the image above, we want to build an automated process to extract key information from our document such as:
- INVOICE_NUMBER
- CUSTOMER_NUMBER
- CUSTOMER_NAME
- VENDOR_NAME
- INVOICE_AMOUNT
- INVOICE_DATE
- VENDOR_ADDRESS.
A solution can be implemented as follows:
import boto3 |
First, we establish a connection to AWS Bedrock via the python library called litellm and set up session configuration. Then we have a function to encode raw image into binary form.
Next, we create the prompt template to be sent to Claude 3 Haiku.
image_path = “./images/26_png_jpg.rf.2318e3f83a44413b5b855925e571a538.jpg” |
Let’s make some key observations from our prompt:
- We have specified the fields we want to extract.
- We have specified the format we expect for some the keys (e.g. INVOICE_DATE).
- We have enforced outputs to be in JSON format.
- We have instructed the model to produce UNK in case it cannot return the requested information from the invoice.
Running our code will produce:
pp = pprint.PrettyPrinter(indent=4) |
The next step will be to extract the JSON shaped output from the result above and for that we might create a customed function just for that:
def extract_json_from_string(input_string): |
Comparing our results to the actual image with highlighted fields:
Claude 3 was able to extract the relevant fields and to tell us where the information was ambiguous.
How Much Does This Cost?
For a very long time, Amazon Textract was considered the defacto option for anything OCR. Using machine learning, it can extract text, handwriting, layout elements and data from scanned documents. It goes beyond simple optical character recognition to identify, understand and extract specific data from documents.
One of the main challenges with Amazon Textract is the cost of using the service. So, for that matter, let us evaluate an estimated cost of our solution with Textract versus Claude 3 Haiku:
Bedrock Claude 3 Haiku (pricing)
cost = 106/1000 * $0.00025 + 14/1000 * $0.00125 = $0.000044
If we have 1000 invoices our total cost will be $0.044
Amazon Textract (pricing)
cost = Price for page with table and queries ($0.020)
If we have 1000 invoices our total cost will be $20
Closing Remarks
We are still in the early days of generative AI, but we can already see the infinite number of opportunities from using those foundational models, and Claude 3 Haiku is giving us this sentiment. Strong collaboration and a focus on innovation across industries will usher in a new era of generative AI. We cannot wait to see what customers build next.
In part 2 of this series, we will explore a direct and deeper comparison between building an OCR solution using Claude 3 Haiku and Amazon Textract.