Extract data from Image, using Claude 3.5 Sonnet

What do I want to do?

In this demo, the objective is to extract the total amount (after discount, and after taxes) from a receipt image.

Some characteristics of these receipts:

There isn't a fixed format
It can be in different languages (English, S. Chinese, T. Chinese, Malay)
It can have more than one language
It can be an image from a camera snapshot, or an electronic file sent via email

How do I plan to do it?

I am using the Anthropic message API and Claude-3-5-sonnet-20240620 model, with a simple system prompt below:

Extract the total amount from the image. It should be a number, e.g. 100.50, usually next to the word 'total', 'total amount', 'grand total'. It can be in any languages. The currency symbol is RM

Test Results

Test #1: An A4 size receipt, itemized in table format.

Result: ✅ Success

Test #2: A 58mm thermal receipt paper, captured by phone camera. The text density is very high, font size is relatively small.

Result: ✅ Success

Test #3 - A landscape A4 paper, captured by a phone camera, some part of the image is malformed.

Result: ✅ Success

It seems the total amount is successfully identified in all 3 tests. Good job Claude 3.5!

Cost? And some improvements

The input tokens spent per attempt is about 1500-2000, which is probably equivalent to $0.005. Can we reduce it?

I tried to trim the white space around the image, but they are not useful. And I also resize the image to around 500px before sending it into the API.

The input token is kept below 1000 and the success rate is still 100%.

That's pretty awesome. :)

AI Summary

gpt-4o-2024-05-13 2024-07-16 00:40:12

The blog post demonstrates extracting total amounts from receipt images using Claude 3.5 Sonnet and the Anthropic message API. The process is successful across various receipt formats and languages, with optimization techniques reducing token usage and maintaining a high success rate.

Chrome On-device AI 2025-07-28 00:54:40

Share this Post