OCR Receipt Scanner Case Study: 97.78% Confidence, One Session

The problem

The side project: a web app where users photograph receipts and get cashback rewards for qualifying purchases. The feature: dual photo upload. One photo of the receipt. One photo of the purchased product. Both required. Both processed. Cashback triggered when both are verified.

The scope was real: database schema changes, backend API updates, a new frontend UI, and an OCR pipeline that could distinguish between receipt text and product packaging text. All of it in a single feature build.

The approach: three phases, verified in sequence

I structure AI-built features in phases rather than "build the whole thing." Each phase has a clear output, and you verify it before moving to the next.

Phase 1: Database schema. New tables and columns to store two images per submission instead of one. Relationships, constraints, migration scripts. The agent handled the schema design. I reviewed the migration.

Phase 2: Backend API. Endpoints updated to accept two files. Storage logic for both images. OCR pipeline wired to the receipt image specifically, not both. Confidence scoring persisted to the database.

Phase 3: Frontend UI. Two upload zones. Preview states for both images. Progress feedback during upload. Error handling when one image fails validation. The full user flow from "take two photos" to "cashback submitted."

Each phase worked before the next began. Standard practice, but important: it is much easier to debug Phase 2 when you know Phase 1 is correct.

Five critical bugs

None of them would have been caught by unit tests with synthetic data.

Bug 1: Image decompression (appeared three times). The Vision API requires raw image bytes. The upload pipeline compressed images with gzip before storing them. The OCR call was receiving compressed bytes, producing garbage output. This sounds simple. It appeared three times because there were three separate code paths that processed images. Each had the same bug independently. Small test images do not compress significantly, so tests passed. Real camera photos at full resolution, properly compressed, produced complete failures.

Bug 2: OCR aggregation scope. Product photos were being fed to the OCR pipeline alongside receipt photos. The model was trying to read text from a photo of a shampoo bottle and mixing it into the receipt data. The fix was a single scope filter. Finding it required tracing through the pipeline to understand where the "process all images" assumption had been made.

Bug 3: FormData field ordering. The backend expected receipt photo first, product photo second. The frontend was constructing FormData in a way that did not guarantee order. On most desktop browsers, the order happened to be correct. On mobile browsers with specific image sources, it flipped. The cashback submission would succeed but process the product photo as the receipt. This is exactly the kind of bug that survives QA: it works in testing because you test from a desktop with consistent behavior.

Bugs 4 and 5 were validation edge cases: specific image dimensions and file size combinations that triggered an error path the frontend was not handling, causing silent failures with no user feedback.

The outcome

After all five fixes, I tested with a real receipt photograph: the kind of wrinkled, slightly overexposed, taken-at-an-angle photo that actual users submit.

OCR confidence: 97.78%.

Not a test image. Not a PDF. An actual receipt, photographed with a phone, under kitchen lighting. Text extraction accurate. Line items matched. Total correct.

The full session also covered GDPR legal pages, a user feedback survey design, and the correct GDPR distinction between "data processor" and "data controller." One session, one feature, five bugs found, three adjacent tasks done. That is the breadth of what a single agentic session can cover when the pipeline is working.

What this means for your team

The lesson from this case study is not "AI builds features fast." The lesson is about the gap between "it works" and "it works for real users."

AI builds fast. Fast means you reach that gap faster. The bugs do not disappear. They arrive sooner. Your job shifts from writing code to specifying scope precisely and testing with real data, real devices, real user inputs, before you ship.

The team that adopts agentic engineering without changing its testing discipline will ship faster and break more. The team that changes both will compound.

Test with real data as early as possible. Every shortcut there shows up as a production incident. The agent builds what you specify. Specify real conditions.

OCR Receipt Scanner: 97.78% Confidence, Three Phases, One Session

The problem

The approach: three phases, verified in sequence

Five critical bugs

The outcome

What this means for your team

Want to build features this way?