Still that QA is needed to ensure validity of outputed data is right interpreted. We can still se that Yes GPT on its latest model is totally able to interpret whole image, and without giving him context our first iteration already gives us an advanced logic of the context.
Existing OCR apps :
[Mindee](https://www.mindee.com/product/financial-document-ocr-api?utm_medium=paid&utm_source=google&utm_campaign=ACQ_Brand&_gl=1*gq1l5d*_up*MQ.._gsMQ..&gclid=CjwKCAjwk7DFBhBAEiwAeYbJsZYPC7F8OdTFal34CbmkkS6tpSRzTiCg-mT9h3TVDbFh-kQWjFpGuxoC6lYQAvD_BwE&gbraid=0AAAAAC05TmpCcQSuEbrESlKkBYsPZCPtW) looks like a performing actor
A bit out of scope we have ABBYY, doing something in the kind at really large scale
Klippa looks strong too
+1 for nanonets cause we can even train the model
Overall developing our own means Google Document AI (custom processor) or AWS Textract (forms/tables + handwriting). When the image has been translated to understandable data (txt). We can then use of perfectly designed AI agent to interpret that to our format. (format is to be defined).


{
"line_no": 10, // integer position on the BL (Poste)
"article_code": "938222", // supplier SKU (Article / Code)
"description": "SUPPORT EVACUATION 1200 + 3 CONSOLES GC", // (Désignation)
"uom": "PCE", // unit (Unité) e.g. PCE, M
"ordered_qty": 2.0, // Quantité commandée
"delivered_qty": 2.0, // Quantité livrée
"to_deliver_qty": 0.0, // Reste à livrer
"weight": { "value": 2.0, "uom": "KG" }, // Poids (when shown per line)
"batch_lot": null, // N° Lot (SATEBA docs) if present
"serial_number": null, // if present
"expiry_date": null, // if present
"package_ids": [], // SSCC/pallet/case labels if present
"dimensions_cm": { "l": null, "w": null, "h": null }, // if present
"notes": null, // free-text notes/comments per line
"source": { "bl_number": "81964752", "page": 1 } // traceability to the BL
}
If a BL shows no per-line weight, I leave weight null and use totals at document level.