Tesseract veraPDF Ghostscript ClamAV

Upload a document in JPG, PNG or PDF format and extract the text it contains or convert it to a PDF/A. Validate a PDF/A. Scan a PDF searching for potential threats. Configure how a group of images is preprocessed for the OCR (removing table borders, cropping a ticket, contrast, brightness, resizing), name this set of parameters and activate it on demand in interactive mode or by program. Ask us to add a specific postprocessing of the text extracted from your documents (clear text from a PDF or read from images by OCR) to verify the result or obtain formatted data output in a JSON or XML file which you will be able to feed directly to another service.

All functionalities are available for free in the interface of your personal space or by program as a paid service through a simple REST API. See the User's Guide. All communications are encrypted. Your files are inaccessible to others and automatically deleted passed a certain delay.

Tesseract is an open-source optical character recognition engine sponsored by Google since 2006.

The PDF/A is an ISO-standardized version of the PDF format specialized for use in the archiving and preservation of electronic documents.

The veraPDF consortium, led by the Open Preservation Foundation and the PDF Association, was created in response to the EU Commission's PREFORMA challenge to develop an open-source validator for the PDF/A format.

Ghostscript is a suite of software for processing Postscript and PDF files.

ClamAV is a free antivirus.