Sunday, 25 January 2009

Information Capture - EMC CAPTIVA 6

EMC Captiva InputAccel
Intelligent Document and Data Capture for the Enterprise

EMC Captiva InputAccel 6 enables you to capture information from most paper or electronic sources, transform it into digital content, and deliver it into back-end systems. By helping your business reduce or completely eliminate manual data entry, InputAccel minimizes processing errors, improves data accuracy, and boosts productivity.
With InputAccel, you also prepare your business for future growth with server configurations that scale to large enterprise installations with multiple servers. These high-availability configurations ensure that if one of your servers goes offline, others will continue to operate, protecting your work in progress and eliminating downtime.


Features

* Distributed electronic capture
* Image enhancement
* Automatic document classification
* Validation
* Back-end integration
* Web Services that enables SOA Architectures

Benefits

* Capture structured, unstructured, and semi-structured documents from any scanner, fax machine, or file system within the enterprise.
* Despeckle and deskew images, and adjust page orientation with InputAccel.
* Identify scanned documents so you can prioritize, configure data capture processes, and route documents based on user-defined rules.
* Set user-defined business rules and compare against other data sources to ensure accuracy.
* Deliver validated data (XML, PDF, JPG, TIFF, and ASCII) into back-end content and business process management systems.

Wednesday, 5 November 2008

BARCODE RECOGNITION FOR FREE

BARCODE RECOGNITION FOR FREE

Often it is required to just digitalize paper documents and extract the barcode inside...

In this case a data capture system can organize the workflow expecially if there are a lot of people involved in the process, sometimes there is a solution for free..

A lot of scanners embedd some barcode recognition feature so during the paper documents digitalization the can store in txt/csv files the information inside the barcode with the image path.

In other cases it is possible to store the barcode information as the image filename (with a progressive sequence at the end, we could have the same data into the barcode) .

So sometimes it is not required to buy a OCR/ Barcode recognition engine it is better to spend time to choose the right scanner.

Fujitsu, Canon, Kodak are between the best scanners brands.

Tuesday, 4 November 2008

DATA CAPTURE SYSTEM

DATA CAPTURE SYSTEM

In the picture below a generic data capture system (for OCR, ICR, OMR, Barcode Recognition).




As we can see a generic system is based of:

Scanning module, where there is the scanning of paper documents;

Classification Module, to classify the right template;

Recognition Module (OCR, ICR, OMR, Barcode Recognition) to extract the data from the images.

Validation Module: to validate the data extracted

Release Module, to write the information extracted and the images in txt files+ filesystem, ECM systems, ERP, etc..

Friday, 24 October 2008

tesseract ocr is an OCR Engine

Tesseract is an OCR Engine that was developed at HP Labs.
They started in 1985 but the project ended in 1995.
Google acquired the tecnology and now it is alive again!

Tuesday, 14 October 2008

EMC Captiva

Data capture system.

Scanning
Classification
Extraction
Release of the extracted data in databases, enterprise applications.

TIS

Top Image System, Israel company, client server platfor that reliws on .NET. Integration with SAP.

Autonomy Cardiff Teleform

Data Capture product. Client server architecture for automatic ocr extraction.