The Changing Face of Document Capture: Why Cognitive Document Capture is Smarter than OCR?Trivender Singh
OCR to Cognitive Document Capture
Information capture has advanced a long way since the advent of Optical Character Recognition (OCR) and Digital Repositories (EDMS Electronic Document Management Systems). Modern business applications require pre-validated and structured data as input so it is readily available for use in decision making and driving business processes. The majority of business applications store data in pre-defined schemas in a relational database, hence they all prefer structured data.
However, a large amount of information received by organisations is unstructured and applications like ERP (Enterprise Resource Planning), CRM (Customer Relationship Management), BI (Business Intelligence), RPA (Robotic Process Automation) need structured data. Organisations are facing the challenge of acquiring, streamlining, and adding structure to these unstructured pieces of information before feeding it to these line of business applications.
In order to structure the data, it’s necessary to interpret and understand incoming communication messages which can arrive via disparate channels like email, post or other electronic channels in variety of formats like paper, images, and PDF/DOC/XLS attachments. It is imperative for organisations to identify the purpose of the communication swiftly, identify where to channel it to, and what actions to take. Data hidden in these communications can have critical business information buried within which is key for making business decisions. Swift actions can help organisations provide better customer service, gain a competitive edge, and save costs.
In this blog we discuss how Cognitive Capture technologies provide an edge over traditional OCR tools, how it helps to enhance and simplify data extraction from documents and how this data can be utilised for decision making by downstream business processes.
What is Cognitive Capture? What is Cognitive Document Automation?
In thinking about the definition of Cognitive Capture, let us first look at the meaning of the word ‘cognitive’. According to the Oxford Dictionary, ‘cognitive’ means “connected with mental processes of understanding”. The word ‘cognitive’ is derived from ‘cognition’ which means “the mental action or process of acquiring knowledge and understanding through thought, experience, and senses.”
Cognitive Capture technologies aim to mirror human behaviour when analysing pieces of data. These technologies utilise Artificial Intelligence, Machine Learning and Neural Networks to classify documents and extract meaningful information unlike traditional OCR tools which predominantly focus on reading character streams from images. Cognitive Capture takes into account the context of information, document layout, spatial context and data presentation along with other attributes of a document to extract information. These technologies can not only detect characters in a document but can also detect structure and clean and optimise the image before verifying the accuracy of the extracted data. In addition, external data sets can be used to enhance and verify data to improve extraction rates through a process of correlation and matching.
For example, in the case of bank statements, Cognitive Capture tools can extract account numbers, statement date, transaction line items, reference numbers, customer name, address, totals etc. Data can be recognised from a variety of different statement types, with carrying length and layout, and details from the statement can be matched to existing customer records in internal systems.
Cognitive Capture technologies recognise information through trained machine learning models rather than taking a template-based approach. Template based systems are ineffective in dealing with changing document formats and are unable to cope with new document types which the system has never seen before.
Plain OCR i.e. just reading characters and numbers from a scanned image is relatively straightforward and this challenge has been solved long ago. The first widespread use of OCR technology in businesses was to read printed numbers from cheques using monospace fonts (a fixed-pitch, fixed-width, or non-proportional font, a monospace font is a font whose letters and characters each occupy the same amount of horizontal space). OCR technologies played a big role in reading scanned records in content repositories to facilitate search and retrieval of records, but can’t solve the problem of structured data extraction from unstructured pieces of information residing in these documents.
People often use OCR and Cognitive Capture in the same context. Cognitive Capture does rely upon good quality OCR, but Cognitive Capture technologies go much further than traditional OCR. Ultimately the combination of the two elements facilitates Cognitive OCR which significantly enhances the uses cases, practicality, and opportunity for automation.
Before we delve further into Cognitive Capture, let us look at the different type of documents businesses receive on a day-to-day basis and how cognitive document processing technologies can be used.
Structured documents always have a fixed layout with a fixed data schema and the data items are always presented in the same location. For example, passport application forms, surveys, direct debit mandates, questionnaires, claim forms and bank account opening form.
Cognitive Capture technologies can automatically detect form layout and auto create zones to process check boxes, signature fields, text areas etc. by analysing a set of sample documents. Automatic background removal algorithms can clean up the image prior to registering zones for higher extraction accuracy. Technologies are now sufficiently mature to cater for forms printed using different printers, remove any image distortion due to imperfect scans and deskew images to accurately register zones. Different OCR engines can be used for different zones and results can be evaluated and voted to use the best results with the highest confidence. Using different OCR engines helps to target the most suitable one for a piece of information as some engines may be more suited for processing handwritten content over machine printed text, or more tuned for the data found on specific documents, e.g. cheques.
Semi-structured documents have a predefined data schema, but the structure and layout of the document can differ. It is known what information should be in the document, but it is unknown how the information is laid out. Semi-Structured documents are most likely generated from the contents of a structured database in the first place, but this structure is lost as documents are shared between organisations. Also, not all information as per the schema will always be present on the document and the same data fields could be referred to differently across different documents. For example, dates could be presented in different ways, amounts could vary in format with different currency symbols, and a field may be called an “Account Number” on one document and a “Customer Number” on another. Semi-structured documents are commonly found in organisations as invoices from suppliers, sales orders from customers, and remittances as receipt of payments. Cognitive Capture can be used to reduce the effort to re-key an invoice, speed up sales order processing through customer order automation, and automate cash allocation.
Unstructured documents are free-form documents which do not adhere to any structure in terms of their layout and content and most likely it is not apparent what information will be available on the document. Typical examples include letters, contracts, articles, memos, and emails. A lot of meaningful business information is buried in unstructured documents and cognitive technologies like NLP (Natural Language Processing) and Sentiment Analysis can help in extracting this information for use in business applications.
Key Capabilities of Cognitive Capture Solutions
Image Clean-Up and Optimisation
Cognitive Capture can apply advanced image optimisation techniques like deskew, despeckle, character sharpening and auto-orientation to pre-process document images. Image optimisation came about as organisations started digitising large volumes of documents entering business processes and was conducted on documents in real-time during scanning. Today, whilst still important for the digitising process it can also optimise images received as email attachments or uploaded using mobile phone cameras.
Image optimisation ensures a good quality image is passed to the Cognitive Capture workflow and is imperative to ensure optimal automation downstream as it will improve recognition rates.
Classification is important in Cognitive Capture workflows as determining the correct document category will enable the extraction of relevant data from the document. Cognitive Capture technologies can use layout or content-based classification or a mix of both approaches to classify documents. Layout based classification is more effective for document types that have a fixed distinct layout i.e., structured documents. Content based classification is more suitable for semi-structured and unstructured documents.
Cognitive Capture engines can be trained using sample images to automatically use document layout and content to prepare classification models. Image clustering can be used to automatically label documents ready for use in creating classification training models. These methodologies greatly enhance the speed at which the technology can be deployed and its ability to automate classification.
Data Extraction – Cognitive Data Capture
Cognitive Capture technologies enable data extraction from semi structured documents by using a mix of machine learning and a rule-based approach to extract meaningful information. For example: amounts, dates, line items, tax breakdowns from invoices and sales orders, contract numbers, expiry dates, validity dates, from contract and insurance documents.
Systems can identify the information surrounding the data field (like dates, amounts and purchase order numbers on invoices) and automatically build and update a machine learning model. Table extraction support can identify table headers and recognise start and end of tables spanning across multiple pages. Pre-trained machine learning models can be used for extracting invoice and sales order information.
Fuzzy database matching against existing datasets is an immensely powerful tool for extracting accurate data from documents. For example, supplier identification can be performed by matching data from the invoice against information held in the supplier table in the ERP system. This ensures variation in supplier names like Ltd, Limited, PLC, LLP etc. are accounted for and ensure clean and pre-validated supplier information is passed to the ERP system.
In addition, it is not only human readable information that can be extracted but also data from Barcodes and QR codes can be read and used to validate or enrich extracted data. Cognitive Capture systems can allow additional rules to be built on top of the machine learning model like data formatters, data validators and business rules. This is where the AI/ML based system training can be augmented by a rules-based approach to get the best out of both.
Natural Language Processing (NLP)
Natural Language Processing is a key AI algorithm for better understanding of content and sentiment of unstructured documents. NLP algorithms analyse the text of a document (document contents) to understand wordings and extract entities and derive sentiments.
Entity Extraction: Cognitive Capture solutions allow named entity extraction to recognise objects like persons, products, places, URLs, emails addresses, cities, countries etc. Named entities are found in unstructured natural language texts like sentences found in emails, documents etc. Entity extraction also considers the grammatical placement of data in a document.
Sentiment Analysis aims to determine the sentiment of wording on a document to derive a positive or negative tone within the document based upon the context and choice of words and phrases used by the author.
Machine Learning Models for Cognitive Capture
Cognitive Capture solutions support the creation of machine learning models and allow system configurators to test and tune these models.
Machine learning models can be trained using both supervised and unsupervised learning. Training requires a subject matter expert to train the system using images which are symbolic (a true representation) of data received in real life scenarios. As data fields are tagged the system automatically updates the model which is used later to perform classification and data extraction.
The machine learning model should be tested on using sample test data and benchmark tools can be used to compare different models against the same test data.
Machine Learning models can be incrementally updated using a feedback loop in a production environment at runtime to tune it. Discrepancy checking is important to ensure the model has not been trained using conflicting training data. Having a larger dataset of trained document will ensure the system is more robust and can handle new document types and variations of existing documents.
This is referred to as the Three – Ts of Cognitive Capture. This video illustrates how Kofax Cognitive Capture can be used for supervised and unsupervised machine learning. (Thanks to Jesper Scherpenhuijsen for sharing this video)
Cognitive capture capabilities are particularly effective in business process applications like loan processing, invoice processing, fraud prevention, claims processing, sales order processing and digital mailroom solutions as they enable high levels of automation to de deployed at pace, with a high degree and accuracy and, importantly, can learn over time to improve performance and to adapt to changes in documents that require processing.