For years, word processing software and printers have done the job of turning digital text files into tangible documents in hard-copy format. The challenge has been moving in the opposite direction. The ability to take a hard-copy document and translate it into text that can be saved to a machine, searched, and even edited is a feat that has gradually become more achievable in recent years.
The technology underlying this ability is known as optical character recognition. As the name suggests, it allows a computer to recognize the symbols on a piece of paper and associate them with letters, numbers, punctuation marks, and other characters. This sounds simple enough in practice, but it has taken some time for the technology to develop to the point where it can reliably process documents in a variety of fonts that may be old and blurry—not to mention processing hand-written text!
OCR solutions are critical for the digitization step of document understanding AI. Before this technology, the only readily available alternatives were for a human to manually read and re-type a document or to scan a document and save it as an image that could not be edited or searched.
Considerations for Choosing the Right OCR Solution for Your Specific Application
There are several widely known and highly effective OCR solutions available on the market today. However, the unique requirements of your application will determine which software is ideal. Paying for overpowered optical character recognition technology could be a financial burden to your business. Similarly, selecting OCR software that doesn’t meet your workflow needs could cause process failure and lead to financial loss.
The following considerations will help guide your comparison of optical character recognition software.
Structured vs. Unstructured Data
When an OCR application scans a document for data, its task can be made easier or more difficult based on the organization of that data and whether it's based on some form of a template. Consider a very simple example that many readers are likely familiar with filling in the blanks on a standardized test. Imagine a test with ten questions with a space to enter a one-word answer to each question. This is structured data. Each test turned in should follow the same simple format of one answer per one prompt/question, making it relatively easy for a computer to match the answer to the associated question.
Now, consider the opposite end of the spectrum: unstructured data. Imagine, instead of the example above, students are required to answer an essay question. Each student will be submitting free-form responses that may be structured very differently from one another. A question might ask students to identify five core themes of a novel, for example. It's much more difficult for a computer to digest and evaluate this type of unstructured data, but it can be done if using a sophisticated enough application. Often this requires programming certain rules and leveraging machine learning.
Template vs. Non-Template
Relatedly, basic OCR applications can do a passable job of processing data entered in templates. This might include entering information on a tax form, for example. The computer knows that it should be able to identify a filer’s Social Security number in one field, gross income in another, and filing status in yet another.
By contrast, data that is not entered into a template requires the application to read it to determine where the data belongs. Consider, for example, an HR recruitment application used to read applicant resumes. While each resume should contain roughly the same information, different candidates will use different formats. The OCR application will need to determine how to take data from those nonstandard documents and compile them into a standardized format for comparison and evaluation.
Confidence Levels of Recognition
Sometimes it’s not critical that OCR inputted data be 100% accurate. If, after processing a job candidate’s resume, an OCR program lists their undergraduate institution as the University of Wisconsin, it should be fairly obvious to anyone reading that input that the student attended the University of Wisconsin. But in other instances, accuracy can be crucially important. Accounting is a prime example. The difference between zero ("0") and an eight ("8") could represent a critical material difference.
Of course, we can’t expect a computer to identify when it has inputted data incorrectly. If it could do that, we’d simply program it to correct that error or to not make the error in the first place. But we can ask that same computer to tell us how confident it is in its recognition of a character. That’s where confidence levels, or confidence scores, come into play. These scores can tell the user of an OCR application how confident the program is with its recognition of a text component. This component could be anything from a single character to an entire document.
Ability to Read Different Fonts
Humans are pretty good at comprehending that “these three words” mean the same thing as “these three words”, even though they’re in different fonts. Computer applications can be just as good if not better than humans at this task, if they’re designed and built with sufficient sophistication.
OCR With Machine Learning vs. Without Machine Learning
As alluded to above, OCR tools can leverage machine learning to help with complex text recognition tasks. A great example of this is in contract review and analysis. Different entities have different formats for contracts, even when those contracts are addressing essentially the same things. A business may have multiple contracts with different vendors, all of which contain the choice of law, limitation of liability, and warranty provisions, but in different parts of the document. One limitation of liability provision may begin with, “liability shall not exceed…” while another begins with, “total aggregate liability under this agreement shall be limited to…” An OCR application using machine learning can learn to identify these as similar provisions and flag them as such to better organize, compare and analyze.
A useful tool in any OCR application is the ability of a human user, through a user interface, to manually validate that data has been uploaded correctly. Even when a human user takes the time to verify every word of an inputted document against what was recognized by the OCR application, the application still saves time by eliminating the need for the human to enter all that text themselves. More often, a human user might spot check documents by validating a random sample, check certain key fields—such as dollar amounts—or use their validation as an input to further improve the OCR application’s machine learning.
An API, or application programming interface, is a means through which one computer or application can speak to another. APIs transfer data from one point to another, and these are huge value adds for any OCR application. Recognizing and storing data have limited utility unless that data can be shared or sent to other systems. For example, an OCR tool used to process invoices might have an API allowing it to send that data to an accounting system.
Increasingly, data is stored in the cloud, meaning it’s hosted on servers that can be accessed remotely, rather than on a single machine and accessible only through that machine. This provides a great deal of convenience, but it also opens the door to data security risks. If it’s easy for the people who should have access to that data to access it, how can one make sure it’s not so easy for those who should not have access to it?
Any time a person or business entrusts their data to a third-party cloud hosting service provider, they need to be confident in that provider’s data security, and this should be a key consideration when utilizing any cloud-based OCR solution.
Data storage is a major logistical and financial consideration when choosing any software application. It costs money to store data, and part of that cost goes towards ensuring the data is secure. An important factor in the data storage discussion is whether data is stored on-platform—meaning that, once inputted, it resides within the application that processed it—or off-platform—meaning that one application processes it and then sends it to another application for storage. Whether an OCR application provides on-platform or off-platform storage can make a big difference in terms of overall cost, logistics, and security.
NITCO Can Help You Leverage OCR Solutions for Effective Intelligent Process Automation
NITCO has years of experience working on document understanding problems. Our solutions can be implemented using Robotic Process Automation deployment and Intelligent Process Automation systems that result in higher productivity for your business. We enable machines to do the more tedious work at your organization, so your employees can focus on finding creative solutions to more sophisticated business opportunities.
Contact NITCO today to discover how we can help streamline your business processes.