Optical character recognition (OCR) software has been around for a long time and OmniPage from ScanSoft is often touted as the market leader in this space.Optical character recognition (OCR) software has been around for a long time and OmniPage from ScanSoft is often touted as the market leader in this space. Helping to maintain its lead, it comes bundled with many scanners as a Limited Edition.
ScanSofts latest edition to the OmniPage range now has the ability to convert PDF documents and output them in editable Word, Excel or HTML format. Previously, PDF ability had been the domain of Adobes Acrobat and a handful of document conversion programs. Add to that a 40% improvement in accuracy, plus a host of new and improved features, and this upgrade becomes worthy of consideration.
Installation and use of OmniPage is dead simple, and to make things even simpler, the OCR Wizard will guide the first-time user through the process of scanning, converting and outputting documents.
Our first test for OmniPage was a page from PC Authority, Tim Deans Login column (August 2001, page 17). Pictured, the left pane shows the raw scan and on the right is the final converted page.Clicking the OCR Wizard brings up a dialog box of options such as black-and-white or colour scan, magazine media, table, spreadsheet and language.
After the conversion OmniPage runs a proofread, similar to a spellchecker, which focuses the readers attention on doubtful words and characters. This particular scan of Login resulted in 100% accuracy for text. The only queries the proofreader generated were unusual words such as broadbandwagon. The proofreader will also learn commonly used words by adding corrections to its
Unfortunately, the font types, colour and weighting were not matched properly, hence the difference in paragraph flow. But the overall page retained the main layout of the original. Font sizes, on the other hand, were matched more accurately, even with small point sizes such as the PC Authority Web address shown at the bottom of each page, which is about 6 points. The word Login in reverse has remained a graphic, as has the picture of Tim and some special characters like the trademark PC Authority exclamation mark at the end of the column.
Other scans we performed with more complex layouts such as crossword puzzles, logos, background watermarks and diagonal text were not as accurate, and were sometimes left as a graphic.
You can export scanned files to Word, Excel, HTML or PDF. You can then open and edit the text while retaining the page layout, including tables, columns, headings and font information. OmniPage 11.0 retained table and spreadsheet layout in most of our tests, even when the scanned document did not have gridlines or table borders.
Several PDF documents that were converted had the same level of accuracy as a scanned document with similar layout and font problems. Opening PDF documents with more than 10 pages took an incredibly long time, but converting them to editable text was very quick.
Version 11.0 of OmniPage has greater language recognition, with over 100 languages in its repertoire. However, the languages all have Roman alphabet characters - French, Italian, Greek and even Eskimo are all there but theres no listing for Chinese or Arabic.
Before you begin each scan OmniPage must be told what languages are present on the page. You can specify multiple languages on a page.
The accuracy of faxed documents was largely dependent on the quality of the fax and, in most cases, OmniPage could be forgiven for not converting blurred or joined characters. A new despeckle module has been added to improve the accuracy of faxed documents and, on the whole, it did a remarkably good job of converting faxes.
As with the previous version, OmniPage 11.0 will read back the scanned document, albeit with a robotic voice.
With ScanSofts acquisition of Caere early last year, OmniPage Pro now has to share company space with former competitor TextBridge OCR Software. OmniPage is now being pitched at the high-end corporate market while the considerably less expensive TextBridge Pro 9.0 is aimed at the consumer market. Although the layout retention of both apps are mostly on a par, OmniPage Pro, with its PDF ability and higher accuracy, is clearly the winner - but it comes at a price.