From paper to the Web

When you create documents on a PC, moving them to a Web site is no great feat. But you have fewer options when your documents arrive in the mail – and I don’t mean e-mail.

But the process need not be a nightmare any longer. Caere has taken its OmniPage optical character recognition (OCR) product and extended it to create OmniPage Web, an application that walks you through the entire process of scanning, OCR and proofing multiple printed pages. OmniPage Web creates a set of Web pages from printed information, complete with navigation buttons, table of contents and even a common look and feel if you want one.

OmniPage Web succeeds because of the quality of its text recognition and proofing tools. I took our corporate training catalog and scanned the first few pages. OmniPage recognized most of the common words. When it didn’t recognize a name or an abbreviation, OmniPage Web asked if it should change the word and proposed possible substitutes. OmniPage Web automatically skips from one unrecognized word to the next, making the whole process smooth and quick.

When the recognition is complete, the program displays all the scanned pages on the left side of the screen and an HTML page image on the right side. In the middle is a proofing panel that lets you make changes to the pages. The program is intelligent enough to recognize headings in the document and apply them as HTML code. It also recognizes hyperlinks and e-mail addresses in a document and creates appropriate HTML tags.

The software can put links to images of the original pages into your HTML text. OmniPage Web cleverly turns the scanned page into an image map; you can click on it to be transported to the corresponding point in the HTML document. This is a handy navigation feature for viewers familiar with the original paper format.

OmniPage Web is a great timesaver, but it’s not perfect. It doesn’t do well recognizing text printed over a variable background, such as a logo or a photo. It had trouble with the bullets in front of some of the lines in my test documents. And the color and button themes it applies to pages are, for the most part, dreadful.

Still, if you have a large amount of information on paper that you want to move to the Web as quickly as possible, no software makes the process smoother than OmniPage Web.

I’ve met my meta match

Though I’m a long-time fan of metasearch tools, I haven’t found one worth paying for since Quarterdeck’s WebCompass disappeared – until now.

X-Portal from Kaufman Consulting Service, Ltd. (KCSL) is a metasearch workhorse that does more than most metasearch applications, which scour multiple search engines to uncover as many matches as possible for your search string. (There are more than a dozen such metasearch tools online.)

Most free metasearch tools make it difficult or impossible for you to add your own search sites to their lists of engines. X-Portal allows you to control your list of active search engines. For example, you could add your own corporate site and search it at the same time you search the Internet.

Adding a search engine is easy. A wizard asks for the site name and URL, then brings up the site and asks you to run a query. After that, it’s placed on your list of active engines.

X-Portal is closely tied to Microsoft’s Internet Explorer browser. It adds a button to Internet Explorer’s standard tool bar. When you click on it, an X-Portal pane appears in the browser the same way a search or history pane does.

The X-Portal pane has two tabs. The Search tab lets you look for words or phrases, which are returned ranked by relevance, and highlights the search terms in the results. The References tab connects you to a compendium of dictionaries and almanacs, the Columbia Encyclopedia and even a political atlas that lets you zoom in and out. There are 300,000 references, all stored on your hard drive, all of which are searchable.

The program isn’t perfect. It doesn’t let you group your search engines by category, which would let you have one set for technical support, one set for shopping and so on. It also won’t let you export a list of the links you found, which would help you share them with others. Its documentation is nonexistent, though its online help is useful. It takes up 150M bytes of disk space. And currently, it only works with Internet Explorer 4.01 or above.

Maybe many of X-Portal’s shortcomings will be rectified in upcoming releases. Until then, this application still might be a good-enough reason for Communicator users to switch to Internet Explorer 5.

WordPress Themes