Bonus OCR Stuff

I'll admit that I didn't cover OCR in my scanning book. Guilty! But for a reason: There just aren't that many people using OCR. It's not a first choice for buying a scanner. But this doesn't mean people don't use OCR, so I felt I owed it to the minority of people who use OCR or at least are interested to write a small smidgen of text about it.

So here's the bonus section on OCR stuff, which covers the whole OCR issues in two teeny sections:

  • Basic OCR Thoughts
  • Doing OCR

Some Thoughts about OCR

First, OCR stands for Optical Character Recognition. It's software, not a hardware thing. The OCR software uses the scanner to read in text. So you put a piece of paper face-down in the scanner, run the OCR software, and soon you have all the text from that document available inside the computer just as if it was typed in. Nifty.

Second, OCR software has come a long way. Early OCR applications read text and often guessed at the results, putting special characters in the text for items it couldn't read, such as:

Four S~ore and S~~en Yea~s A~o . . .

Granted, such results were often much better than typing the text yourself, but it still wasn't the utter convenience that comes with the promise of using a computer.

Third, most scanners come with OCR software, which means you can try it out if you're interested. Be aware that the OCR software you may already have is a limited version; often the "full" or professional version is much better and has more detailed features.

Finally, if you're really serious about OCR you'll probably want a sheet feeder. Scanning text in one sheet a time can be tiresome. A sheet feeder automatically slides new sheets into the scanner and old sheets out, and it's controlled by the OCR program.

Doing OCR

The most popular OCR program is called OmniPage. A limited edition of that application was most likely installed on your computer when you first installed your scanning software.

All basic OCR follows

  • Run the OCR application
  • Place the original into the scanner
  • Scan/Read the image
  • Save the text

Run your OCR application, most likely some form of OmniPage.

Start up the program just as you would any program.

Place the original into the scanner

Find something with text on it. The most successful stuff to scan is plain, evenly spaced, typewritten material. Unless you have the most current, top-of-the-line OCR software, it probably won't be able to read fancy fonts or text in boxes or multiple columns.

For this exercise, I choose a handout I created for a recent play I directed. It's just a page of rules and notes, as well a list summarizing new features of Windows Me. (Woot! Woot!)

Scan the image

Click the Scan button. In the PC version of OmniPage (shown below), the button has a scanner and a hand placing text into the scanner.

The scanner will read the page, creating an image, which is usually displayed in the OCR program's window, as shown above.

Save the text

Here you're given a choice. Depending on the program there may be options for merely reading and reviewing the text (the Perform OCR button above does that); an option for saving the text to the clipboard for pasting later; an option for merely saving the text to disk; or options for reading the text into a word processing program. The choice is up to you.

In my OCR program, I simply chose the option that rendered the text. The list reads in perfectly. That's good, but the information was pretty big. For smallter text, don't expect much.

That's pretty much all there is to it!

Remember to quit your OCR software when you're done scanning.
Some versions of OCR software can be customized for scanning information from other languages. For example, you can tune the application to view French or German, which makes the software sensitive to the way that language's text appears. (It does not translate the text into English.)
As far as I know, there is no OCR software that can read handwriting.
Good luck OCR'ing!