March 9, 2016

Scanning Text into a Word Document

Filed under: Main — Tags: — admin @ 12:01 am

A scanner is just a flat-surface camera. It reads only images. To pull text off a page, and eventually get that text into a Word document, you must manipulate the scanned image. The software to carry out that task is called Optical Character Recognition software, OCR for short.

If you have a nice scanner, then OCR software was included with the device. My Cannon scanner’s software features OCR as a scanning option, as shown in Figure 1. When I choose that option, the media is scanned, text is pulled from the document and saved as a plain text file.

Figure 1. Setting OCR mode for my Cannon scanner.

Figure 1. Setting OCR mode for my Cannon scanner.

In Figure 2, you see a document I scanned. It just contains a few lines of text, thoughts about Coffee. The image in Figure 2 was scanned as JPEG image file; it does not contain text, only pixels.

Figure 2. A scanned text document (graphics file). Click to embiggen.

Figure 2. A scanned text document (graphics file). Click to embiggen.

When I use OCR scanning mode, however, the image is scanned and then it’s read by the OCR software. The image file is still saved (at least when I use my Cannon scanner’s software), but then the text pulled from the image is presented in a text editor window. That window is shown in Figure 3, which represents the OCR software’s attempt to read the text document scanned in Figure 2.

Figure 3. OCR software attempts to read text from a scanned file.

Figure 3. OCR software attempts to read text from a scanned file.

As you can see by comparing Figures 2 and 3, the results are rather mediocre. That’s what you can expect from OCR software. I assume that perhaps more expensive software could better recognize words. Even so, the remaining steps are the same:

1. Copy and paste text from the scanned document into a Word document.
2. Fix the errors.

You could save the scanned text document. I don’t. Copying and pasting works well enough.

My HP OfficeJet Pro is an all-in-one printer that features a scanner. It uses a control panel on the printer (a touchscreen) to select scanning modes. ORC isn’t one of the modes, but it might be on another all-in-one printer. If so, choose that mode to scan text from a document. Select the output location, such as your computer or a thumb drive. The results from that OCR operation will probably be the same as when scanning as described earlier in this chapter: You still need to get the text into your Word document and then edit the mistakes.

By the way, if you save the text documents, follow these steps to pull the document’s text into Word:

1. Click the Insert tab.
2. In the Text group, click the Object button’s menu triangle and choose Text From File.
3. Use the Insert File dialog box to locate the ORC-generated text file.
4. Click the Insert button to add the ORC file’s text to your Word document.

Again, you’ll have to edit the text to be more presentable in your document. Remember: It’s faster to edit that text than to type it all over again from scratch.

2 Comments

  1. Intereesting must give it a go, before it is needed!

    Comment by glennp — March 12, 2016 @ 1:31 am

  2. The HP OfficeJet Pro is a great printer. The problem is that it won’t recognize the PC unless I install the specific HP software. Beyond that it works great as a printer, you just can’t scan and save the document to the PC.

    If I install the HP software, however, it contains a lot of bloat and actually slows the printing process. So for now, I scan to a thumb drive, then access the thumb drive’s files from the PC.

    Comment by admin — March 12, 2016 @ 10:58 am

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.


Powered by WordPress