{"id":8143,"date":"2016-03-09T00:01:05","date_gmt":"2016-03-09T08:01:05","guid":{"rendered":"http:\/\/www.wambooli.com\/blog\/?p=8143"},"modified":"2016-03-09T08:06:39","modified_gmt":"2016-03-09T16:06:39","slug":"scanning-text-into-a-word-document","status":"publish","type":"post","link":"https:\/\/www.wambooli.com\/blog\/?p=8143","title":{"rendered":"Scanning Text into a Word Document"},"content":{"rendered":"<p>A scanner is just a flat-surface camera. It reads only images. To pull text off a page, and eventually get that text into a Word document, you must manipulate the scanned image. The software to carry out that task is called Optical Character Recognition software, OCR for short.<br \/>\n<!--more--><br \/>\nIf you have a nice scanner, then OCR software was included with the device. My Cannon scanner&#8217;s software features OCR as a scanning option, as shown in Figure 1. When I choose that option, the media is scanned, text is pulled from the document and saved as a plain text file.<\/p>\n<div id=\"attachment_8145\" style=\"width: 560px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-8145\" src=\"http:\/\/www.wambooli.com\/blog\/wp-content\/uploads\/2016\/03\/OCR_figure1.png\" alt=\"Figure 1. Setting OCR mode for my Cannon scanner.\" width=\"550\" height=\"455\" class=\"size-full wp-image-8145\" srcset=\"https:\/\/www.wambooli.com\/blog\/wp-content\/uploads\/2016\/03\/OCR_figure1.png 550w, https:\/\/www.wambooli.com\/blog\/wp-content\/uploads\/2016\/03\/OCR_figure1-300x248.png 300w\" sizes=\"auto, (max-width: 550px) 100vw, 550px\" \/><p id=\"caption-attachment-8145\" class=\"wp-caption-text\">Figure 1. Setting OCR mode for my Cannon scanner.<\/p><\/div>\n<p>In Figure 2, you see a document I scanned. It just contains a few lines of text, thoughts about Coffee. The image in Figure 2 was scanned as JPEG image file; it does not contain text, only pixels.<\/p>\n<div id=\"attachment_8146\" style=\"width: 242px\" class=\"wp-caption aligncenter\"><a href=\"http:\/\/www.wambooli.com\/blog\/wp-content\/uploads\/2016\/03\/OCR_figure2.jpg\" rel=\"attachment wp-att-8146\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-8146\" src=\"http:\/\/www.wambooli.com\/blog\/wp-content\/uploads\/2016\/03\/OCR_figure2-232x300.jpg\" alt=\"Figure 2. A scanned text document (graphics file). Click to embiggen.\" width=\"232\" height=\"300\" class=\"size-medium wp-image-8146\" srcset=\"https:\/\/www.wambooli.com\/blog\/wp-content\/uploads\/2016\/03\/OCR_figure2-232x300.jpg 232w, https:\/\/www.wambooli.com\/blog\/wp-content\/uploads\/2016\/03\/OCR_figure2.jpg 510w\" sizes=\"auto, (max-width: 232px) 100vw, 232px\" \/><\/a><p id=\"caption-attachment-8146\" class=\"wp-caption-text\">Figure 2. A scanned text document (graphics file). Click to embiggen.<\/p><\/div>\n<p>When I use OCR scanning mode, however, the image is scanned and then it&#8217;s read by the OCR software. The image file is still saved (at least when I use my Cannon scanner&#8217;s software), but then the text pulled from the image is presented in a text editor window. That window is shown in Figure 3, which represents the OCR software&#8217;s attempt to read the text document scanned in Figure 2.<\/p>\n<div id=\"attachment_8147\" style=\"width: 560px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-8147\" src=\"http:\/\/www.wambooli.com\/blog\/wp-content\/uploads\/2016\/03\/OCR_figure3.png\" alt=\"Figure 3. OCR software attempts to read text from a scanned file.\" width=\"550\" height=\"378\" class=\"size-full wp-image-8147\" srcset=\"https:\/\/www.wambooli.com\/blog\/wp-content\/uploads\/2016\/03\/OCR_figure3.png 550w, https:\/\/www.wambooli.com\/blog\/wp-content\/uploads\/2016\/03\/OCR_figure3-300x206.png 300w\" sizes=\"auto, (max-width: 550px) 100vw, 550px\" \/><p id=\"caption-attachment-8147\" class=\"wp-caption-text\">Figure 3. OCR software attempts to read text from a scanned file.<\/p><\/div>\n<p>As you can see by comparing Figures 2 and 3, the results are rather mediocre. That&#8217;s what you can expect from OCR software. I assume that perhaps more expensive software could better recognize words. Even so, the remaining steps are the same:<\/p>\n<p><strong>1. Copy and paste text from the scanned document into a Word document.<br \/>\n2. Fix the errors.<\/strong><\/p>\n<p>You could save the scanned text document. I don&#8217;t. Copying and pasting works well enough.<\/p>\n<p>My HP OfficeJet Pro is an all-in-one printer that features a scanner. It uses a control panel on the printer (a touchscreen) to select scanning modes. ORC isn&#8217;t one of the modes, but it might be on another all-in-one printer. If so, choose that mode to scan text from a document. Select the output location, such as your computer or a thumb drive. The results from that OCR operation will probably be the same as when scanning as described earlier in this chapter: You still need to get the text into your Word document and then edit the mistakes.<\/p>\n<p>By the way, if you save the text documents, follow these steps to pull the document&#8217;s text into Word:<\/p>\n<p><strong>1. Click the Insert tab.<br \/>\n2. In the Text group, click the Object button&#8217;s menu triangle and choose Text From File.<br \/>\n3. Use the Insert File dialog box to locate the ORC-generated text file.<br \/>\n4. Click the Insert button to add the ORC file&#8217;s text to your Word document.<\/strong><\/p>\n<p>Again, you&#8217;ll have to edit the text to be more presentable in your document. Remember: It&#8217;s faster to edit that text than to type it all over again from scratch.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>How to use OCR software.<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[3],"tags":[9],"class_list":["post-8143","post","type-post","status-publish","format-standard","hentry","category-main","tag-word"],"_links":{"self":[{"href":"https:\/\/www.wambooli.com\/blog\/index.php?rest_route=\/wp\/v2\/posts\/8143","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.wambooli.com\/blog\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.wambooli.com\/blog\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.wambooli.com\/blog\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.wambooli.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=8143"}],"version-history":[{"count":3,"href":"https:\/\/www.wambooli.com\/blog\/index.php?rest_route=\/wp\/v2\/posts\/8143\/revisions"}],"predecessor-version":[{"id":8153,"href":"https:\/\/www.wambooli.com\/blog\/index.php?rest_route=\/wp\/v2\/posts\/8143\/revisions\/8153"}],"wp:attachment":[{"href":"https:\/\/www.wambooli.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=8143"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.wambooli.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=8143"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.wambooli.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=8143"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}