[ Go back to normal view ]

BW2 :: the bitwise supplement :: http://www.bitwisemag.com/2

Free OCR Program
And you may already have it!

26 August 2007

by Huw Collingbourne

I happen to need to do some OCR (Optical Character Recognition) at the moment. Having discovered that the ‘80s are back in fashion (yes, I promise you!) I decided the time was right to republish some of the stuff I wrote back in the early ‘80s. Back then I was a pop music journalist and spent my time interviewing stars such as Boy George, George Michael, the B52s, Judas Priest and Adam Ant. Just to show how long ago this was, I didn’t even have a computer. Everything I wrote was banged out at the keyboard of an ancient and formidably heavy Imperial 66 typewriter.

You may not know it, but you may already have this OCR program!


By some quirk of fate, while I no longer have copies of the magazines in which my interviews appeared, I do still have a bunch of old and faded carbon copies stuck in the bottom of a desk drawer. It is these that I needed to bring back to life by scanning them, feeding them through some OCR software and getting the original text converted into editable format her on my PC.

It’s ages since I last looked at any professional OCR software so I don’t have any Vista-ready OCR applications sitting among the boxes of software that clutter up my office. I decided to Google around to see if I could find any free OCR programs. It was while I was doing this that I noticed a reference to something called ‘Microsoft Office Document Scanning’ which is (so I gathered) bundled with Microsoft Office. I clicked the Start menu, found the Microsoft Office 2007 program group and – it wasn’t there!

I then went into the Control Panel and started up the office ‘repair’ procedure. To my surprise, Microsoft Office Document Scanning was listed under Tools as an uninstalled option. I selected it to be installed and, hey presto!, it suddenly appeared on the Office group of the Start Menu (and no, I didn’t have to put any installation disks into my DVD drive – it seems that the program was on my hard disk already but, for some reason, had not been installed into the menu).

I wasn’t expecting too much of this program, to be honest, and my low expectations were fully realised when I scanned one of my faded carbon copies as a ‘black and white’ document and found that only the white bit actually came through. Which is to say that when it scanned the pages it didn’t recognise one single piece of text! I was just about to uninstall the darn’ thing when, out of idle curiosity, I decided to try scanning again but this time selecting ‘grayscale’ rather than ‘black and white’.

To my amazement, this worked. Indeed, it not only worked – it worked very well indeed. Even with carbon copies so faded that I can barely read them it achieves, I’d say, something in excess of 80% accuracy. With good, crisp carbon copies, it comes remarkably close to 100%.

You can scan one page at a time or several pages in sequence (in which case it prompts you when it’s ready to scan the next page). When it’s finished, a click of a button copies the scanned text into Word, ready for final proofing and editing.

This may not have all the bells and whistles of an expensive standalone OCR package. Even so, it’s saved me a huge amount of retyping and, bearing in mind the exceptionally low quality (the print quality, I mean, not the content!) of the 25 year-old carbon copies which I’m feeding it, it’s doing a very satisfactory job.