ffutures: (Default)
ffutures ([personal profile] ffutures) wrote2005-03-19 10:25 am

Grrrrr....

How the hell can a 1966 paperback be harder to scan and OCR than any of the Victorian books I've done? So far I'm averaging 2-3 errors a line on test runs, and nothing I try is improving things much. Time to look for another edition, I fear...

Later: Turns out that using my old scanner and computer it works fine - except that OCR is a bit on the slow side. I can live with that - just watched Buffy episode 1 while scanning the first chapter, I'll work my way through the next few chapters during the Dr. Who Night thing.

[identity profile] ci5rod.livejournal.com 2005-03-19 03:26 am (UTC)(link)
Is it the paper? Some of the 60s paperstock has always struck me as not likely to scan well, being very grainy stuff.

[identity profile] gonzo21.livejournal.com 2005-03-19 03:47 am (UTC)(link)
I was just about to ask the same question. Is the paper particularly thin? What about placing a sheet of black or white paper behind the page being scanned?

[identity profile] karohemd.livejournal.com 2005-03-19 04:18 am (UTC)(link)
*nods* In [livejournal.com profile] ffuture's case it's most likely not high quality literature but mass produced on the shoddy line with a lot of bleed on inferior, most likely non-white paper and tightly printed.

Is there an option to zoom the page before you run the OCR? That could help as well.

[identity profile] cobrabay.livejournal.com 2005-03-19 04:33 am (UTC)(link)
I was thinking zoom too. Many years back I had to do test scans & OCR of a wide variety of things for a project at work, and for some stuff I actually had to photocopy it to a larger size and lighten the image before scanning. Nowadays I'd manipulate the scanned image, perhaps despeckle and sharpen too.

[identity profile] ffutures.livejournal.com 2005-03-19 04:52 am (UTC)(link)
It's yellowing paper, and small very thin lettering.

I'm going to try it with my old scanner and PC tonight - I have a feeling it may actually work better under those conditions.

[identity profile] elementalv.livejournal.com 2005-03-19 05:09 am (UTC)(link)
A less sensitive scanner might help. Cheap paper stock tends also to lead to greater ink bleed, which another likely source for your errors. If you can clean up the scan in a photo editor (make your whites whiter and darken your blacks) before you run the OCR software, you might be able to get a better result.