Grrrrr....

Mar. 19th, 2005 10:25 am
ffutures: (Default)
[personal profile] ffutures
How the hell can a 1966 paperback be harder to scan and OCR than any of the Victorian books I've done? So far I'm averaging 2-3 errors a line on test runs, and nothing I try is improving things much. Time to look for another edition, I fear...

Later: Turns out that using my old scanner and computer it works fine - except that OCR is a bit on the slow side. I can live with that - just watched Buffy episode 1 while scanning the first chapter, I'll work my way through the next few chapters during the Dr. Who Night thing.

Date: 2005-03-19 03:26 am (UTC)
From: [identity profile] ci5rod.livejournal.com
Is it the paper? Some of the 60s paperstock has always struck me as not likely to scan well, being very grainy stuff.

Date: 2005-03-19 03:47 am (UTC)
From: [identity profile] gonzo21.livejournal.com
I was just about to ask the same question. Is the paper particularly thin? What about placing a sheet of black or white paper behind the page being scanned?

Date: 2005-03-19 04:18 am (UTC)
From: [identity profile] karohemd.livejournal.com
*nods* In [livejournal.com profile] ffuture's case it's most likely not high quality literature but mass produced on the shoddy line with a lot of bleed on inferior, most likely non-white paper and tightly printed.

Is there an option to zoom the page before you run the OCR? That could help as well.

Date: 2005-03-19 04:33 am (UTC)
From: [identity profile] cobrabay.livejournal.com
I was thinking zoom too. Many years back I had to do test scans & OCR of a wide variety of things for a project at work, and for some stuff I actually had to photocopy it to a larger size and lighten the image before scanning. Nowadays I'd manipulate the scanned image, perhaps despeckle and sharpen too.

Date: 2005-03-19 04:52 am (UTC)
From: [identity profile] ffutures.livejournal.com
It's yellowing paper, and small very thin lettering.

I'm going to try it with my old scanner and PC tonight - I have a feeling it may actually work better under those conditions.

Date: 2005-03-19 05:09 am (UTC)
From: [identity profile] elementalv.livejournal.com
A less sensitive scanner might help. Cheap paper stock tends also to lead to greater ink bleed, which another likely source for your errors. If you can clean up the scan in a photo editor (make your whites whiter and darken your blacks) before you run the OCR software, you might be able to get a better result.

December 2025

S M T W T F S
  12 3 456
7 89 10111213
14 15 16 1718 1920
21 22 2324252627
28 29 3031   

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated Dec. 31st, 2025 03:30 pm
Powered by Dreamwidth Studios