Need software to "OCR" PDFs for searching

juanantoniod

Antonio
Power User
Local time
10:00 AM
Messages
300
Location
Los Angeles, CA
Hello,

This is what I am trying to accomplish: A 'self managed' document management system.

My scanner will scan documents into PDF directly, naming them 001, 002, 003, etc.

I would like to leave these assigned file names, and then just search through the PDFs when I need to find something (a particular document).

Therefore, I need a program that OCRs the scanned PDF images, so that the text within them can be searched and found. Which program can I use to do this?

Thank you very much for any advice and assistance! :)
 

My Computer

Computer Manufacturer/Model Number
HP Pavilion Media Center PC m7350n
OS
Microsoft Windows 7 Home Premium 32-bit 7601 Multiprocessor Free Service Pack 1
CPU
Intel(R) Pentium(R) D CPU 2.80GHz
Motherboard
ASUSTek Computer INC. EMERY
Memory
2.00 GB
Graphics Card(s)
NVIDIA GeForce 6200SE TurboCache(TM)
Sound Card
Realtek High Definition Audio
Monitor(s) Displays
HP L1710 LCD Display
Screen Resolution
1280 x 1024 x 32 bits (4294967296 colors) @ 60 Hz
Hard Drives
(1) SAMSUNG SP2504C (2) EPSON Stylus Storage USB Device (3) Generic USB CF Reader USB Device (4) Generic USB MS Reader USB Device (5) Generic USB SD Reader USB Device (6) Generic USB SM Reader USB Device (7) Seagate FA GoFlex Desk USB Dev
Internet Speed
20+mbps
Slightly digressing from the topic, OCR is a somewhat inexact science because the quality of the scan is the key. One could use Adobe acrobat pro (the paid version, not the free reader) but whenever I've tried I always get an error saying the file cant be OCRed because it contains graphics other than images and texts. Dont know whether its peculiar to adobe or theres something I'm doing wrong.
 

My Computer

Computer Manufacturer/Model Number
Too many to describe...
OS
Windows 7 x64 pro/ Windows 7 x86 Pro/ XP SP3 x86
Yes, and what I am trying to avoid is key working every document. I want to scan all of my documents into one folder of PDFs, then run a keyword search hoping that at least one of the keywords will locate the correct document. I know it's asking for a lot. I'm just trying to save as much effort as possible. Thanks!
 

My Computer

Computer Manufacturer/Model Number
HP Pavilion Media Center PC m7350n
OS
Microsoft Windows 7 Home Premium 32-bit 7601 Multiprocessor Free Service Pack 1
CPU
Intel(R) Pentium(R) D CPU 2.80GHz
Motherboard
ASUSTek Computer INC. EMERY
Memory
2.00 GB
Graphics Card(s)
NVIDIA GeForce 6200SE TurboCache(TM)
Sound Card
Realtek High Definition Audio
Monitor(s) Displays
HP L1710 LCD Display
Screen Resolution
1280 x 1024 x 32 bits (4294967296 colors) @ 60 Hz
Hard Drives
(1) SAMSUNG SP2504C (2) EPSON Stylus Storage USB Device (3) Generic USB CF Reader USB Device (4) Generic USB MS Reader USB Device (5) Generic USB SD Reader USB Device (6) Generic USB SM Reader USB Device (7) Seagate FA GoFlex Desk USB Dev
Internet Speed
20+mbps

My Computer

Computer Manufacturer/Model Number
OEM - Me
OS
Windows 7 Home Premium x64
CPU
AMD Phenom II X6 1600T
Motherboard
GigaByte GZ-990FXA-UD3
Memory
16GB PC3-10700 (1342MHz)
Graphics Card(s)
ATI Radeon 5770 HD (x2) CrossFire
Sound Card
On-board RealTek chipset
Monitor(s) Displays
3x Hanns-G 1920x1080 Monitors
Screen Resolution
3x Hanns-G 1920x1080 Monitors
Hard Drives
Intel 25-V SSD 40GB: 218 MB/s AT: 0.1ms
Intel X-25M SSD 80GB: 230MB/s AT: 0.1ms
Seagate 750GB: 133 MB/s AT: 13ms (perpendicular storage)
Buffalo HD-PCTU3 1TB External drive
PSU
OCZ Stealth X Stream 750W
Case
Cheap (unknown)
Cooling
Stock
Keyboard
HP USB
Mouse
LogiTech USB
Internet Speed
1.5 Mbps - Slow - At the tail-end of a rural network
Other Info
Printer: Epson Stylus C-84
Scanner: HP 3500C Flatbed
DVD-RW: Plextor
DVD-ROM: Unknown
WEI: 7.4
Thanks for the PDFzilla reco. I already had it, but it is great for anyone else who has not already gotten the FREE copy. It converts PDFs to almost any other format imaginable.

What I found out through Googling this, is that I wanted my PDF documents to be *indexed*. Supposedly, Google Desktop automatically does this as part of its indexing, *if* you have the texttopdf program add-on, which should come with it. Also, now, Windows 7, can be configured to index PDF text. So, I have it set up and am giving it some time to do the indexing before I run a test query.

Thanks to everyone here who is so helpful!
 

My Computer

Computer Manufacturer/Model Number
HP Pavilion Media Center PC m7350n
OS
Microsoft Windows 7 Home Premium 32-bit 7601 Multiprocessor Free Service Pack 1
CPU
Intel(R) Pentium(R) D CPU 2.80GHz
Motherboard
ASUSTek Computer INC. EMERY
Memory
2.00 GB
Graphics Card(s)
NVIDIA GeForce 6200SE TurboCache(TM)
Sound Card
Realtek High Definition Audio
Monitor(s) Displays
HP L1710 LCD Display
Screen Resolution
1280 x 1024 x 32 bits (4294967296 colors) @ 60 Hz
Hard Drives
(1) SAMSUNG SP2504C (2) EPSON Stylus Storage USB Device (3) Generic USB CF Reader USB Device (4) Generic USB MS Reader USB Device (5) Generic USB SD Reader USB Device (6) Generic USB SM Reader USB Device (7) Seagate FA GoFlex Desk USB Dev
Internet Speed
20+mbps
I found a fix to make pdf doc with searchable text in win 7. Scan a document as an image file with a resolution of at least 600 dpi. Use your ocr software to convert it to a searchable pdf. This is the only method that worked for me after much trial and error. Believe me, I tried them all. My hp all-in-one printer works with hp solution center 14.0 which has the ocr software. All the best.
 

My Computer

OS
windows home premium 64 bit
Thanks for the PDFzilla reco. I already had it, but it is great for anyone else who has not already gotten the FREE copy. It converts PDFs to almost any other format imaginable.

What I found out through Googling this, is that I wanted my PDF documents to be *indexed*. Supposedly, Google Desktop automatically does this as part of its indexing, *if* you have the texttopdf program add-on, which should come with it. Also, now, Windows 7, can be configured to index PDF text. So, I have it set up and am giving it some time to do the indexing before I run a test query.

Thanks to everyone here who is so helpful!

Win 7 will not index all of your pdf files properly for searching. I tried everything.
 

My Computer

OS
windows home premium 64 bit
Yes, and what I am trying to avoid is key working every document. I want to scan all of my documents into one folder of PDFs, then run a keyword search hoping that at least one of the keywords will locate the correct document. I know it's asking for a lot. I'm just trying to save as much effort as possible. Thanks!
I posted a fix on this thread that might work for you.
 

My Computer

OS
windows home premium 64 bit
Back
Top