[sane-devel] Using Find Command in XSane PDF File
tingox at gmail.com
Wed Feb 1 00:21:37 UTC 2017
On Tue, Jan 31, 2017 at 11:22 PM, Raymond Hanslits <rhanslits at yahoo.com> wrote:
> I have been scanning and creating PDF files in XSane. However, I cannot get
> the Find Command to search for words and phrases in PDF files.
> Can you help me?
Scanned documents consists of images (yes, even PDF documents) from the start.
The find command (or a search command) works with text (words, phrases).
So, after scanning your document, you need to perform text recognition
on the images in your document. This process is usually called ocr -
optical character recognition. It scans the images for characters and
creates text from the characters it finds. This is not a 100% success,
more like 97 - 98. This text is usually saved as a layer (invisible)
"on top" of each image in a PDF file, so when you search you will find
the word on the correct page.
I see that XSane has an "OCR" button, but I have never used it, so I
don't know if it works or what it takes to make it work.
I usually use gscan2pdf http://gscan2pdf.sourceforge.net/ for scanning
text documents. It uses SANE, has several ocr tools you can choose
from, and usually works very well.
More information about the sane-devel