[sane-devel] Configuring OCR tool

Jeroen Van den Keybus jeroen.vandenkeybus at gmail.com
Mon Jul 29 23:15:15 BST 2019


All this gets me no nearer configuring Xsane. Is anybody listening there on
> sane-devel? Do you guys not know? I'm one step short of downloading source
> and grepping it!
>

Feel free to download and grep.

But as far as I know, xsane does not integrate any OCR. gscan2pdf,
recommended previously by Jeff, does. It allows you to work on and
reorganize individual scanned pages and turn them into multi-page pdfs,
with or without OCR'ed text. Quite frankly one of the best tools I ever
used. I did hundreds of pages with it.


> gscan2pdf, eh?
> That sounds like a gnome thing, doesn't it? Gnome I don't remember well.
> I'm on Slackware64 here, and you're probably developing it under a gnome
> shell. I'll happily try it out and compare it, but you'll have to spoonfeed
> me on the dependencies. Slackwasre64 supports kde, & XFCE. I'm using xfce.
>

It's current version 2.5.5 is GTK3 based. I tried it for you on openSUSE
Plasma (KDE) with tesseract. Builds and works fine, but requires a
significant number of Perl libraries. Clone it from git, run 'perl
Makefile.PL' and observe what is missing. Then make, install and run it to
see additional missing packages. None of it is problematic.


> Could you post the output of 'ldd /path/to/gscan2pdf' please? I'll get an
> idea of how much hassle I'm in for. I'm hoping for a short output, not a
> long one.
>

Sure:

$ ldd $(which gscan2pdf)
        not a dynamic executable


>
>
> *Sent:* Sunday, July 28, 2019 at 10:09 AM
> *From:* "Jeff" <jffry at posteo.net>
> *To:* sane-devel at alioth-lists.debian.net
> *Subject:* Re: [sane-devel] Configuring OCR tool
> On 26/07/2019 16:16, Business Kid wrote:
> > I have sane(1.0.27) & xsane(0.999) working here on my HP LaserJet MFP
> > 130nw Multifunction printer. I wanted to use it for OCR (At which I have
> > some commercial experience). gocr seems to be the only OCR tool; but
> > that project seems to be dying, or dead.
> >
> > This query is about OCR. How do I set the ocr program & options in
> > xsane? I would like to be able to choose tesseract, or ABBYY and pass
> > options. I think tesseract has a 'stdout' option, which allows you to
> > junk the original file. In commercial work, 500G disks were being
> > swapped around regularly as they filled up and were queued for OCR.
> >
> > I did a test of GPL linux tools a few years back, and *tesseract* came
> > out best, with a new OCR engine in Beta. I was able to scan & then edit
> > one of my father's plays which had been typewritten for him by a novice
> > in the 1960s. He then corrected it by hand. Having done work for a firm
> > here 10 years back, I knew that *ABBYY* was probably the best
> > (commercial) package, then only available in M$Windoze.  ABBYY now have
> > a (commercial) linux package, with a one month free trial :-D.
>
> I can't help you with xsane, but I can suggest another scanning tool
> that supports OCR, in particular tesseract (but I am biased, because I
> am the author):
>
> gscan2pdf
>
> Regards
>
> Jeff
>
> --
> sane-devel mailing list: sane-devel at alioth-lists.debian.net
> https://alioth-lists.debian.net/cgi-bin/mailman/listinfo/sane-devel
> Unsubscribe: Send mail with subject "unsubscribe your_password"
> to sane-devel-request at lists.alioth.debian.org
> --
> sane-devel mailing list: sane-devel at alioth-lists.debian.net
> https://alioth-lists.debian.net/cgi-bin/mailman/listinfo/sane-devel
> Unsubscribe: Send mail with subject "unsubscribe your_password"
>              to sane-devel-request at lists.alioth.debian.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://alioth-lists.debian.net/pipermail/sane-devel/attachments/20190730/4132b3aa/attachment.html>


More information about the sane-devel mailing list