[sane-devel] Announcing new GUI frontend Sanescan

Povilas Kanapickas povilas at radix.lt
Mon Jul 18 00:35:08 BST 2022


Sanescan is yet another GUI scanning and OCR frontend that uses SANE as
its backend.

The reason for its existence is that none of the current open-source
scanning and OCR applications actually do OCR well enough that text
selection works reliably when viewing the produced PDF document.
Many of the tools work well with simple cases like single-column book
pages, but something like multi-columnar newspaper or an invoice with
tables often results in multi-line selection taking characters from all
over the page.

The work on Sanescan project has already resulted in improvements in
Tesseract OCR engine itself
(https://github.com/tesseract-ocr/tesseract/pull/3787), which
demonstrates that Sanescan can be more than another frontend for
Tesseract OCR and actually improve the state of the art of open source
scanning and OCR experience.

Currently Sanescan is of beta quality. The GUI application lacks polish
so it's not yet recommended to end users. This post is only to raise
awareness among the developers. However, the application already
produces PDF documents that are in certain areas significantly better
than all open source alternatives, so the potential is there.

The code currently lives at my personal GitHub
https://github.com/p12tic/sanescan, but the plan is to move the code
somewhere under the SANE project (e.g. under the "frontend" group in
SANE GitLab https://gitlab.com/sane-project/frontend).

Short term plan is to focus on usability and full feature parity with
all other open source OCR applications and then do a proper 1.0 release.
Long term plan is to extract full OCR processing pipeline and provide it
as a library to all third-party applications. Another long term goal is
to introduce additional features (such as film scanning) that would make
Sanescan the primary open-source choice to do all things to related to
scanning. The last goal in particular is still many months away.

This project has received a grant by the NLnet foundation and their NGI0
Discovery fund to improve open source scanning and OCR capabilities.
Huge thanks to them!

Regards,
Povilas Kanapickas



More information about the sane-devel mailing list