As of Recoll 1.43.3, the alternate Python rclimg.py and rclimgp.py handlers can execute OCR on image files.
The default image handler is the Perl/exiftool-based rclimg script which has not been extended for running OCR, but it is quite better for extracting metadata tags.
The difference between the two alternate handlers is that rclimg.py uses the Python exiv2 interface and rclimgp.py actually executes the Perl handler for extracting tags. On Windows only the latter is supported.
For performing image OCR, you need to tell Recoll to use the alternate handler and
also to enable OCR by setting the imgocr variable. This is done by
editing two text files in the index configuration directory (by default
~/.recoll under Unix-like systems or MacOS systems, or
C:/users/[you]/AppData/Local/Recoll under Windows.
Example configuration for Unix-like systems or MacOS systems:
In mimeconf
(e.g. ~/.recoll/mimeconf):
[index] image/gif = execm rclimg.py image/jp2 = execm rclimg.py image/jpeg = execm rclimg.py image/png = execm rclimg.py image/tiff = execm rclimg.py image/x-nikon-nef = execm rclimg.py image/x-xcf = execm rclimg.py
Of course you can also only use a subset of the image types, and the rclimgp.py handler could be used instead.
On Windows you must use rclimgp.py (because there is no Recoll support for the exiv2 library):
[index] image/gif = execm rclimgp.py image/jp2 = execm rclimgp.py image/jpeg = execm rclimgp.py image/png = execm rclimgp.py image/tiff = execm rclimgp.py image/x-nikon-nef = execm rclimgp.py image/x-xcf = execm rclimgp.py
In recoll.conf, by example for Windows:
ocrprogs = tesseract tesseractcmd = C:/Program Files(x86)/Tesseract-OCR/tesseract.exe tesseractlang = eng [c:/path/to/my/images/folder] imgocr = 1
the tesseractcmd setting is usually not needed on other platforms (the
command will be in the PATH)

