• thevoidzero@lemmy.world
    link
    fedilink
    arrow-up
    12
    ·
    3 months ago

    For the OCR, have you tried tesseract? For printed documents it can take image input and generate a pdf with selectable text. I don’t OCR much but it has been useful when I tried a few times.

    You might be able to have a script that takes the scanner input into tesseract and output a pdf. It only works on a single image per run so I had to make script to run it on whole pdf by separating it and stitching it back together.