Skip to content

Commit

Permalink
Use different japanese train files for tesseract
Browse files Browse the repository at this point in the history
They seem to work better as suggested here:
tesseract-ocr/tessdata#119

Refs: #973
  • Loading branch information
eikek committed Aug 13, 2021
1 parent f79aa44 commit 326cf1c
Showing 1 changed file with 6 additions and 0 deletions.
6 changes: 6 additions & 0 deletions docker/dockerfiles/joex.dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -63,6 +63,12 @@ RUN wget ${joex_url:-https://github.com/eikek/docspell/releases/download/v$versi
rm docspell-joex-*.zip && \
ln -snf docspell-joex-* docspell-joex

# Using these data files for japanese, because they work better. See #973
RUN \
wget https://raw.githubusercontent.com/tesseract-ocr/tessdata_fast/master/jpn_vert.traineddata && \
wget https://raw.githubusercontent.com/tesseract-ocr/tessdata_fast/master/jpn.traineddata && \
mv jpn*.traineddata /usr/share/tessdata

COPY joex-entrypoint.sh /opt/joex-entrypoint.sh

ENTRYPOINT ["/opt/joex-entrypoint.sh", "-J-XX:+UseG1GC"]
Expand Down

0 comments on commit 326cf1c

Please sign in to comment.