Skip to content

Latest commit

 

History

History
178 lines (117 loc) · 4.32 KB

INSTALL.md

File metadata and controls

178 lines (117 loc) · 4.32 KB

Installation Instructions

Docker Quickstart

To get the tool up and running in a docker container:

git clone https://github.com/UB-Mannheim/ocr-gt-tools
cd ocr-gt-tools
./dev/run-docker.sh <path-to-images> <path-to-corrections>

The first time you run this, it will download the docker image and run an Apache server in the container with all the configuration taken care of.

Navigate to http://localhost:8888/ocr-gt to use it.

Install dependencies

Install Debian packages (for other distros, YMMV).

make apt-get

(See dev/debian.mk)

Install current Git revisions of hocr-tools and ocropus:

make vendor

Create configuration

Copy the default configuration and shorten/edit as needed:

cp dist/ocr-gt-tools.default.yml dist/ocr-gt-tools.yml

Deploy on a server

On Apache

  • Enable CGI on Apache
sudo a2enmod cgi
  • Deploy to Apache document folder:
make deploy

(See dev/apache.mk)

This will recreate out-of-date files in ./dist, create a folder $APACHE_BASEURL in $APACHE_DIR and copy all the files from ./dist to $APACHE_DIR/$APACHE_BASEURL using sudo with user $APACHE_USER.

Deployment can be customized with four environment variables, the default is:

make APACHE_USER=www-data APACHE_GROUP=www-data APACHE_DIR=/var/www/html APACHE_BASEURL=ocr-gt deploy
  • Make sure scripts ending in .cgi are executable in the $APACHE_DIR/$APACHE_BASEURL folder:
$ sudo $EDITOR /etc/apache2/sites-available/000-default.conf
    <Directory "/var/www/html/ocr-gt">
        Options +ExecCGI
        AddHandler cgi-script .cgi
    </Directory>
sudo -u www-data cp dist/ocr-gt-tools.default.yml $APACHE_DIR/$APACHE_BASEURL/ocr-gt-tools.yml
# "sudo $EDITOR $APACHE_DIR/$APACHE_BASEURL/ocr-gt-tools.yml" as needed
  • Restart apache
sudo systemctl restart apache2

The web application will be available under http://localhost/ocr-gt.

Docker

docker run -t -p kbai/ocr-gt-tools

The server is available on port 9090.

Bundled standalone server

For development and quick experimentation, we ship a standalone server, wrapping the CGI in a Plack app:

make dev-server

Testing the server

Navigate to http://localhost:9090/index.html.

Drop a file, such as this thumbnail onto the document.

Do some transliterating and commenting.

Click "Speichern".

Checkout the contents of ./example/ocr-corrections/.

Developing the frontend

Install the development dependencies: The npm package (which pulls in nodejs) and some nodejs-based tools:

make dev-apt-get

Then npm to bootstrap the tools for building HTML from Pug, CSS from LESS etc. and to install the frontend assets:

npm install

After changing CSS/Javascript, make sure to regenerate the dist folder:

make dist

This will

  • Download web fonts to ./dist/fonts/ and generate a matching CSS file in ./dist/css/
  • copy all CSS stylesheets to ./dist/css/ and minify them to ./dist/style.css
  • copy all JS scripts to ./dist/js/ and minify them, in the right order, to ./dist/script.js with source map

Perl

For local tests in Windows I use Strawberry Perl.

The scripts used the following perl modules. You can download them from cpan.

  • CGI
  • CGI::Carp
  • JSON
  • Config::IniFiles

Log-Files / Error-Log-Files

Infos from perlscript ocr-gt-tools.cgi are stored in log/ocr-gt-tools.log

Debug log can be written on stderr or in log/ocr-gt-tools.log, default is stderr. If you wish to become debug log in log/ocr-gt-tools.log, please edit ocr-gt-tools.yml and set logging:stderr to false.