extractoAPI is a tool to run on a GPU an algorithm using a model for element extraction in images.
extractorAPI retrieves images from a IIIF manifest or a list of manifests and uses a vision model to extract objects from images and return annotations in text files.
- Sudo privileges
- Git
- Python: 3.10
sudo apt-get install redis-server python3-venv python3-dev
git clone https://github.com/jnorindr/extractorAPI
cd extractorAPI
python3.10 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
Set your app and database variables:
DB_NAME="<db-name>"
APP_NAME="<front-app-name-that-will-access-api>"
APP_KEY="$(openssl rand -base64 32 | tr -d '/\n')"
Create a SQLite database at the root of the API repository to store API keys
sqlite3 $DB_NAME.db <<EOF
CREATE TABLE apps (
id INTEGER PRIMARY KEY AUTOINCREMENT,
app_name CHAR(50) NOT NULL,
app_key CHAR(80) NOT NULL
);
EOF
Add app name and key to the database
sqlite3 $DB_NAME.db <<EOF
INSERT INTO apps (app_name, app_key) VALUES ('$APP_NAME', '$APP_KEY');
EOF
Show content of the apps
table:
sqlite3 -header -column $DB_NAME.db <<EOF
SELECT * FROM apps;
EOF
Copy the content of the template file
cp .env{.template,}
Change the content according to your Celery backend, client app and API keys database
CELERY_BROKER_URL="redis://localhost:<redis-port>" # default port: 6379
API_PORT=<api-port> # default port: 5000
DEBUG=True # False for production
CLIENT_APP_URL="<url-of-front-app-connected-to-API>"
DB_NAME="<db-name-without-extension>"
If you want to use the API for training and use Comet as a tracker, add to your .env
:
COMET_API_KEY=<comet-API-key>
COMET_PROJECT_NAME=<project-name>
⚠️ Be sure to not override a previously defined redis password
Get the path of Redis config file
REDIS_CONF=$(redis-cli INFO | grep config_file | awk -F: '{print $2}' | tr -d '[:space:]')
Generate a password
REDIS_PSW="$(openssl rand -base64 32 | tr -d '/\n')"
Update the redis configuration
sudo sed -i -e "s/^requirepass [^ ]*/requirepass $REDIS_PSW/" "$REDIS_CONF"
sudo sed -i -e "s/# requirepass [^ ]*/requirepass $REDIS_PSW/" "$REDIS_CONF"
Update the CELERY_BROKER_URL
inside the .env
file:
sed -i '' -e "s~^CELERY_BROKER_URL=.*~CELERY_BROKER_URL=\"redis://:$REDIS_PSW@localhost:6379\"~" .env
Restart Redis
sudo systemctl restart redis-server
Test the password
redis-cli -a $REDIS_PSW
If not already, start redis:
sudo systemctl start redis
Launch Celery
celery -A app.app.celery worker -B -c 1 --loglevel=info -P threads
Run the app
python run.py
Or run everything at once:
bash run.sh
# Choose app to use for request
APP_NAME="<your_app_name>"
# Load environment variables ($DB_NAME and $API_PORT)
source .env
# Get API_KEY
API_KEY=$(sqlite3 $DB_NAME.db <<EOF
SELECT app_key FROM apps WHERE app_name = '$APP_NAME';
EOF
)
One manifest
curl -X POST -H "X-API-Key: $API_KEY" -F manifest_url='<url-manifest>' http://127.0.0.1:$API_PORT/run_detect
Manifest list in a text file
curl -X POST -H "X-API-Key: $API_KEY" -F url_file=@iiif/test-manifests.txt http://127.0.0.1:$API_PORT/detect_all
To use a different model from the default model
curl -X POST -H "X-API-Key: $API_KEY" -F model='<model-filename>' manifest_url='<url-manifest>' http://127.0.0.1:$API_PORT/run_detect
Get the list of available extraction models filenames
curl http://127.0.0.1:$API_PORT/models
Compute similarity scores for pairs of documents (here: (doc1,doc1)
, (doc1,doc2)
, (doc1,doc3)
, (doc2,doc2)
, (doc2,doc3)
, (doc3,doc3)
)
curl -X POST -H "X-API-Key: $API_KEY" -H "Content-Type: application/json" -d '{
"documents": {
"doc1_id": "doc1_url",
"doc2_id": "doc2_url",
"doc3_id": "doc3_url"
}
}' http://127.0.0.1:$API_PORT/run_similarity
Choose the backbone model for feature extraction (between: resnet34
, moco_v2_800ep_pretrain
, dino_deitsmall16_pretrain
, dino_vitbase8_pretrain
)
curl -X POST -H "X-API-Key: $API_KEY" -H "Content-Type: application/json" -d '{
"documents": {"doc1_id": "doc1_url"},
"model": "dino_vitbase8_pretrain"
}' http://127.0.0.1:$API_PORT/run_similarity