GitHub - drAbreu/soda-seq2seq: Generating bio NER and ER at scale using text generation

.env file for this repo:

    SEQ2SEQ_MODEL_PATH="./seq2seq_models"
    CACHE="./cache"
    RUNS_DIR="./runs"
    RUNTIME=""

The way to install this repo is as with any other venv repo.

    git clone https://github.com/drAbreu/soda-seq2seq.git
    python3 -m venv .venv
    source .venv/bin/activate
    pip install --upgrade pip
    pip install -r requirements.txt

Training the model from a model in the 🤗 Hub. At the moment only the BART and T5 models are supported

  python -m cli.seq2seq_train ./data/sd-seq2seq-clean.csv \
  --from_pretrained t5-base \
  --task "Causal hypothesis: " \
  --delimiter "###tt9HHSlkWoUM###" \
  --skip_lines 0 \
  --eval_steps 500 \
  --logging_steps 50 \
  --num_train_epochs 10 \
  --learning_rate 7.5e-5 \
  --lr_scheduler_type 'cosine' \
  --warmup_steps 500

Training the model beginning from a locally stored checkpoint.

  python -m cli.seq2seq_train ./data/sd-seq2seq-clean.csv \
  --from_local_checkpoint ./seq_2seq_models/checkpoint-5000 \
  --base_model facebook/bart-base \
  --task "Causal hypothesis: " \
  --delimiter "###tt9HHSlkWoUM###" \
  --skip_lines 0 \
  --eval_steps 500 \
  --logging_steps 50 \
  --num_train_epochs 12 \
  --lr_scheduler_type 'cosine' \
  --warmup_steps 500 \
  --do_train \
  --do_eval

Testing the results of a model

  python -m cli.seq2seq_train ./data/sd-seq2seq-clean.csv \
  --from_local_checkpoint ./seq2seq_models/best-bart-base \
  --base_model facebook/bart-base \
  --task "Causal hypothesis: " \
  --delimiter "###tt9HHSlkWoUM###" \
  --do_predict
  
    python -m cli.seq2seq_train ./data/sd-seq2seq-clean.csv \
  --from_local_checkpoint ./seq2seq_models/best-t5-base \
  --base_model t5-base \
  --task "Causal hypothesis: " \
  --delimiter "###tt9HHSlkWoUM###" \
  --do_predict \
  --top_k 25 \
  --top_p 0.9
  
    python -m cli.seq2seq_train ./data/sd-seq2seq-clean.csv \
  --from_local_checkpoint ./seq2seq_models/best_bert_large \
  --base_model facebook/bart-large \
  --task "Causal hypothesis: " \
  --delimiter "###tt9HHSlkWoUM###" \
  --do_predict

Name		Name	Last commit message	Last commit date
Latest commit History 91 Commits
cli		cli
seq2seq		seq2seq
.gitignore		.gitignore
README.md		README.md
__init__.py		__init__.py
blurb_benchmark.py		blurb_benchmark.py
callbacks.py		callbacks.py
data_classes.py		data_classes.py
data_collator.py		data_collator.py
data_utils.py		data_utils.py
main.py		main.py
metrics.py		metrics.py
requirements.txt		requirements.txt
setup_logger.py		setup_logger.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

drAbreu/soda-seq2seq

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages