Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[references][tf/pt] Add early stopping callback #924

Closed
felixdittrich92 opened this issue May 20, 2022 · 5 comments · Fixed by #1397
Closed

[references][tf/pt] Add early stopping callback #924

felixdittrich92 opened this issue May 20, 2022 · 5 comments · Fixed by #1397
Labels
ext: references Related to references folder framework: pytorch Related to PyTorch backend framework: tensorflow Related to TensorFlow backend topic: character classification Related to the task of character classification topic: text detection Related to the task of text detection topic: text recognition Related to the task of text recognition type: enhancement Improvement

Comments

@felixdittrich92
Copy link
Contributor

🚀 The feature

Add EarlyStopping callback in references for both frameworks

Motivation, pitch

To make it easier for users to choose some hyperparameters i would suggest that we add a EarlyStopping callback this would be also useful if you train your model remote (for example AWS)

Alternatives

No response

Additional context

No response

@felixdittrich92 felixdittrich92 added type: enhancement Improvement ext: references Related to references folder framework: pytorch Related to PyTorch backend framework: tensorflow Related to TensorFlow backend topic: text detection Related to the task of text detection topic: text recognition Related to the task of text recognition topic: character classification Related to the task of character classification labels May 20, 2022
@felixdittrich92 felixdittrich92 added this to the 1.0.0 milestone May 20, 2022
@SkaarFacee
Copy link
Contributor

This is something I would like to work on, could I get more details regarding this ? I haven't quite understood the AWS part

@felixT2K
Copy link
Contributor

felixT2K commented Nov 1, 2023

Sure :)

AWS was just one example.
The main idea behind it is to stop the training early if there is no improvement after n epochs.
With this knowledge, the user would be able, for example, to automatically stop an AWS instance as soon as the training is finished or to be informed somehow.

Each training script in references should contain a functionality of early stopping
(every inner folder contains a utils.py file so this would be the place to add the function / class)

EarlyStopping Reference: https://github.com/Bjarten/early-stopping-pytorch/blob/master/pytorchtools.py

Requirements:
The implementation should not depend on TensorFlow or PyTorch because we want to use this implementation in each training script (TF and PT)

Example:

for epoch in mb:
      fit_one_epoch(model, train_loader, batch_transforms, optimizer, scheduler, mb, amp=args.amp)
      # Validation loop at the end of each epoch
      val_loss, exact_match, partial_match = evaluate(model, val_loader, batch_transforms, val_metric, amp=args.amp)
      if val_loss < min_loss:
          print(f"Validation loss decreased {min_loss:.6} --> {val_loss:.6}: saving state...")
          torch.save(model.state_dict(), f"./{exp_name}.pt")
          min_loss = val_loss
      mb.write(
          f"Epoch {epoch + 1}/{args.epochs} - Validation loss: {val_loss:.6} "
          f"(Exact: {exact_match:.2%} | Partial: {partial_match:.2%})"
      )
      # W&B
      if args.wb:
          wandb.log(
              {
                  "val_loss": val_loss,
                  "exact_match": exact_match,
                  "partial_match": partial_match,
              }
          )
       if args.early_stop and early_stopping.update(val_loss):  # <---------------- This (stop training)
         break

@felixdittrich92
Copy link
Contributor Author

@SkaarFacee any updates ? :)

@SkaarFacee
Copy link
Contributor

@SkaarFacee any updates ? :)

I have made some progress, but had to stop midway due to work commitments. Over the following weekend I hope to solve the issue

@felixdittrich92
Copy link
Contributor Author

@SkaarFacee any updates ? :)

I have made some progress, but had to stop midway due to work commitments. Over the following weekend I hope to solve the issue

Sounds good ☺️

@felixdittrich92 felixdittrich92 linked a pull request Dec 4, 2023 that will close this issue
@felixdittrich92 felixdittrich92 removed this from the 1.0.0 milestone Oct 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ext: references Related to references folder framework: pytorch Related to PyTorch backend framework: tensorflow Related to TensorFlow backend topic: character classification Related to the task of character classification topic: text detection Related to the task of text detection topic: text recognition Related to the task of text recognition type: enhancement Improvement
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants