Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[To keep] -- [build] tf upgrade by keeping keras v2 #1542

Merged
merged 18 commits into from
Oct 7, 2024

Conversation

felixdittrich92
Copy link
Contributor

@felixdittrich92 felixdittrich92 commented Apr 10, 2024

This PR:

  • upgrade TF by staying at keras v2

@odulcy-mindee This would work if we want to upgrade tf by keeping keras at version 2 without the need to change the current model saving/loading / refactoring everything (would be only a temp fix)

I'm super unhappy with this PR but i don't see an alternative atm 😓

@felixdittrich92 felixdittrich92 added topic: build Related to dependencies and build topic: ci Related to CI framework: tensorflow Related to TensorFlow backend labels Apr 15, 2024
Copy link
Collaborator

@odulcy-mindee odulcy-mindee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you set TF_USE_LEGACY_KERAS=1 everywhere as it is the default behavior of your PR ?

pyproject.toml Outdated Show resolved Hide resolved
@felixdittrich92
Copy link
Contributor Author

felixdittrich92 commented Apr 18, 2024

Why did you set TF_USE_LEGACY_KERAS=1 everywhere as it is the default behavior of your PR ?

The behavior of this env variable is super strange:

By installing TF>= 2.16 it installs automatically keras>=3.0.0 so tf.keras is additonal installed (to keep keras v2 backwards compatibility with tf >= 2.16)

I'm not sure why it works for everything ..without the latency scripts there it jumps all the time to keras 3 (also on the CI job) without setting TF_USE_LEGACY_KERAS=1 in front of your script - which leads to errors

@felixdittrich92 felixdittrich92 added this to the 0.9.0 milestone Apr 24, 2024
@felixdittrich92 felixdittrich92 self-assigned this Apr 24, 2024
@felixdittrich92
Copy link
Contributor Author

@odulcy-mindee Now it works

  • Added as function instead of os.environ .. in every script which makes it easier to remove later on if we have a solution with keras v3

@felixdittrich92 felixdittrich92 changed the title [DRAFT] tf upgrade by keeping keras v2 [build] tf upgrade by keeping keras v2 Apr 24, 2024
@felixdittrich92 felixdittrich92 marked this pull request as ready for review April 24, 2024 09:55
Copy link

codecov bot commented Apr 24, 2024

Codecov Report

Attention: Patch coverage is 94.28571% with 2 lines in your changes missing coverage. Please review.

Project coverage is 96.46%. Comparing base (dccc26b) to head (9940348).
Report is 3 commits behind head on main.

Files with missing lines Patch % Lines
doctr/file_utils.py 71.42% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1542      +/-   ##
==========================================
+ Coverage   96.39%   96.46%   +0.07%     
==========================================
  Files         164      164              
  Lines        7823     7869      +46     
==========================================
+ Hits         7541     7591      +50     
+ Misses        282      278       -4     
Flag Coverage Δ
unittests 96.46% <94.28%> (+0.07%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why no more tf.function ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before it was disabled globally so no tf.function has had any effect
the functions where the tf.function annotation is removed makes use of numpy which is not compatible with the compilation (which is triggered for the tf.function annotated functions)

@felixdittrich92 felixdittrich92 marked this pull request as draft April 26, 2024 16:18
@felixdittrich92
Copy link
Contributor Author

On hold

@felixdittrich92 felixdittrich92 changed the title [build] tf upgrade by keeping keras v2 [To keep] -- [build] tf upgrade by keeping keras v2 May 15, 2024
@felixdittrich92 felixdittrich92 modified the milestones: 0.9.0, 1.0.0, 0.10.0 Jun 6, 2024
@felixdittrich92
Copy link
Contributor Author

@odulcy-mindee Our fallback if we don't have a working version with keras v3 end of the year :)

@felixdittrich92
Copy link
Contributor Author

@odulcy-mindee After digging (again) a bit deeper into the keras v3 upgrade:

  • Mostly every model would require a re-training (sar_resnet31 complete broken - keras v3 LSTMCell bug)
  • Regression bug for LSTM's which makes it not possible to load from the keras v2 saved weights
  • Most of our models have to run in eager mode but keras v3 ignores tf.config.run_functions_eagerly(True) completly on GPU 😅
  • By default keras v3 uses jit compilation on GPU anyway disabling this takes no effect (on CPU all models expect sar_resnet31 can be trained and loaded afterwards without trouble - on GPU only a few models because jit raises issues)

So i would suggest to stick with the solution of the PR ftm. wdyt ?

)


def ensure_keras_v2() -> None: # pragma: no cover
Copy link
Contributor Author

@felixdittrich92 felixdittrich92 Oct 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@odulcy-mindee
The breaking point here is that doctr needs to be imported before tensorflow:

two options:

  • we keep the ensure_keras_v2 to make it explicit
  • we import doctr (in the training / eval scripts) first and ignore ruff's (isort)

What would you prefer ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep ensure_keras_v2 function, because option 2 will be a hell to maintain in my opinion

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes i agree :D
@odulcy-mindee btw could you check your slack ? ^^

@odulcy-mindee
Copy link
Collaborator

@felixdittrich92

  1. When you write "keras v3 LSTMCell bug", do you mean is it a known bug or does it have a different behavior than before ?
  2. About the jit error, is it related to what has been written here ? Keras 2 <> Keras 3 incompatibilities keras-team/keras#18467

Keras 3 has jit_compile=True by default -- this might not work with all TF ops, so with some custom models/layers you might have set jit_compile=False if you see an XLA related error.

If you don't have any solution, let's keep the PR likes that

@felixdittrich92
Copy link
Contributor Author

@felixdittrich92

  1. When you write "keras v3 LSTMCell bug", do you mean is it a known bug or does it have a different behavior than before ?
  2. About the jit error, is it related to what has been written here ? Keras 2 <> Keras 3 incompatibilities keras-team/keras#18467

Keras 3 has jit_compile=True by default -- this might not work with all TF ops, so with some custom models/layers you might have set jit_compile=False if you see an XLA related error.

If you don't have any solution, let's keep the PR likes that

1.It's a bug but i need to open a ticket for this (not done yet) get_initial_state is available and takes the same args but if you pass anything it raises an error
2. Correct :) but disabling on the keras.Model with self.jit_compile = False does exactly nothing 😓

@felixdittrich92 felixdittrich92 marked this pull request as ready for review October 4, 2024 17:10
@felixdittrich92
Copy link
Contributor Author

@odulcy-mindee
Then i would say ready to review 😅

@felixdittrich92
Copy link
Contributor Author

felixdittrich92 commented Oct 4, 2024

We will move to keras v3 at any time but it's really hard and i spent already a lot (to much 😅) time on it
But i will come up with some PR's which prepares further for v3 but will work also with v2

@felixdittrich92
Copy link
Contributor Author

Closes: #1673

Copy link
Collaborator

@odulcy-mindee odulcy-mindee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, let's keep it like that, thank you !

@felixdittrich92 felixdittrich92 merged commit 90c3fff into mindee:main Oct 7, 2024
80 of 81 checks passed
@felixdittrich92 felixdittrich92 deleted the tf-update-keras-v2 branch October 7, 2024 09:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
framework: tensorflow Related to TensorFlow backend topic: build Related to dependencies and build topic: ci Related to CI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants