Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PERMANENT DRAFT: TF grouped convolutions check #1183

Closed
wants to merge 8 commits into from

Conversation

felixT2K
Copy link
Contributor

@felixT2K felixT2K commented Apr 25, 2023

This PR:

  • only a test PR at the moment
  • fix for TF mobilenet grouped convolutions issue
    tensorflow.python.framework.errors_impl.InvalidArgumentError: {{function_node __wrapped__Conv2DBackpropInput_device_/job:localhost/replica:0/task:0/device:CPU:0}} Gradients for grouped convolutions are not supported on CPU. Please file a feature request if you run into this issue. Computed input depth 576 doesn't match filter input depth 1 [Op:Conv2DBackpropInput]

The big disadvantage there would be, that we would need to retrain all models which uses mobilenet as backbone and the classification models itself

The export for TF mobilenet from #1182 still works with the changes from this PR

python3 /home/felix/Desktop/doctr/references/classification/train_tensorflow.py mobilenet_v3_small --pretrained

WARNING:tensorflow:Detecting that an object or model or tf.train.Checkpoint is being deleted with unrestored values. See the following logs for the specific values in question. To silence these warnings, use `status.expect_partial()`. See https://www.tensorflow.org/api_docs/python/tf/train/Checkpoint#restorefor details about the status object returned by the restore function.
WARNING:tensorflow:Detecting that an object or model or tf.train.Checkpoint is being deleted with unrestored values. See the following logs for the specific values in question. To silence these warnings, use `status.expect_partial()`. See https://www.tensorflow.org/api_docs/python/tf/train/Checkpoint#restorefor details about the status object returned by the restore function.
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).layer_with_weights-1.block.layer_with_weights-0.kernel
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).layer_with_weights-1.block.layer_with_weights-0.kernel
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).layer_with_weights-2.block.layer_with_weights-2.kernel
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).layer_with_weights-2.block.layer_with_weights-2.kernel
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).layer_with_weights-3.block.layer_with_weights-2.kernel
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).layer_with_weights-3.block.layer_with_weights-2.kernel
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).layer_with_weights-4.block.layer_with_weights-2.kernel
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).layer_with_weights-4.block.layer_with_weights-2.kernel
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).layer_with_weights-5.block.layer_with_weights-2.kernel
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).layer_with_weights-5.block.layer_with_weights-2.kernel
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).layer_with_weights-6.block.layer_with_weights-2.kernel
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).layer_with_weights-6.block.layer_with_weights-2.kernel
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).layer_with_weights-7.block.layer_with_weights-2.kernel
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).layer_with_weights-7.block.layer_with_weights-2.kernel
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).layer_with_weights-8.block.layer_with_weights-2.kernel
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).layer_with_weights-8.block.layer_with_weights-2.kernel
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).layer_with_weights-9.block.layer_with_weights-2.kernel
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).layer_with_weights-9.block.layer_with_weights-2.kernel
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).layer_with_weights-10.block.layer_with_weights-2.kernel
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).layer_with_weights-10.block.layer_with_weights-2.kernel
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).layer_with_weights-11.block.layer_with_weights-2.kernel
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).layer_with_weights-11.block.layer_with_weights-2.kernel

@frgfm What do you think makes it sense in that case of retraining all the stuff to fix this issue ?

And the more related question @olivmindee @charlesmindee
Could you retrain the models ? Especially the rotation classification model and the detection/recogition ones depends on your datasets (computation power would not be a problem on my side but the data is 😅 )

@felixdittrich92 felixdittrich92 added type: bug Something isn't working help wanted Extra attention is needed module: models Related to doctr.models framework: tensorflow Related to TensorFlow backend awaiting response Waiting for feedback topic: character classification Related to the task of character classification labels Apr 25, 2023
@felixT2K
Copy link
Contributor Author

felixT2K commented Apr 26, 2023

ok SeperableConv2D and DepthwiseConv2D does not support different row/col stride values currently which would break the rectangular pooling implementations 😩

An integer or tuple/list of 2 integers, specifying the strides of the convolution along the height and width. Can be a single integer to specify the same value for all spatial dimensions. Current implementation only supports equal length strides in the row and column dimensions. Specifying any stride value != 1 is incompatible with specifying any dilation_rate value != 1.

@felixT2K
Copy link
Contributor Author

Maybe we can keep this as Draft and check if the next TF release brings a fix

@frgfm
Copy link
Collaborator

frgfm commented Apr 29, 2023

Thanks for the suggestion Felix! Yes as you mentioned, I think this is a risky move 😅
I used to ping the TF team regularly to fix this, but it hasn't helped so far. It's a shame, you can always expect differences of support between frameworks but grouped convolutions ... the TF team seems to have too much stuff to handle at the moment so it's complicated to get a time estimate :/

@felixT2K
Copy link
Contributor Author

Thanks for the suggestion Felix! Yes as you mentioned, I think this is a risky move sweat_smile I used to ping the TF team regularly to fix this, but it hasn't helped so far. It's a shame, you can always expect differences of support between frameworks but grouped convolutions ... the TF team seems to have too much stuff to handle at the moment so it's complicated to get a time estimate :/

Yeah lets keep this Draft and i will try to check after each TF release if it is maybe fixed 😅

@felixT2K felixT2K force-pushed the fix-grouped-convolutions-issue branch from cce9376 to fc759fd Compare April 29, 2023 14:16
@felixT2K felixT2K force-pushed the fix-grouped-convolutions-issue branch from fc759fd to b06df1a Compare May 23, 2023 10:07
@felixdittrich92 felixdittrich92 changed the title DRAFT: Fix for grouped convolutions PERMANENT DRAFT: TF grouped convolutions check Jul 5, 2023
@felixT2K felixT2K force-pushed the fix-grouped-convolutions-issue branch from b06df1a to 177d28e Compare July 10, 2023 14:29
@felixdittrich92 felixdittrich92 removed help wanted Extra attention is needed awaiting response Waiting for feedback labels Jul 22, 2023
@felixT2K felixT2K force-pushed the fix-grouped-convolutions-issue branch from 177d28e to 68f12d4 Compare August 25, 2023 12:27
@codecov
Copy link

codecov bot commented Aug 25, 2023

Codecov Report

Merging #1183 (62976f1) into main (3deac68) will decrease coverage by 0.02%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##             main    #1183      +/-   ##
==========================================
- Coverage   95.78%   95.76%   -0.02%     
==========================================
  Files         154      154              
  Lines        6903     6903              
==========================================
- Hits         6612     6611       -1     
- Misses        291      292       +1     
Flag Coverage Δ
unittests 95.76% <ø> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

see 1 file with indirect coverage changes

@felixT2K felixT2K closed this Apr 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
framework: tensorflow Related to TensorFlow backend module: models Related to doctr.models topic: character classification Related to the task of character classification type: bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants