PERMANENT DRAFT: TF grouped convolutions check #1183

felixT2K · 2023-04-25T10:40:01Z

This PR:

only a test PR at the moment
fix for TF mobilenet grouped convolutions issue
tensorflow.python.framework.errors_impl.InvalidArgumentError: {{function_node __wrapped__Conv2DBackpropInput_device_/job:localhost/replica:0/task:0/device:CPU:0}} Gradients for grouped convolutions are not supported on CPU. Please file a feature request if you run into this issue. Computed input depth 576 doesn't match filter input depth 1 [Op:Conv2DBackpropInput]

The big disadvantage there would be, that we would need to retrain all models which uses mobilenet as backbone and the classification models itself

The export for TF mobilenet from #1182 still works with the changes from this PR

python3 /home/felix/Desktop/doctr/references/classification/train_tensorflow.py mobilenet_v3_small --pretrained

WARNING:tensorflow:Detecting that an object or model or tf.train.Checkpoint is being deleted with unrestored values. See the following logs for the specific values in question. To silence these warnings, use `status.expect_partial()`. See https://www.tensorflow.org/api_docs/python/tf/train/Checkpoint#restorefor details about the status object returned by the restore function.
WARNING:tensorflow:Detecting that an object or model or tf.train.Checkpoint is being deleted with unrestored values. See the following logs for the specific values in question. To silence these warnings, use `status.expect_partial()`. See https://www.tensorflow.org/api_docs/python/tf/train/Checkpoint#restorefor details about the status object returned by the restore function.
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).layer_with_weights-1.block.layer_with_weights-0.kernel
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).layer_with_weights-1.block.layer_with_weights-0.kernel
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).layer_with_weights-2.block.layer_with_weights-2.kernel
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).layer_with_weights-2.block.layer_with_weights-2.kernel
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).layer_with_weights-3.block.layer_with_weights-2.kernel
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).layer_with_weights-3.block.layer_with_weights-2.kernel
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).layer_with_weights-4.block.layer_with_weights-2.kernel
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).layer_with_weights-4.block.layer_with_weights-2.kernel
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).layer_with_weights-5.block.layer_with_weights-2.kernel
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).layer_with_weights-5.block.layer_with_weights-2.kernel
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).layer_with_weights-6.block.layer_with_weights-2.kernel
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).layer_with_weights-6.block.layer_with_weights-2.kernel
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).layer_with_weights-7.block.layer_with_weights-2.kernel
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).layer_with_weights-7.block.layer_with_weights-2.kernel
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).layer_with_weights-8.block.layer_with_weights-2.kernel
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).layer_with_weights-8.block.layer_with_weights-2.kernel
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).layer_with_weights-9.block.layer_with_weights-2.kernel
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).layer_with_weights-9.block.layer_with_weights-2.kernel
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).layer_with_weights-10.block.layer_with_weights-2.kernel
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).layer_with_weights-10.block.layer_with_weights-2.kernel
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).layer_with_weights-11.block.layer_with_weights-2.kernel
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).layer_with_weights-11.block.layer_with_weights-2.kernel

@frgfm What do you think makes it sense in that case of retraining all the stuff to fix this issue ?

And the more related question @olivmindee @charlesmindee
Could you retrain the models ? Especially the rotation classification model and the detection/recogition ones depends on your datasets (computation power would not be a problem on my side but the data is 😅 )

felixT2K · 2023-04-26T07:06:27Z

ok SeperableConv2D and DepthwiseConv2D does not support different row/col stride values currently which would break the rectangular pooling implementations 😩

An integer or tuple/list of 2 integers, specifying the strides of the convolution along the height and width. Can be a single integer to specify the same value for all spatial dimensions. Current implementation only supports equal length strides in the row and column dimensions. Specifying any stride value != 1 is incompatible with specifying any dilation_rate value != 1.

felixT2K · 2023-04-27T09:08:15Z

Maybe we can keep this as Draft and check if the next TF release brings a fix

frgfm · 2023-04-29T09:15:06Z

Thanks for the suggestion Felix! Yes as you mentioned, I think this is a risky move 😅
I used to ping the TF team regularly to fix this, but it hasn't helped so far. It's a shame, you can always expect differences of support between frameworks but grouped convolutions ... the TF team seems to have too much stuff to handle at the moment so it's complicated to get a time estimate :/

felixT2K · 2023-04-29T12:23:29Z

Thanks for the suggestion Felix! Yes as you mentioned, I think this is a risky move sweat_smile I used to ping the TF team regularly to fix this, but it hasn't helped so far. It's a shame, you can always expect differences of support between frameworks but grouped convolutions ... the TF team seems to have too much stuff to handle at the moment so it's complicated to get a time estimate :/

Yeah lets keep this Draft and i will try to check after each TF release if it is maybe fixed 😅

codecov · 2023-08-25T12:48:20Z

Codecov Report

Merging #1183 (62976f1) into main (3deac68) will decrease coverage by 0.02%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##             main    #1183      +/-   ##
==========================================
- Coverage   95.78%   95.76%   -0.02%     
==========================================
  Files         154      154              
  Lines        6903     6903              
==========================================
- Hits         6612     6611       -1     
- Misses        291      292       +1

Flag	Coverage Δ
unittests	`95.76% <ø> (-0.02%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

see 1 file with indirect coverage changes

felixdittrich92 requested review from frgfm, charlesmindee and odulcy-mindee April 25, 2023 11:37

felixT2K mentioned this pull request Apr 26, 2023

[tests/TF/build] enable missing classification onnx tests and set tensorflow lower bound to 2.11 #1182

Merged

felixT2K force-pushed the fix-grouped-convolutions-issue branch from cce9376 to fc759fd Compare April 29, 2023 14:16

felixT2K force-pushed the fix-grouped-convolutions-issue branch from fc759fd to b06df1a Compare May 23, 2023 10:07

felixdittrich92 mentioned this pull request Jun 28, 2023

[tests] update test cases #1233

Merged

felixdittrich92 changed the title ~~DRAFT: Fix for grouped convolutions~~ PERMANENT DRAFT: TF grouped convolutions check Jul 5, 2023

felixT2K force-pushed the fix-grouped-convolutions-issue branch from b06df1a to 177d28e Compare July 10, 2023 14:29

felixdittrich92 removed help wanted Extra attention is needed awaiting response Waiting for feedback labels Jul 22, 2023

felixT2K added 7 commits August 25, 2023 14:25

test fix for grouped convolutions

dcd30ac

remove checkpoints which would need retraining

95f13f2

check stride issue in depthwiseConv2D

97df814

SeperableConv instead of DepthwiseConv to overcome the rect break

6014961

rebase

2f67239

up

1b98182

rebase

68f12d4

felixT2K force-pushed the fix-grouped-convolutions-issue branch from 177d28e to 68f12d4 Compare August 25, 2023 12:27

versions

62976f1

felixT2K closed this Apr 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PERMANENT DRAFT: TF grouped convolutions check #1183

PERMANENT DRAFT: TF grouped convolutions check #1183

felixT2K commented Apr 25, 2023 •

edited

Loading

felixT2K commented Apr 26, 2023 •

edited

Loading

felixT2K commented Apr 27, 2023

frgfm commented Apr 29, 2023

felixT2K commented Apr 29, 2023

codecov bot commented Aug 25, 2023

PERMANENT DRAFT: TF grouped convolutions check #1183

PERMANENT DRAFT: TF grouped convolutions check #1183

Conversation

felixT2K commented Apr 25, 2023 • edited Loading

felixT2K commented Apr 26, 2023 • edited Loading

felixT2K commented Apr 27, 2023

frgfm commented Apr 29, 2023

felixT2K commented Apr 29, 2023

codecov bot commented Aug 25, 2023

Codecov Report

felixT2K commented Apr 25, 2023 •

edited

Loading

felixT2K commented Apr 26, 2023 •

edited

Loading