-
Notifications
You must be signed in to change notification settings - Fork 79
RAM OOM Problem #16
Comments
@kjh21212 I'm facing the same RAM issue, were you able to solve it? |
I have same issue.
When I run the run_common_voice.py code. These are shown.
I think some of the tf.function? affect to speed of the training. Does the retracing warning have a connection with OOM error? |
@Nambee Seems like there's something with GradientTape, RNN layers or TFRecords. I implemented DeepSpeech2 with tfrecord dataset in keras and when I trained it using .fit function, no OOM error, but when I trained using GradientTape, the memory kept going up and then OOM. However, when I trained SEGAN (No recurrent network, only Conv) with a generator dataset using GradientTape, it worked fine. |
Please try again with the latest commit. I have updated it to use Tensorflow 2.2.0 and solved the retracing issue |
@noahchalifour Just executed the current repository code with one GPU. I am also running into the OOM error also using a GeForce GTX 1080 Ti card. |
I have figured out that if we use @tf.function
def train():
for batch in train_dataset:
train_step(batch) The downside of this trick is we can't use native python functions and unimplemented tf functions in graph mode (like |
@usimarit Are you able to train/use the model? I can only afford a very small batch size (4-8 samples) when running on a single GeForce 1080 Ti (~11 GB RAM) and I am not even sure if it's working. How long did you have train your model? |
I guess small batch size is normal for ASR models. I trained a ctc model on rtx 2080ti 11G on about 300hours dataset and it took 3 days for 12 epochs with batch size 4. |
@usimarit Oh, I misinterpreted the issue the. Yeah, that batch size size what I am using too. Didn't expect such a small batch size to work out :) |
@noahchalifour But I'am also facing the problem even with using Tensorflow2.2.0 and the latest commit. |
@usimarit I have tried it, but it still doesn't work |
when i run your code
it happened RAM OOM in eval part
i don't know why happened this problem?
my desktop ram size is 128GB and using 4-ways gpu
and it was increase memory every eval batch
also 4-ways gpu batch process speed is slower than single gpu
The text was updated successfully, but these errors were encountered: