Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix txt writing issue #11

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Conversation

Gaaaavin
Copy link

For details, please see #10

@White-Mask-230
Copy link

White-Mask-230 commented Jul 25, 2024

In this issue #4 was a bug that the library torch tried to allocate 184.32 GiB I think that doing the thing you do it can solve the bug. Can you try it please?

@Gaaaavin
Copy link
Author

@White-Mask-230 I don't think the #4 is related to this issue.
Moreover, I'm also having similar problem with #4.

I'm training with a customized dataset that has ~1400 images and ~366k initial points. I got the error that "torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 1050588.79 GiB"!!!

I'm also anticipating that issue to be addressed.

@Snosixtyboo
Copy link
Collaborator

@White-Mask-230 I don't think the #4 is related to this issue. Moreover, I'm also having similar problem with #4.

I'm training with a customized dataset that has ~1400 images and ~366k initial points. I got the error that "torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 1050588.79 GiB"!!!

I'm also anticipating that issue to be addressed.

Hi, thanks for bringing this up! Could you tell us about your setup (OS / CUDA / Pytorch version)?

@White-Mask-230
Copy link

White-Mask-230 commented Jul 26, 2024

@White-Mask-230 I don't think the #4 is related to this issue.
Moreover, I'm also having similar problem with #4.

I'm training with a customized dataset that has ~1400 images and ~366k initial points. I got the error that "torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 1050588.79 GiB"!!!

I'm also anticipating that issue to be addressed.

Ok, I think that it could work because it was an optimitation. I try but no.

About your pull request for me is correct

@Gaaaavin
Copy link
Author

@White-Mask-230 I don't think the #4 is related to this issue. Moreover, I'm also having similar problem with #4.
I'm training with a customized dataset that has ~1400 images and ~366k initial points. I got the error that "torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 1050588.79 GiB"!!!
I'm also anticipating that issue to be addressed.

Hi, thanks for bringing this up! Could you tell us about your setup (OS / CUDA / Pytorch version)?

@Snosixtyboo I'm using an Ubuntu 20.04 machine, with cuda 12.1 and pytorch 2.3.0

@Snosixtyboo
Copy link
Collaborator

@White-Mask-230 I don't think the #4 is related to this issue. Moreover, I'm also having similar problem with #4.
I'm training with a customized dataset that has ~1400 images and ~366k initial points. I got the error that "torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 1050588.79 GiB"!!!
I'm also anticipating that issue to be addressed.

Hi, thanks for bringing this up! Could you tell us about your setup (OS / CUDA / Pytorch version)?

@Snosixtyboo I'm using an Ubuntu 20.04 machine, with cuda 12.1 and pytorch 2.3.0

Ok, thanks a lot! We have seen issues with cuda 12.1 on Ubuntu, that SEEM to go away with 12.3 or 12.5. We are trying different things to make it work with 12.1 / 11.8 etc, because there is no good reason why it shouldn't, unless there is a really low-level bug in the build toolchain. But in the meantime, you may try if 12.3 or 12.5 works out for you!

Best,
Bernhard

@White-Mask-230
Copy link

@White-Mask-230 I don't think the #4 is related to this issue. Moreover, I'm also having similar problem with #4.
I'm training with a customized dataset that has ~1400 images and ~366k initial points. I got the error that "torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 1050588.79 GiB"!!!
I'm also anticipating that issue to be addressed.

Hi, thanks for bringing this up! Could you tell us about your setup (OS / CUDA / Pytorch version)?

@Snosixtyboo I'm using an Ubuntu 20.04 machine, with cuda 12.1 and pytorch 2.3.0

Ok, thanks a lot! We have seen issues with cuda 12.1 on Ubuntu, that SEEM to go away with 12.3 or 12.5. We are trying different things to make it work with 12.1 / 11.8 etc, because there is no good reason why it shouldn't, unless there is a really low-level bug in the build toolchain. But in the meantime, you may try if 12.3 or 12.5 works out for you!

Best,
Bernhard

True, but in my point of view, I think that every big problem starts with a small solution

@Snosixtyboo
Copy link
Collaborator

True, but in my point of view, I think that every big problem starts with a small solution

Agreed. Thats why ive been looking into this bug for 3 days now. It is super elusive and seems somehow related with internal GPU memory management, depending on different build settings and how the application is loaded.

@Gaaaavin
Copy link
Author

Gaaaavin commented Aug 2, 2024

The GPU memory problem should be solve as described in this issue

I think the txt writing issue is not related to the GPU memory issue, and should be addressed by this PR.
@Snosixtyboo If there's no other concerns about this PR, please consider mergeing it. Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants