Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NaNs for CTCF and other multibatch cases when number of rounds is one #117

Open
ilibarra opened this issue Jan 25, 2023 · 0 comments
Open

Comments

@ilibarra
Copy link
Member

ilibarra commented Jan 25, 2023

The behavior, when the number of rounds was only one, was some weights becoming NaNs during training, for log_etas weights, particularly due to this step in the code.

It was something associated with any of these lines. I think it was the normalization being always one.

mubind/mubind/models/models.py

Lines 1635 to 1649 in 270dc2a

out = None
if self.enr_series:
out = torch.cumprod(binding_scores, dim=1) # cum product between rounds 0 and N
else:
out = binding_scores
# multiplication in one step
etas = torch.exp(self.log_etas)
out = out * etas[batch, :]
# fluorescent data e.g. PBM, does not require scaling, to keep numbers beyond range [0 - 1]
if not kwargs.get('scale_countsum', True):
return out
results = out.T / torch.sum(out, dim=1)

If prevented for the CTCF and discarding the column [1], one could then load multiple samples with non-similar k-mers, and only one round.
https://github.com/theislab/mubind/blob/fix-scatac/notebooks/batch/01_CTCF_two_batches.ipynb

@johschnee do you remember/can you point out, in case you remember what the crucial step was, and if a likely fix is possible? Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant