Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance stopping criterion options and handle infinite log-likelihoods #103

Conversation

andreramosfdc
Copy link

Customizable Stopping Criterion:
Previously, the package only allowed stopping when the log-likelihood stops increasing. I've introduced a new argument stopping_criterion that accepts two symbols:
:convergence: maintains the original behavior of stopping when the log-likelihood stops increasing.
:stability: introduces a new criterion where the algorithm stops when the log-likelihood stops changing within a given margin (this is important in my use case).

Handling Infinite Log-Likelihoods:
To address a bug where infinite log-likelihood values would break the code, I added logic to handle these cases by replacing infinite values with the log of the largest representable float number. This prevents the process from crashing while still allowing it to continue with large values.

@gdalle
Copy link
Owner

gdalle commented Sep 19, 2024

Hi, thank you for these contributions! I have a few follow up questions. Don't be discouraged by my skepticism, I'm really excited for someone to use my package 😍

Do we need this?

:stability: introduces a new criterion where the algorithm stops when the log-likelihood stops changing within a given margin (this is important in my use case).

Can you explain why it is important in your use case? Do you face a scenario where the loglikelihood goes down?

To address a bug where infinite log-likelihood values would break the code

Do you have a reproducible example of code breaking in that case? I think -Inf values for the loglikelihood are perfectly fine. And if you encounter +Inf it is possibly an issue with your model and not with my library. I'd happily take a look if you can share an MWE.

How do we do this if we need it?

:convergence: maintains the original behavior of stopping when the log-likelihood stops increasing.

That's kind of a weird name for a stopping criterion, either way we're checking convergence. Maybe we could use :max_increase vs :max_variation?

I added logic to handle these cases by replacing infinite values with the log of the largest representable float number.

This logic cannot be merged in the current state, because it is located in one of the central functions of the package (obs_logdensities!), and this function must remain non-allocating. At the moment the tests will probably fail because

findall(i -> i < -log(-nextfloat(-Inf)), logb)

creates a new array of indices, which is very inefficient. If we are to change anything in this function (and I'm really doubtful whether we should), it has to be inside the loop.

@andreramosfdc
Copy link
Author

andreramosfdc commented Sep 19, 2024 via email

@gdalle
Copy link
Owner

gdalle commented Sep 19, 2024

Okay, this seems to confirm that we actually don't need the modifications you suggest.

Stopping criterion

Interestingly, it initially decreased before shooting up significantly after a few iterations, likely due to initialization effects.

This should never happen, at least in standard cases. The Baum-Welch algorithm is a version of the EM algorithm, which is guaranteed to increase the loglikelihood (up to floating point errors). The option to check that this increase happens is actually a really useful debugging tool, and should basically never be turned off.
If you provide me with a Minimum Working Example, I might be able to help you debug. Was it a model where you implemented a custom fitting step?

In my use case, I printed the log-likelihood at each iteration to better
understand what was happening.

Also you don't need to do that, Baum-Welch returns the vector of loglikelihood values at the end.

Infinite loglikelihood

I encountered a state where all values were identical (and equal to 0), which resulted in a normal distribution with zero variance.

Indeed, zero-variance Gaussians are one of the cases where you may encounter a loglikelihood of +Inf, I struggled with that one too. While it may sound appealing, truncating the computed loglikelihood will only mask the symptoms of the problem. It won't solve the underlying issue, namely the degenerate emission distribution, which steps from the model and/or the data. Please check out the debugging section of the docs, where I give a few tips to solve it.

@andreramosfdc
Copy link
Author

andreramosfdc commented Sep 19, 2024 via email

@gdalle
Copy link
Owner

gdalle commented Sep 19, 2024

My use case is one in which the transition matrices vary over time (and has
a large number of observations). Maybe that is why the guarantee that the
loglikelihood would increase does not hold.

Even with time-dependent matrices the loglikelihood should still increase. I'd love it if you could show me an artificial example to debug it together.

The challenge arises because we actually want the model to estimate a state
with zero values.

By zero values I'm assuming you mean the variances. If that is the case, I guess you should use a different distribution object (like a Dirac mass) to accurately describe these degenerate emissions.

@andreramosfdc
Copy link
Author

andreramosfdc commented Sep 19, 2024 via email

@gdalle
Copy link
Owner

gdalle commented Sep 19, 2024

10:30 AM on GMT-4 (NYC time) is perfect for me. Can you send me an invite on my email, found here?

@andreramosfdc
Copy link
Author

andreramosfdc commented Sep 19, 2024 via email

@andreramosfdc
Copy link
Author

andreramosfdc commented Sep 20, 2024 via email

@gdalle gdalle closed this Sep 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants