[fix] Fix the activation checkpointing when using SwiGLUPackedFusedOp #1127

According to the docs (https://pytorch.org/docs/stable/autograd.html#torch.autograd.Function) forward() method should not be called directly, apply() method have to be used instead. After removing forward call, activation checkpointing starts working.

The IF conditional on the x.requires_grad state (to change the behavior between inference/training modes) changes behavior of the recomputation of the forward() method which breaks activation checkpointing (as on recomputation phase x is detached with requires_grad==False, and different number of tensors are saved in the save_for_backward() method).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[fix] Fix the activation checkpointing when using SwiGLUPackedFusedOp #1127

[fix] Fix the activation checkpointing when using SwiGLUPackedFusedOp #1127

Commits on Oct 11, 2024

Commits on Oct 17, 2024

[fix] Fix the activation checkpointing when using SwiGLUPackedFusedOp #1127

Are you sure you want to change the base?

[fix] Fix the activation checkpointing when using SwiGLUPackedFusedOp #1127

Commits on Oct 11, 2024

Commits on Oct 17, 2024