How can I implement custom activation storage in a model, for example, by merging the activation values of two layers, storing them, and ensuring proper backpropagation functionality?
You can use saved tensor hooks to hook onto the activation saving logic
https://pytorch.org/tutorials/intermediate/autograd_saved_tensors_hooks_tutorial.html