I have the following problem. I have a huge tensor that is used and modified throughout the model. Something like:
Init -> Modification -> Modification -> Modification -> Modification -> Modification -> (...) -> Loss
| |
-> Loss -> Loss
(I hope the ascii-art translates)
The Tensors are quite huge, multiple gigabytes, so I really can’t keep more than a few copies around. I run into problems quickly.
I have ideas how to implement the modifications via inplace operations, but due to the diverging computations I can’t always express this. But since the nodes where the graph splits are directly going into the loss, I wonder if I can take these computations and directly compute the gradient and therefore eliminate the path from the autograd engine? As I understood it, the problem for inplace-operations is the autograd engine.