Inplace Operations, huge Tensors and leaves

I have the following problem. I have a huge tensor that is used and modified throughout the model. Something like:

Init -> Modification -> Modification -> Modification -> Modification -> Modification -> (...) -> Loss
                     |                                                |            
                     -> Loss                                          -> Loss

(I hope the ascii-art translates)

The Tensors are quite huge, multiple gigabytes, so I really can’t keep more than a few copies around. I run into problems quickly.

I have ideas how to implement the modifications via inplace operations, but due to the diverging computations I can’t always express this. But since the nodes where the graph splits are directly going into the loss, I wonder if I can take these computations and directly compute the gradient and therefore eliminate the path from the autograd engine? As I understood it, the problem for inplace-operations is the autograd engine.