Here x is a tensor, f is an nn.Module model. Both are on cuda:
x = x.to(torch.device('cuda'))
f = f.to(torch.device('cuda'))
The main loop looks like this:
x = f(x)
I noticed that pretty quickly training slows down. I believe this feedforward mechanism could be a problem. Do I need to use a different approach, like create an intermediary tensor for x and input it into f?
The problem may be that, by doing that, you are increasing the computational graph (if you never call backward). I would say it depends on how many iterations your loop runs until being broken by the condition. I would have a look on that with a counter. Note that if you feed forward lot of times your graph will be huge, thus, will backpropagate way slower.
Backpropagation backpropagates from the variable you call backprop from to leaf variables.
Leaf variables are input nodes to the graph. Intermediate variables keep graph history.
When you call detach you are kind of breaking the graph and creating a new fresh leaf variable so that memory is reused. Else, it stills carry old graph I would say.
And it’s not x=detach(x) but x=x.detach() Detach is a tensor method. I dunno if it exists are a standalone function in torch