Here x is a tensor, f is an nn.Module model. Both are on cuda: x = x.to(torch.device('cuda')) f = f.to(torch.device('cuda')) The main loop looks like this: while(some condition): x = f(x) I noticed that pretty quickly training slows down. I believe this feedforward mechanism could be a pro…

yep [image] Help clarifying repackage_hidden in word_language_model Every variable has a .creator attribute that is an entry point to a graph, that encodes the operation history. This allows autograd to replay it and differentiate each op. So each hidden state will have a…

Training slows down with loop like x=f(x)

vision

JuanFMontesinos (Juan Montesinos) October 29, 2019, 9:49pm 12

yep