Ben
February 20, 2019, 3:42pm
1
for example:
self.cnn = model.densenet121(pretrained=True).features
x1 = self.cnn(x1)
x2 = self.cnn(x2)
x3 = self.cnn(x3)
got cuda oof in pytorch 1.0, the memory increases on each line, which, I think, should not.
The similar code run in 0.4.1 is ok. Not know why?
thanks!
rasbt
(Sebastian Raschka)
February 20, 2019, 3:44pm
2
You are probably adding more and more onto the autograd forward graph here … If you don’t want to backpropagate from x3 -> x1, add
with torch.no_grad():
x1 = self.cnn(x1)
x2 = self.cnn(x2)
x3 = self.cnn(x3)
Ben
February 20, 2019, 3:49pm
3
Yes, I want autograd. not from x3 to x1. Actually the x is too big, so I split it to x1, x2 and x3 and feed them to self.cnn seperately. In 0.4.1, it is ok, but in 1.0, I got cuda oof.
Would intermediate backprops work?
For example
optimizer.zero_grad()
x1 = self.cnn(x1)
... # calculate your loss
loss.backward()
optimizer.step()
optimizer.zero_grad()
x2 = self.cnn(x2)
... # calculate your loss
loss.backward()
optimizer.step()
optimizer.zero_grad()
x3 = self.cnn(x3)
... # calculate your loss
loss.backward()
optimizer.step()
Uses much less memory. Your effective batchsize will be smaller, but reducing the batchsize is a common way to prevent OOM errors
Ben
February 21, 2019, 2:26am
5
Thanks! That makes sense! I think it is equivalent to reducing batch size.