Cuda oof in 1.0 but not in 0.4.1

Ben · February 20, 2019, 3:42pm

for example:

self.cnn = model.densenet121(pretrained=True).features

x1 = self.cnn(x1)
x2 = self.cnn(x2)
x3 = self.cnn(x3)

got cuda oof in pytorch 1.0, the memory increases on each line, which, I think, should not.

The similar code run in 0.4.1 is ok. Not know why?

thanks!

rasbt · February 20, 2019, 3:44pm

You are probably adding more and more onto the autograd forward graph here … If you don’t want to backpropagate from x3 -> x1, add

with torch.no_grad():
    x1 = self.cnn(x1)
    x2 = self.cnn(x2)
    x3 = self.cnn(x3)

Ben · February 20, 2019, 3:49pm

Yes, I want autograd. not from x3 to x1. Actually the x is too big, so I split it to x1, x2 and x3 and feed them to self.cnn seperately. In 0.4.1, it is ok, but in 1.0, I got cuda oof.

justusschock · February 20, 2019, 5:09pm

Would intermediate backprops work?

For example

optimizer.zero_grad() 
x1 = self.cnn(x1)
... # calculate your loss
loss.backward()
optimizer.step()
optimizer.zero_grad() 
x2 = self.cnn(x2)
... # calculate your loss
loss.backward()
optimizer.step()
optimizer.zero_grad() 
x3 = self.cnn(x3)
... # calculate your loss
loss.backward()
optimizer.step()

Uses much less memory. Your effective batchsize will be smaller, but reducing the batchsize is a common way to prevent OOM errors

Ben · February 21, 2019, 2:26am

Thanks! That makes sense! I think it is equivalent to reducing batch size.