Ben
February 20, 2019, 3:42pm
#1
for example:

self.cnn = model.densenet121(pretrained=True).features

x1 = self.cnn(x1)
x2 = self.cnn(x2)
x3 = self.cnn(x3)

got cuda oof in pytorch 1.0, the memory increases on each line, which, I think, should not.

The similar code run in 0.4.1 is ok. Not know why?

thanks!

rasbt
(Sebastian Raschka)
February 20, 2019, 3:44pm
#2
You are probably adding more and more onto the autograd forward graph here … If you don’t want to backpropagate from x3 -> x1, add

```
with torch.no_grad():
x1 = self.cnn(x1)
x2 = self.cnn(x2)
x3 = self.cnn(x3)
```

Ben
February 20, 2019, 3:49pm
#3
Yes, I want autograd. not from x3 to x1. Actually the x is too big, so I split it to x1, x2 and x3 and feed them to self.cnn seperately. In 0.4.1, it is ok, but in 1.0, I got cuda oof.

Would intermediate backprops work?

For example

```
optimizer.zero_grad()
x1 = self.cnn(x1)
... # calculate your loss
loss.backward()
optimizer.step()
optimizer.zero_grad()
x2 = self.cnn(x2)
... # calculate your loss
loss.backward()
optimizer.step()
optimizer.zero_grad()
x3 = self.cnn(x3)
... # calculate your loss
loss.backward()
optimizer.step()
```

Uses much less memory. Your effective batchsize will be smaller, but reducing the batchsize is a common way to prevent OOM errors

Ben
February 21, 2019, 2:26am
#5
Thanks! That makes sense! I think it is equivalent to reducing batch size.