Is your model with the Dropout working or are you getting the OOM error now all the time?
Maybe a process still lives and uses all your GPU memory? Could you check it? I don’t know, if nvidia-smi works on a windows machine.
The volatile flag is deprecated. Since you are using pytorch 0.4.0, you should use with torch.no_grad() instead.
Did you observe the memory? Is it growing once you add Dropout to your model?
OOM when I add dropout layers. Without dropout, I could fit [70 6000 6000 4] model on 1 GPU. (Any model with dropout layer is running into OOM)
I use os.environ["CUDA_DEVICE_ORDER"]="PCI_BUS_ID" os.environ["CUDA_VISIBLE_DEVICES"]="0,1" to run it on specific GPUs. And no, these GPUs use 0 memory when experiments are not run. Checked via nvidia-smi
Yes, memory keeps growing when I try training models with dropout layer. I’ve trained bigger models (more parameters) on 1 12Gb GPU. But the model as small as one mentioned above, runs into OOM.
No, I mean you might protect your code with an idiom on Windows, like
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.bn1 = nn.BatchNorm1d(70)
self.fc1 = nn.Linear(70, 100)
self.d1 = nn.Dropout(0.5)
self.fc2 = nn.Linear(100, 100)
self.d2 = nn.Dropout(0.5)
self.fc3 = nn.Linear(100, 4)
def forward(self, x):
x = x.view(-1, 70)
x = self.bn1(x)
x = F.relu(self.fc1(x))
x = self.d1(x)
x = F.relu(self.fc2(x))
x = self.d2(x)
return F.log_softmax(self.fc3(x))
if __name__ == '__main__':
model = Net()
if cuda:
model.cuda()
optimizer = optim.Adam(model.parameters(), lr=0.00001, betas=(0.9, 0.999), weight_decay=0.0)
# and more code that is on the outer part without a protection
@ash_gamma Could you please also post the functions of train and validate ? In 0.4.0, you should remember to use with torch.no_grad(): during interference.
The use of train_loss += loss may be the cause. Try using train_loss += loss.item() instead.
Please refer to the latest example of MNIST to change your code.
But to let you know, I’m able to run training scripts without running into OOM when I do not have dropout layer. Shouldn’t train_loss += be a problem even without dropout then?
I don’t think the problem is on the dropout side. As it is used so many examples in PyTorch repo, users should have reported them since it’s now almost one month after the last release. And before the release, we have ran several benchmarks on various networks, which contains AlexNet that has dropout in it.