Hello,
I’ve noticed that PyTorch (torch (0.1.10.post2), torchvision (0.1.7)) uses significantly more GPU RAM than e.g. Theano running a similar code. Unfortunately I cannot disclose the actual code but I think it should be possible to reproduce this behavior with the following simple sequential architecture (all dimensions are in form MINIBATCH_LENGTH x NUM_CHANNELS x H x W):
N - minibatch size, e.g. 20, padding mode == “same” everywhere
Input: Nx10x1024x1024
Layer0: Conv2D, stride=2, filters=32, output: Nx32x512x512
Layer1: Conv2D, stride=2, filters=64, output: Nx64x256x256
Layer2: Conv2D, stride=2, filters=128, output: Nx128x128x128
… all the way down to 16x16 feature maps
LayerX: Conv2D, stride=2, filters=…, output: Nx…x16x16
… now, go opposite way - transposed convolution
LayerX+1: ConvTranspose2D, stride=2, filters=…, output: Nx…x32x32
… all the way up …
Layer(Last): ConvTranspose2D, stride=2, filters=…, output: Nx10x1024x1024
Main loop:
net = Net()
net.cuda()
net.train()
optimizer = optim.SGD(net.parameters(), lr = ...)
criterion = nn.MSELoss()
input = Variable(torch.from_numpy(<your-Nx10x1024x1024-tensor>).cuda())
target = Variable(torch.from_numpy(<your-Nx10x1024x1024-tensor>).cuda())
for epoch in xrange(...):
output = net(input)
loss = criterion(output, target)
net.zero_grad()
loss.backward()
optimizer.step()
On a GTX-1060 corresponding code takes around 30% more GPU RAM than its Theano counterpart (exection times are about the same for Theano and PyTorch versions).
Is this something that can be fixed?
Thanks,