I am trying to process a large batch on GPU by splitting it into smaller ones like this:
X = Variable(torch.from_numpy(X), requires_grad=False)
n_samples, batch_size, n_features = X.shape
fX = []
for i in range(0, batch_size, 32):
# split large batch into smaller ones
x = X[:, i:i+32]
# shape is (n_samples, 32, n_features)
# send small batch to GPU
x = x.cuda()
# process on GPU
fx = recurrent_net(x)
# shape is (32, n_dimensions)
# send to CPU
fx = fx.cpu()
# keep track of results for later stacking
fX.append(fx)
fX = torch.cat(fX, dim=0)
# shape is (batch_size, n_dimensions)
I thought (naively, I guess :)) that this would solve my “out of memory” issue as smaller batches are sent to GPU one at a time, then sent back to CPU (hopefully to avoid filling the GPU memory…)
However, my GPU still quickly runs out of memory.
What is the best way to achieve what I want to achieve?
For completeness sake: I do want to backprop later so it is my understanding that using volatile = True
is not an option. Correct me if I am wrong.
Thanks!
Hervé, one month in PyTorch, and loving it!