I am trying to process a large batch on GPU by splitting it into smaller ones like this:
X = Variable(torch.from_numpy(X), requires_grad=False) n_samples, batch_size, n_features = X.shape fX =  for i in range(0, batch_size, 32): # split large batch into smaller ones x = X[:, i:i+32] # shape is (n_samples, 32, n_features) # send small batch to GPU x = x.cuda() # process on GPU fx = recurrent_net(x) # shape is (32, n_dimensions) # send to CPU fx = fx.cpu() # keep track of results for later stacking fX.append(fx) fX = torch.cat(fX, dim=0) # shape is (batch_size, n_dimensions)
I thought (naively, I guess :)) that this would solve my “out of memory” issue as smaller batches are sent to GPU one at a time, then sent back to CPU (hopefully to avoid filling the GPU memory…)
However, my GPU still quickly runs out of memory.
What is the best way to achieve what I want to achieve?
For completeness sake: I do want to backprop later so it is my understanding that using
volatile = True is not an option. Correct me if I am wrong.
Hervé, one month in PyTorch, and loving it!