[resolved] LSTM with image data - how to save GPU memory?

samarth-robo · June 29, 2017, 6:11pm

Posting my solution here for reference. The solution divides the data into ‘groups’ of M images. The user specifies M based on the size of their GPU and resolution of the images.
This solution loops through the N x T x C x H x W data by figuring out B x G x C x H x W batches based on the M value.

# data is N x T x C x H x W
# target is N x T x d
M = 64  # no. of images that can fit on the GPU 
N, T = data.size(0), data.size(1)
G = min(T, M)  # no. of time slices that can fit on the GPU
B = min(N, M/G)  # batch size that can fit on the GPU

if train:
  data_var   = Variable(data, requires_grad=True)
  target_var = Variable(target, requires_grad=False)
else:
  data_var   = Variable(data, volatile=True)
  target_var = Variable(target, volatile=True)

loss_accum = 0 
b_start = np.random.randint(N%B + 1)
for b in xrange(N/B):
  b_idx = b_start + torch.LongTensor(xrange(b*B, (b+1)*B))
  xb = torch.index_select(data_var, dim=0, index=Variable(b_idx))
  tb = torch.index_select(target_var, dim=0, index=Variable(b_idx).cuda())
  model.reset_hidden_states(B)
  g_start = np.random.randint(T%G + 1)
  for g in xrange(T/G):
    g_idx = g_start + torch.LongTensor(xrange(g*G, (g+1)*G))
    xg = torch.index_select(xb, dim=1, index=Variable(g_idx))
    tg = torch.index_select(tb, dim=1, index=Variable(g_idx).cuda())
    model.detach_hidden_states()
    output = model(xg, cuda=cuda, async=True)

    if criterion is not None:
      loss = criterion(output, tg) 
      loss_accum += loss.data[0]

      if train:
        # SGD step
        optim.learner.zero_grad()
        loss.backward()
        optim.learner.step()

where the model.reset_hidden_states() re-initializes them with random values from a normal distribution and ‘repackages’ them like in Help clarifying repackage_hidden in word_language_model