How to train a neural network on a list of tensors with different shapes

I have a list of tensors with different shapes; the list is denoted by examples_train, and the corresponding label list is labels_train; there are about 1200 examples in the list examples_train. I want to fit my model on this list. I cannot use data loader since the tensors in the list have different shapes.

My current method is to compute loss of the examples one by one, and sum up the losses to do backpropagation. The code is:

training_indices = np.arange(len(examples_train))
np.random.shuffle(training_indices) #shuffle the examples
optimizer.zero_grad()
loss = 0.
for idx in training_indices:
      example = example_train[idx]
      label = labels_train[idx]
     #Add batch dimension
      example = example.unsqueeze(0)
      label = label.unsqueeze(0)
      example = example.cuda()
      label = label.cuda()
      logits = model(example)
      current_loss = F.cross_entropy(logits, label)
      loss += current_loss
loss = loss/len(examples_train)
loss.backward()
optimizer.step()

The code above works, but sometimes there might be “CUDA out of memory” error, and the optimisation is also very slow. How can I fix this problem? Should I partition the list examples_train into smaller sub lists(mini batches)?

1 Like
  1. can you use dataloader for variable length data?
    Definitely yes, you should implement a collate_fn for the dataloader (commonly we pad the data to fit the size and be arranged as mini-batch)

  2. why does the GPU memory increase in iterations?
    the line loss +=... makes torch store all intermediate tensors from step 1 until you call backward ()

Thanks for your reply.
Actually in my case each example need to multiply with another corresponding tensor in the forward process, which I didn’t mention above, so it still seems impossible to use dataloader.

For the GPU memory increasing problem, can I partition the examples_train list into small lists(mini-batches) and do backpropagation for each list to solve this problem?

Best Regards.

Can you give a code snippet example for the problem? It’s not very clear how the “corresponding tensor” is used.

Actually my code is to implement a graph neural network and the examples are graphs including different number of nodes. There are node2edge and edge2node operations in the forward process, which need matrix which is specific for each graph.

To solve this problem, can I delete the intermediate tensors after I added them to the loss variable?
I.e.

loss += current_loss
del current_loss

In your case, if it’s impossible to batch the data. Using gradient accumulating is OK. But as you mentioned, you should partition the entire dataset into mini-batch size, and accumulating gradients for limited steps (the same as “batch size”), otherwise the GPU memory usage is going to explode.
Your don’t need to manually delete the loss variable. After loss.backward(), the computation graph is released by pytorch itself.

And just a hint, accumulating gradients by step is not identical to mini-batching update when there are layers like batch-norm in your network.