Long story short, I cannot modify the input batch size of 128 of my data loader. When I do:
for batch_idx, o_t in enumerate(train_loader): o_t = o_t.to(device) y = model(o_t)
I get a CUDA out of memory error.
To get around this, I tried the following:
for batch_idx, o_t in enumerate(train_loader): mini_batch_size = 16 y =  for mini_batch_idx in range(int(128/mini_batch_size)): start, end = mini_batch_idx*mini_batch_size, (mini_batch_idx+1)*mini_batch_size o_t_mini = o_t[start:end] o_t_mini = o_t_mini.to(device) y_mini = model(o_t_mini) o_t_mini, y_mini = o_t_mini.cpu(), y_mini.cpu() y.append(y_mini) y = torch.cat(y, dim=0)
However, this does not help either, and I observe that GPU memory usage increases linearly after each forward pass, and not after the tensors are moved to the device. Moving the tensors back to cpu has no effect on GPU memory, either.
Any ideas why this is the case, and how I could get around this? Thanks!