Hi everyone,
Currently, I can’t feed a batch more than 64 samples
(memory constraints on single GPU) for training . But my custom loss calculation requires at least 128 predictions
since I’ll be using PCA to reduce feature dimensions to 128 and it won’t work when samples < components
.
The routine way of getting model predictions is:
for batch_idx, data in enumerate(dataloader['train']):
batch, lbl = data[0], data[1]
out = model(batch)
loss = custom_loss(out, lbl)
loss.backward()
I’m trying collecting model outputs for two training batches (64×2=128) before continuing with the loss calculations i.e.,
out_, lbl_ = [], []
for batch_idx, data in enumerate(dataloader['train']):
batch, lbl = data[0], data[1]
out_.append(model(batch))
lbl_.append(lbl)
if ((batch_idx + 1) % 2 == 0):
loss = custom_loss(torch.stack(out_), torch.stack(lbl_))
loss.backward()
yet there’s still this error: “RuntimeError: CUDA out of memory.” ¯(°_o)/¯
I’d be so thankful if you may help answering these questions:
- Is there a separate graph created (requiring more memory) each time a model is fed with a training batch?
- How may I get multiple outputs before calculating and backpropagating the loss?