I’m facing challenge working on NLP application, where I can provide batch size at max 2 due to memory issue (I’m using 8 gb GPU).
I want to increase my batch size because model is not converging well with small batch size.
My question is instead of using the gradient accumulation, can i use the following procedure ?
“”"
“batch_size” is the required batch size, say 16
“max_batch_size” is the max batches can allocate on limited memory, say 2
1)calculate the loss for batch size 16 in forward pass without grads
2) the pass the batch size of 2 and calculate the gradients
3) update the gradients by averaging on each iteration
“”"
for epoch in range(epochs):
for idx,sample in enumerate(dataloader):
# dataloader with batch_size =16
# here calculating the loss for batch size of 16
with torch.no_grads():
outputs = model(inputs)
loss = loss_fn(outputs, labels)
# Pass the batch size of 2 (becuase of memory constraint) and calculate teh gradients
for idx in range(0,batch_size,max_batch_size):
inputs, labels = sample[idx:idx+max_batch_size]
# Forward Pass
outputs = model(inputs)
loss.backward()
model.grads /= 2
optimizer.step()
optimizer.zero_grad()