Hi. I have a beginner question, which I can’t seem to find a definitive answer to anywhere.
From what I understand, doing batched training has 2 benefits:
- Batched operations can be optimized better on the GPU leading to performance increases
- The optimizer runs after seeing multiple data samples before doing .step() which makes the model less sensitive to individual samples.
My question is as follows: If we ignore the performance benefits are these two training loops equivalent?
Proper batching:
model = SomeModel()
optimizer = SomeOptimizer(model.parameters(), ...)
dataloader = DataLoader(dataset, batch_size=10)
for batch_data in enumerate(dataloader):
loss = model(batch_data)
loss.backward()
optimizer.step()
optimizer.zero_grad()
Manual batching?:
model = SomeModel()
optimizer = SomeOptimizer(model.parameters(), ...)
dataloader = DataLoader(dataset, batch_size=10)
for batch_data in enumerate(dataloader):
for i in range(10):
loss = model(batch_data[i])
loss.backward()
optimizer.step()
optimizer.zero_grad()
If they are not equivalent, what is the difference and can it be remedied without passing batched data into the model?
For context, due to time pressure I’m not able to implement the model with batched input but I would still like to simulate the effects of batched learning (except for the performance benefits)
Thanks in advance!