How can you train your model on large batches when your GPU can’t hold more than a few samples?

Hi @ptrblck

I am writing this message for you as you always have helped me with very good answers.

I am doing Kaggle competitions but I always run on the problem that I can run bigger batch size and get really bad results.

I have two 2080 ti with 11 Gig of memory and trying to run images 300x300 with batch size 8 give me very bad results and with 16 it always tells me that CUDA ran out of memory…

Can you help me, please

1 Like

You can accumulate gradient for multiple mini batches and then do a single backward pass, to simulate a larger batch size

@carloalbertobarbano how does this works, what do you mean to accumulate mini batches and do single backward pass ? can you elaborate a little and if you have code will appreciate it.

Thanks

You want batch_size=16, but you can only fit 8 images images in your memory: then you will accumulate the gradients for two mini batches of size 8, and perform the optimization step every two iterations (2*8 = 16). Your code would look something like this:

dataloader = DataLoader(.., batch_size=8, ..)

for i, (minibatch, labels) in enumerate(dataloader):
  output = model(minibatch)
  loss = criterion(output, labels)
  loss.backward()

  if (i+1) % 2 == 0:
    optimizer.step()
    optimizer.zero_grad()
2 Likes

@carloalbertobarbano thanks! let me try it! so I can do mulitple of 8, like

if(i+1) % 4 == 0:
    optimizer.step()
    optimizer.zero_grad()

this will be like batch of 32 right ?

Yes, that’s correct :smiley:

@carloalbertobarbano, by any chance would you know how to implement on fastai?

# https://gist.github.com/thomwolf/ac7a7da6b1888c2eeac8ac8b9b05d3d3
for i, (inputs, labels) in enumerate(training_set):
    predictions = model(inputs)                     # Forward pass
    loss = loss_function(predictions, labels)       # Compute loss function
    loss = loss / accumulation_steps                # Normalize our loss (if averaged)
    loss.backward()                                 # Backward pass
    if (i+1) % accumulation_steps == 0:             # Wait for several backward steps
        optimizer.step()                            # Now we can do an optimizer step
        model.zero_grad()                           # Reset gradients tensors
        if (i+1) % evaluation_steps == 0:           # Evaluate the model when we...
            evaluate_model() 

Nope sorry, I don’t know about fast.ai. But that code looks right

@carloalbertobarbano Thank you amigo!

1 Like