Hi, I have seen a couple of postings on this error, but no one has posted their solution in detail. I am getting a strange ‘out of memory’ error when I try to run my imaging pipeline on 2.5 million images, versus on 150,000 images (where it works just fine).
This is the error message:
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-22-f844cad716f4> in <module>
19 return log.plot_epochs(log=True)
20
---> 21 training_step(n_epochs, data, encoder, decoder, optimizer, criterion)
<ipython-input-22-f844cad716f4> in training_step(n_epochs, data, encoder, decoder, optimizer, criterion)
4 N = len(trn_dl)
5 for i, data in enumerate(trn_dl):
----> 6 trn_loss = train_batch(data, encoder, decoder, optimizer, criterion)
7 #trn_loss = train_batch(data, encoder, decoder, optimizer, criterion, batch_size)
8 pos = epoch + (1+i)/N
<ipython-input-16-b949d8041438> in train_batch(data, encoder, decoder, optimizer, criterion)
13 encoder.zero_grad()
14 loss.backward()
---> 15 optimizer.step()
16 return loss
/opt/conda/lib/python3.8/site-packages/torch/optim/optimizer.py in wrapper(*args, **kwargs)
86 profile_name = "Optimizer.step#{}.step".format(obj.__class__.__name__)
87 with torch.autograd.profiler.record_function(profile_name):
---> 88 return func(*args, **kwargs)
89 return wrapper
90
/opt/conda/lib/python3.8/site-packages/torch/autograd/grad_mode.py in decorate_context(*args, **kwargs)
26 def decorate_context(*args, **kwargs):
27 with self.__class__():
---> 28 return func(*args, **kwargs)
29 return cast(F, decorate_context)
30
/opt/conda/lib/python3.8/site-packages/torch/optim/adamw.py in step(self, closure)
90 state['step'] = 0
91 # Exponential moving average of gradient values
---> 92 state['exp_avg'] = torch.zeros_like(p, memory_format=torch.preserve_format)
93 # Exponential moving average of squared gradient values
94 state['exp_avg_sq'] = torch.zeros_like(p, memory_format=torch.preserve_format)
RuntimeError: CUDA out of memory. Tried to allocate 2.19 GiB (GPU 0; 15.78 GiB total capacity; 14.21 GiB already allocated; 144.75 MiB free; 14.29 GiB reserved in total by PyTorch)
The batching size for the dataloader is 32 but I’ve tried anywhere between 5 and 32 and it makes no difference in the Cuda memory usage. I’ve found that after my Resnet, a lot of memory is being used up.
The basic nn stack is:
- encoder: resnet with final fc layer removed to expose only features from transfer learning
- decoder: lstm seq-to-seq with one linear layer
Goal is to predict image captions from a set of images up to 2.5 million.
the training code is below:
def training_step(n_epochs, data, encoder, decoder, optimizer, criterion):
for epoch in range(n_epochs):
if epoch == 5: optimizer = torch.optim.AdamW(params, lr=1e-4)
N = len(trn_dl)
for i, data in enumerate(trn_dl):
trn_loss = train_batch(data, encoder, decoder, optimizer, criterion)
#trn_loss = train_batch(data, encoder, decoder, optimizer, criterion, batch_size)
pos = epoch + (1+i)/N
log.record(pos=pos, trn_loss=trn_loss, end='\r')
N = len(val_dl)
for i, data in enumerate(val_dl):
val_loss = validate_batch(data, encoder, decoder, criterion)
#val_loss = validate_batch(data, encoder, decoder, criterion, batch_size)
pos = epoch + (1+i)/N
log.record(pos=pos, val_loss=val_loss, end='\r')
log.report_avgs(epoch+1)
return log.plot_epochs(log=True)
training_step(n_epochs, data, encoder, decoder, optimizer, criterion)
The memory usage is as follows:
0 after basic data loading of images using the custom DL
Jumps to 7295012864 after sending the encoder and decoder to(device).
Tried the following to empty the cache and it did nothing:
print(torch.cuda.memory_allocated())
print(torch.cuda.memory_cached())
torch.cuda.empty_cache()
print(torch.cuda.memory_cached())
If we figure out the problem, I will post a detailed answer for future forum users. Appreciate any suggestions on how to fix this.