RuntimeError: CUDA out of memory?

The same code , with the same setup, with the same torch version, was running fine 2 days ago. Today I tried to rerun the code I got this error.
I’m using colab

/content/WS_DAN_PyTorch-master
Dataset Name:car, Train:[223927], Val:[61159]
Batch Size:[12], Total:::Train Batches:[18661],Val Batches:[5097]
Namespace(action='train', alpha=0.95, batch_size=12, checkpoint_path='checkpoint/car', dataset='car', epochs=80, gpu_ids='0', image_size=512, input_size=448, lr=0.001, model_name='inception', momentum=0.9, multi_gpu=True, optim='sgd', parts=32, print_freq=100, resume='', scheduler='step', use_gpu=True, weight_decay=1e-05, workers=4)
/usr/local/lib/python3.6/site-packages/torch/optim/lr_scheduler.py:123: UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`.  Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
  "https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate", UserWarning)
Start epoch 0 ==========,lr=0.001000
/usr/local/lib/python3.6/site-packages/torch/nn/functional.py:3121: UserWarning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details.
  "See the documentation of nn.Upsample for details.".format(mode))
[W TensorIterator.cpp:924] Warning: Mixed memory format inputs detected while calling the operator. The operator will output channels_last tensor even if some of the inputs are not in channels_last format. (function operator())
[W TensorIterator.cpp:918] Warning: Mixed memory format inputs detected while calling the operator. The operator will output contiguous tensor even if some of the inputs are in channels_last format. (function operator())
Epoch: [0][0/18661]	Time 4.517 (4.517)	Data 1.092 (1.092)	Loss 10.3476 (10.3476)	Prec@1 0.000 (0.000)	Prec@5 0.000 (0.000)
loss1,loss2,loss3,feature_center_loss 9.355782508850098 9.292245864868164 9.394888877868652 0.9999999403953552
Traceback (most recent call last):
  File "train_bap.py", line 210, in <module>
    train()
  File "train_bap.py", line 145, in train
    train_prec, train_loss = engine.train(state, e)
  File "/content/WS_DAN_PyTorch-master/utils/engine.py", line 58, in train
    _, _, output3 = model(img_crop)
  File "/usr/local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 153, in forward
    return self.module(*inputs[0], **kwargs[0])
  File "/usr/local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/content/WS_DAN_PyTorch-master/model/inception_bap.py", line 185, in forward
    ftm = self.Mixed_6e(x) #N x 768 x 17 x 17
  File "/usr/local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/content/WS_DAN_PyTorch-master/model/inception_bap.py", line 296, in forward
    branch7x7 = self.branch7x7_1(x)
  File "/usr/local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/content/WS_DAN_PyTorch-master/model/inception_bap.py", line 422, in forward
    x = self.bn(x)
  File "/usr/local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.6/site-packages/torch/nn/modules/batchnorm.py", line 136, in forward
    self.weight, self.bias, bn_training, exponential_average_factor, self.eps)
  File "/usr/local/lib/python3.6/site-packages/torch/nn/functional.py", line 2016, in batch_norm
    training, momentum, eps, torch.backends.cudnn.enabled
RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 11.17 GiB total capacity; 10.49 GiB already allocated; 9.81 MiB free; 10.80 GiB reserved in total by PyTorch)

engine code:

def train(self,state,epoch):
        batch_time = AverageMeter()
        data_time = AverageMeter()
        losses = AverageMeter()
        top1 = AverageMeter()
        top5 = AverageMeter()
        config = state['config']
        print_freq = config.print_freq
        model = state['model']
        criterion = state['criterion']
        optimizer = state['optimizer']
        train_loader = state['train_loader']
        model.train()
        end = time.time()
        for i, (img, label) in enumerate(train_loader):
            # measure data loading time
            data_time.update(time.time() - end)

            target = label.cuda()
            input = img.cuda()
            # compute output
            attention_maps, raw_features, output1 = model(input)
            features = raw_features.reshape(raw_features.shape[0], -1)

            feature_center_loss, center_diff = calculate_pooling_center_loss(
                features, state['center'], target, alfa=config.alpha)

            # update model.centers
            state['center'][target] += center_diff

            # compute refined loss
            # img_drop = attention_drop(attention_maps,input)
            # img_crop = attention_crop(attention_maps, input)
            img_crop, img_drop = attention_crop_drop(attention_maps, input)
            _, _, output2 = model(img_drop)
            _, _, output3 = model(img_crop)

Thanks for your time.

I guess this is the culprit, you are addingt he graph here as well. try using the .item() on it and you should be fine.