CUDA out of memory during training step

I trained my model with two GPU by DDP(), it’s work well.
But today it always show “CUDA out of memory” when training step and always stop in same train iter (247).
I checked the memory by wathch -n 1 nvidia-smi and it was normal up to 247 iter. At 247 iter, the memory would suddenly increase until it exceeded 32GB.

2022-08-16 14:15:03.840 | INFO     | utils.fit:loss_log:89 - loss_all: 0.000, heatmap_loss: 0.000, xy_offset_loss: 0.000, z_offset_loss: 0.000, wlh_loss: 0.000, angle_bin: 0.000, angle_offset: 0.000, cls_loss: 0.000
2022-08-16 14:15:03.841 | INFO     | utils.fit:training_step:185 - Parameter containing:
tensor([[-0.2428,  0.0316, -0.2120,  0.2174,  0.2807,  0.2077,  0.0292,  0.1832,
          0.1201],
        [-0.2144,  0.0829, -0.0840, -0.0549, -0.0242,  0.1138, -0.2576, -0.0103,
          0.3143],
        [ 0.0353,  0.0268, -0.2982,  0.0269, -0.0893,  0.0274,  0.1253,  0.1671,
         -0.3090]], device='cuda:0', requires_grad=True)
2022-08-16 14:15:03.843 | INFO     | utils.fit:training_step:186 - epoch: 72/200, iter: 247/906 optim_lr: 0.00010000000000000002, sche_lr: 0.00010000000000000002
time cost : 3.46462 sec
Traceback (most recent call last):
  File "3D_train.py", line 242, in <module>
    trainer.train()
  File "3D_train.py", line 235, in train
    self.fit_func.fit(epoch)
  File "/data/lianghao/lidar_and_4D_imaging_radar_fusion_demo/3D-MAN-reproduction/utils/fit.py", line 354, in fit
    all_loss = self.training_step(epoch)
  File "/data/lianghao/lidar_and_4D_imaging_radar_fusion_demo/3D-MAN-reproduction/utils/fit.py", line 150, in training_step
    output = self.model(lidar_pillar, self.opts) # time cost : 0.03377 sec
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/parallel/distributed.py", line 619, in forward
    output = self.module(*inputs[0], **kwargs[0])
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/data/lianghao/lidar_and_4D_imaging_radar_fusion_demo/3D-MAN-reproduction/model/FSD_module.py", line 123, in forward
    pesu_img = self.lidar_branch(pillars)
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/data/lianghao/lidar_and_4D_imaging_radar_fusion_demo/3D-MAN-reproduction/model/FSD_module.py", line 81, in forward
    x = self.PFNLayer(x)
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/data/lianghao/lidar_and_4D_imaging_radar_fusion_demo/3D-MAN-reproduction/model/FSD_module.py", line 64, in forward
    x = self.norm(x)
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/batchnorm.py", line 136, in forward
    self.weight, self.bias, bn_training, exponential_average_factor, self.eps)
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/functional.py", line 2058, in batch_norm
    training, momentum, eps, torch.backends.cudnn.enabled
RuntimeError: CUDA out of memory. Tried to allocate 1.83 GiB (GPU 0; 31.75 GiB total capacity; 28.34 GiB already allocated; 106.50 MiB free; 29.10 GiB reserved in total by PyTorch)
Traceback (most recent call last):
  File "/opt/conda/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/opt/conda/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/opt/conda/lib/python3.6/site-packages/torch/distributed/launch.py", line 260, in <module>
    main()
  File "/opt/conda/lib/python3.6/site-packages/torch/distributed/launch.py", line 256, in main
    cmd=cmd)
subprocess.CalledProcessError: Command '['/opt/conda/bin/python', '-u', '3D_train.py', '--local_rank=1']' returned non-zero exit status 1.

I resume my checkpoint file so the start epoch is 72.