Oom error during process in 3d side project

My code link : genji970/gaussian-occupancy-prediction-with-nuscene: using 3d gaussian splatting, generating dense voxel from sparse point cloud. Doing voxel occupancy prediction. Input is 6 multi view camera nuscene dataset and label is lidar nuscene dataset. pipeline from data process to traininig

======
Loading NuScenes tables for version v1.0-trainval...
23 category,
8 attribute,
4 visibility,
64386 instance,
12 sensor,
10200 calibrated_sensor,
2631083 ego_pose,
68 log,
850 scene,
34149 sample,
2631083 sample_data,
1166187 sample_annotation,
4 map,
Done loading in 68.971 seconds.
======
Reverse indexing ...
Done reverse indexing in 50.5 seconds.
======
  0%|          | 3/34149 [00:01<5:01:35,  1.89it/s]
[DEBUG] Loaded 3 multiview samples
Traceback (most recent call last):
  File "...\train.py", line 132, in <module>
    occ_feat = occupancy_decoder(gaussian_embed, voxel_coords)
  File "...\.venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "...\gaussian_encoder\gaussian_decoder\gaussian_decoder.py", line 50, in forward
    dist = chunked_cdist(xyz, anchor_grid, chunk_size=512)  # (Nv, N)
  File "...\utils\ops.py", line 20, in chunked_cdist
    dist_chunk = torch.cdist(chunk, anchor_grid)  # (chunk_size, N)
  File "...\.venv\lib\site-packages\torch\functional.py", line 1222, in cdist
    return _VF.cdist(x1, x2, p, None)  # type: ignore[attr-defined]
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 48.00 MiB (GPU 0; 4.00 GiB total capacity; 6.88 GiB already allocated; 0 bytes free; 6.94 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

I did

1) reducing batch size
2) gc.collect() , torch.cuda.empty_cache()
3) with torch.cuda.amp.autocast(enabled=False):
4) os.environ["PYTORCH_CUDA_ALLOC_CONF"] = "max_split_size_mb:16"
5)chunking batch data 

But didn’t work

My env : rtx1650 4gib