Unable to allocate cuda memory, when there is enough of cached memory

reducing to smallest batch_size =2 still didnt worked. Giving error,
RuntimeError: CUDA out of memory. Tried to allocate 144.00 MiB (GPU 0; 2.00 GiB total capacity; 1.01 GiB already allocated; 105.76 MiB free; 1.05 GiB reserved in total by PyTorch)

I tried to do restart and things, but it dont worked.
when using without cuda, notebook freezes on running both locally and in colab.

Oh it might be problem in my implementation, pretrained network using cuda working.