Model not releasing memory even with torch.no_grad and torch.cuda.empty_cache()

I have created a model using this transfer learning tutorial however when I use it the GPU memory finishes because whenever I do model(tensor) some memory is occupied. the larger the tensor the larger the memory occupied. and this is not released even upon torch.cuda.empty_cache() and even when I use with torch.no_grad.

this is the error I get
CUDA out of memory. Tried to allocate 5.86 GiB. GPU 0 has a total capacity of 15.89 GiB of which 3.68 GiB is free. Process 3018 has 12.21 GiB memory in use. Of the allocated memory 11.74 GiB is allocated by PyTorch, and 181.37 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

the process 3018 line I’m getting for the first time , however it most probably is just pytorch
here is the kaggle link

Calling torch.cuda.empty_cache() will clear the cache but will not release allocated and used memory. If you want to release all memory you need to delete all objects allocating memory first. The error could this be expected assuming an intermediate tensor must to created and triggers the OOM.