RuntimeError: CUDA error: out of memory in Pytorch

I am getting this error in the last batches while i am training a large speech dataset. I tried reducing batch sizes (16,8,4,2) but every time i got this error at the end of the epoch.
can someone give me the solution?

99%|█████████▉| 11706/11812 [29:32<01:08, 1.55it/s, avg_loss=tensor(76.1413, device=‘cuda:0’), iter=11705, loss=tensor(117.4730, device=‘cuda:0’)]

RuntimeError Traceback (most recent call last)
in
3 start = time.time()
4
----> 5 run_state = run_epoch(model, optimizer, train_ldr, *run_state)
6
7 msg = “Epoch {} completed in {:.2f} (s).”

~/Hasan/Project/SpeechRNNT/speech-master/train.py in run_epoch(model, optimizer, train_ldr, it, avg_loss)
28 optimizer.zero_grad()
29 loss = model.loss(batch)
—> 30 loss.backward()
31
32 grad_norm = torch.nn.utils.clip_grad_norm_(model.parameters(), 200)

~/miniconda3/envs/ariyan/lib/python3.6/site-packages/torch/tensor.py in backward(self, gradient, retain_graph, create_graph)
91 products. Defaults to False.
92 “”"
—> 93 torch.autograd.backward(self, gradient, retain_graph, create_graph)
94
95 def register_hook(self, hook):

~/miniconda3/envs/ariyan/lib/python3.6/site-packages/torch/autograd/init.py in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables)
88 Variable._execution_engine.run_backward(
89 tensors, grad_tensors, retain_graph, create_graph,
—> 90 allow_unreachable=True) # allow_unreachable flag
91
92

~/Hasan/Project/SpeechRNNT/speech-master/transducer/functions/transducer.py in backward(self, *args)
78 grads = parent.backward(*args)[0]
79 if self.size_average:
—> 80 grads = grads / grads.shape[0]
81 return grads, None, None, None
82

RuntimeError: CUDA error: out of memory