We are running a self-attention mechanism with a conv1d network. We have gotten oom error when we multiply the quey tensor with the key tensor. The size of the query and key tensors are [16, 4, 22778]
t_1 = torch.bmm(proj_query, proj_key)
We have tried converting our tensors to half-precision; however, we still get the same error OOM:
RuntimeError: CUDA out of memory. Tried to allocate 15.46 GiB (GPU 0; 14.76 GiB total capacity; 53.88 MiB already allocated; 13.97 GiB free; 80.00 MiB reserved in total by PyTorch)
Do you have any recommendations to fix this issue?