Tracing failed sanity check! CUDA out of memory on simple dot product

Hello community,

It has been a while that I am having issues with GPU memory and performance of torch methods. Recently, I’ve been trying to use torch.jit to speed up things, however having memory issues while having no clue if I am in the write direction.

Since torch doesn’t seem to have an optimized way do to element wise multiplication using complex numbers (refer to previous link). I decide to put together a small experiment for it.

When I get to the desirable shapes (tensor dimensions) I will have on my application I keep getting CUDA out of memory exceptions.

My assumption is memory leak, since the size of my sample is not enough to take the total capacity of 14.73 GB of GPU that Colab provides.

This is my code snippet

def complex_dot(x, y):
  a, b = x[:, :, :, :, :, 0], x[:, :, :, :, :, 1]
  c, d = y[:, :, :, :, :, 0], y[:, :, :, :, :, 1]
  return torch.stack([a*c - b*d, a*d + b*c], dim=-1)

Complete code available on Gist

PS.: This happens regardless using jit