Hi,
I’m trying to compute the following dot product.
x = torch.rand((256, 32, 32, 1, 256)).cuda()
y = torch.rand((256, 1, 1, 256, 1024)).cuda()
z = x @ y
expecting to obtain:
z → torch.Size([256, 32, 32, 1, 1024])
however I get:
RuntimeError: CUDA out of memory. Tried to allocate 256.00 GiB (GPU 0; 10.92 GiB total capacity; 512.00 MiB already allocated; 9.78 GiB free; 514.00 MiB reserved in total by PyTorch)
Is there a way to perform this operation efficiently that can fit in the gpu ram?