Torch.matmul error

torch.matmul(torch.randn(16, 4, 7056, 10), torch.randn(16, 4, 10, 7056))

gives error

RuntimeError: CUDA out of memory. Tried to allocate 11.87 GiB (GPU 0; 10.91 GiB total capacity; 99.79 MiB already allocated; 9.00 GiB free; 4.21 MiB cached)

any alternative?

The output size is

torch.Size([16, 4, 7056, 7056])

So, it really requires 11.87 GB on GPU. If you want whole output at once, you need a GPU with more memory than 12 GB.

You don’t give much of a context, maybe this is obvious but in case not:

You can allocate the matrices on the CPU, then iterate over the first two dimensions, and send the sliced tensors to GPU for the actual matmult (which only computes on the last two dimensions).