I am having a hard time identifying an efficient solution for the following.
I have a tensor A with dimension (11000, 11000)
I also have a tensor H with dimension (11000, 20)
I want to populate a tensor C with the following:
The dimension of C should be (11000, 20)
Where the rows in C are:
You want to mulitply all the values in A by all values in H in a batched manner. And then reduce over the dimension of A: C = (A.unsqueeze(1) * H.unsqueeze(2)).sum(dim=1).
I tried increasing the value 20 (this is my lstm_size in my VAE model) to a higher number and it causes very high usage of memory in the calculation.
Below is the code to reproduce the high memory consumption
A = torch.rand(11000,11000)
H = torch.rand(11000,200)
C = (A.unsqueeze(1) * H.unsqueeze(2)).sum(dim=2)
The above code is an example of how the code is written in my model, which causes the session to crash.
When the model is ran i get the following error message, with the error referring to the calculation of C.
CUDA out of memory. Tried to allocate 180.30 GiB (GPU 0; 15.90 GiB total capacity; 6.81 GiB already allocated; 8.10 GiB free; 7.10 GiB reserved in total by PyTorch)