Out-of-memory error resulting from multiplying Tensors (help needed)

I’m playing around with the LSTM architecture and I added another set of weights, the same dimension as the original set.

   def LSTMCell(input, hidden, w_ih, w_ih2, w_hh, w_hh2, b_ih=None, b_hh=None):
        hx, cx = hidden
        w_ih = torch.mul(w_ih2, w_ih)
        w_hh = torch.mul(w_hh2, w_hh)

I get a RuntimeError: cuda runtime error (2) : out of memory at c:\programdata\miniconda3\conda-bld\pytorch_1524543037166\work\aten\src\thc\generic/THCStorage.cu:58 at line: w_hh = (w_hh2 * w_hh)

Anybody have any idea why this is happening?
Does element wise multiplication of Tensors (correction: Matrices) create an unnecessary gradient build up?
(I need to use element wise multiplication to keep the original Tensor (correction: Matrix) size)

Any help would be appreciated. Thanks!!

I’d recommend you printing the sizes of the tensors that you are multiplying.
There might be some broadcasting happening under the hood (for example, a 10000 tensor multiplied by a 10000 x 1 tensor, which would generate a 10000 x 10000 tensor as result)

1 Like

Forgive me, but I don’t understand what broadcasting means in this context.
The sizes are [4600 , 400] for both matrices (I think I may have made a mistake by calling them Tensors), and the output is the same size.

(Worth noting, is that the network runs for a couple seconds before giving the memory error.)