I wrote a custom network that uses custom RNN cells that I also wrote. Everything works on CPU and I now try to get it to run on a GPU for faster training.
I’m registering all weights as well as a weight mask (with requires_grad=False
) as Parameters in my RNN cells so they get passed to the GPU by just using .to(device)
on my network. This seems to work. However in the forward pass method of my network I’m using three constructs that cause some trouble. (The code is too large to paste here, so I try to explain as good as I can.)
- A python list contexbuffer, where I append hidden states that I computed and need to access again in the same forward pass.
- A torch tensor output initialized with zeros that I fill with the final outputs after all hidden states are computed.
- Torch tensors state_mask, zero-one masks where at least one is generated per point in the sequence and passed together with input and hidden states to the RNN cell for computation of the next hidden state.
I tried moving the hidden states from the context buffer as well as the state masks to the GPU right before computation in the RNN cell. This works, but I feel like this is far from ideal. The didn’t manage to get the output to be on the GPU however, I always get
Expected object of backend CPU but got backend CUDA for argument #2 ‘target’
when calculating the loss with .to(device)
on the target.
Is there some way to initialize those lists/tensors so that they are moved to the GPU with the whole network?