When should a variable be cuda'd?

A seemingly very basic question but I’ve gone through a few tutorials and am still at a loss. :rimshot:

What are the basic rules of thumb for deciding when something should have an if statement with .cuda()?

It almost seems random in some of the tutorials. Parts of model classes have to return .cuda(), other parts don’t. Even when the whole class itself then gets .cuda(). The main tutorial I’ve spent time with is the really good seq2seq one. Initially I downloaded the older version from the github repo that didn’t have the cuda options already included. So I spent some time changing random things to .cuda() until it worked. There must be a more sensible way of understanding what needs .cuda().

It should almost always be possible to get away with only two calls to .cuda(): one on the data and one on the model. Everywhere else you should use tensor.new to create tensors on the same device as other tensors you already have.

1 Like

That’s great. Thanks for the suggestion. Just to clarify what you mean by judicious use of tensor.new. I haven’t been able to find any documentation for tensor.new. I’m guessing the general idea is whenever defining a tensor within a Variable to use tensor.new().zeros(x, y) and this will avoid having to use an is_cuda if statement.

As for the data, I’m going to experiment with this myself soon but I thought I’d add a question in for anyone else who comes along. Torchtext and the regular dataloader both create iterators. Can you run .cuda() on the batch iterator or do have to run it on every Variable batch the comes out of the iterator.

There examples in seq2seq tutorial of the encoder and decoder models being cuda’d

EDIT: I just noticed that torchtext does indeed already have a built in option for specifying device when creating the batch iterators