Do I need to call .cuda() on optimizer and criterions?

Calling .cuda on optimizer that uses Adam gives me AttributeError: 'Adam' object has no attribute 'cuda', calling .cuda() criterion is fine. Also if I have 2 GPUs do I do something different? When I look at my GPU usage it’s very spiky, the usage goes up and down with long delays of zero usage. I followed the data parallelism tutorial to use 2 GPUs for the model and that’s about it. Can some one give me a list of thing I should call .cuda() on?