Basic requirements for doing GPU-based computations

I’m using PyTorch tensor library for tensor based computations. I’m implementing a particular type of spiking neural network and I do not use PyTorch’s Network or Module. I just use tensor and its methods plus some functions in nn.functionals.

Currently, in order to have a single code capable of running on GPU and CPU, I just set the device property for tensors assuming that all the computations that involves GPU-allocated tensors will be done completely by the GPU. I was monitoring my GPU usage and I found that it does not reach to the maximum usage capacity even for a large instance of the problem (I’m not sure if it was large enough!). This made me unsure about my assumption. Is it enough to put all the tensors on GPU in order to have all the computations on GPU, or I have to explicitly call .cuda() for all the built-in methods and functions?

It should be sufficient to move your tensors to the GPU.
You can check the current device by printing tensor.device.

If your GPU is not fully utilized your code might have some bottleneck or the operations are just too small so that the overhead of moving between host and device is bigger than the actual performance gain.