Why torch.sparse.cuda() is much faster than torch.cuda()?

The opt torch.sparse.cuda() is much faster than torch.cuda(), why?


I don’t think torch.cuda() is a function? Could you be clearer on what you compare and how please?

b.cuda() is much faster than a.cuda() even though a.size() is roughly equal to b.size().


How many non-zero elements are in the sparse Tensor? The whole point of the sparse tensor is to only save the non-zero values, so there is potentially much less things to transfer to the gpu.

Hi, i have another question. When i use sparse tensor cuda() in dfferent models, the same tensor takes different time (one is based on nn.Module, another is an C++ cuda extension i wrote). What can affect the time ?

I have known the reason. The cuda() is asynchronous.

