Hi,
what is the most elegant way to force a Tensor
to always stay on the CPU
?
I have a SparseLinear
layer that won’t fit on my GPU, so I’d like that part of the net to stay on the CPU, even when the rest of my model lives on the GPU.
Currently I’m using a rather ugly hack, by simply replacing the cuda()
method of the Tensor
[1]
And one more question, what’s the reason cuda() / cpu()
methods of a Model
do not call the same methods of the children? E.g I thought that calling model.cuda()
would call model.sparse_layer.cuda()
(which would move result of sparse dot product to GPU), but that’s not the case, since self._apply()
only works with parameters / buffers.
Is calling model.apply(lambda t: t.cuda())
the solution here, or I shouldn’t call cuda()
on all the children?
[1]
def force_cpu_(tensor):
tensor.cuda = types.MethodType(lambda self, *args, **kwargs: self,
tensor)
return tensor