Model.cuda() vs. model.to(device)

I suppose that model.cuda() and model.to(device) are the same, but they actually gave me different running time.

I have the follwoing:

device = torch.device("cuda")
model = model_name.from_pretrained("./my_module")  # load my saved model
tokenizer = tokenizer_name.from_pretrained("./my_module")  # load tokenizer
model.to(device)  # I think no assignment is needed since it's not a tensor
model.eval()  # I run my model for testing

However, later testing process takes 2 min 19 sec, which is different from if I do model.cuda() instead of model.to(device), while the latter takes 1 min 08 sec. I know they both are fast, but I don’t understand why their running times are quite different while the two ways of coding should be the same thing.

I was wondering due to the data amount, time might fluctuate. I can understand that. But I just wanna make sure the the two ways of coding a the same.

1 Like

Hi,

They do the same thing yes: send each param to the GPU one after the other.
Are you sure that you don’t have something else on the machine that could be using either the GPU, the CPU or the disk and that would slow down your eval?

3 Likes

Hi,

Yes, I didn’t modify any line of code except changing the ways of utilizing GPU. If they actually do the same thing, then I guess it might due to the case that warm-up time varies.

Note that if it is a shared machine, this kind of thing can also be cause by other users :wink:

That makes sense. Thank you!