Model.cuda() vs.

I suppose that model.cuda() and are the same, but they actually gave me different running time.

I have the follwoing:

device = torch.device("cuda")
model = model_name.from_pretrained("./my_module")  # load my saved model
tokenizer = tokenizer_name.from_pretrained("./my_module")  # load tokenizer  # I think no assignment is needed since it's not a tensor
model.eval()  # I run my model for testing

However, later testing process takes 2 min 19 sec, which is different from if I do model.cuda() instead of, while the latter takes 1 min 08 sec. I know they both are fast, but I don’t understand why their running times are quite different while the two ways of coding should be the same thing.

I was wondering due to the data amount, time might fluctuate. I can understand that. But I just wanna make sure the the two ways of coding a the same.

1 Like


They do the same thing yes: send each param to the GPU one after the other.
Are you sure that you don’t have something else on the machine that could be using either the GPU, the CPU or the disk and that would slow down your eval?



Yes, I didn’t modify any line of code except changing the ways of utilizing GPU. If they actually do the same thing, then I guess it might due to the case that warm-up time varies.

Note that if it is a shared machine, this kind of thing can also be cause by other users :wink:

That makes sense. Thank you!