I suppose that model.cuda()
and model.to(device)
are the same, but they actually gave me different running time.
I have the follwoing:
device = torch.device("cuda")
model = model_name.from_pretrained("./my_module") # load my saved model
tokenizer = tokenizer_name.from_pretrained("./my_module") # load tokenizer
model.to(device) # I think no assignment is needed since it's not a tensor
model.eval() # I run my model for testing
However, later testing process takes 2 min 19 sec, which is different from if I do model.cuda()
instead of model.to(device)
, while the latter takes 1 min 08 sec. I know they both are fast, but I don’t understand why their running times are quite different while the two ways of coding should be the same thing.
I was wondering due to the data amount, time might fluctuate. I can understand that. But I just wanna make sure the the two ways of coding a the same.