Do loss, optimizer and lr_scheduler all need .to(device) operation?

In my network, there is a gpu utilization problem, it is not efficient enough.
So I have some question about the transformation between cpu and gpu, I think there exists a big gap.
in my current code: # default gpu

And what about loss, optimizer and lr_scheduler, do they need to transform to gpu?
Will this operation have a big impact on efficiency?

PS: loss, optimizer, lr_scheduler and all parameters about them (such as loss_weight)

Thanks in advance