Using of data.to(device ) in pytorch

Sanjayvarma11 · February 21, 2020, 8:59am

Hello guys. I am very interested in learning PyTorch.so I am trying to understand some basic code in PyTorch.
here are some of the doubts I got while going through the code

I found a piece of code like this in training function

for batch_idx, (data, target) in enumerate(pbar):
    # get samples
    data, target = data.to(device), target.to(device)

Here my doubt is what does "data.to(device), target.to(device ) "do in PyTorch.

2)I also found line of code
optimizer.zero_grad()
what does this piece of code do in PyTorch? One of the comments says that it will make gradients zero.
so what does the advantage if we make gradients to zero.??

learner47 · February 21, 2020, 9:23am

data.to(device) moves the data to cpu or GPU based on what device is. This is required for faster computations.

In PyTorch, the gradients are accumulated using loss.backward() and then the gradients are applied using optimizer.step(). The stale gradients from the previous back propagation need to be cleared before running the optimizer.step() again. This is achived through optimzer.zero_grad().

mmisiur · February 21, 2020, 9:25am

.to(x) as states in Docs moves the tensor from current device to the device x, i.e. cpu ("cpu") or gpu ("cuda:0", "cuda:1" …). It’s useful, because you can specify x in the beginning of code and from there do not care whether it is cpu or gpu, and just move all your tensors, models etc. to it. The alternative is to call .cpu() or cuda(), but it lacks the flexibility of .to(). There is awesome explanation of this in the official PyTorch Docs here.

about .zero_grad() you can find extensive discussion here.