Variable is deprecated and become a no-op? (torch 0.4.1)

Since variable is deprecated,I used tensor with required_grad = True as the model input,
The training step as follow:
Code :

for i, (inputs, label) in enumerate(train_loader):
    inputs, label = inputs.to(device), label.to(device)

    optimizer.zero_grad()
    output = model(inputs)

    t1 = time.time()
    loss = criterion(output, label)
    loss.backward()
    optimizer.step()
    t2 = time.time()

However, it costs 17s for computing loss.backward().

After I add Variable(Inputs), it only costs 0.003s to compute loss.backward()

for i, (inputs, label) in enumerate(train_loader):
    inputs, label = inputs.to(device), label.to(device)
    inputs, label = Variable(inputs), Variable(label)

    optimizer.zero_grad()
    output = model(inputs)

    t1 = time.time()
    loss = criterion(output, label)
    loss.backward()
    optimizer.step()
    t2 = time.time()

My question is, since my torch version is 0.4.1 (torch.version) and according to pytorch document, Variable() is no-op now. Why I got a speed-up result after I use this function?
Is there any suggestion that I can find out the reason? thank you so much!!!

If you create a plain Variable, it won’t require gradients.
Did you set requires_grad=True in the __getitem__ method of your Dataset?
If so, could you remove it and check the time again?

Also, note that CUDA calls are asynchronous. If you are using a GPU as device, you should synchronize the code before starting and stopping the timer using:

torch.cuda.synchronize()
t1 = time.perf_counter()
...
torch.cuda.synchronize()
t2 = time.perf_counter()