0.4.1 is slower than 0.3.1 in backprop

Hello!

I was porting 0.3.1 code to 0.4.1 but performance was lower than before. It was the same result when I wrote and tested it with a simple code. Below is the simple code.

Is there a mistake in my code??

import torch
import torch.nn as nn
import torch.optim as optim
import numpy as np
import time


# ----------------Test Model----------------
class Test(nn.Module):
    def __init__(self):
        super(Test, self).__init__()

        self.layers = nn.Sequential(
            nn.Linear(4, 128),
            nn.ReLU(),
            nn.Linear(128, 128),
            nn.ReLU(),
            nn.Linear(128, 2)
        )

    def forward(self, x):
        return self.layers(x)
# ------------------------------------------


# ----------------Make Model----------------
use_cuda = torch.cuda.is_available()

if torch.__version__ == '0.4.1':
    device = torch.device("cuda" if use_cuda else "cpu")
    model = Test().to(device)
    #criterion = nn.MSELoss()
elif torch.__version__ == '0.3.1.post2':
    import torch.autograd as autograd

    if use_cuda:
        model = Test().cuda()
    else:
        model = Test()
# -------------------------------------------


optimizer = optim.Adam(model.parameters())


# -------------------Test--------------------
if torch.__version__ == '0.4.1':
    q_value = model(torch.FloatTensor(np.random.rand(1, 4)).to(device))
    expected_q_value = torch.FloatTensor(np.random.rand(1, 2)).to(device)

    print(q_value.requires_grad, expected_q_value.requires_grad)
    loss = (q_value - expected_q_value).pow(2).mean()
    #loss = criterion(q_value, expected_q_value)

    torch.cuda.synchronize()
    s = time.time()
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
    torch.cuda.synchronize()
    print(time.time() - s)
elif torch.__version__ == '0.3.1.post2':
    if use_cuda:
        a = autograd.Variable(torch.FloatTensor(np.random.rand(1, 4))).cuda()
        b = torch.FloatTensor(np.random.rand(1, 2)).cuda()
    else:
        a = autograd.Variable(torch.FloatTensor(np.random.rand(1, 4)))
        b = torch.FloatTensor(np.random.rand(1, 2))
    q_value = model(a)
    expected_q_value = autograd.Variable(b)

    print(q_value.requires_grad, expected_q_value.requires_grad)

    loss = (q_value - expected_q_value).pow(2).mean()

    torch.cuda.synchronize()
    s = time.time()
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
    torch.cuda.synchronize()
    print(time.time() - s)

How much slower? I am wondering because I’ve seen some other post recently where the performance was very bad for PyTorch > 0.4:

Indeed, I wonder about three things:

  • how much slower?
  • what GPU are you using?
  • How did you install pytorch? (conda / pip / what command?)

There are minor differences in speed between the CuBLAS in CUDA8 (which was default binary for pytorch 0.3.1) and CuBLAS in CUDA9 (default binary for pytorch 0.4.1).

Pytorch 0.3.1 was 1.4 times faster than 0.4.1 in the simple code. It was not as slow as 50 times. Thanks.

How much slower?

  • Pytorch 0.3.1 was 1.4 times faster than 0.4.1 in the simple code. In addition, learning time increased from 5 hours and 20 minutes to 6 hours and 30 minutes in order code.

what GPU are you using?

  • I used Titan XP

How did you install pytorch? (conda / pip / what command?)

  • I am using a conda and I used the following command to install the 0.3.1
    conda install -c peterjc123 pytorch cuda90
    and when installing the 0.4.1, I used the following command
    conda install pytorch -c pytorch
    (in windows 10)

Thanks