0.4.1 is slower than 0.3.1 in backprop

Hongdam · August 2, 2018, 5:02pm

Hello!

I was porting 0.3.1 code to 0.4.1 but performance was lower than before. It was the same result when I wrote and tested it with a simple code. Below is the simple code.

Is there a mistake in my code??

import torch
import torch.nn as nn
import torch.optim as optim
import numpy as np
import time


# ----------------Test Model----------------
class Test(nn.Module):
    def __init__(self):
        super(Test, self).__init__()

        self.layers = nn.Sequential(
            nn.Linear(4, 128),
            nn.ReLU(),
            nn.Linear(128, 128),
            nn.ReLU(),
            nn.Linear(128, 2)
        )

    def forward(self, x):
        return self.layers(x)
# ------------------------------------------


# ----------------Make Model----------------
use_cuda = torch.cuda.is_available()

if torch.__version__ == '0.4.1':
    device = torch.device("cuda" if use_cuda else "cpu")
    model = Test().to(device)
    #criterion = nn.MSELoss()
elif torch.__version__ == '0.3.1.post2':
    import torch.autograd as autograd

    if use_cuda:
        model = Test().cuda()
    else:
        model = Test()
# -------------------------------------------


optimizer = optim.Adam(model.parameters())


# -------------------Test--------------------
if torch.__version__ == '0.4.1':
    q_value = model(torch.FloatTensor(np.random.rand(1, 4)).to(device))
    expected_q_value = torch.FloatTensor(np.random.rand(1, 2)).to(device)

    print(q_value.requires_grad, expected_q_value.requires_grad)
    loss = (q_value - expected_q_value).pow(2).mean()
    #loss = criterion(q_value, expected_q_value)

    torch.cuda.synchronize()
    s = time.time()
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
    torch.cuda.synchronize()
    print(time.time() - s)
elif torch.__version__ == '0.3.1.post2':
    if use_cuda:
        a = autograd.Variable(torch.FloatTensor(np.random.rand(1, 4))).cuda()
        b = torch.FloatTensor(np.random.rand(1, 2)).cuda()
    else:
        a = autograd.Variable(torch.FloatTensor(np.random.rand(1, 4)))
        b = torch.FloatTensor(np.random.rand(1, 2))
    q_value = model(a)
    expected_q_value = autograd.Variable(b)

    print(q_value.requires_grad, expected_q_value.requires_grad)

    loss = (q_value - expected_q_value).pow(2).mean()

    torch.cuda.synchronize()
    s = time.time()
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
    torch.cuda.synchronize()
    print(time.time() - s)

rasbt · August 3, 2018, 4:43am

How much slower? I am wondering because I’ve seen some other post recently where the performance was very bad for PyTorch > 0.4:

smth · August 3, 2018, 5:35am

Indeed, I wonder about three things:

how much slower?
what GPU are you using?
How did you install pytorch? (conda / pip / what command?)

There are minor differences in speed between the CuBLAS in CUDA8 (which was default binary for pytorch 0.3.1) and CuBLAS in CUDA9 (default binary for pytorch 0.4.1).

Hongdam · August 3, 2018, 11:54am

Pytorch 0.3.1 was 1.4 times faster than 0.4.1 in the simple code. It was not as slow as 50 times. Thanks.

Hongdam · August 3, 2018, 11:56am

How much slower?

Pytorch 0.3.1 was 1.4 times faster than 0.4.1 in the simple code. In addition, learning time increased from 5 hours and 20 minutes to 6 hours and 30 minutes in order code.

what GPU are you using?

I used Titan XP

How did you install pytorch? (conda / pip / what command?)

I am using a conda and I used the following command to install the 0.3.1
conda install -c peterjc123 pytorch cuda90
and when installing the 0.4.1, I used the following command
conda install pytorch -c pytorch
(in windows 10)

Thanks