Hey there.
I have been playing around with Pytorch on Linux for some time now and recently decided to try get more scripts to run with my GPU on my Windows desktop. Since trying this I have noticed a massive performance difference between my GPU execution time and my CPU execution time, on the same scripts, such that my GPU is significantly slow than CPU. To illustrate this I just a tutorial program found here (https://pytorch.org/tutorials/beginner/pytorch_with_examples.html#pytorch-tensors)
import torch
import datetime
print(torch.__version__)
dtype = torch.double
#device = torch.device("cpu")
device = torch.device("cuda:0")
# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10
# Create random input and output data
x = torch.randn(N, D_in, device=device, dtype=dtype)
y = torch.randn(N, D_out, device=device, dtype=dtype)
# Randomly initialize weights
w1 = torch.randn(D_in, H, device=device, dtype=dtype)
w2 = torch.randn(H, D_out, device=device, dtype=dtype)
start = datetime.datetime.now()
learning_rate = 1e-6
for t in range(5000):
# Forward pass: compute predicted y
h = x.mm(w1)
h_relu = h.clamp(min=0)
y_pred = h_relu.mm(w2)
# Compute and print loss
loss = (y_pred - y).pow(2).sum().item()
#print(t, loss)
# Backprop to compute gradients of w1 and w2 with respect to loss
grad_y_pred = 2.0 * (y_pred - y)
grad_w2 = h_relu.t().mm(grad_y_pred)
grad_h_relu = grad_y_pred.mm(w2.t())
grad_h = grad_h_relu.clone()
grad_h[h < 0] = 0
grad_w1 = x.t().mm(grad_h)
# Update weights using gradient descent
w1 -= learning_rate * grad_w1
w2 -= learning_rate * grad_w2
end = datetime.datetime.now()
print(end-start)
I increased the number of Epoch’s from 500 to 5000 as I have read that the first CUDA call is very slow due to initialisation. However the performance issue still exists.
With device = torch.device("cpu")
the final time printed out is normal around 3-4 seconds, well device = torch.device("cuda:0")
executes in around 13-15 seconds
I have reinstalled Pytorch a number of different ways (uninstalling the previous installation of course) and the problem still persists. I am hoping that someone can help me, if I have perhaps missed a set (didn’t install some other API/program) or am doing something wrong in the code.
Python: v3.6
Pytorch:v0.4.1
GPU: NVIDIA GeForce GTX 1060 6GB
CUDA: 9.0 (According to torch.version.cuda
)
Any help would be appreciated