I have been playing around with Pytorch on Linux for some time now and recently decided to try get more scripts to run with my GPU on my Windows desktop. Since trying this I have noticed a massive performance difference between my GPU execution time and my CPU execution time, on the same scripts, such that my GPU is significantly slow than CPU. To illustrate this I just a tutorial program found here (https://pytorch.org/tutorials/beginner/pytorch_with_examples.html#pytorch-tensors)
import torch import datetime print(torch.__version__) dtype = torch.double #device = torch.device("cpu") device = torch.device("cuda:0") # N is batch size; D_in is input dimension; # H is hidden dimension; D_out is output dimension. N, D_in, H, D_out = 64, 1000, 100, 10 # Create random input and output data x = torch.randn(N, D_in, device=device, dtype=dtype) y = torch.randn(N, D_out, device=device, dtype=dtype) # Randomly initialize weights w1 = torch.randn(D_in, H, device=device, dtype=dtype) w2 = torch.randn(H, D_out, device=device, dtype=dtype) start = datetime.datetime.now() learning_rate = 1e-6 for t in range(5000): # Forward pass: compute predicted y h = x.mm(w1) h_relu = h.clamp(min=0) y_pred = h_relu.mm(w2) # Compute and print loss loss = (y_pred - y).pow(2).sum().item() #print(t, loss) # Backprop to compute gradients of w1 and w2 with respect to loss grad_y_pred = 2.0 * (y_pred - y) grad_w2 = h_relu.t().mm(grad_y_pred) grad_h_relu = grad_y_pred.mm(w2.t()) grad_h = grad_h_relu.clone() grad_h[h < 0] = 0 grad_w1 = x.t().mm(grad_h) # Update weights using gradient descent w1 -= learning_rate * grad_w1 w2 -= learning_rate * grad_w2 end = datetime.datetime.now() print(end-start)
I increased the number of Epoch’s from 500 to 5000 as I have read that the first CUDA call is very slow due to initialisation. However the performance issue still exists.
device = torch.device("cpu") the final time printed out is normal around 3-4 seconds, well
device = torch.device("cuda:0") executes in around 13-15 seconds
I have reinstalled Pytorch a number of different ways (uninstalling the previous installation of course) and the problem still persists. I am hoping that someone can help me, if I have perhaps missed a set (didn’t install some other API/program) or am doing something wrong in the code.
GPU: NVIDIA GeForce GTX 1060 6GB
CUDA: 9.0 (According to
Any help would be appreciated