Initial Pytorch setup and another few questions

Hi everyone .

I’m new to Pytorch and I’m trying to figure out a few things right now .

I have the following components :

   ...:
`__Python VERSION: 3.6.5 |Anaconda, Inc.| (default, Mar 29 2018, 13:32:41) [MSC v.1900 64 bit (AMD64)]
__pyTorch VERSION: 1.2.0+cu92
__CUDA VERSION
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Tue_Jun_12_23:08:12_Central_Daylight_Time_2018
Cuda compilation tools, release 9.2, V9.2.148
__CUDNN VERSION: 7201
__Number CUDA Devices: 1
__Devices
Active CUDA Device: GPU 0
Available devices  1
Current cuda device  0``

I’m using the following hardware and software :
Z440 G4
16GB RAM
NVIDIA QUADRO RTX 4000 (CUDA supported)
Windows server 2012r2
Anaconda 3.5.2 - Python 3.6.5 ( I need that specific version) .

  1. After the installation of all the components I needed I searched for a training example just to make sure Pytorch is working . I executed the following :
`import torch
 import torch.optim as optim

 model = torch.nn.Linear(5, 2)

  Initialize optimizer
 optimizer = optim.SGD(model.parameters(), lr=0.001, momentum=0.9)

 print("Model's state_dict:")
 for param_tensor in model.state_dict():
     print(param_tensor, "\t", model.state_dict()[param_tensor].size())

 print("Model weight:")
 print(model.weight)

 print("Model bias:")
 print(model.bias)

 print("---")
 print("Optimizer's state_dict:")
 for var_name in optimizer.state_dict():
     print(var_name, "\t", optimizer.state_dict()[var_name])``

And got the following output

Model's state_dict:
weight   torch.Size([2, 5])
bias     torch.Size([2])
Model weight:
Parameter containing:
tensor([[-0.1759, -0.2537,  0.1401,  0.2342, -0.3865],
        [-0.2142,  0.2763, -0.2919,  0.2003, -0.3242]], requires_grad=True)
Model bias:
Parameter containing:
tensor([-0.0585,  0.2511], requires_grad=True)
---
Optimizer's state_dict:
state    {}
param_groups     [{'lr': 0.001, 'momentum': 0.9, 'dampening': 0, 'weight_decay': 0, 'nesterov': False, 'params': [935863832152, 9358638
31360]}]

Next I have saved it by using :

torch.save(model.state_dict(), "temp.pt")

So far it seems OK to me . Anything else I can / Should check in addition ?

2 . How can I check in real time that my GPU is working ? I want to make sure PYTORCH is using the GPU and not the CPU . Anything I can check ? Saved log file maybe ?

Appreciate your help .
Please let me know if any information is needed .

Push your model to the device and execute a training step:

device = 'cuda'
model.to(device)
data = torch.randn(1, 5, device=device)
out = model(data)
out.mean().backward()

If this code runs fine then your GPU is used.

For more testing, you could execute some examples.

Thank you for the answer .
I don’t have any ready model really so I picked up the following code instead and uncomment the GPU part to make sure the GPU will be used:

: dtype = torch.float
: #device = torch.device("cpu")
: device = torch.device("cuda:0") # Uncomment this to run on GPU
:
: # N is batch size; D_in is input dimension;
: # H is hidden dimension; D_out is output dimension.
: N, D_in, H, D_out = 64, 1000, 100, 10
:
: # Create random input and output data
: x = torch.randn(N, D_in, device=device, dtype=dtype)
: y = torch.randn(N, D_out, device=device, dtype=dtype)
:
: # Randomly initialize weights
: w1 = torch.randn(D_in, H, device=device, dtype=dtype)
: w2 = torch.randn(H, D_out, device=device, dtype=dtype)
:
: learning_rate = 1e-6
: for t in range(500):
:     # Forward pass: compute predicted y
:     h = x.mm(w1)
:     h_relu = h.clamp(min=0)
:     y_pred = h_relu.mm(w2)
:
:     # Compute and print loss
:     loss = (y_pred - y).pow(2).sum().item()
:     if t % 100 == 99:
:         print(t, loss)
:
:     # Backprop to compute gradients of w1 and w2 with respect to loss
:     grad_y_pred = 2.0 * (y_pred - y)
:     grad_w2 = h_relu.t().mm(grad_y_pred)
:     grad_h_relu = grad_y_pred.mm(w2.t())
:     grad_h = grad_h_relu.clone()
:     grad_h[h < 0] = 0
:     grad_w1 = x.t().mm(grad_h)
:
:     # Update weights using gradient descent
:     w1 -= learning_rate * grad_w1
:     w2 -= learning_rate * grad_w2

Got the following:

99 384.6168518066406
199 2.1987862586975098
299 0.02469571679830551
399 0.0005685320356860757
499 7.920750067569315e-05

If what you are doing here does not trigger any errors, it means that the GPU is working well, since all of your tensors are in the GPU.

So you know, what the code you posted is doing is calculating by ‘hand’ the error and then updating the values of w1 and w2.

Using the code you and ptrblck provided, here is a snippet you can copy and paste to check if the system is using the GPU for training:

import torch
import torch.optim as optim

model = torch.nn.Linear(5, 2)

#Initialize optimizer
optimizer = optim.SGD(model.parameters(), lr=0.001, momentum=0.9)

device = 'cuda'
model.to(device)
data = torch.randn(1, 5, device=device)
out = model(data)
out.mean().backward()

optimizer.step()
optimizer.zero_grad()