Pytorch on amd/rocm

rtadd · May 16, 2023, 1:30pm

Hello.
First of all I’d like to clarify that I’m really new in all of this, not only pytorch and ML but even python.
I’m learning to use this library and I’ve managed to make it work with my rx 6700 xt by installing both the amdgpu driver (with rocm) and the “pip install…” command as shown on the PyTorch website.

The thing is that my gpu isn’t supported according to amd’s documentation, so one extra step I had to take was to add an environment variable (os.environ[“HSA_OVERRIDE_GFX_VERSION”]= “10.3.0”) otherwise all my hardware was correcly detected, but anytime I tried to do anything with a tensor loaded on the gpu memory I would get this error: Segmentation fault (core dump)

Anyway, I managed to make it work. So far I’ve only tried to do simple linear regressions with the following code:

import torch as trc
from torch.utils.data import Dataset
import os
import time as tm
os.environ["HSA_OVERRIDE_GFX_VERSION"]= "10.3.0"

gpu = trc.device("cuda" if trc.cuda.is_available() else "cpu")

class Dataset_1(Dataset):
    def __init__(self, transf=None, dev=None):
        self.X = trc.arange(-3.0, 3.0, 0.1, device=dev).view(-1,1)
        self.Y = (-3 * self.X) + (0.1 * trc.randn(self.X.size(), device=dev)) - 2
        self.transf = transf
    
    def __getitem__(self, indice):
        dat = [self.X[indice], self.Y[indice]]
        if self.transf:
            dat = self.transf(dat)
        return dat

class reg_lin(trc.nn.Module):
    def __init__(self, num_ent, num_sal):
        super(reg_lin, self).__init__()
        self.lineal = trc.nn.Linear(num_ent, num_sal)

    def forward(self,x):
        val = self.lineal(x)
        return val

datos1 = Dataset_1(dev=gpu)

modelo1 = reg_lin(1, 1)
modelo1 = modelo1.to(gpu)

def criterio(yhat, y):
    return trc.mean( (yhat - y) ** 2 )

optim = trc.optim.SGD(modelo1.parameters(), 0.01)

def entrenar_modelo(iter):
    for epoch in range(iter):
        Yhat = modelo1(datos1[:][0])
        loss = criterio(Yhat, datos1[:][1])
        loss.backward()
        optim.step()
        optim.zero_grad()

t0 = tm.time()

entrenar_modelo(5)

t1 = tm.time()

print(modelo1.state_dict())

print(t1-t0)

What I found out is that running this code on the gpu is several times slower than running it on the cpu (by removing “dev=gpu” on datos1 and commenting out the “modelo1.to(gpu)” line).

Is that supposed to happen? Is there anything wrong with the code?

-EDIT-
A bit more of information since I’m learning new stuff. Using the DataLoader reduces significantly the difference between the times of the GPU and CPU. Before de GPU took more or less twice the time, with the DataLoader is around 50% more time.