How requires_grad_(True) effect the training result

DanielC · December 8, 2022, 11:29pm

Hi there,
I am training a network whose loss function is given by:

def loss_func(vec, yhat):
    # yhat is the output of network
    # vec is a vector
    loss = yhat.T @ vec
    return loss

Where the vec is given by following method:

def mv(w, yhat):
    # w is a external matrix
    # yhat is the output of the network
    return w @ yhat

and the training process is:

for i in range(epochs):
    yhat = net(input_data)
    vec = mv(w, yhat)
    loss = loss_func(w, vec)
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

When I train the network like this way, it will return my very good result. However, when I change the third line in above code block to vec = mv(w, yhat.data), the training is very hard to converge, and final result is always worse. Can anyone explain this?

Appreciate any reply.

ptrblck · December 9, 2022, 12:08am

Could you explain why you are using the deprecated .data attribute?
Using it would skip any Autograd checks (and probably recordings) and your training might break easily.

DanielC · December 9, 2022, 12:35am

Hi ptrblck,
Thank you very much for your reply, the reason I use .data is that I’d like to use cupy for calculating the vec I need. Converting torch.Tensor to cupy.array ask me to do that.
In the function I given before:

def mv(w, yhat):
    # w is a external matrix
    # yhat is the output of the network
    return w @ yhat

The w matrix is a huge and sparse matrix. using PyTorch function torch.sparse.mm() calculating it is too slow (This is also a thing I cannot understand).

ptrblck · December 9, 2022, 12:48am

If you want to use a 3rd party library (such as cupy) you would need to write a custom autograd.Function and implement the forward as well as backward pass manually.
This tutorial would show you a small example how to do so.

DanielC · December 9, 2022, 12:53am

Thanks ptrblck,
Could you please explain a bit about why PyTorch calculate spmv so slow? I guess both torch and cupy call some function in cuSPARSE to calculate spmv, why is there such a big difference between their performance?

emcastillo · December 9, 2022, 3:09am

We have an exmaple of how to interface CuPy/PyTorch in the documentation
https://docs.cupy.dev/en/stable/user_guide/interoperability.html#using-custom-kernels-in-pytorch

Speed of routines may be heavily affected by your device, which GPU are you using?

ptrblck · December 9, 2022, 5:11am

I would need to see your exact use case as well as setup to see which calls are actually used and how you are profiling your code.