How requires_grad_(True) effect the training result

Hi there,
I am training a network whose loss function is given by:

def loss_func(vec, yhat):
    # yhat is the output of network
    # vec is a vector
    loss = yhat.T @ vec
    return loss

Where the vec is given by following method:

def mv(w, yhat):
    # w is a external matrix
    # yhat is the output of the network
    return w @ yhat

and the training process is:

for i in range(epochs):
    yhat = net(input_data)
    vec = mv(w, yhat)
    loss = loss_func(w, vec)
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

When I train the network like this way, it will return my very good result. However, when I change the third line in above code block to vec = mv(w, yhat.data), the training is very hard to converge, and final result is always worse. Can anyone explain this?

Appreciate any reply.

Could you explain why you are using the deprecated .data attribute?
Using it would skip any Autograd checks (and probably recordings) and your training might break easily.

1 Like

Hi ptrblck,
Thank you very much for your reply, the reason I use .data is that Iā€™d like to use cupy for calculating the vec I need. Converting torch.Tensor to cupy.array ask me to do that.
In the function I given before:

def mv(w, yhat):
    # w is a external matrix
    # yhat is the output of the network
    return w @ yhat

The w matrix is a huge and sparse matrix. using PyTorch function torch.sparse.mm() calculating it is too slow (This is also a thing I cannot understand).

If you want to use a 3rd party library (such as cupy) you would need to write a custom autograd.Function and implement the forward as well as backward pass manually.
This tutorial would show you a small example how to do so.

1 Like

Thanks ptrblck,
Could you please explain a bit about why PyTorch calculate spmv so slow? I guess both torch and cupy call some function in cuSPARSE to calculate spmv, why is there such a big difference between their performance?

We have an exmaple of how to interface CuPy/PyTorch in the documentation
https://docs.cupy.dev/en/stable/user_guide/interoperability.html#using-custom-kernels-in-pytorch

Speed of routines may be heavily affected by your device, which GPU are you using?

I would need to see your exact use case as well as setup to see which calls are actually used and how you are profiling your code.