Hi there,
I am training a network whose loss function is given by:
def loss_func(vec, yhat):
# yhat is the output of network
# vec is a vector
loss = yhat.T @ vec
return loss
Where the vec is given by following method:
def mv(w, yhat):
# w is a external matrix
# yhat is the output of the network
return w @ yhat
and the training process is:
for i in range(epochs):
yhat = net(input_data)
vec = mv(w, yhat)
loss = loss_func(w, vec)
optimizer.zero_grad()
loss.backward()
optimizer.step()
When I train the network like this way, it will return my very good result. However, when I change the third line in above code block to vec = mv(w, yhat.data), the training is very hard to converge, and final result is always worse. Can anyone explain this?
Could you explain why you are using the deprecated .data attribute?
Using it would skip any Autograd checks (and probably recordings) and your training might break easily.
Hi ptrblck,
Thank you very much for your reply, the reason I use .data is that Iād like to use cupy for calculating the vec I need. Converting torch.Tensor to cupy.array ask me to do that.
In the function I given before:
def mv(w, yhat):
# w is a external matrix
# yhat is the output of the network
return w @ yhat
The w matrix is a huge and sparse matrix. using PyTorch function torch.sparse.mm() calculating it is too slow (This is also a thing I cannot understand).
If you want to use a 3rd party library (such as cupy) you would need to write a custom autograd.Function and implement the forward as well as backward pass manually. This tutorial would show you a small example how to do so.
Thanks ptrblck,
Could you please explain a bit about why PyTorch calculate spmv so slow? I guess both torch and cupy call some function in cuSPARSE to calculate spmv, why is there such a big difference between their performance?