U have to return a FloatTensor that is the result of some operation of ur loss function, so ur loss function won’t break the gradients graph. To be more specific, can u provide an example code of ur loss function?

As I said, I think in this code u r returning a Python float object, and that doesn’t have a .backward() method.
I’m unclear about what the tensor_mul, vec_minus are, but the operations u apply to the outputs should all the Pytorch operations, involving some numpy operations would probably break the graph(I’m not sure).
If u return a FloatTensor like u described. The tensor would be a “Leaf variable”(I’m not sure what exactly it should be called), and these kind of variable does not have gradients because u created them out of a float, which holds no gradients information.

Just to note: U can change torch.ones(k).cuda() - TEN to 1 - TEN. Also, ur loss function looks like cross entropy loss to me, and Pytorch has a implementation for that.

it is not cross entropy,
it is a loss for a multilabel problem, that i want to minimize when the model correct in one lable of all the labels of an object in my data