Find non zero elements in a tensor

@wasiahmad, what you are minimizing is just losses.sum(), and your gradient descent steps are multiplied by a weight that depends on the number of non-zeros element, different at each iteration. But nothing guarantees that it will minimize sum(x)/non-zeros(x) for all x, which is (I think) what you want to do.