Hi everyone,
I try to implement an idea I found in a paper where an encoder is trained, and the loss function is defined as: MSE(y*, y) + #of non 0 values in one of the layers.
i.e. the loss function is composed of both the prediction MSE and a count of the non 0.0 values of the encoding layer.
My problem is that I can’t implement the non zero value counting. I have tried:
- adding (encoded_x != 0.0).sum() to the loss function. Where encoded_x is the values of the middle layer. Apparently this is not a tensor so it has not effect (?).
- approximation - adding encoded_x.sum() or even torch.pow(encoded_x, 1/8).sum(). Here I tried to penalize by the value itself. The result is that the encoding layer has really small values.
Any ideas how this might be implemented? I understand the derivative for the sum is not always defined, but there must be a way.
Thanks,
Anton.