Hi everyone,

I try to implement an idea I found in a paper where an encoder is trained, and the loss function is defined as: MSE(y*, y) + #of non 0 values in one of the layers.

i.e. the loss function is composed of both the prediction MSE and a count of the non 0.0 values of the encoding layer.

My problem is that I can’t implement the non zero value counting. I have tried:

- adding (encoded_x != 0.0).sum() to the loss function. Where encoded_x is the values of the middle layer. Apparently this is not a tensor so it has not effect (?).
- approximation - adding encoded_x.sum() or even torch.pow(encoded_x, 1/8).sum(). Here I tried to penalize by the value itself. The result is that the encoding layer has really small values.

Any ideas how this might be implemented? I understand the derivative for the sum is not always defined, but there must be a way.

Thanks,

Anton.