Hi,
I’ve implemented a pointer network of NEURAL COMBINATORIAL OPTIMIZATION WITH REINFORCEMENT LEARNING, it’s almost all okay, but logit clipping section if I implement that feature the output gets saturated and trimmed by the hyperbolic function. So all the outputs gets the same value. Any suggest?.
If I deactivate this function all ok, but the output is extremely overconfident.
An example would be a tensor with the values:
[15,13,12.1,12,40]
after apply logit clipping it’s saturated:
[10,10,10,10,10]
# logits clipping
# self.C is a constant equal to 10 in the paper
vector_pointer = self.C*torch.tanh(vector_pointer)