I’ve implemented a pointer network of NEURAL COMBINATORIAL OPTIMIZATION WITH REINFORCEMENT LEARNING, it’s almost all okay, but logit clipping section if I implement that feature the output gets saturated and trimmed by the hyperbolic function. So all the outputs gets the same value. Any suggest?.
If I deactivate this function all ok, but the output is extremely overconfident.
An example would be a tensor with the values:
after apply logit clipping it’s saturated:
# logits clipping # self.C is a constant equal to 10 in the paper vector_pointer = self.C*torch.tanh(vector_pointer)