Logit clipping in pointer network

Hi,
I’ve implemented a pointer network of NEURAL COMBINATORIAL OPTIMIZATION WITH REINFORCEMENT LEARNING, it’s almost all okay, but logit clipping section if I implement that feature the output gets saturated and trimmed by the hyperbolic function. So all the outputs gets the same value. Any suggest?.
If I deactivate this function all ok, but the output is extremely overconfident.

An example would be a tensor with the values:
[15,13,12.1,12,40]
after apply logit clipping it’s saturated:
[10,10,10,10,10]

# logits clipping
# self.C is a constant equal to 10 in the  paper
vector_pointer = self.C*torch.tanh(vector_pointer)

I think it is the expected behavior. The purpose of introducing const * tanh(u) is to increase the exploration, as mentioned in the paper. By applying a softmax on these numbers, you get the same A for all of them. Which can motivate the algorithm to explore more.

Oh, I think you’re right, it’s only used in training and deactivate in test mode. But it was something weird when I saw it.