So my question is simply how do I do that ? At first I thought to initialize the weights of the output layer to zero but it will prevent it from learning. I’m sure there is a simple method but I can’t find it.
It won’t, this only applies to hidden layers (that generate “artifical” features). For output layer, loss (eq.: distance to target) provides a non-zero gradient.
Hi, I was wondering if you had any luck figuring this out? I came across this topic because I was actually trying to implement that same algorithm, and I had the same question.
I tried your suggestion regarding the output layer weights, and didn’t get any improvement in performance. My advantage networks end up with a loss of around 30k mean squared error and I converge around 500mbb exploitability. I can’t figure out where I’m going wrong in replicating the neural net of the paper, but this is one line that I don’t understand. I’ve never seen such a technique discussed anywhere.