General question on training model and weight decay

HI. I’m trying weight-decay for the first time. Is 5e-4 a good value? (with the Adam optimizer) Does anyone have good links that I might use as a resource? I have a special problem. My model works with the babi data set. My code is here: . My model trains to a 95 - 99 % accuracy but when I validate it on data that it has not seen in training it performs to the 50% level. Watching it train, the first 50% is fine and if I validate at that stage the accuracy at each point matches the training. IOW I think it is very strange that my validation keeps pace with my training and then suddenly stops at the 50% mark.

I am using dropout in most of my pytorch components at about 0.5. I have just started using weight decay. I read somewhere that 5e-4 is a good value. Is this true? I still have this underlying problem with the validation not keeping up with the training after 50%.

I’m trying anything to fix this. I have an open stack overflow question here if anyone is interested.