How to implement the Squared-Loss Penalty so that is runs fast? According to this thesis?
Multipass Deep Q-Networks (wits.ac.za)