Sometimes when I train certain models I see loss terms like this?
This happens not quite offen, but it does not restricted to a certain network and dataset, is there any insight why and when this would happen and why the loss will back to normal after that batch?