Clip gradient norm from chatbot tutorial

The author probably got a value by training a model, and seeing gradient values in tensorboard. Another option might have been by hyperparameter tuning. Also they might have took an arbitrary value seeing in other people’s code. I