Hi, after using model = DataParallel(model), when using clip_grad_norm_ function, how should I pass the parameters, model.parameters() or model.module.parameters() ?
I’m training a ViT model, when I set max epoch to 100 epochs, everything was fine, but when I set max epoch to 200, the loss became [nan] after 55 epochs, and I’m afraid it’s because of the gradient clipping step