Why I don't train model with Distribute Data Parallel?

I guess this is very similar to question of Error when training AutoModelForQuesionAnswering with Distribute Data Parallel?

Please check my reply there.