Hi,
I’m trying to give a model as input to Pytorch-DDP for distributed training. Here my script throws an error of invalid scalar type. when I look into the model(GPT-NEO 1.3B) buffers it is boolean and loop through the buffer and modify the data type to int8. Now DDP is working fine.
problem:
After training, I’m not getting any response from the model. The model is not generating any inference/prediction here. I don’t know whether what am I doing is correct or wrong.
Kindly clarify my understanding with your reply, Thanks.
Note: For synchronisation of weights among the nodes DDP need a datatype of float32 or int. Boolean is not accepted.