Gradient hooks with DataParallel and DDP

Hi,

Seems like this question had come up before. Seems like this limitation is intrinsic to DataParallel, e.g. see Yanli_Zhao answer here:

Also see warnings here