[resolved] Broadcasting Variables Across GPUs and Autograd

I am Broadcasting the Variables to different GPUs using the function https://github.com/pytorch/pytorch/blob/master/torch/nn/parallel/_functions.py#L6. Will the gradient automatically aggregated during the backward? I am getting errors when doing backward, not sure if it is because of broadcasting incorrectly handled. Thanks in advance!

It works

I make sure all inputs into forward function are cuda type. However, it still has this error …