I have a use case where I need to run forward passes on two different tensors before calling a backwards pass. I could potentially concatenate the two tensors and run a single forwards pass but that would lead to information from both tensors mixing in the batchnorm layers — I would ideally like to avoid this behavior.
Would nn.DataParallel function correctly if I call it twice in a row? I will admit that my knowledge of both DP and DDP is very limited at the moment.