Any way to have DataParallel and DistributedDataParallel automatically handle buffers?

I have a module that has values instantiated with register_buffer functions. Is there any way to have these values automatically handled by DataParallel and DistributedDataParallel? I built a custom DataParallel with a custom gather that will work for them, but am having some more significant issues implementing the same with DDP. Would be great to just set the buffers up in a way that it’s handled automatically.

As an extra level of complexity some of these values should be summed across the replicas, some should be averaged, and one takes the max value.

Thank you for this detailed response! Three questions about my initial DataParallel implementation before moving into Distributed.

1 - When I inspect the gather() function outputs appears to be a list of tensors. The buffers I’m looking to adjust are registered to modules. This is how I’m currently doing it, does this seem ok? I’m doing this over every buffer of which there are a handful, so if there is a quicker way to do this that would be great. Currently my 2GPU test is 10X slower than my 1GPU test.

Average = self.gather([self.replicas[x].average for x in self.device_ids].mean(0)
self.module.average = Average

2 - My understanding is that gather is just called once at the end of the DataParallel forward() function. Some of these buffers are actually modified during the backward function. Is there a way to have those adjusted automatically? Currently I’m calling a backwardGather() function after the backward function by hand every batch. Which is working, but not ideal.

3 - You mentioned that it handles parameters automatically. Is there a way to to just switch these from buffers to parameters and have them handled? Or it only handles parameters that are part of the autograd graph so there’s not any way to have them be adjusted on the side?