DataParallel in pytorch blitz tutorial

I was reading about data parallel in here. Its mentioned that we can use it by just calling model = nn.DataParallel(model). Then there is a link to another tutorial at the bottom of the page. There it is mentioned to use DataParallel like this self.block2 = nn.DataParallel(self.block2).

My question is what is the difference between them? does the first method do what second method does for every block automatically? if yes Which one is best practice if I want that for every block? Also in second method what happens to the blocks which are not inside DataParallel?

DataParallel can be applied to any nn.Module. For the first example, you are correct in that every nn.Module (i.e. the blocks) will be data parallel as well, since the entire module is.

The second example mostly seeks to show that you can have a regular nn.Module, but have a data parallel module contained within it, and everything should work as expected, and only the data parallel module will be parallelized across the batch dimension.

1 Like