DDP (multigpu) on multivariate time series dataset

lorsonb · November 22, 2024, 1:44am

Hi everyone. I am doing some time series training on multiple GPUs using DDP, where my accuracy (mse) decreases with increasing number of GPUs. My guess is this has something to do with the way data is being split across the GPUs. Given the importance of order in time series datasets, i.e., time moving sequential from row 0 to row “len of dataset”, I am considering splitting the dataset via columns/features. Is this possible with Pytorch’s distributed sampler or DDP? Also, am I right with my assumption that the distributed sampler splits the data by rows? Thanks.

fegin · November 22, 2024, 7:16pm

DDP doesn’t control what data feed into the model and is only responsible for reducing the gradients. So I would say this issue is orthogonal to DDP.

lorsonb · November 22, 2024, 7:28pm

Thanks fegin. I just read in the torch.nn.parallel.distributed.py file that the module does not control how data is split across the GPUs. The user, i.e., me, will have to handle that. My guess is that it is possible. I’ll look further into it. If you have any ideas, you can share if you don’t mind.

So what I basically want to do is to let each GPU handle a set of features (all rows/timesteps for those features), instead of what is currently being done, where each GPU handles batches of all features up to a given timestep.