DataParallel performance in C++

It’s well-established that DataParallel performs poorly in Python. Does anyone have anecdotal experience with DataParallel in C++? I assume that it can achieve performance on par with DistributedDataParallel since there’s no GIL contention but it would be really helpful to hear real experiences with it before invest in switching from Python to C++.

Take this information with a grain of salt, as I didn’t run DataParallel in C++, but I don’t think the worse performance of DataParallel compared to DDP is due to the GIL, but probably the communication overhead needed for the scatter/gather of the model parameters and data.

1 Like