Issue with using DataParallel (includes minimal code)

One thing is that you will need a torch.cuda.synchronize() before calling time.time() to make sure all pending CUDA kernals in the stream are finished.

You can also use elapsed_time to measure. See discussion here.

If you are looking for the most performant solution, DistributedDataParallel should be the way to go. [example]

When I uncomment the line self.tools.SomeFunc(self.model, self.features) and set the flag to True , I receive the following error:

Looks like self.model is a DataParallel instance? If so, DataParallel does not have the first_term attribute. If this attribute is on the model instance you passed to DataParallel, you can access the original model instance through self.model.module (see DataParallel code here) which should have the first_term attribute.