Is GPU synchronization a speed overhead?

pwr617 · November 30, 2018, 5:34am

Hi! I defined a network and put it into cuda.I entered a picture and got the result.I found that the time to take the results is much longer than the time spent by the model prediction.0.007s vs 0.139s!!

I searched for the relevant questions. I know that this is the time consumed by GPU synchronization, but I defined the same network with tensorflow. I can easily get the result. It seems that there is no time consumed by synchronization. I want to ask whether there is a mistake in my way of using pytorch or a mistake in pytorch strategy.
After all, 0.139 seconds of printing results are much more than the 0.007 seconds predicted by the model? Here’s my code.
def load_model(self):
network.load_net(self.model_path, self.net)
if self.cuda:
self.net.cuda()
self.net.eval()

def predict(self, data):
# load a image
img = cv2.imread(data, 0)
img = img.astype(np.float32, copy=False)
img = img.reshape((1, 1, img.shape[0], img.shape[1]))

# inference
t1 = time.time()
density_map = self.net(img)
t2 = time.time()

# torch.cuda.synchronize()

t3 = time.time()
density_map = density_map.data.cpu().numpy()
t4 = time.time()
print(t2 - t1)
print(t4 - t3)

et_count = np.sum(density_map)

0.00782918930053711
0.13913774490356445
Thank you for your advice !

smth · November 30, 2018, 6:07am

your thinking is quite off.

GPU synchronization just waits for all of the queued work to finish.

Regardless of the framework, the time it takes should be about the same.