Hi! I defined a network and put it into cuda.I entered a picture and got the result.I found that the time to take the results is much longer than the time spent by the model prediction.0.007s vs 0.139s!!

I searched for the relevant questions. I know that this is the time consumed by GPU synchronization, but I defined the same network with tensorflow. I can easily get the result. It seems that there is no time consumed by synchronization. I want to ask whether there is a mistake in my way of using pytorch or a mistake in pytorch strategy.

After all, 0.139 seconds of printing results are much more than the 0.007 seconds predicted by the model? Here’s my code.

def load_model(self):

network.load_net(self.model_path, self.net)

if self.cuda:

self.net.cuda()

self.net.eval()

def predict(self, data):

# load a image

img = cv2.imread(data, 0)

img = img.astype(np.float32, copy=False)

img = img.reshape((1, 1, img.shape[0], img.shape[1]))

# inference

t1 = time.time()

density_map = self.net(img)

t2 = time.time()

# torch.cuda.synchronize()

t3 = time.time()

density_map = density_map.data.cpu().numpy()

t4 = time.time()

print(t2 - t1)

print(t4 - t3)

et_count = np.sum(density_map)

0.00782918930053711

0.13913774490356445

Thank you for your advice !