Hi! I defined a network and put it into cuda.I entered a picture and got the result.I found that the time to take the results is much longer than the time spent by the model prediction.0.007s vs 0.139s!!
I searched for the relevant questions. I know that this is the time consumed by GPU synchronization, but I defined the same network with tensorflow. I can easily get the result. It seems that there is no time consumed by synchronization. I want to ask whether there is a mistake in my way of using pytorch or a mistake in pytorch strategy.
After all, 0.139 seconds of printing results are much more than the 0.007 seconds predicted by the model? Here’s my code.
def predict(self, data):
# load a image
img = cv2.imread(data, 0)
img = img.astype(np.float32, copy=False)
img = img.reshape((1, 1, img.shape, img.shape))
t1 = time.time()
density_map = self.net(img)
t2 = time.time()
t3 = time.time()
density_map = density_map.data.cpu().numpy()
t4 = time.time()
print(t2 - t1)
print(t4 - t3)
et_count = np.sum(density_map)
Thank you for your advice !