Hi, all
I found that the speed of function tensor.topk() is very slow when evaluating a model for semantic segmentation. Below is the code snippet. output is 2x5x297x817 and it takes about 3.6s to calculate top-1 on GPU. Is there any optimization for the code?
output_var = model(input_var)
output = output_var.data
torch.cuda.synchronize()
start = time.time()
_, pred = output.topk(1, 1, True, True)
torch.cuda.synchronize()
print("Top one Time :{}".format(time.time() - start))