Time profile the code and pytorch behave oddly

Originally, I want to optimize my code, figure out which operation is time consuming, to make my code run more fast.

This is my original code. It’s just some feature extracted to be inferred by model, and then compute some distances of the embeddings. It works with GPU.

I add time called almost each line to show the time consuming. kind of crazy…

There is a 9 times loop in my code. It shows that this line(blue box)of the 1st loop ost most of the time.

The time is show as follow(red box).

I change above blue box line as the blue box line as follow. That is

positive_embedding = anchor_positive_embedding[enroll_idx].mean(dim=0)


positive_embedding = anchor_positive_embedding[2]

Now the code like this, just the blue box code changed.

However, the most time consuming operation changes.

That is:


While this line doesn’t consume too much time before the blue box line changing.

It’s obviously odd. I use line_profiler to obtain time statistic. Almost the same.

Moreover, when I uncomment the


in my code, this line becomes the most time consuming, only in the 1st loop. So strange.

More infos about my environment:
torch 1.4.0
cuda 10.1
docker environment

If need more infos, I will provide.

Is it a bug? Or which operation is the most time consuming in my code, in fact?

CUDA operations are executed asynchronously, so you would need to synchronize the code before starting and stopping the code manually via torch.cuda.synchronize() or use a built-in utility such as torch.utils.benchmark from the nightly/master, which will add it for you (as well as warmup iterations).

1 Like