Forward vs Backward vs Optimize

I am training the following network:

My dataset is 512x512x500 CT data.
My training was very slow and profiling gave the following output:

Are these time complexities expected for the forward, loss.backward and optimize steps?
Loss.backward() is the slowest step. I can understand forward calculation and loss.backward having similar time complexity, but what’s the reason for optimize step being ~30x faster? Are all the weight updates happening simultaneously, while loss.backward is sequential?


as answered in the other thread (please dont double-post):

optimizer.step() is 30x faster because it’s a very cheap / small operation (usually linear in number of parameters), compared to the neural network itself (which usually has matmul / convolution, which is quadratic or more in number of parameters)

Thanks a lot for your reply.

Really sorry for the second post. I had posted this before the reply on the previous one. Will keep this in mind in the future.