Requires grad and calculation speed

kkudus · September 26, 2019, 7:19pm

Hi All,

Whether I have requires_grad_ set to True or False for my tensor does not seem to affect the time taken for operations on that tensor to be completed. This is confusing to me. I would’ve expected operations to take longer when requires_grad_ is True, as the extra step of creating the graph must be taken. Any help on understanding why I’m seeing these results would be much appreciated.

Thanks in advance

albanD · September 26, 2019, 7:50pm

This is actually a good news as many hours were spent optimizing the autograd engine so that it’s overhead is negligible in terms of runtime

kkudus · September 27, 2019, 3:43pm

Very interesting! So creating the graph is a cheap operation? It’s not until you start using the graph (backprop, etc) that the computational cost becomes significant?

albanD · September 27, 2019, 3:47pm

Yes.
The overhead of creating the graph is simply creating one cpp object (the Node), wrapping some Tensors (the one saved for the backward, creating the links to the previous Nodes and linking the output to the newly created Node.

Backprop need to traverse this whole graph to know which operations to perform (fairly expensive) and then computing the gradients themselves is roughly as expensive as the forward pass.

kkudus · September 27, 2019, 6:29pm

Awesome, thanks so much for the quick and informative response!