Whether I have requires_grad_ set to True or False for my tensor does not seem to affect the time taken for operations on that tensor to be completed. This is confusing to me. I would’ve expected operations to take longer when requires_grad_ is True, as the extra step of creating the graph must be taken. Any help on understanding why I’m seeing these results would be much appreciated.

Very interesting! So creating the graph is a cheap operation? It’s not until you start using the graph (backprop, etc) that the computational cost becomes significant?

Yes.
The overhead of creating the graph is simply creating one cpp object (the Node), wrapping some Tensors (the one saved for the backward, creating the links to the previous Nodes and linking the output to the newly created Node.

Backprop need to traverse this whole graph to know which operations to perform (fairly expensive) and then computing the gradients themselves is roughly as expensive as the forward pass.