Writing a cpp extension

justusschock · July 28, 2018, 4:37pm

Hi,

As mentioned in this tutorial I currently have to provide a manual backward function in a cpp extension.

From my understanding, this backward would be the same as it would be evaluated by autograd in python (just implemented in C++) or am I mistaken here?

Is there any possibility to show the functions which are actually performed during autograd’s backward (to be sure I am not missing any part of the gradients calculation)?

tom · July 29, 2018, 7:04pm

The usual way to check on gradients is torch.autograd.gradcheck.
For autograd-traced calculations, you can traverse the graph by following .grad_fn and their .next_functions.

Best regards

Thomas

justusschock · July 29, 2018, 8:22pm

Are the functions in .grad_fn the ones which have been used during the forward pass or which have to be called to calculate the gradients? I.e. are they part of the forward or the backward path?

SimonW · July 29, 2018, 8:31pm

they are part of the backward path

justusschock · July 29, 2018, 8:32pm

Means if I call these functions with the right arguments I’ve got the ensembled backward path?

tom · July 30, 2018, 5:05pm

As far as I know (I think it came up a couple of weeks ago), you cannot call them and you cannot actually access the parameters (except for Python-defined torch.autograd.Function subclasses).

Best regards

Thomas

justusschock · July 30, 2018, 5:07pm

Hi Thomas,

Thanks a lot. So I have to manually overthink the derivations of my forward pass for a cpp-extension?

Best regards
Justus

tom · July 31, 2018, 9:44am

If you only use basic ops (as opposed to custom kernels), you could see whether you can use automatic differentiation (edit: I tried, works out of the box).
In that case, however, it might not be necessary to use C++ extension at all unless you do really funny stuff, the Python speed has been rather decent for me. (I tried that with torch.nn.functional.bilinear once.)

Now, all that is theorizing, I don’t know what is best for your use case.

Best regards

Thomas

justusschock · July 31, 2018, 10:03am

I was just curious, because in the tutorial I linked above they mention a speedup from a python implementation to the same implementation in C++.

I thought autograd would not (yet) be working in C++?

tom · July 31, 2018, 12:02pm

I just looked at the tutorial again, I think part, possibly a large part, of the speed-up might well be from the forward not creating a graph, so a more fair C++ vs. Python comparison would be to use with torch.no_grad() in Python or implement forward and backward as an autograd function in Python.

justusschock · July 31, 2018, 12:05pm

Thanks a lot. So to sum this up: In your opinion it should be just fine to stay with python as long as I don’t want to create some fancy kernels (which I actually don’t) since the speedup would be minimal in a fair comparison?

tom · July 31, 2018, 12:19pm

Don’t take my word for it though. That might be a part. So if a, say, 10% speedup is worth writing it in C++, do try. But I think you don’t get 30% just from moving to C++, but a larger part of that is the custom backward (which you could do similarly in Python).