Tensorflow to PyTorch Questions

I have been using pytorch for the last year after using tensorflow for several years and theano before that. I very much enjoy the functional interface of tensorflow a lot. However, the interface had several bugs which gave me troubles so I switched to pytorch. For example, adding extra loss terms to the optimization objective via add_loss() didn’t always work because of (i am told) an incompatibility between tensorflow tensor and keras tensors. I also find many of google’s own code uses tensorflow 1.x and has no 2.x support.

After several months of a lot of development using pytorch, I have noticed some things in pytorch that I find a little bit bothersome and wonder if I am simply using the library wrong. Hoping you guys can help me out. Following are my questions…

  1. I have quite a bit of boilerplate between pytorch projects. For example, if I am training a Unet for segmentation or training a pre-trained densenet for classifier, there is a lot of code common between them. This has tripped me up a few times because if I find a bug in the common code, i have to change it everywhere. The majority of the common code I have is handled under the hood in tensorflow though. These are not naive things like training (i don’t use model.fit, i have my own training loop). My largest common areas of code are in initialization of conv filters and in handling convolutional padding. My signal processing background really likes the tensorflow padding=‘same’ and kernel_initialization=‘blah’ paradigm. It seems I need custom functions to do this in pytorch and I would imagine (unless I am way off and if so, would like some guidance) that others have a lot of this same code also.

  2. In pytorch, I find myself thinking about the network as the instantiation (and initialization) of a bunch of computations which i connect together with a forward() method. In tensorflow, I think about the math I want to do and make the function which seems a little simpler in my mind. This seems much easier to maintain and version control verses my pytorch method of making a bunch of self._conv1, self._conv2, … self._convN objects and then having to string them all together. Is there a similar way to create the “graph” in pytorch that I don’t know about?

  3. What is the preferred way to analyze the training procedure in lieu of no tensorboard. I do realize pytorch has a plugin to use tensorboard but newer versions of tensorboard and quite broke. For example, many of the data points are not displayed when you have more than 1,000 epochs. I am also not a huge fan of the exponential moving average and would prefer a simple moving average filter instead. It seems like torch folks could development something much better than tensorboard.

Overall, I find pytorch to be little bit faster for much easier to install in Windows and a lot less buggy than tensorflow. Keep up the great work!

  1. I think a lot of boilerplate code could be reduced by using a higher-level API on top of PyTorch such as Ignite, Catalyst, or PyTorch Lightning. These libraries provide a different level of abstraction and might still fit your use case. If I’m not mistaken, you have still the full control with your training loop etc.
    For the "same" padding implementation: since the padding values depend on the input shape, you could use a helper function, which calculates the padding and uses the functional API. @rwightman implemented an approach here for his MedianPool2d layer.

  2. Could you explain what the benefits in version control would be and what makes the most trouble for you using the dynamic approach in PyTorch?

  3. If you are seeing issues in TensorBoard, feel free to create an issue on GitHub, so that we can track and fix it. Personally, I use visdom, which might not have the full functionality of TensorBoard, but worked better for me. :wink:

1 Like