I have been using pytorch for the last year after using tensorflow for several years and theano before that. I very much enjoy the functional interface of tensorflow a lot. However, the interface had several bugs which gave me troubles so I switched to pytorch. For example, adding extra loss terms to the optimization objective via add_loss() didn’t always work because of (i am told) an incompatibility between tensorflow tensor and keras tensors. I also find many of google’s own code uses tensorflow 1.x and has no 2.x support.
After several months of a lot of development using pytorch, I have noticed some things in pytorch that I find a little bit bothersome and wonder if I am simply using the library wrong. Hoping you guys can help me out. Following are my questions…
-
I have quite a bit of boilerplate between pytorch projects. For example, if I am training a Unet for segmentation or training a pre-trained densenet for classifier, there is a lot of code common between them. This has tripped me up a few times because if I find a bug in the common code, i have to change it everywhere. The majority of the common code I have is handled under the hood in tensorflow though. These are not naive things like training (i don’t use model.fit, i have my own training loop). My largest common areas of code are in initialization of conv filters and in handling convolutional padding. My signal processing background really likes the tensorflow padding=‘same’ and kernel_initialization=‘blah’ paradigm. It seems I need custom functions to do this in pytorch and I would imagine (unless I am way off and if so, would like some guidance) that others have a lot of this same code also.
-
In pytorch, I find myself thinking about the network as the instantiation (and initialization) of a bunch of computations which i connect together with a forward() method. In tensorflow, I think about the math I want to do and make the function which seems a little simpler in my mind. This seems much easier to maintain and version control verses my pytorch method of making a bunch of self._conv1, self._conv2, … self._convN objects and then having to string them all together. Is there a similar way to create the “graph” in pytorch that I don’t know about?
-
What is the preferred way to analyze the training procedure in lieu of no tensorboard. I do realize pytorch has a plugin to use tensorboard but newer versions of tensorboard and quite broke. For example, many of the data points are not displayed when you have more than 1,000 epochs. I am also not a huge fan of the exponential moving average and would prefer a simple moving average filter instead. It seems like torch folks could development something much better than tensorboard.
Overall, I find pytorch to be little bit faster for much easier to install in Windows and a lot less buggy than tensorflow. Keep up the great work!