Hello team,
Great work on PyTorch, keep the momentum. I wanted to try my hands on it with the launch of the new MultiLabeling Amazon forest satellite images on Kaggle.
Note: new users can only post 2 links in a post so I can’t direct link everything
I created the following code as an example this weekend to load and train a model on Kaggle data and wanted to give you my feedback on PyTorch. I hope it helps you.
-
Loading data that is not from a regular dataset like MNIST or CIFAR is confusing and hard. I’m aware of the ImageFolder DataSet but that forces an unflexible hierarchy and just plain doesn’t work for multilabel or multiclass tasks.
First of all, it’s confusing because of theDataSet
andDataLoader
distinction which is non-obvious. I do think there is merit to keep those separated but documentation must be improved on the role of both. If possible I would renameDataSet
toDataStorage
orDataLocation
to make it obvious that we have a pointer to a storage and an iterator to that storage.
Secondly, it’s hard because non of the examples show a real world dataset: a csv with a list of image paths and corresponding labels. -
There is no validation split facilities. An example with how to use
SubsetRandomSampler
to create something similar toScikitlearn
train_test_split
would be great. (See https://github.com/pytorch/pytorch/issues/1106). It should accept a percentage and a random seed at least or aSklearn
fold object (KFold
,StratifiedKFold
) at best. This is critical for use in Kaggle and other ML competitions. (I will implement a naive one for the competition) -
There is no documentation about
Data Augmentation
. The following post mentions it. However as far as I understood the documentation if you have a 40000 images training dataset, even if you usePIL transforms
you still get 40000 training samples.Data Augmentation
would be to get +40 000 training samples per transformation done.
As I side-note, I believe data augmentation should be done at theDataLoader
level as mentionned in Discussion about datasets and dataloaders. -
Computing the shape after a view is non-trivial i.e. the 2304 in the following code for a 32x32x3 image
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = nn.Conv2d(3, 32, kernel_size=3)
self.conv2 = nn.Conv2d(32, 64, kernel_size=3)
self.conv2_drop = nn.Dropout2d()
self.fc1 = nn.Linear(2304, 256)
self.fc2 = nn.Linear(256, 17)
def forward(self, x):
x = F.relu(F.max_pool2d(self.conv1(x), 2))
x = F.relu(F.max_pool2d(self.conv2_drop(self.conv2(x)), 2))
x = x.view(x.size(0), -1) # Flatten layer
x = F.relu(self.fc1(x))
x = F.dropout(x, training=self.training)
x = self.fc2(x)
return F.sigmoid(x)
Points 5 or 6 would probably be best in a PyTorch wrapper but I will still mention them.
-
Early stopping would be nice to combat overfitting when loss doesn’t decrease for a certain number of epochs
-
Pretty printing of epochs, accuracy/loss