Great work on PyTorch, keep the momentum. I wanted to try my hands on it with the launch of the new MultiLabeling Amazon forest satellite images on Kaggle.
Note: new users can only post 2 links in a post so I can’t direct link everything
I created the following code as an example this weekend to load and train a model on Kaggle data and wanted to give you my feedback on PyTorch. I hope it helps you.
Loading data that is not from a regular dataset like MNIST or CIFAR is confusing and hard. I’m aware of the ImageFolder DataSet but that forces an unflexible hierarchy and just plain doesn’t work for multilabel or multiclass tasks.
First of all, it’s confusing because of the
DataLoaderdistinction which is non-obvious. I do think there is merit to keep those separated but documentation must be improved on the role of both. If possible I would rename
DataLocationto make it obvious that we have a pointer to a storage and an iterator to that storage.
Secondly, it’s hard because non of the examples show a real world dataset: a csv with a list of image paths and corresponding labels.
There is no validation split facilities. An example with how to use
SubsetRandomSamplerto create something similar to
train_test_splitwould be great. (See https://github.com/pytorch/pytorch/issues/1106). It should accept a percentage and a random seed at least or a
Sklearnfold object (
StratifiedKFold) at best. This is critical for use in Kaggle and other ML competitions. (I will implement a naive one for the competition)
There is no documentation about
Data Augmentation. The following post mentions it. However as far as I understood the documentation if you have a 40000 images training dataset, even if you use
PIL transformsyou still get 40000 training samples.
Data Augmentationwould be to get +40 000 training samples per transformation done.
As I side-note, I believe data augmentation should be done at the
DataLoaderlevel as mentionned in Discussion about datasets and dataloaders.
Computing the shape after a view is non-trivial i.e. the 2304 in the following code for a 32x32x3 image
class Net(nn.Module): def __init__(self): super(Net, self).__init__() self.conv1 = nn.Conv2d(3, 32, kernel_size=3) self.conv2 = nn.Conv2d(32, 64, kernel_size=3) self.conv2_drop = nn.Dropout2d() self.fc1 = nn.Linear(2304, 256) self.fc2 = nn.Linear(256, 17) def forward(self, x): x = F.relu(F.max_pool2d(self.conv1(x), 2)) x = F.relu(F.max_pool2d(self.conv2_drop(self.conv2(x)), 2)) x = x.view(x.size(0), -1) # Flatten layer x = F.relu(self.fc1(x)) x = F.dropout(x, training=self.training) x = self.fc2(x) return F.sigmoid(x)
Points 5 or 6 would probably be best in a PyTorch wrapper but I will still mention them.
Early stopping would be nice to combat overfitting when loss doesn’t decrease for a certain number of epochs
Pretty printing of epochs, accuracy/loss