Multiclass Classification in PyTorch

QuantScientist · November 18, 2017, 9:50am

Thanks @smth
I tried what you suggested. Inside class GenericImageDataset(Dataset):, I read the column tmp_df[1] from the CSV file which represents the multi-class label, and then I tried both using one-hot encoding and a self.mlb = MultiLabelBinarizer() however in both cases, training does not seem to work.

When using the MultiLabelBinarizer(), torch complains that:
ValueError: Target and input must have the same number of elements. target nelement (160) != input nelement (16)

Unless I do this:
self.y_train=self.y_train.reshape((self.y_train.shape[0]*10,1)) # Must be reshaped for PyTorch!

Why is this happenning? In any case, training does not seem to converge even after fixing this issue.

When I use one-hot encoding, I dont even get to the training phase, as torch comlains that it can not read the key “down” which is one of the lables.

I uploaded the full rendered notebook here:
https://github.com/QuantScientist/Deep-Learning-Boot-Camp/blob/master/Kaggle-PyTorch/tf/PyTorch%20Speech%20Recognition%20Challenge%20Starter.ipynb

To make this clear: what should be the return value of self.y_train?
With multi label binarizer get:

INFO:__main__:y_train [[ 1.  0.  0. ...,  0.  0.  0.]
 [ 1.  0.  0. ...,  0.  0.  0.]
 [ 1.  0.  0. ...,  0.  0.  0.]
 ..., 
 [ 0.  0.  0. ...,  0.  0.  1.]
 [ 0.  0.  0. ...,  0.  0.  1.]
 [ 0.  0.  0. ...,  0.  0.  1.]]

Any help would be appreciated,