Using your own data for Pytorch

Max_Power · June 25, 2018, 4:15am

Say I have my img data in the format:
x_train, y_train, x_test, y_test

How can I turn that into something pytorch can use? I tried

train = TensorDataset(x_train, y_train) and using a dataloader, but that gives me a TypeError

ptrblck · June 25, 2018, 4:33am

How did you define x_train and y_train?
Using torch.tensors should work:

x_train = torch.randn(10, 3, 24, 24)
y_train = torch.empty(10, dtype=torch.long).random_(0, 10)

dataset = TensorDataset(x_train, y_train)
x, y = dataset[0]

Max_Power · June 25, 2018, 4:37am

the data is in a numpy.ndarray. Do I have to convert that into a torch tensor?

ptrblck · June 25, 2018, 4:39am

Yes! Just convert it with torch.from_numpy and it should work.

Max_Power · June 25, 2018, 4:40pm

If I have data in shape of [50000, 32, 32, 3], how do I convert that to [5000, 3, 32, 32] for torch tensors?

ptrblck · June 25, 2018, 4:41pm

This should work: tensor.permute(0, 3, 1, 2).

Max_Power · June 25, 2018, 6:07pm

For a multi-class classification problem, what should the structure of the labels be, and what loss function should I use? Currently the labels are [50000,1] with 10 classes (I’m thinking maybe it should be [50000,10]?), and I’m using nn.CrossEntropyLoss() and I get the error “multi-target not supported”

Thanks

ptrblck · June 25, 2018, 6:16pm

Your targets should be of the size [batch_size], so just squeeze your tensor and it should work.

target = target.squeeze(1)

CrossEntropyLoss is fine. Note that you have to provide the logits, i.e. your model shouldn’t have a non-linearity as the last layer.
You can find more information on the criterion in the docs.