Using your own dataset on PyTorch

Ismail_Elezi · February 16, 2017, 2:27pm

I have a dataset in numpy format (actually, I just modified the CIFAR-10/MNIST datasets given on PyTorch). The dimensions of it consist with the dimensions a normal CNN expects. For example, if I write:

print(our_dataset.shape, our_labels.shape)

I get:

(10000, 3, 32, 32) (10000,)

which is fine. Now I cast the data info torch format using:

train_data = torch.from_numpy(our_dataset)
our_labels = torch.from_numpy(our_labels)

encapsulate it into a TensorDataset:

train = torch.utils.data.TensorDataset(train_data, our_labels)

and finally into a DataLoader:

trainloader = torch.utils.data.DataLoader(train, batch_size=128, shuffle=True)

All fine here. Now I build the neural network, and then when I do the training, I get the error:

TypeError: DoubleSpatialConvolutionMM_updateOutput received an invalid combination of arguments - got (int, torch.DoubleTensor, torch.DoubleTensor, torch.FloatTensor, torch.FloatTensor, torch.DoubleTensor, torch.DoubleTensor, long, long, int, int, int, int), but expected (int state, torch.DoubleTensor input, torch.DoubleTensor output, torch.DoubleTensor weight, [torch.DoubleTensor bias or None], torch.DoubleTensor finput, torch.DoubleTensor fgradInput, int kW, int kH, int dW, int dH, int padW, int padH)

If I am working on cuda, then the error changes to:

_cudnn_convolution_full_forward received an invalid combination of arguments - got (torch.cuda.DoubleTensor, torch.cuda.FloatTensor, torch.cuda.FloatTensor, torch.cuda.DoubleTensor, tuple, tuple, int, bool), but expected (torch.cuda.RealTensor input, torch.cuda.RealTensor weight, torch.cuda.RealTensor bias, torch.cuda.RealTensor output, std::vector<int> pad, std::vector<int> stride, int groups, bool benchmark)

Anyone has seen these errors before? In addition, is the right way of using a new dataset by first casting it to torch format, then building a TensorDataset and finally a DataLoader?

sarthak1996 · February 16, 2017, 2:32pm

Can you paste the output for our_dataset.dtype and our_labels.dtype
And also did you send the data and labels to the GPU while training?
For example:
inputs, labels = data
inputs, labels = Variable(inputs).cuda(), Variable(labels).cuda()

our_dataset.dtype should be of type (np.float32)
our_labels.dtype should be of type ('int64')

edgarriba · February 16, 2017, 2:33pm

you need to cast your data to double in that case

Ismail_Elezi · February 16, 2017, 2:47pm

If I type:

print(our_dataset.dtype, our_labels.dtype)

I get:

float32 int64

I sent the data to cuda (and gave the Error in that case). However, if I don’t send the net (and data) to cuda, I get the other error that I posted in my original post.

@edgarriba

Well, the data is already in float.

edgarriba · February 16, 2017, 2:55pm

right, but the error says that expect a DoubleTensor

sarthak1996 · February 16, 2017, 3:06pm

Try np.double for our_dataset

Ismail_Elezi · February 16, 2017, 3:09pm

If I case the our_dataset to float32 and labels to int64, then the error changes to:

multi-target not supported at /data/users/soumith/miniconda2/conda-bld/pytorch-0.1.7_1485444530918/work/torch/lib/THNN/generic/ClassNLLCriterion.c:20

but the shape of the labels is (10000,).

Hmm, this is becoming weird.

sarthak1996 · February 16, 2017, 3:30pm

In the training function:
where you have written:
inputs, labels = data
inputs, labels = Variable(inputs), Variable(labels)
add this:
labels=labels[:,0]
The error is because the labels are of size batch_sizex1 you need to convert it to 'batch_size'

Ismail_Elezi · February 16, 2017, 3:35pm

It seems that this works.

Shine_George · January 23, 2021, 11:24am

the version of 0.3.0 can well solve the problem of the problem of ‘received an invalid combination of arguments’