Unable to create Tensor Dataset from Numpy data

I have a numpy dataset of 54160 images of dimensions 60x80x1 (HeightxWidthxChannels).

I wanted to create a DataLoader for the numpy dataset.

The code is-

train_data = np.load(BASE_DATA_PATH + ‘training_data-22-balanced.npy’)
train = train_data[:-500]
test = train_data[-500:]

X = np.array([i[0] for i in train]).reshape(-1, 1, input_height, input_width) /255
Y = [i[1] for i in train]

test_X = np.array([i[0] for i in test]).reshape(-1, 1, input_height, input_width) / 255
test_Y = [i[1] for i in test]

https://stackoverflow.com/questions/44429199/how-to-load-a-list-of-numpy-arrays-to-pytorch-dataset-loader

import torch.utils.data as utils
tensor_X = torch.stack([torch.Tensor(i) for i in X])
tensor_y = torch.stack([torch.Tensor(i) for i in Y])

tensor_valid_X = torch.stack([torch.Tensor(i) for i in test_X])
tensor_valid_y = torch.stack([torch.Tensor(i) for i in test_Y])

dataset = utils.TensorDataset(tensor_X,tensor_y.long()) # create your dataset
dataloader = utils.DataLoader(dataset) # create your dataloader

valid_dataset = utils.TensorDataset(tensor_valid_X,tensor_valid_y.long()) # create your validation datset
valid_dataloader = utils.DataLoader(valid_dataset) # create your validation dataloader

The error i get is-

File “”, line 70, in
dataset = utils.TensorDataset(tensor_X,tensor_y.long()) # create your dataset

File “C:\Users\myidi\Anaconda3\lib\site-packages\torch\utils\data\dataset.py”, line 36, in init
assert all(tensors[0].size(0) == tensor.size(0) for tensor in tensors)

AssertionError

It worked on a smaller dataset of 320 images but when i got a larger dataset, it is giving an error. How do I fix this?

Could you print the shapes of tensor_X and tensor_y before passing it to the TensorDataset?

tensor_X.shape
Out[8]: torch.Size([54160, 1, 60, 80])

tensor_y.shape
Out[9]: torch.Size([13540, 3])

Thanks for the info.
The batch dimensions are indeed different for your data and target as the error message suggests.
I think your reshape in this line of code seems to create more samples for the data:

X = np.array([i[0] for i in train]).reshape(-1, 1, input_height, input_width) /255

The difference is a factor of 4, so it might be your input_height or input_width are too small?

2 Likes