Hello,
I have a weird issue and I can’t seem to find a typo. I have a training data set of shape (3719,100)
and a testing data set of (6635,100)
. When I transform all data to tensors, the custom dataset and loader for the test data seems to be loading train data according to the .size(0
outputs. Below is the code I’m working with:
class tensor_data(Dataset):
def __init__(self, num, cats, labels, images):
self.num = num_tensor
self.cats = cats
self.labels = labels
self.images = images
def __len__(self):
return len(self.num)
def __getitem__(self, idx):
return self.num[idx], self.cats[idx], self.labels[idx], self.images[idx]
Here is how I derive the data to use the above:
###########################train tensors#####################################
num_tensor = torch.tensor(np.asarray(train_sample.loc[:, numerical_columns]), dtype = torch.float)
col_tensor = torch.tensor(np.asarray(train_sample.loc[:, non_loca_cat_columns]), dtype = torch.long)
label_tensor = torch.tensor(np.asarray(train_sample.loc[:, 'target']), dtype = torch.long)
image_tensor = []
for i, row in train_sample.iterrows():
image_tensor.append(train_transform(Image.open(Path(row['location']))))
images = torch.stack(image_tensor)
print("train: ", num_tensor.size(), col_tensor.size(), label_tensor.size(), images.size())
output: train: torch.Size([3719, 6]) torch.Size([3719, 8]) torch.Size([3719]) torch.Size([3719, 3, 224, 224])
train_data = tensor_data(num_tensor, col_tensor, label_tensor, images)
print('train_data:', len(train_data))
output: train_data: 3719
train_loader = DataLoader(train_data, batch_size = 10, shuffle = True)
print('train_loader: ', len(train_loader))
output: train_loader: 372
So all of the above makes sense. This is what happens when I do the same with the test data.
test_num_tensor = torch.tensor(np.asarray(test.loc[:, numerical_columns]), dtype = torch.float)
test_col_tensor = torch.tensor(np.asarray(test.loc[:, non_loca_cat_columns]), dtype = torch.long)
test_label_tensor = torch.tensor(np.asarray(test.loc[:, 'target']), dtype = torch.long)
test_image_tensor = []
for i, row in test.iterrows():
test_image_tensor.append(test_transform(Image.open(Path(row['location']))))
test_images = torch.stack(test_image_tensor)
print("test: ", test_num_tensor.size(), test_col_tensor.size(), test_label_tensor.size(), test_images.size())
output: test: torch.Size([6635, 6]) torch.Size([6635, 8]) torch.Size([6635]) torch.Size([6635, 3, 224, 224])
test_data = tensor_data(test_num_tensor, test_col_tensor, test_label_tensor, test_images)
print('test_data:', len(test_data))
output: test_data: 3719
test_loader = DataLoader(test_data, batch_size = 10, shuffle = False)
print('test_loader: ', len(test_loader))
output: test_loader: 372
The above shows the test tensors the correct size of (6635, x)
but when bringing them in through the custom dataset then loader, I get the lengths of the train loader. Am I missing something simple?