Test Loader seems to be loading train data

Jordan_Howell · June 29, 2020, 11:14am

Hello,

I have a weird issue and I can’t seem to find a typo. I have a training data set of shape (3719,100) and a testing data set of (6635,100). When I transform all data to tensors, the custom dataset and loader for the test data seems to be loading train data according to the .size(0 outputs. Below is the code I’m working with:

class tensor_data(Dataset):
    def __init__(self, num, cats, labels, images):
        self.num = num_tensor
        self.cats = cats
        self.labels = labels
        self.images = images
    
    def __len__(self):
        return len(self.num)
    
    def __getitem__(self, idx):
        return self.num[idx], self.cats[idx], self.labels[idx], self.images[idx]

Here is how I derive the data to use the above:

###########################train tensors#####################################
num_tensor = torch.tensor(np.asarray(train_sample.loc[:, numerical_columns]), dtype = torch.float)
col_tensor = torch.tensor(np.asarray(train_sample.loc[:, non_loca_cat_columns]), dtype = torch.long)
label_tensor = torch.tensor(np.asarray(train_sample.loc[:, 'target']), dtype = torch.long)
image_tensor = []
for i, row in train_sample.iterrows():
    image_tensor.append(train_transform(Image.open(Path(row['location']))))
images = torch.stack(image_tensor)
print("train: ", num_tensor.size(), col_tensor.size(), label_tensor.size(), images.size())
output: train:  torch.Size([3719, 6]) torch.Size([3719, 8]) torch.Size([3719]) torch.Size([3719, 3, 224, 224])

train_data = tensor_data(num_tensor, col_tensor, label_tensor, images)
print('train_data:', len(train_data))
output: train_data: 3719


train_loader = DataLoader(train_data, batch_size = 10, shuffle = True)
print('train_loader: ', len(train_loader))
output: train_loader:  372

So all of the above makes sense. This is what happens when I do the same with the test data.

test_num_tensor = torch.tensor(np.asarray(test.loc[:, numerical_columns]), dtype = torch.float)
test_col_tensor = torch.tensor(np.asarray(test.loc[:, non_loca_cat_columns]), dtype = torch.long)
test_label_tensor = torch.tensor(np.asarray(test.loc[:, 'target']), dtype = torch.long)
test_image_tensor = []
for i, row in test.iterrows():
    test_image_tensor.append(test_transform(Image.open(Path(row['location']))))
test_images = torch.stack(test_image_tensor)
print("test: ", test_num_tensor.size(),  test_col_tensor.size(),  test_label_tensor.size(),  test_images.size())
output: test:  torch.Size([6635, 6]) torch.Size([6635, 8]) torch.Size([6635]) torch.Size([6635, 3, 224, 224])

test_data = tensor_data(test_num_tensor, test_col_tensor, test_label_tensor, test_images)
print('test_data:', len(test_data))
 output: test_data: 3719

test_loader = DataLoader(test_data, batch_size = 10, shuffle = False)
print('test_loader: ', len(test_loader)) 
output: test_loader:  372

The above shows the test tensors the correct size of (6635, x) but when bringing them in through the custom dataset then loader, I get the lengths of the train loader. Am I missing something simple?

harsha_g · June 29, 2020, 11:32am

That’s the problem. Should be self.num = num

Jordan_Howell · June 29, 2020, 11:39am

Wow. Yep. That was it. Thank you!