Write dataset class to have 2 dataset output

Hi,

I have been following the data loading tutorial Here as it is a very similar problem to mine.
The difference being that instead of facial landmarks, I have human pose information, but in the same format. See image below for data structure.

I wish to load my data so that I have the images and landmarks are two separate entities that can go through the network at different times.
The reason is that I wish to feed in the pose information at the final stage, like in Concatenate dataset.

How can I separate this dataset into images and pose information?
Also, by using this .csv method, how would I extract the classes that I wish to classify into. Other classes replace ‘Clear_path’ in the image file location.

I am not restricted to using a .csv like in the example if a better method can be suggested.

Thanks in advance for any assistance
Cheers
spoonerj

Would you like to create a Dataset which yields the image, the corresponding class, and the post information as the target?
If so, how are the classes defined? Would you like to get the class based on the image path?

I’m not sure I understand the relationship to the other thread clearly.
Would you like to use separate model/layers for the class and post information?

Sorry, I should have tried to make it clearer.
The aim of the dataset is to classify images into a class. The classes come from the file name where ‘clear_path/my_image.jpg’, clear_path is the class. On other images, it may be ‘looking/my_image2.jpg’ and the class would be looking.

I don’t want to use the pose information in the other columns as targets, but as supplementary information to aid the classification.

So I want to split the dataset into images, and into pose information. The images would first pass through the network on their own until they get to the fc layers. Then at the first fc layer, I wish pass the network the pose information for that image. I am using imagenet models such as Alexnet and VGG

I have managed to load the images using the ImageFolder function previously, however I am not sure how I can then load the pose information, and ensure that the pose information and image associated with the pose would match when they go through the dataset.

The best way would probably be to write a custom Dataset and process the CSV file in it:

  • load the csv in __init__ using e.g. pandas
  • split the DataFrame into the image paths and pose information
  • store the pose information as a tensor
  • create a dict for a folder to class index mapping
  • store all corresponding targets for each image path
  • in __getitem__: load the image using the path and return the image tensor, class, and pose information

I have followed your advice and created a dataset class which gives me the dataset in the form I think I need. See below:

    class KeypointDataset(Dataset):
        def __init__(self, csv_file, root_dir, 
                     transform1=transform, transform2=transform):
            
            self.pose_frame = pd.read_csv(csv_file, header=None)
            self.images = self.pose_frame.iloc[:,0]
            self.classes = self.pose_frame.iloc[:,1]
            self.pose_kp = self.pose_frame.iloc[:,2:]
            self.root_dir = root_dir
            self.transform1 = transform1
            self.transform2 = transform2            
            
        def __len__(self):
            return len(self.pose_frame)
        
        def __getitem__(self, idx):
            #load image
            img_name = os.path.join(self.root_dir, self.images[idx])
            img_name = img_name.replace('\\','/')
            image = io.imread(img_name)
            print(image.shape)
            #apply transform on image
            img_as_tensor = self.transform1(image)
            
            #image class
            image_class = self.classes[idx]
            
            #pose keypoints
            keypoints = self.pose_kp.iloc[idx, :].as_matrix()
            keypoints = keypoints.astype('float').reshape(-1,2)
            #keypoints to tensor
            keypoints_as_tensor = self.transform2(keypoints)
            
            return (img_as_tensor, image_class), (keypoints_as_tensor, img_name) 

My question now, is how do I break up the inputs for images and keypoints in the training loop. In the example you showed in Concatenate dataset, I understand the concept of adding the second input at the fc layer with x1 and x2. But what I don’t know, is how to tell my training loop to load these as two inputs, and how to identify x1 and x2 as images and keypoints respectively.
My training loop currently looks like this:

def train_model(model, criterion, optimizer, num_epochs):
        
        best_model_wts = copy.deepcopy(model.state_dict())
        best_acc = 0.0
        
        since = time.time()
        
        history = []

        for epoch in range(num_epochs):
    
            running_loss = 0.0
            total_train = 0
            correct_train = 0
            
            #iterate over data
            for i, data in enumerate(train_loader, 0):
                inputs, labels = data
                inputs, labels = inputs.to(device), labels.to(device)
                    
                optimizer.zero_grad()
                
                outputs = model(inputs)
                _, predicted = torch.max(outputs.data, 1)
                loss = criterion(outputs, labels)
                loss.backward()
                optimizer.step()
            
                #accuracy            
                running_loss += loss.item()
                total_train += labels.nelement() #number of pixels in batch
                correct_train += (predicted == labels).sum().item()
                
            epoch_loss = running_loss / len(train_loader.dataset)
            epoch_acc = correct_train / total_train
            
            print(str('Epoch '+ str(epoch) +' Training Loss: {:.4f}  Training Accuracy {:.4f}'.format(
                    epoch_loss, epoch_acc)))
            
            history.append([epoch_loss, epoch_acc])
            
            if epoch_acc > best_acc:
                best_acc = epoch_acc
                best_model_wts = copy.deepcopy(model.state_dict())
        print()
        
        time_elapsed = time.time() - since
        print('Training complete in {:.0f}m {:.0f}s'.format(
                time_elapsed // 60, time_elapsed % 60))
        print('Best val Acc: {:4f}'.format(best_acc))
        
        #format history
        history = pd.DataFrame(
                history,
                columns=['train_loss','train_acc'])
        
        torch.save(model.state_dict(), save_file_name)
        #load best model weights
        model.load_state_dict(best_model_wts)
        return model, history

Many thanks for your help so far :grinning:

Your Dataset returns currently two tuples, so basically 4 values.
If you write the return statement as

return img_as_tensor, image_class, keypoints_as_tensor, img_name

you could simply get these values for each batch in your training loop:

for i, (img_data, target, kp_data, img_name) in enumerate(train_loader, 0):
    ...

and pass the tensors to the corresponding layers.

I have managed to start training an Alexnet model on the concatenated data, however it keeps throwing an error after its seen 320 images. The total dataset size is 6,444. I think it might be something to do with how I’ve manipulated the data to be able to concatenate it. I have flattened both inputs so they share the same dimensions, and have a batch size of 16. My model looks like this:

class MyModel(nn.Module):
        def __init__(self):
            super(MyModel, self).__init__()
            self.cnn = models.alexnet(pretrained=False)
            print(self.cnn)
            self.cnn.classifier[6] = nn.Linear(self.cnn.classifier[6].in_features, 20)
            
            self.fc1 = nn.Linear(20 + 34, 60)
            self.fc2 = nn.Linear(60, 9)
        
        def forward(self, image, pose):
            x1 = self.cnn(image)
            x1 = x1.view(batch_size,-1) #torch.Size([16,20])
            x2 = pose.view(batch_size,-1) #torch.Size([16,34])
            
            
            x = torch.cat((x1,x2), dim=1)
            x = F.relu(self.fc1(x))
            x = self.fc2(x)
            return x

but it returns this error after 20 batches in the first epoch:

RuntimeError: shape '[16, -1]' is invalid for input of size 136

Any idea on what could be causing this error?

I’m not sure how many samples your Dataset has, but probably the last batch is smaller than the specified batch_size.
I would recommend to use the batch dimension flexible and rather specify the feature dimensions:

x1 = x1.view(-1, 20)
x2 = pose.view(-1, 34)

This makes sure your code is also working with varying batch sizes.

This worked! Thank you very much for your help :slight_smile: