Loading image frames with corresponding heartrate data in csv file (Python Pytorch Model Implementation)

Hi there,

I have images in folder and having corresponding hr_data labels in csv file. I want to use dataloader to load images with corresponding target labels. Do I need to implement custom dataset class for it? I am new to pytorch. Please help me in this regard. I would really appreciate it. Can anyone share the method to do so?

Thank you.

I would recommend to write a custom Dataset as described here.

The general workflow is:

  • load the data paths or the data directly (if it fits into memory and/or is small) in the __init__ method
  • return the length of the dataset in __len__
  • load, process, and return each data-target pair in __getitem__ using the passed index.

In your case you would most likely load the csv data in the __init__ method and index it in the __getitem__ using the index.
Also, the lazy image loading and processing would take place in the __getitem__ method.

I tried the way you told me but I am not getting the correct results.

Note: I want my image tensor to be of shape [3, T, 128,128] where T is no of images and 3 is channels and 128, 128 are width and height

My code:

transform = transforms.Compose(

[transforms.ToTensor(),

 transforms.Resize(128)

])

from torch.utils.data import Dataset, DataLoader

from torch import from_numpy, tensor

import numpy as np

class UBFCDataset(Dataset):

def __init__(self):

    xy = np.loadtxt('/content/drive/My Drive/Subject3hr.csv')

    self.x_data = torch.from_numpy(xy[:])

    self.len = xy.shape[0]

def __len__(self):

    return self.len

def __getitem__(self, index):

    train_data_tensor = torchvision.datasets.ImageFolder("/content/drive/My Drive/Meta-rPPG-master/da", transform=transforms)

    return train_data_tensor, self.x_data

dataset = UBFCDataset()

train_loader = DataLoader(dataset=dataset,batch_size=32,shuffle=True)

for data in train_loader:

    inputs, labels = data

    inputs, labels = tensor(inputs), tensor(labels)

Error:

TypeError: default_collate: batch must contain tensors, numpy arrays, numbers, dicts or lists; found <class ‘torchvision.datasets.folder.ImageFolder’>

The ImageFolder creation should be in the __init__ and in __getitem__ you would index the ImageFolder.
Alternatively, you could also load the images manually using a list of image paths in __getitem__.

Thank you @ptrblck

I am getting the proper tensors for my images tensor [batch,channel,width,height] and also getting my label tensor but When I am trying to feed that into my model I am getting the following error.

The ImageFolder will return a data and target sample, which will be stored in frame, while labels contain your “real” target tensor.
I guess you want to remove the target tensor returned by the ImageFolder dataset, so use:

frame = frame[0]

to only keep the returned data tensor and pass it to the model.

PS: it’s better to post code snippets by wrapping them into three backticks ```, which makes debugging easier. :wink:

Thank you so much @ptrblck. Your solution worked. Now my model is not accepting my inputs tensor shape. It is throwing following error.

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-31-84d3b7e54446> in <module>()
     44         # here: 178 samples, batch_size = 4, n_iters=178/4=44.5 -> 45 iterations
     45         model = PhysNet_padding_Encoder_Decoder_MAX(frames=128)
---> 46         rPPG, x_visual, x_visual3232, x_visual1616 = model(inputs)

1 frames
<ipython-input-23-fe7087e31db6> in forward(self, x)
     85     def forward(self, x):               # x [3, T, 128,128]
     86         x_visual = x
---> 87         [batch,channel,length,width,height] = x.shape
     88 
     89         x = self.ConvBlock1(x)               # x [3, T, 128,128]

ValueError: not enough values to unpack (expected 5, got 4)

x has 4 dimensions, while you are trying to unpack the size into 5 values.
I guess the length dimension is not defined.

Yeah, but I need to get the length dimension which means total length of the inputs, How can I get the tensor shape right?

Here is the model in which I am trying to feed my inputs:


Only for research purpose, and commercial use is not allowed.
MIT License
Copyright (c) 2019 
'''



import math
import torch.nn as nn
from torch.nn.modules.utils import _triple
import torch
import pdb



class PhysNet_padding_Encoder_Decoder_MAX(nn.Module):
    def __init__(self, frames=128):  
        super(PhysNet_padding_Encoder_Decoder_MAX, self).__init__()
        
        self.ConvBlock1 = nn.Sequential(
            nn.Conv3d(3, 16, [1,5,5],stride=1, padding=[0,2,2]),
            nn.BatchNorm3d(16),
            nn.ReLU(inplace=True),
        )

        self.ConvBlock2 = nn.Sequential(
            nn.Conv3d(16, 32, [3, 3, 3], stride=1, padding=1),
            nn.BatchNorm3d(32),
            nn.ReLU(inplace=True),
        )
        self.ConvBlock3 = nn.Sequential(
            nn.Conv3d(32, 64, [3, 3, 3], stride=1, padding=1),
            nn.BatchNorm3d(64),
            nn.ReLU(inplace=True),
        )
        
        self.ConvBlock4 = nn.Sequential(
            nn.Conv3d(64, 64, [3, 3, 3], stride=1, padding=1),
            nn.BatchNorm3d(64),
            nn.ReLU(inplace=True),
        )
        self.ConvBlock5 = nn.Sequential(
            nn.Conv3d(64, 64, [3, 3, 3], stride=1, padding=1),
            nn.BatchNorm3d(64),
            nn.ReLU(inplace=True),
        )
        self.ConvBlock6 = nn.Sequential(
            nn.Conv3d(64, 64, [3, 3, 3], stride=1, padding=1),
            nn.BatchNorm3d(64),
            nn.ReLU(inplace=True),
        )
        self.ConvBlock7 = nn.Sequential(
            nn.Conv3d(64, 64, [3, 3, 3], stride=1, padding=1),
            nn.BatchNorm3d(64),
            nn.ReLU(inplace=True),
        )
        self.ConvBlock8 = nn.Sequential(
            nn.Conv3d(64, 64, [3, 3, 3], stride=1, padding=1),
            nn.BatchNorm3d(64),
            nn.ReLU(inplace=True),
        )
        self.ConvBlock9 = nn.Sequential(
            nn.Conv3d(64, 64, [3, 3, 3], stride=1, padding=1),
            nn.BatchNorm3d(64),
            nn.ReLU(inplace=True),
        )
        
        self.upsample = nn.Sequential(
            nn.ConvTranspose3d(in_channels=64, out_channels=64, kernel_size=[4,1,1], stride=[2,1,1], padding=[1,0,0]),   #[1, 128, 32]
            nn.BatchNorm3d(64),
            nn.ELU(),
        )
        self.upsample2 = nn.Sequential(
            nn.ConvTranspose3d(in_channels=64, out_channels=64, kernel_size=[4,1,1], stride=[2,1,1], padding=[1,0,0]),   #[1, 128, 32]
            nn.BatchNorm3d(64),
            nn.ELU(),
        )
 
        self.ConvBlock10 = nn.Conv3d(64, 1, [1,1,1],stride=1, padding=0)
        
        self.MaxpoolSpa = nn.MaxPool3d((1, 2, 2), stride=(1, 2, 2))
        self.MaxpoolSpaTem = nn.MaxPool3d((2, 2, 2), stride=2)
        
        
        #self.poolspa = nn.AdaptiveMaxPool3d((frames,1,1))    # pool only spatial space 
        self.poolspa = nn.AdaptiveAvgPool3d((frames,1,1))

        
    def forward(self, x):	    	# x [3, T, 128,128]
        x_visual = x
        [batch,channel,length,width,height] = x.shape
          
        x = self.ConvBlock1(x)		     # x [3, T, 128,128]
        x = self.MaxpoolSpa(x)       # x [16, T, 64,64]
        
        x = self.ConvBlock2(x)		    # x [32, T, 64,64]
        x_visual6464 = self.ConvBlock3(x)	    	# x [32, T, 64,64]
        x = self.MaxpoolSpaTem(x_visual6464)      # x [32, T/2, 32,32]    Temporal halve
        
        x = self.ConvBlock4(x)		    # x [64, T/2, 32,32]
        x_visual3232 = self.ConvBlock5(x)	    	# x [64, T/2, 32,32]
        x = self.MaxpoolSpaTem(x_visual3232)      # x [64, T/4, 16,16]
        

        x = self.ConvBlock6(x)		    # x [64, T/4, 16,16]
        x_visual1616 = self.ConvBlock7(x)	    	# x [64, T/4, 16,16]
        x = self.MaxpoolSpa(x_visual1616)      # x [64, T/4, 8,8]

        x = self.ConvBlock8(x)		    # x [64, T/4, 8, 8]
        x = self.ConvBlock9(x)		    # x [64, T/4, 8, 8]
        x = self.upsample(x)		    # x [64, T/2, 8, 8]
        x = self.upsample2(x)		    # x [64, T, 8, 8]
        
        
        x = self.poolspa(x)     # x [64, T, 1,1]    -->  groundtruth left and right - 7 
        x = self.ConvBlock10(x)    # x [1, T, 1,1]
        
        rPPG = x.view(-1,length)            
        

        return rPPG, x_visual, x_visual3232, x_visual1616

You could write a custom Dataset and concatenate multiple images to the desired shape.
ImageFolder will create samples using single image tensors and their corresponding target.

The more interesting part would be, how should the length be defined? Would you like to concatenate random images? If so, you could also increase the batch size and reshape the data such that the batch size will be split into a smaller batch size and the length dimension.
On the other hand, if the length dimension would have any specific “logic” such as sequential frames, I would recommend to stick to the custom Dataset approach and implement the loading logic in its __getitem__.

I currently have following tensor with dimensions[batch_size, channel, width, height] but I want to add one dimension to the above tensors at third index as [batch_size, channel, length, width, height] where length is the dimension that I want to add into my tensor. Length value will be defined as the total length of inputs Can you please tell me how should I add need dimension and set its value. Please help me with this regard.

You can add the dimension via tensor = tensor.unsqueeze(2), but this would only add this dimension with a size of 1.
I don’t know how this dimension should be filled, but if you would like to use multiple image tensors and stack them in this dimension, you could use:

data = [torch.randn(1, 3, 24, 24) for _ in range(10)]
x = torch.stack(data, dim=2)
print(x.shape)
> torch.Size([1, 3, 10, 24, 24])

Thank you I have got the following code working for my model. Can you help me in getting the accuracy for the model after every epoch. I am getting tensors as a result. How to compare and get the results. I am using 1792 input images corresponding to labels.

for i in range(EPOCHS):
  for batch_num in range(total_batch_num):
      inputs=np.reshape(x_train[batch_num,:,:,:,:], [1,CHANNELS,BATCH_SIZE,IMAGE_HEIGHT,IMAGE_WIDTH])
      Target = np.reshape(y_train[batch_num,:],[1,BATCH_SIZE])
      inputs = torch.from_numpy(inputs).float()
      Target = torch.from_numpy(Target).float()
      rPPG, x_visual, x_visual3232, x_visual1616 = model(inputs)
      rPPG = (rPPG-torch.mean(rPPG)) /torch.std(rPPG)	 	# normalize   
      Target = (Target-torch.mean(Target)) /torch.std(Target)    
      loss = criterion(rPPG,Target)     
      loss = loss.detach().numpy()
      print('Training: loss=' + str(loss))
      torch.save(model, 'save.model')
      print('A model has been saved')

where predicted value is rPPG and true value is Target

rPPG doesn’t seem to contain the predicted class indices and based on the operation on it and the target (mean subtraction) I guess you are working with a regression task?
If so, then you could calculate the loss (e.g. via nn.MSELoss or any other criterion suitable for a regression task) as I’m unsure how the accuracy would be defined in this case.

This is a regression task. I have already calculated the loss but was wondering if I can find out the accuracy of it. Can I define confusion matrix for it? If yes how can I do that?

I’m still unsure how you would calculate the accuracy and and how the confusion matrix should look like.
Since you are working with (infinite) floating point values, comparing the prediction and target directly won’t work without binning and also the confusion matrix would have an infinite shape.
Do you want to transform this regression task somehow into a classification?