How should I fill null values for pictures

Jordan_Howell · April 30, 2020, 3:24pm

Hello,

I have a model that will take results from three different models based on the types of pictures and combine them for a final binary classification. The data is pulled based on the location of a photo pulled from a dataframe.

Here is a snapshot:

Is there a way that I can skip the NaN values or fill them with a tensor of 0s the size (224x224)?

If the above is possible would I do that in the model object or in the custom data pull object?

Is there a better way to do this?

I was thinking:

class image_Dataset(Dataset):
    '''
    image class data set   
    
    '''
    def __init__(self, data, transform = None):
        '''
        Args:
        ------------------------------------------------------------
            data = dataframe
            image = column in dataframe with absolute path to the image
            label = column in dataframe that is the target classification variable
            numerical_columns =  numerical columns from data
            categorical_columns = categorical columns from data
            policy = ID variable
            
        '''
        self.image_frame = data
        self.transform = transform
        
    def __len__(self):
        return len(self.image_frame)
    
    def __getitem__(self, idx):
        if torch.is_tensor(idx):
            idx = idx.tolist()
         
        label = self.image_frame.loc[idx, 'target']
        
        if self.image_frame[self.image_frame[idx, 'Roof'].isna() == True]:
            pic = np.ones(3,224,224)
        else:
            pic = Path(self.image_frame.loc[idx,'Roof'])
        img = Image.open(pic)
        if self.transform:
            image = self.transform(img)
        
        policy = self.image_frame.loc[idx, 'policy']
        
        numerical_data = self.image_frame.loc[idx, numerical_columns]
        
        numerical_data = torch.tensor(numerical_data, dtype = torch.float)
        
        for category in cat_columns:
            self.image_frame[category] = self.image_frame[category].astype('category')
            
            self.image_frame[category] = self.image_frame[category].astype('category').cat.codes.values
        
            
        categorical_data = self.image_frame.loc[idx, cat_columns]
        categorical_data = torch.tensor(categorical_data, dtype = torch.int64)
            
        return image, label, policy, categorical_data , numerical_data

But this still throws and error when there is a NaN value.

I also tried passing a pass statement when it encountered a NaN value. No luck.

ptrblck · May 1, 2020, 1:13am

I’m not sure I understand the use case completely.
Do you need to load all three columns and would like to return a constant fake image for the NaN values?
Would it work, if you only load the valid images, as the other two don’t seem to be very useful?

Jordan_Howell · May 1, 2020, 9:52am

Use case is this:

I have observations with different number and type of pictures. Each observation can have from 1 to 8 pictures across three different types (roof, roof hazard and front) of photos. I want to use all photos but it creates a dataframe with null values as seen above.

For your second question, I guess I could run three different models, one for each picture and concatenate either, the predictions or the feature sets they are reduced too. I’m not sure if i would take the max or average of the results for observations with more than one photo though dynamically. Any thoughts on that?

ptrblck · May 2, 2020, 3:47am

Are these picture types (roof, roof hazard and front) representing the target?
If so, then your approach to create separate models for these images won’t work, as you won’t know the target for new images (and would probably want to predict them).

I’m still unsure if I understand the use case.
You’ve created a pandas DataFrame and would now like to load all pictures at once (ignoring the NaN entries)?

Jordan_Howell · May 2, 2020, 12:11pm

The target is a binary 1 or 0 I’m trying to predict from the picture. There could be multiple pictures per target.

I was thinking of three different models then take a weighted average of each model’s prediction per unique ID. I’m not sure what to do if the ID is missing a picture like roof hazard.

The dataframe has the location for the images for loader to grab. So the loader iterates 1 row at a time grabbing pictures.

Jordan