How to encode labels for classification on custom dataset

sparshgarg23 · January 24, 2022, 9:56am

I am performing classification to identify which phenotype does the person in the image belong to.
In all there are eight classes
My dataset is organized as follows
Images
Character_class(contains .txt files,each txt file tells us about which class does the image belong to].The label looks like this

m_la01

No of images 800
No of labels corresponding to the images 800

As such ,given my current dataset which is shown below,what changes should I make to my dataloader so that I can train the classifier on it using standard classification loss(crossentropy).Do i need to encode the label in one hot form and return that as a tensor?
My custom dataset

class BlenderPoseDataset(Dataset):
    
    def __init__(self,paths,batch_size=16):
        self.img_dir='blender_data_1/images_blender/*'
        self.pose_files=paths
        self.batch_size=batch_size
        self.transforms=None
        self.image_shape=(224,224)
        self.data=[]
        self.min_data=np.load('char_min_data.npy')
        self.max_data=np.load('char_max_data.npy')
        img_list=glob.glob(self.img_dir)
        for img_path in img_list:
            for file in self.pose_files:
                self.data.append([img_path,file])
    def __len__(self):
        return len(self.data)
    
    def image_process(self,rgb_img):
        rgb_img=np.transpose(rgb_img.astype('float32'),(2,0,1))/255.0
        return rgb_img
    
    
    def __getitem__(self,idx):
        if self.transforms is not None:
            self.image=self.transform(self.image)
        img_path,label_path=self.data[idx]
        img=cv2.imread(img_path)
        img=cv2.resize(img,self.image_shape)[:,:,::-1].astype(np.float32)
        img=self.image_process(img)
        img=torch.from_numpy(img).float()
       with open(label_path,'r') as f:
              label=f.read(label_path)
        return img,label

dataset=Dataset(labels_paths,batch_size=32)

I have also implemented my own representation of one hot encoding that converts the class label to one hot and returns them,which can be later on used during training.Please let me know if that approach sounds reaosnable

label_arr=np.zeros(8)
classes=['f_af01','f_as01','f_ca01','f_la01','m_af01','m_as01','m_ca01','m_la01']
with open(file_name,'r') as f:
    label=f.readline()
print(label)
for i in range(len(label_arr)):
    if label==classes[i]:
        label_arr[i]=1

ptrblck · January 25, 2022, 12:31am

nn.CrossEntropyLoss can be used with a target containing class indices and starting from the latest release also accepts targets containing probabilities.
In the former case the model output should have the shape [batch_size, nb_classes] containing logits and the targets [batch_size] containing class indices in the range [0, nb_classes-1].
One-hot encoded targets could be seen as a probability, and should be accepted now, too.

sparshgarg23 · January 25, 2022, 4:01am

I tried to label encode,but with nn.LogSoftmax() activation function,I got the error

RuntimeError: multi-target not supported at

ptrblck · January 25, 2022, 4:43am

Could you post the model output shape as well as the shape of the targets, please?

sparshgarg23 · January 25, 2022, 5:01am

label batch shape is [128]
model output shape is [128]

ptrblck · January 25, 2022, 5:02am

That’s wrong as described in my previous post. Make sure the model output has the shape [batch_size, nb_classes].

sparshgarg23 · January 25, 2022, 5:03am

In all there are 8 labels,my custom dataset is as follows

classes=('m_as01','f_as01','m_af01','f_af01','m_ca01','f_ca01','m_la01','f_la01')
idx_to_class = {i:j for i, j in enumerate(classes)}
class_to_idx = {value:key for key,value in idx_to_class.items()}

class BlenderPoseDataset(Dataset):
    
    def __init__(self,paths,batch_size=16):
        self.img_dir='blender_data_1/images_blender/*'
        self.pose_files=paths
        self.batch_size=batch_size
        self.transforms=None
        self.image_shape=(224,224)
        self.data=[]
        img_list=glob.glob(self.img_dir)
        for img_path in img_list:
            for file in self.pose_files:
                self.data.append([img_path,file])
    def __len__(self):
        return len(self.data)
    
    def image_process(self,rgb_img):
        rgb_img=np.transpose(rgb_img.astype('float32'),(2,0,1))/255.0
        return rgb_img
    
    
    def __getitem__(self,idx):
        if self.transforms is not None:
            self.image=self.transform(self.image)
        img_path,label_path=self.data[idx]
        img=cv2.imread(img_path)
        img=cv2.resize(img,self.image_shape)[:,:,::-1].astype(np.float32)
        img=self.image_process(img)
        img=torch.from_numpy(img).float()
        with open(label_path,'r') as f:
          label=f.readline()
        label=class_to_idx[label]
        return img,label

output when iterating through train loader

Batch of image has shape: torch.Size([128, 3, 224, 224])
Batch of pose has shape: torch.Size([128])

sparshgarg23 · January 27, 2022, 4:13pm

just wanted to put a quick update,there is a bug in the dataloader,the nested for loop in the init causes one image to be associated with 10 class labels,which is not correct,because each image should only be associated with one label.
For example

for img_path in img_list:runs 1000 times
            for file in self.pose_files: runs for all the pose files#1000
                self.data.append([img_path,file])
total number of times the loop runs 1000*1000 which causes the no of entries in the dataloader to blow up,and also causes one image to be associated with 1000 labels.

correct way

for img_path in img_list:
            for file in self.pose_files:
             if pose file_idx==img_path_idx:
        then append label and img to data else ignore

Because of the previous problem,i think the data distribution gets affected,which in turn causes the net to overfit.Let me know if I am on the right track,plus I solved the issue of encoding labels by using class_to_idx and idx_to_class methods.