Need help with implementing handwriting recognition

nit-in · August 2, 2020, 6:19pm

I want to train a neural network to recognise my handwriting only.
I have some confusions like-

Data: How do I prepare a data set for network? I want to work with my own data. Can I use images like the image that I have uploaded below?
IMG_20200802_214904-01|690x227
Training: How should I actually train the network ? In other datasets they come in image and label pair? How do I achieve that?
Can I make a text file with same name as my image and type the text in it same as my image like

xyz.txt
I want to use images that have
text in it that is written in multiple lines like this.
My question is how do I prepare a data
set for the network and load it.

then train the network with image and text inside the .txt file
It is possible?
if yes how?

I have read many tutorials but still not able to get an answer

ptrblck · August 3, 2020, 7:21am

The data preparation as well as the data loading might depend a bit on the overall model and training structure.
I.e. what kind of model would you like to use?
If your desired architecture accepts the input image containing the complete text, your suggestion of loading the image together with the text (and encode it) sounds reasonable.

This tuturial gives you more information on how to write a custom Dataset.
For a general text processing turotial you migh have a look at the Seq2Seq tutorial.

nit-in · August 4, 2020, 8:04pm

I have written this dataset

class Classification(Dataset):
    def __init__(self,image_paths,targets,resize=None):
        self.image_paths = image_paths
        self.targets = targets
        self.resize = resize
        self.aug = al.Compose([al.Normalize(always_apply=True)])
        
    def __len__(self):
        return len(self.image_paths)
    
    def __getitem__(self,position):
        image = Image.open(self.image_paths[position]).convert("RGB")
        targets = self.targets[position]
        st1,st2 = "",""
        words = []
        with open(targets,"r") as t:
            lines = t.readlines()
            for line in reversed(lines):
                st1 = line
                st2 = st1 + st2
            words.insert(0,st2)
            #words = [ord(c) for c in str(l[0])]
    
        if self.resize is not None:
            image = image.resize((self.resize[1],self.resize[0]),resample=Image.BILINEAR)

        image = np.array(image)
        augmented = self.aug(image=image)
        image = augmented["image"]
        image = np.transpose(image,(2,0,1)).astype(np.float32)
        print(torch.tensor(image,dtype=torch.float).shape)
        return{
            "images" : torch.tensor(image,dtype=torch.float),
            "targets" : words
        }

this return two things

a tensor of size (3,120,480)
and I have kept image size as wxh = 480x120
the text from the corresponding txt file

{'images': tensor([[[1.3927, 0.6392, 0.1768,  ..., 2.2489, 2.2489, 2.2489],
          [1.4098, 0.6906, 0.1597,  ..., 2.2489, 2.2489, 2.2489],
          [1.2899, 0.6392, 0.0741,  ..., 2.2489, 2.2489, 2.2489],
          ...,
          [1.2557, 0.5707, 1.9407,  ..., 2.2147, 2.2147, 2.2147],
          [1.1700, 0.6392, 1.9578,  ..., 2.2147, 2.2147, 2.2147],
          [0.7248, 0.1426, 1.2557,  ..., 2.2147, 2.2147, 2.2147]],
 
         [[1.5532, 0.7829, 0.3102,  ..., 2.4286, 2.4286, 2.4286],
          [1.5707, 0.8354, 0.2927,  ..., 2.4286, 2.4286, 2.4286],
          [1.4482, 0.7829, 0.2052,  ..., 2.4286, 2.4286, 2.4286],
          ...,
          [1.4132, 0.7129, 2.1134,  ..., 2.3936, 2.3936, 2.3936],
          [1.3256, 0.7829, 2.1310,  ..., 2.3936, 2.3936, 2.3936],
          [0.8704, 0.2752, 1.4132,  ..., 2.3936, 2.3936, 2.3936]],
 
         [[1.7685, 1.0017, 0.5311,  ..., 2.6400, 2.6400, 2.6400],
          [1.7860, 1.0539, 0.5136,  ..., 2.6400, 2.6400, 2.6400],
          [1.6640, 1.0017, 0.4265,  ..., 2.6400, 2.6400, 2.6400],
          ...,
          [1.6291, 0.9319, 2.3263,  ..., 2.6051, 2.6051, 2.6051],
          [1.5420, 1.0017, 2.3437,  ..., 2.6051, 2.6051, 2.6051],
          [1.0888, 0.4962, 1.6291,  ..., 2.6051, 2.6051, 2.6051]]]),

 'targets': ['I want to use images that have\ntext in it that is written in\nmultiple lines like this.\n\nMy question is how do I prepare a data\nset for the network and load it.\n']}

Is this ok?
or Should I try anything else?

Can I go ahead with this and write other parts
or
Do I need to convert the targets to a tensor also?

ptrblck · August 5, 2020, 7:30am

You would need to convert the target to a tensor e.g. via a dictionary. The linked Seq2Seq tutorial gives you an example how to map from one language to another and you could adapt it for your use case.