Writer Identification Help


I’m currently trying to implement the architecture presented in the FragNet paper in the scope of writer identification:

This paper also deals with the problem of writer identification using a small amount of data. The input data contains handwritten word images and the goal is to identify writers by handwriting style/font.

I encountered problems while training the model:

  1. The loss doesn’t converge and the accuracy doesn’t improve while training the model on the handwriting IAM dataset.
  2. I tested the data preparation by using the torchvision resnet50 model, and the results were awesome.

I would be happy to solve this problem, my code includes:

The forward function of the Feature Pyramid Network (self.convblock_lst is nn.Sequential model including convolutional blocks):

def forward(self, x):
    output_features = list()
    for convblock in self.convblock_lst:
        x = convblock(x)
    return output_features

Fragment pathway forward function (self.convblock_lst is the convolutional blocks of the fragment pathway as nn.Sequential model):

def forward(self, fragments_t, fpn_fragment_t):
fragments_t: Fragments in time_t (=step_t) of all batch word images.
fragments_t.shape is (B, C, H, WS) Where C = 1 (Grayscale) and WS is the
width timestep int(W/self.timestamps)
fpn_fragment_t: The cropped FPN (batch) features by timestamp t, given as a list.
The elements in the list have different shapes (from the high-level
features to the low_level ones)

    for convblock, f_t in zip(self.convblock_lst, fpn_fragment_t):
        B1, C1, H1, W1 = fragments_t.shape
        B2, C2, H2, W2 = f_t.shape
        assert B1 == B2, "Incompatible batch size"

        # Resize feature_map_t to match the last output shape
        f_t = F.interpolate(f_t, (H1, W1))

        # Concatenate the fpn cropped features and fragments_t
        # by the channel's dimension
        fragments_t = torch.cat([fragments_t, f_t], dim=1)
        fragments_t = convblock(fragments_t)

    return fragments_t

FragNet forward function:

def forward(self, words):
words: The input batch of words with shape of (B, C, H, W)

    # Extract the different features from the given words
    word_features = self.fpn_pathway(words)
    words_batch, words_channels, words_height, words_width = words.shape

    outputs = list()
    for i in range(0, words_height//self.q):
        for j in range(0, words_width//self.q):
            # Get fragments q-ij from the word images
            batch_fragments_t = words[:,:,i*self.q:(i+1)*self.q,j*self.q:(j+1)*self.q]
            fragments_batch, fragments_channels, fragments_height, fragments_width = batch_fragments_t.shape 

            # Get current fragments_t from the FPN outputs
            # The first fp fragment equals the fragment itself
            batch_wf_fragment_t = [batch_fragments_t]
            for wf in word_features:
                wf_batch, wf_channels, wf_height, wf_width = wf.shape
                wf_height_q = int(self.q*(wf_height/words_height))
                wf_width_q = int(self.q*(wf_width/words_width))


            # Pass the input fragments to the fragment pathway
            output = self.fragment_pathway(batch_fragments_t, batch_wf_fragment_t)
            output = self.classifier(self.flatten(output))
            output = F.softmax(output, 1) # Shape is (B, num_writers)

    return outputs

Thank you!