Hello,
I’m currently trying to implement the architecture presented in the FragNet paper in the scope of writer identification:
https://arxiv.org/abs/2003.07212
This paper also deals with the problem of writer identification using a small amount of data. The input data contains handwritten word images and the goal is to identify writers by handwriting style/font.
I encountered problems while training the model:
- The loss doesn’t converge and the accuracy doesn’t improve while training the model on the handwriting IAM dataset.
- I tested the data preparation by using the torchvision resnet50 model, and the results were awesome.
I would be happy to solve this problem, my code includes:
The forward function of the Feature Pyramid Network (self.convblock_lst is nn.Sequential model including convolutional blocks):
def forward(self, x):
output_features = list()
for convblock in self.convblock_lst:
x = convblock(x)
output_features.append(x)
return output_features
Fragment pathway forward function (self.convblock_lst is the convolutional blocks of the fragment pathway as nn.Sequential model):
def forward(self, fragments_t, fpn_fragment_t):
“”"
fragments_t: Fragments in time_t (=step_t) of all batch word images.
fragments_t.shape is (B, C, H, WS) Where C = 1 (Grayscale) and WS is the
width timestep int(W/self.timestamps)
fpn_fragment_t: The cropped FPN (batch) features by timestamp t, given as a list.
The elements in the list have different shapes (from the high-level
features to the low_level ones)
“”"
for convblock, f_t in zip(self.convblock_lst, fpn_fragment_t):
B1, C1, H1, W1 = fragments_t.shape
B2, C2, H2, W2 = f_t.shape
assert B1 == B2, "Incompatible batch size"
# Resize feature_map_t to match the last output shape
f_t = F.interpolate(f_t, (H1, W1))
# Concatenate the fpn cropped features and fragments_t
# by the channel's dimension
fragments_t = torch.cat([fragments_t, f_t], dim=1)
fragments_t = convblock(fragments_t)
return fragments_t
FragNet forward function:
def forward(self, words):
“”"
words: The input batch of words with shape of (B, C, H, W)
“”"
# Extract the different features from the given words
word_features = self.fpn_pathway(words)
words_batch, words_channels, words_height, words_width = words.shape
outputs = list()
for i in range(0, words_height//self.q):
for j in range(0, words_width//self.q):
# Get fragments q-ij from the word images
batch_fragments_t = words[:,:,i*self.q:(i+1)*self.q,j*self.q:(j+1)*self.q]
fragments_batch, fragments_channels, fragments_height, fragments_width = batch_fragments_t.shape
# Get current fragments_t from the FPN outputs
# The first fp fragment equals the fragment itself
batch_wf_fragment_t = [batch_fragments_t]
for wf in word_features:
wf_batch, wf_channels, wf_height, wf_width = wf.shape
wf_height_q = int(self.q*(wf_height/words_height))
wf_width_q = int(self.q*(wf_width/words_width))
batch_wf_fragment_t.append(wf[:,:,i*wf_height_q:(i+1)*wf_height_q,j*wf_width_q:(j+1)*wf_width_q])
# Pass the input fragments to the fragment pathway
output = self.fragment_pathway(batch_fragments_t, batch_wf_fragment_t)
output = self.classifier(self.flatten(output))
output = F.softmax(output, 1) # Shape is (B, num_writers)
outputs.append(output)
return outputs
Thank you!