My task is to extract certain user-defined entities from receipts using Graph Convolution network (https://arxiv.org/pdf/1903.11279.pdf). I’m using ICDAR-2019-SROIE dataset which consists of 625 receipts in total. My user-defined entities are : [Address, Company, Date, Invoice_no, Total, Tax and Other (remaining nodes are considered as other)] Each receipt is considered as a graph with nodes and edges. Each graph is a fully-connected. Each text segment is a node. I have constructed the node features by using countVectorizer().
For example if we have 3 nodes :
nodes = ["this is an example of first node content ", "this is second node", "this is third node"]
dict = `{"an" : 1, "content" : 2, "example" : 3, "first" : 4 ,"is" : 5, "node" : 6, "of" : 7, "second" : 8, "third" : 9, "this" : 10}`
After applying pre-padding the node features looks like:
node 1 = [10, 5, 1, 3, 7, 4, 6, 2]
node 2 = [0, 0, 0, 0, 10, 5, 8, 6]
node 3 = [0, 0, 0, 0, 10, 5, 9, 6]
Labels = {'address': array([1., 0., 0., 0., 0., 0., 0.]),
'company': array([0., 1., 0., 0., 0., 0., 0.]),
'date': array([0., 0., 1., 0., 0., 0., 0.]),
'invoice_no': array([0., 0., 0., 1., 0., 0., 0.]),
'other': array([0., 0., 0., 0., 1., 0., 0.]),
'tax': array([0., 0., 0., 0., 0., 1., 0.]),
'total': array([0., 0., 0., 0., 0., 0., 1.])}.
My class labels are imbalanced [72, 36, 36, 36, 1400, 36, 140]
I have created a small dataset which consists of 1756 nodes from 36 receipts (same layout receipts) and I’ve splittied them into Train, Validation and Test (1400 (from 26 receipts), 148 (from 5 receipts), 208 (from 5 receipts) nodes respectively). In total, there are 337 features. After the padding technique, there are 9 features per node. I’ve created adjacency matrix 1756*1756 in shape which requires for graph convolution while training the model.
I’ve not included any normalization to the features, which I felt not required (I tried row normalization but its of no use).
This is my model :
nfeat = 337
nhid1 = 40
nhid2 = 30
nhid3 = 20
nhid4 = 10
embed = 50
nclass = 7
alpha = 0.2
import torch
import torch.nn as nn
import torch.nn.functional as F
class GAT(nn.Module):
def __init__(self, nfeat, nhid1, nhid2, nhid3, nhid4, embed, nclass, alpha):
super(GAT1, self).__init__()
self.embed1 = nn.Embedding(nfeat, embed)
self.lstm1 = nn.LSTM(embed, nhid1, num_layers=1, bidirectional=True, batch_first=True)
self.fc1 = nn.Linear(nhid1 * 2, nhid1)
self.gc1 = GraphConvolutionLayer(nhid1, nhid2, alpha=alpha)
self.gc2 = GraphConvolutionLayer(nhid2, nhid3, alpha=alpha)
self.embed2 = nn.Embedding(nhid3, nhid4)
self.lstm2 = nn.LSTM(nhid4, nhid4, num_layers=1, bidirectional=True, batch_first=True)
self.fc2 = nn.Linear(nhid4*2, nclass)
def forward(self, x, adj):
x = self.embed1(x.long())
x, _ = self.lstm1(x)
x = x[:, -1, :]
x = self.fc1(x)
x = F.elu(self.gc1(x, adj))
x = F.elu(self.gc2(x, adj))
x = self.embed2(x.long())
x, _ = self.lstm2(x)
x = x[:, -1, :]
x = self.fc2(x)
return F.log_softmax(x, dim=1)
class GraphConvolutionLayer(nn.Module):
def __init__(self, in_features, out_features, alpha):
super(GraphConvolutionLayer, self).__init__()
self.in_features = in_features
self.out_features = out_features
self.alpha = alpha
self.W = nn.Parameter(torch.empty(size=(in_features, out_features)))
nn.init.xavier_uniform_(self.W.data, gain=1.414)
self.a = nn.Parameter(torch.empty(size=(2 * out_features, 1)))
nn.init.xavier_uniform_(self.a.data, gain=1.414)
self.leakyrelu = nn.LeakyReLU(self.alpha)
def forward(self, h, adj):
wh = torch.mm(h, self.W)
hij = self._concat_features(wh)
alpha = self.leakyrelu(torch.matmul(hij, self.a).squeeze(2))
alpha = torch.mm(alpha, adj)
alphaij = F.softmax(alpha, dim=1)
ti = torch.matmul(alphaij, wh)
return ti
def _concat_features(self, h):
N = h.size()[0]
h_repeated_in_chunks = h.repeat_interleave(N, dim=0)
h_repeated_alternating = h.repeat(N, 1)
all_combinations_matrix = torch.cat([h_repeated_in_chunks, h_repeated_alternating], dim=1)
return all_combinations_matrix.view(N, N, 2 * self.out_features)
Sample training loop:
---------------------
def train(epoch):
t = time.time()
model.train()
optimizer.zero_grad()
output = model(features, adj)
loss_train = F.nll_loss(output[idx_train], labels[idx_train])
acc_train = accuracy(output[idx_train], labels[idx_train], 0)
loss_train.backward()
optimizer.step()
After running 5 epochs, train and validation accuracy is same whether its 10 or 100. However their losses are decreasing from around 2 to 0.8.
True Labels : tensor([1, 0, 0, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 6,
4, 4, 4, 6, 4, 4, 4, 6, 4, 4, 4, 4, 4, 4, 4, 4, 4, 5, 3, 2, 4, 4, 4, 4,
4, 4, 4, 4, 1, 0, 0, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4,
4, 4, 4, 6, 4, 4, 4, 6, 4, 4, 4, 6, 4, 4, 4, 4, 4, 4, 4, 4, 4, 5, 3, 4,
2, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 1, 0, 0, 4, 4, 4, 4, 4, 4, 4, 4, 4,
4, 4, 6, 4, 4, 6, 4, 4, 4, 6, 4, 4, 4, 6, 4, 4, 4, 4, 4, 4, 4, 4, 4, 5,
3, 4, 2, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 1, 0, 0, 4, 4, 4, 4, 4, 4, 4,
4, 4, 4, 4, 6, 4, 4, 6, 4, 4, 4, 6, 4, 4, 4, 6, 4, 6, 4, 4, 4, 4, 4, 4,
4, 5, 3, 4, 2, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4])
Predicted Labels : tensor([4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4,
4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4,
4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4,
4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4,
4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4,
4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4,
4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4,
4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4,
4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4])
Since 4 (other) which is a dominating class, my model is always predicting 4.
Any inputs from your end to improve my model and achieve my task will be highly appreciable. Thanks a ton for reading this long post and trying to help me.