I want to create a simple gru which counts the number of ones in a sequence of 0s and 1s. Example
input = 100110, output = 3
input = 101, output = 2
input = 10001110101, output = 6
This is the way I have made my model.
class Counter(NN.Module): def __init__(self, input_size,hidden_size, output_size): super(Counter, self).__init__() self.embed = NN.Embedding(input_size,hidden_size,) self.hidden_size = hidden_size self.output_size = output_size self.gru = NN.GRU(hidden_size,hidden_size, batch_first = True,) self.linear = NN.Linear(hidden_size, output_size) def forward(self,inputs, hidden = 0): embedded = self.embed(inputs) gru_out, gru_hid = self.gru(embedded) final_out = self.linear(gru_out[:,-1]) return final_out
Now I make an object of this class with input size = 2 (0 or 1 which goes in embedding), hidden size = 5, and output size = 1 (total count of 1s in the number).
The losses and optimizer are:
cc = Counter(2,5,1) criterion = NN.MSELoss() optimizer = optim.SGD(cc.parameters(),lr = 0.01)
I am making my inputs and outputs through this function:
def get_number(digits = 10): ones = np.random.choice(digits) l = [1 if i < ones else 0 for i in range(digits)] l = np.random.permutation(l) return l, ones
Now I am simply iterating multiple times to train. I also used scheduler and decaying learning rate. But I am not getting any good results. All I get is some value (close to average value of ones).
digits = 10 for r in range(iterations): cc.zero_grad() inputs =  labels =  for b in range(batch_size): p,q = get_number(digits) inputs.append(p) labels.append(q *1.0) outputs = cc.forward(torch.tensor(inputs)) loss = criterion(outputs, torch.tensor(labels)) temp_loss.append(loss.data.numpy()) loss.backward() optimizer.step()
So here, since digits = 10, therefore I will get a model which predicts some value close to 5 everytime.
I guess it has something to do with gru, hidden layer because that is something which is still unclear to me. Please help.