GRU for creating a simple "ones" counter

I want to create a simple gru which counts the number of ones in a sequence of 0s and 1s. Example
input = 100110, output = 3
input = 101, output = 2
input = 10001110101, output = 6

This is the way I have made my model.

class Counter(NN.Module):
    def __init__(self, input_size,hidden_size, output_size):
        super(Counter, self).__init__()
        self.embed = NN.Embedding(input_size,hidden_size,)
        self.hidden_size = hidden_size
        self.output_size = output_size
        self.gru = NN.GRU(hidden_size,hidden_size, batch_first = True,)
        self.linear = NN.Linear(hidden_size, output_size)
        
    def forward(self,inputs, hidden = 0):
        embedded = self.embed(inputs)
        gru_out, gru_hid = self.gru(embedded)
        final_out = self.linear(gru_out[:,-1])
        return final_out

Now I make an object of this class with input size = 2 (0 or 1 which goes in embedding), hidden size = 5, and output size = 1 (total count of 1s in the number).
The losses and optimizer are:

cc = Counter(2,5,1)
criterion = NN.MSELoss()
optimizer = optim.SGD(cc.parameters(),lr = 0.01)

I am making my inputs and outputs through this function:

def get_number(digits = 10):
    ones = np.random.choice(digits)
    l = [1 if i < ones else 0 for i in range(digits)]
    l = np.random.permutation(l)
    return l, ones

Now I am simply iterating multiple times to train. I also used scheduler and decaying learning rate. But I am not getting any good results. All I get is some value (close to average value of ones).

digits = 10
for r in range(iterations):
    cc.zero_grad()
    inputs = []
    labels = []
    for b in range(batch_size):
        p,q = get_number(digits)
        inputs.append(p)
        labels.append(q *1.0)
    outputs = cc.forward(torch.tensor(inputs))
    loss = criterion(outputs, torch.tensor(labels))
    temp_loss.append(loss.data.numpy())
    loss.backward()
    optimizer.step()

So here, since digits = 10, therefore I will get a model which predicts some value close to 5 everytime. :expressionless:
I guess it has something to do with gru, hidden layer because that is something which is still unclear to me. Please help.

I tried using L1loss, but getting similar results.

OK, I have to ask: Why would you ever try to train an RNN to count the number of 1’s in a binary string? I actually don’t think this is a suitable task for a neural network. For once, your output is not bounded.

Apart from that, why do you have an embedding layer? Your inputs are already numeric values. Embeddings are mainly needed to map words to meaningful vector representations for NLP tasks.

Why would you ever try to train an RNN to count the number of 1’s in a binary string?
I am doing this for my own learning. In theory rnn should be able to do this task. It is not impossible to do this.
Apart from that, why do you have an embedding layer?
I have an embedding layer (which I think will not deteriorate the performance) because once I am successful in this task, I plan to make a “A” counter or any other alphabet/character counter. I know we can make counters just by regex, but rnn should also be able to do this. I was able to make a rnn which counted “1” if I sent the string one-by-one (hidden state has the count till now, input is the next value 1/0 and output is either count+1 or count) . But that is an easy task for rnn, just take decision either to add or not to add. I wanted to make a rnn which carries the count in its hidden state.
In case you have any suggestion/solution, it would be of great help.

I was able to do this perfectly. I was making very basic mistake in calculating loss. Instead of
loss = criterion(outputs, torch.tensor(labels))
I needed to use
loss = criterion(outputs[:,0], torch.tensor(labels))
and the model works perfectly. Infact GRU does it very well. Anyways thanks for trying to help me.