I want to create a simple gru which counts the number of ones in a sequence of 0s and 1s. Example

input = 100110, output = 3

input = 101, output = 2

input = 10001110101, output = 6

This is the way I have made my model.

```
class Counter(NN.Module):
def __init__(self, input_size,hidden_size, output_size):
super(Counter, self).__init__()
self.embed = NN.Embedding(input_size,hidden_size,)
self.hidden_size = hidden_size
self.output_size = output_size
self.gru = NN.GRU(hidden_size,hidden_size, batch_first = True,)
self.linear = NN.Linear(hidden_size, output_size)
def forward(self,inputs, hidden = 0):
embedded = self.embed(inputs)
gru_out, gru_hid = self.gru(embedded)
final_out = self.linear(gru_out[:,-1])
return final_out
```

Now I make an object of this class with input size = 2 (0 or 1 which goes in embedding), hidden size = 5, and output size = 1 (total count of 1s in the number).

The losses and optimizer are:

```
cc = Counter(2,5,1)
criterion = NN.MSELoss()
optimizer = optim.SGD(cc.parameters(),lr = 0.01)
```

I am making my inputs and outputs through this function:

```
def get_number(digits = 10):
ones = np.random.choice(digits)
l = [1 if i < ones else 0 for i in range(digits)]
l = np.random.permutation(l)
return l, ones
```

Now I am simply iterating multiple times to train. I also used scheduler and decaying learning rate. But I am not getting any good results. All I get is some value (close to average value of ones).

```
digits = 10
for r in range(iterations):
cc.zero_grad()
inputs = []
labels = []
for b in range(batch_size):
p,q = get_number(digits)
inputs.append(p)
labels.append(q *1.0)
outputs = cc.forward(torch.tensor(inputs))
loss = criterion(outputs, torch.tensor(labels))
temp_loss.append(loss.data.numpy())
loss.backward()
optimizer.step()
```

So here, since digits = 10, therefore I will get a model which predicts some value close to 5 everytime.

I guess it has something to do with gru, hidden layer because that is something which is still unclear to me. Please help.