Loss increases, converges particular result

I’m trying to figure out how to run a tensor through a function, do some math inside that function, and compare the output to a predetermined “fitness goal” to be used in the loss function. I have started with a very simple example, but already run into issues.

This example is a bread shopping simulation, where every 3rd day bread goes on sale for 90% off. The input and output values just determine what % of your budget to spend on bread each day. Ideally it should end up [0, 0, 1, 0, 0, 1, 0, 0, 1] and so forth

def buybread(args):
    
    budget = 1000
    cost = 10
    totalbread = 0
    days = range((len(args[0])))
    
    for i in days:
        if (i+1) % 3 == 0:
            cost = 1
        else:
            cost = 10
    
        spend = budget
        bought = int((args[0][i] * spend)/cost)
        totalbread += bought
        budget -= (bought * cost)
        
    argsum = torch.sum(args)
    fitness = torch.add(argsum, totalbread)
    return(fitness)

class TwoLayerNet(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc1 = nn.Linear(21,100)
        self.fc2 = nn.Linear(100,21)
    def forward(self, x):
        x = F.sigmoid(self.fc1(x))
        x = F.sigmoid(self.fc2(x))
        return x

m = TwoLayerNet()
loss_fn = nn.MSELoss()
optimizer = optim.Adam(m.parameters(), lr=0.01)

training_epochs = 30
fitnessgoal = torch.tensor(1000)
blanktensor = torch.zeros([1, 21])

   
for i in range(training_epochs):

    fpass = m(blanktensor)
    fitness = buybread(fpass)
    loss = loss_fn(fitness, fitnessgoal.float())
    print(i, fitness, loss)
    optimizer.zero_grad()
    
    loss.backward()
    
    optimizer.step()

Ideally it should learn to buy bread only on the discount days, which is every 3 days when a loaf costs 10x less. If played perfectly, the optimal result is buying 1,000 loaves for $1 each.

Instead, it converges on 129 loaves, no matter what I set the fitness goal to. Oftentimes it will buy 260 or more loaves on the first epoch, just by virtue of randomness, but usually settles on 129 within ten epochs.

I assume that what’s actually happening is that the variable totalbread is being disregarded entirely, and pytorch is optimizing for the highest value for argsum. When I add print(fpass) to the loop, I generally get something like this for the last epoch:

tensor([[0.9997, 0.9997, 0.9996, 0.9996, 0.9996, 0.9997, 0.9997, 0.9997, 0.9997,
     0.9997, 0.9997, 0.9996, 0.9996, 0.9996, 0.9996, 0.9996, 0.9996, 0.9996,
     0.9996, 0.9996, 0.9996]], grad_fn=<SigmoidBackward>)

So it’s trying to spend almost the maximum allowed budget every day, including day 0 where most of it gets used up.

I had previously tried simply setting fitness = totalbread, but that ran into all sorts of problems with being the wrong type of variable, and eventually the loss not changing at all.