Hey everyone, I’m experiencing some problems training a module and I’m not sure what’s causing it. It looks like whenever an epoch ends, the average loss resets to when the training began, almost like the module doesn’t retain the training that it has done on each epoch.
id really appriciate any help on this.
def run_train(self, epochs: int, cycles: int, input_size: int=0):
    self.train()
    for epoch in range(epochs):
        for cycle in range(cycles + 1):
            self.cycle_count += 1
            random_index = random.randint(0, len(self.train_data) - input_size)
            tensor_input = torch.tensor(self.train_data.iloc[random_index:random_index + input_size].values, device=self.device).type(torch.float)
            result_compare = torch.tensor(self.train_data[self.prediction_list].iloc[random_index + input_size - 1:random_index + input_size].values, device=self.device).type(torch.float)
            pred = self.forward(tensor_input)
            loss = self.loss_function(pred, result_compare)
            self.running_loss += loss
            self.optimizer.zero_grad()
            loss.backward()
            self.optimizer.step()
            if cycle % 2000 == 0:
                self.print_trainrun_stats()
                self.running_loss = 0
        self.epoch_count += 1
        self.cycle_count = 0
        torch.save(self.state_dict(), fr'D:\code\playing-with-stocks\data\created data\modules\E{epoch}_model_state.pth')
    self.epoch_count = 0
    self.cycle_count = 0
here is the loss prtinted out during a run as well.
avarge_loss:  tensor(188016.0781, device=‘cuda:0’, grad_fn=)
cycle_count:  1
epoch_count:  0
avarge_loss:  tensor(20398.9414, device=‘cuda:0’, grad_fn=)
cycle_count:  2001
epoch_count:  0
avarge_loss:  tensor(12104.1943, device=‘cuda:0’, grad_fn=)
cycle_count:  4001
epoch_count:  0
avarge_loss:  tensor(7121.3501, device=‘cuda:0’, grad_fn=)
cycle_count:  6001
epoch_count:  0
avarge_loss:  tensor(5201.2925, device=‘cuda:0’, grad_fn=)
cycle_count:  8001
epoch_count:  0
avarge_loss:  tensor(4557.2524, device=‘cuda:0’, grad_fn=)
cycle_count:  10001
epoch_count:  0
avarge_loss:  tensor(24384.3477, device=‘cuda:0’, grad_fn=)
cycle_count:  1
epoch_count:  1