I have a neural network that has multiple output layers for softmax probability depending on the state my agent is in. Here is an example of my network:
class policy(nn.Module): def __init__(self): hidden_layer = 32 super(policy, self).__init__() self.affine1 = nn.Linear(3, hidden_layer) self.affine2 = nn.Linear(hidden_layer, hidden_layer) self.output1 = nn.Linear(hidden_layer, 10) self.output2 = nn.Linear(hidden_layer, 5) self.output3 = nn.Linear(hidden_layer, 3) def forward(self, x): x = torch.nn.functional.relu(self.affine1(x)) x = torch.nn.functional.relu(self.affine2(x)) outputprobs1 = torch.nn.functional.softmax(self.output1(x), dim=-1) outputprobs2 = torch.nn.functional.softmax(self.output2(x), dim=-1) outputprobs3 = torch.nn.functional.softmax(self.output3(x), dim=-1) return outputprobs1, outputprobs2, outputprobs3
The softmax probabilities indicate a certain action my agent will perform, but the agent has different actions based on different states. Because I know which actions my agent should perform, I want to train the policy through supervised learning. I am planning to use torch.nn.CrossEntropyLoss(), as it is a multi-classification problem. Moreover, during each episode, the model chooses from the output probabilities multiple times.
For example, let’s say that there are 3 states my agent can be in, A, B, and C. In state A, the agent uses output1, in state B, the agent uses output2, and in state C, the agent uses output 3. As a result, one example through an episode might be:
- Agent starts in state A: chooses action 9
- Now agent is in state C: chooses action 2
- Now agent is in state B: chooses action 4
- Now agent is in state A: chooses action 5
Here are my following questions:
How would I train this policy through supervised learning? The website for CrossEntropyLoss says I need a input of shape (N, C). So my C here would be 10, 5, and 3 respectively. My thinking is that I would need 3 of these inputs, one for each output layer. As a result, would I need a separate loss function for each output layer, or can I just use one loss function for all of them? Would propagating the loss for one of the output layers affect the others negatively through supervised learning?
Moreover, I would like more clarification on how to obtain the logits needed for the input to CrossEntropyLoss. In order to obtain this, would I use
Any help with this is much appreciated, thank you!