Hello,
I am trying to implement a neural network architecture that has the requirement of changing the network output size depending on another network output value. For example; let’s assume that I have 2 parallel neural networks. My network 2 will give me some integer value say 65. Then, my network1 output size should be nn.Linear(hidden_dim, 65). Below, is the diagrammatic representation of it.
Now, Here is my code:
class GaussianPolicy(nn.Module):
def init(self, num_inputs, num_actions, hidden_dim, action_space=None,
is_recurrent=False, init_mu=0, init_std=1, min_std=1e-6):
super(GaussianPolicy, self).init()
self.recurrent = is_recurrent
self.num_inputs = num_inputs
self.num_actions = num_actions
self.max_horizon = 100 # why 100?? (maybe… it needs for exploration!!! … not sure)
self.min_horizon = 1
self.init_mu = init_mu
self.init_std = init_std
self.hidden_dim = hidden_dim
# Network 2 configuration
self.horizon_linear1 = nn.Linear(num_inputs, hidden_dim)
self.horizon_linear2 = nn.Linear(hidden_dim, hidden_dim)
self.horizon_mu = nn.Linear(hidden_dim, 1)
self.horizon_sigma = nn.Linear(hidden_dim, 1)
# Now, using that horizon_sigma , horizon_mu, I can sample value from a Normal distribution and # scale it and then round it to get an integer value. Let's call this horizon. However, I can only get horizon value in the forward method, and then I need to change the Network 1 configuration from "self.mean_linear = nn.Linear(hidden_dim, num_actions)" to "self.mean_linear = nn.Linear(hidden_dim, num_actions * horizon)". Similarly, for self.log_std_linear as well. How can I do that?
# Network 1 configuration
self.linear1 = nn.Linear(num_inputs, hidden_dim)
self.linear2 = nn.Linear(hidden_dim, hidden_dim)
self.mean_linear = nn.Linear(hidden_dim, num_actions)
self.log_std_linear = nn.Linear(hidden_dim, num_actions)
self.apply(weights_init_)
# action rescaling
if action_space is None:
self.action_scale = torch.tensor(1.)
self.action_bias = torch.tensor(0.)
else:
self.action_scale = torch.FloatTensor(
(action_space.high - action_space.low) / 2.)
self.action_bias = torch.FloatTensor(
(action_space.high + action_space.low) / 2.)
def forward(self, state):
""" horizon network """
h_x = F.relu(self.horizon_linear1(state))
h_x = F.relu(self.horizon_linear2(h_x))
horizon_mean = self.horizon_mean(h_x)
horizon_log_std = self.horizon_log_std(h_x)
horizon_log_std = torch.clamp(horizon_log_std, min=LOG_SIG_MIN, max=LOG_SIG_MAX)
horizon_std = horizon_log_std.exp()
horizon_normal = Normal(horizon_mean, horizon_std)
h_t = horizon_normal.rsample()
horizon = h_t * self.max_horizon + self.min_horizon
horizon = int(horizon.round().item())
print("horizon value is: ", horizon)
""" re-defining the network configuration with the horizon value; I don't think this is right """
self.mean_linear = nn.Linear(self.hidden_dim, self.num_actions * horizon)
self.log_std_linear = nn.Linear(self.hidden_dim, self.num_actions * horizon)
x = F.relu(self.linear1(state))
x = F.relu(self.linear2(x))
mean = self.mean_linear(x)
log_std = self.log_std_linear(x)
log_std = torch.clamp(log_std, min=LOG_SIG_MIN, max=LOG_SIG_MAX)
return mean, log_std, horizon
Now, I thought I should use nn.parameter but still facing difficulty to change the network 1’s output layer configuration. Please, help me with this.
Note: Look for my question/explanation in the form of comments in the code.
Thanks,
Praneeth.