I’ve implemented a simple DDQN network in pytorch and tensorflow. The network is quite shallow.
While the forward pass is much faster in PyTorch compared to TF, the back-propagation step is much slower compared to TF. Both backprop steps were done on the CPU.
Any ideas how to improve it.
The network part is:
def __init__(self, hidden_size_IP=100, hidden_size_rest=100, alpha=0.01, state_size=27, action_size=8, learning_rate=1e-6): super().__init__() # build hidden layers self.l1 = nn.Sequential(nn.Linear(in_features=500, out_features=400), nn.LeakyReLU(negative_slope=alpha)) self.l2 = nn.Sequential(nn.Linear(in_features=400, out_features=200), nn.LeakyReLU(negative_slope=alpha)) self.l3 = nn.Sequential(nn.Linear(in_features=200, out_features=200), nn.LeakyReLU(negative_slope=alpha)) # build output layer self.Qval = nn.Linear(in_features=200, out_features=24) def forward(self, observation): if isinstance(observation, np.ndarray): observation = torch.from_numpy(observation).float() out1 = self.l1(observation) out2 = self.l2(out1) out3 = self.l3(out2) qval = self.Qval(out3) return qval
and the backprop code can be, for example:
self.optimizer = optim.Adam(self.q_net.parameters(), lr=1e-4)
label_batch = torch.rand([64,500])
Q=self.q_net.forward(state_batch).gather(1, act_batch_torch) # q_net is an instance of the network above
loss = mse_loss(input=Q, target=label_batch.detach())
Note that since inference is much faster using the CPU, I’m also doing backprop on the CPU. I have tried transferring the network to the GPU, and then do a backprop on the GPU, but it turned out to be slower.
Any ideas why pyTorch is slower? How can I improve the speed for this type of shallow network?
I’m using pyTorch 1.0.