How to make network recognize the important of position data

kolohe113 · July 18, 2020, 3:41pm

So the input to my network is a depth image (64 x 64) and the robot position (x, y, z). I used torch.cat to combine them and feed them into fully connected layers. After I trained my model for 2000 episodes, it looks to me that my network is straight-up ignoring the position data. Maybe it is because the position data is only 3 x 1? How do I make sure my network recognizes the importantness of my position data?

Any recommendations will be appreciated, thank you!

Valerio_Biscione · July 18, 2020, 4:58pm

Hi and welcome! What type of network are you using? Can you showing us how you actually combine the depth image with the robot position?

kolohe113 · July 18, 2020, 5:17pm

Yeah, so I have the depth image go through 3 layers of convolutional layers. Then I use
the view function to flatten the output from the convolutional layers. Then I used torch.cat to concatenate the flattened output and position data. Here is my code:

class DualingDoubleDQN(nn.Module):
    def __init__(self, num_frames, pos_state_size, action_size, seed, fc4_units=128, fc5_units=128):
        self.action_size = action_size
        super(DualingDoubleDQN, self).__init__()
        self.seed = torch.manual_seed(seed)
        self.conv1 = nn.Conv2d(1, 32, kernel_size=8, stride=3)
        self.conv2 = nn.Conv2d(32, 64, kernel_size=4, stride=2)
        self.conv3 = nn.Conv2d(64, 64, kernel_size=3, stride=1)
        self.fc4 = nn.Linear(64* 6* 6 + pos_state_size, fc4_units) 
        self.fc5 = nn.Linear(fc4_units, fc5_units)
        self.V = nn.Linear(fc5_units, 1)
        self.A = nn.Linear(fc5_units, action_size)

    def forward(self, pos_state, frame_state):
        c1 = F.relu(self.conv1(frame_state))       
        c2 = F.relu(self.conv2(c1))   
        c3 = F.relu(self.conv3(c2))
        c3 = c3.view(c3.size(0), -1)
        #print("c3 size after view: ", c3.size())
        concate = torch.cat((c3,pos_state),1) #concate flattened image and pos data
        f4 = F.relu(self.fc4(concate))
        f5 = F.relu(self.fc5(f4))
        Value = self.V(f5)
        Advantage = self.A(f5)
        return Value, Advantage

kolohe113 · July 18, 2020, 5:19pm

Also, I am doing a dueling double DQN for deep reinforcement learning.

Valerio_Biscione · July 18, 2020, 5:23pm

I don’t see any problem in your code there. Maybe there is something wrong in the loss function? I am not super familiar with DQN but maybe show us your loss fun?

kolohe113 · July 18, 2020, 5:43pm

Interesting, for the loss function I just follow the dueling double dqn algorithm. And when I use only x,y,z,row,pitch,yaw as input everything was working properly, so I think my loss function is defined correctly. Maybe 2000 episodes is not enough for using images as input?