How to make network recognize the important of position data

So the input to my network is a depth image (64 x 64) and the robot position (x, y, z). I used torch.cat to combine them and feed them into fully connected layers. After I trained my model for 2000 episodes, it looks to me that my network is straight-up ignoring the position data. Maybe it is because the position data is only 3 x 1? How do I make sure my network recognizes the importantness of my position data?

Any recommendations will be appreciated, thank you!

Hi and welcome! What type of network are you using? Can you showing us how you actually combine the depth image with the robot position?

Yeah, so I have the depth image go through 3 layers of convolutional layers. Then I use
the view function to flatten the output from the convolutional layers. Then I used torch.cat to concatenate the flattened output and position data. Here is my code:

class DualingDoubleDQN(nn.Module):
    def __init__(self, num_frames, pos_state_size, action_size, seed, fc4_units=128, fc5_units=128):
        self.action_size = action_size
        super(DualingDoubleDQN, self).__init__()
        self.seed = torch.manual_seed(seed)
        self.conv1 = nn.Conv2d(1, 32, kernel_size=8, stride=3)
        self.conv2 = nn.Conv2d(32, 64, kernel_size=4, stride=2)
        self.conv3 = nn.Conv2d(64, 64, kernel_size=3, stride=1)
        self.fc4 = nn.Linear(64* 6* 6 + pos_state_size, fc4_units) 
        self.fc5 = nn.Linear(fc4_units, fc5_units)
        self.V = nn.Linear(fc5_units, 1)
        self.A = nn.Linear(fc5_units, action_size)

    def forward(self, pos_state, frame_state):
        c1 = F.relu(self.conv1(frame_state))       
        c2 = F.relu(self.conv2(c1))   
        c3 = F.relu(self.conv3(c2))
        c3 = c3.view(c3.size(0), -1)
        #print("c3 size after view: ", c3.size())
        concate = torch.cat((c3,pos_state),1) #concate flattened image and pos data
        f4 = F.relu(self.fc4(concate))
        f5 = F.relu(self.fc5(f4))
        Value = self.V(f5)
        Advantage = self.A(f5)
        return Value, Advantage

Also, I am doing a dueling double DQN for deep reinforcement learning.

I don’t see any problem in your code there. Maybe there is something wrong in the loss function? I am not super familiar with DQN but maybe show us your loss fun?

Interesting, for the loss function I just follow the dueling double dqn algorithm. And when I use only x,y,z,row,pitch,yaw as input everything was working properly, so I think my loss function is defined correctly. Maybe 2000 episodes is not enough for using images as input?