Wrong demotions in CNN architecture

Nia3324 · January 10, 2024, 1:22am

Im currently trying to implement a CNN for a Alpha Zero game player, using pytorch but im getting an error regarding matrices multiplication. My input consists of 3 channels with 10x10 matrices.

model = Net(10, 10**2+1)
print(summary(model,(3,10,10)))

giving me the following error:

RuntimeError                              Traceback (most recent call last)
<ipython-input-32-0a101a882eb1> in <cell line: 3>()
      1 model = Net(10, 10**2+1)
      2 
----> 3 print(summary(model,(3,10,10)))

9 frames
/usr/local/lib/python3.10/dist-packages/torch/nn/modules/linear.py in forward(self, input)
    112 
    113     def forward(self, input: Tensor) -> Tensor:
--> 114         return F.linear(input, self.weight, self.bias)
    115 
    116     def extra_repr(self) -> str:

RuntimeError: mat1 and mat2 shapes cannot be multiplied (40x10 and 2x101)

This is the current architecture:

def conv3x3(in_planes, out_planes):
    return nn.Conv2d(in_planes, out_planes, kernel_size=3, padding=1)

def conv1x1(in_planes, out_planes):
    return nn.Conv2d(in_planes, out_planes, kernel_size=1, padding=0)


class Net(nn.Module):
    def __init__(self, board_size, action_size, num_resBlocks=20, num_hidden=128):
        super().__init__()
        self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

        # Initial convolution 
        self.startBlock = nn.Sequential(
            conv3x3(3, num_hidden),
            nn.BatchNorm2d(num_hidden),
            nn.ReLU()
        )
        
        # Loop of all 20 Residual Layers
        self.backBone = nn.ModuleList(
            [ResBlock(num_hidden) for i in range(num_resBlocks)]
        )
        
        
        # Outputs expected value of the state 
        self.valueHead = nn.Sequential(
            conv1x1(num_hidden, 1),
            nn.BatchNorm2d(1),
            nn.ReLU(),

            nn.Linear(in_features=1, out_features=num_hidden),
            nn.ReLU(),

            nn.Linear(in_features=num_hidden, out_features=1),
            nn.Tanh()
        )

    
        # Outputs the probabilities of each possible action 
        self.policyHead = nn.Sequential(
            conv1x1(num_hidden, 2),
            nn.BatchNorm2d(2),
            nn.ReLU(),
            nn.Linear(2, out_features=(action_size)),
            nn.Softmax(dim=1)
        )


        self.to(self.device)

    def forward(self, x):
        x = self.startBlock(x)

        for resBlock in self.backBone:
            x = resBlock(x)

        policy = self.policyHead(x)
        value = self.valueHead(x)

        return policy, value
    

class ResBlock(nn.Module):
    def __init__(self, num_hidden):
        super().__init__()
        self.conv1 = conv3x3(num_hidden, num_hidden)
        self.bn1 = nn.BatchNorm2d(num_hidden)
        self.conv2 = conv3x3(num_hidden, num_hidden)
        self.bn2 = nn.BatchNorm2d(num_hidden)
        self.relu = nn.ReLU()

    def forward(self, x):
        identity = x

        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)

        out = self.conv2(out)
        out = self.bn2(out)

        # Skip connections
        out = self.relu(out + identity)

        return out

Thank you so much for your help, i really appreciate it!

J_Johnson · January 22, 2024, 1:48pm

Your error isn’t complete. But what we can tell is that your convolution layers probably aren’t the issue, because it’s getting all the way to the linear classifier, which is giving the error.

The error is a mismatch of sizes.

Nia3324:

self.policyHead = nn.Sequential(
            conv1x1(num_hidden, 2),
            nn.BatchNorm2d(2),
            nn.ReLU(),
            nn.Linear(2, out_features=(action_size)), #<--- here
            nn.Softmax(dim=1)
        )

It looks like the above is where the error is coming from. Your layer only takes 2 features, but the size of the data going through your model is (40, 10) by that point. You will need to either change that layer’s input size to match the data going in, or use additional convolutions to bring the size of the features down further.

On a side note, 2 features as input for that last linear layer is woefully insufficient, especially with an action space of 101. Have a look at AlphaZeros code again, especially at the output.