Batch size-dimensions mismatch

I have a pytorch class. I am processing a tensor of shape[(256, 256, 256, 1)]. While building the model, this 4d gets in role. But when training, during which the batch_size gets added, how does the actions like permut() work.wouldnt it cause mismatch in the tensors? Also, if you take a look at this class, 4d tensor is what got maintained till the end, but at the end point, 5d tensors are expected. Can somebody explain, how batch size works in model building and model training.
class VoxelMorph1(nn.Module):
def init(self, input_shape=(32, 32, 1), optimizer=‘adam’, loss=None,
metrics=None, loss_weights=None):
super(VoxelMorph1, self).init()

    in_channels = 1
    out_channels = 3
    input_shape = input_shape + (in_channels,)
    
    self.moving = nn.Parameter(torch.randn(input_shape), requires_grad=True)
    self.static = nn.Parameter(torch.randn(input_shape), requires_grad=True)
    
    self.static = nn.Parameter(self.static.unsqueeze(0), requires_grad=True)
    self.moving = nn.Parameter(self.moving.unsqueeze(0), requires_grad=True)

  

    x_in = torch.cat([self.static, self.moving], dim=-1)
    x_in = x_in.permute(3, 0,1,2)
    # encoder
    x1 = nn.Conv3d(in_channels=2, out_channels=16, kernel_size=3, stride=2, padding=1)
    
    x1 = nn.LeakyReLU(negative_slope=0.2)(x1(x_in))  # 16
    print("x1",x1.shape)

    x2 = nn.Conv3d(in_channels=16, out_channels=32, kernel_size=3, stride=2, padding=1)
    x2 = nn.LeakyReLU(negative_slope=0.2)(x2(x1))  # 8
    x3 = nn.Conv3d(in_channels=32, out_channels=32, kernel_size=3, stride=2, padding=1)
    x3 = nn.LeakyReLU(negative_slope=0.2)(x3(x2))  # 4

    x4 = nn.Conv3d(in_channels=32, out_channels=32, kernel_size=3, stride=2, padding=1)
    x4 = nn.LeakyReLU(negative_slope=0.2)(x4(x3))  # 2
    #x4 = x4.permute(3, 0,1,2)

    # decoder [32, 32, 32, 32, 8, 8]
    x = nn.Conv3d(in_channels=32, out_channels=32, kernel_size=3, stride=1, padding=1)
    x = nn.LeakyReLU(negative_slope=0.2)(x(x4))
    x= torch.unsqueeze(x, dim=1)
    x = nn.Upsample(scale_factor=2, mode='nearest')(x)
    x= torch.squeeze(x, dim=1)
    x3 = torch.squeeze(x3, dim=1)
    #x = x.permute(0, 2,3,1)
    xd1 = torch.cat([x, x3], dim=0)  # 4
    
    x = nn.Conv3d(in_channels=64, out_channels=32, kernel_size=3, stride=1, padding=1)
    
    x = nn.LeakyReLU(negative_slope=0.2)(x(xd1))
    x= torch.unsqueeze(x, dim=1)
    x = nn.Upsample(scale_factor=2, mode='nearest')(x)  # 8
    x= torch.squeeze(x, dim=1)
    xd2 = torch.cat([x, x2], dim=0)  # 8
    x = nn.Conv3d(in_channels=64, out_channels=32, kernel_size=3, stride=1, padding=1)
    x = nn.LeakyReLU(negative_slope=0.2)(x(xd2))
    x= torch.unsqueeze(x, dim=1)
    x = nn.Upsample(scale_factor=2, mode='nearest')(x)  # 16
    x= torch.squeeze(x, dim=1)
    xd3 = torch.cat([x, x1], dim=0)  # 16
    x = nn.Conv3d(in_channels=48, out_channels=32, kernel_size=3, stride=1, padding=1)
    xd4 = nn.LeakyReLU(negative_slope=0.2)(x(xd3))
    x = nn.Conv3d(in_channels=32, out_channels=8, kernel_size=3, stride=1, padding=1)
    x = nn.LeakyReLU(negative_slope=0.2)(x(xd4))  # 16

    x= torch.unsqueeze(x, dim=1)
    x = nn.Upsample(scale_factor=2, mode='nearest')(x)  # 32
    x= torch.squeeze(x, dim=1)

    xd5 = torch.cat([x, x_in], dim=0)
    x = nn.Conv3d(in_channels=10, out_channels=8, kernel_size=3, stride=1, padding=1)
    x = nn.LeakyReLU(negative_slope=0.2)(x(xd5))  # 32

    #torch.nn.init.normal_(nn.conv3d.weight, mean=0.0, std=1e-5)
    con3d = nn.Conv3d(in_channels=8, out_channels=out_channels, kernel_size=3, stride=1,
                            padding=1, bias=True)
    torch.nn.init.normal_(con3d.weight, mean=0.0, std=1e-5)
    deformation = con3d(x)
    print("deformation",deformation.shape)
    nb, nd, nh, nw, nc = deformation.shape

Hi Devipriya,

You had written “But when training, during which the batch_size gets added” - what do you mean?

I am not sure but I think you may be assuming that the batch dimension is treated specially by pytorch. But, it isn’t. Each pytorch operation has a certain logic about how it treats all of the dimensions of the input tensors. The batch dimension is often part of those operations, but pytorch never adds a dimension to any tensor automatically.

If you can simplify your code example to maybe just 5-6 lines isolating just the phenomenon you observe and explain what you expected instead, that would be helpful.

I am converting a code from tf to pytorch. My input’s size is [(256, 256, 256, 1)] which is a 4D. But after encoding and decoding, the size of the deformation(at the end of the code), is expected to be 5D. when i asked the author about it, he explained that in tf, conv3d would add batch_size automatically during the training. I couldnt get a grasp of that in pytorch. like, should i add the batch size during the building of the model or not? if not, then i couldnt build the model as there are many dimensions mismatch (ie) i couldnt get the deformation size as 5d.All i am getting is a 4D and i need 5d.

It’s still hard for me to understand what you are doing from the code example, sorry. I’m confused by a few things.

    self.moving = nn.Parameter(torch.randn(input_shape), requires_grad=True)
    self.static = nn.Parameter(torch.randn(input_shape), requires_grad=True)
  
    # you should not be instantiating an nn.Parameter with another parameter
    # but it looks like you are thinking to "add a batch dimension to this parameter"?
    # parameters don't need batch dimensions - only data need batch dimensions.  
    # also, what kind of operation do these parameters belong to?
    self.static = nn.Parameter(self.static.unsqueeze(0), requires_grad=True)
    self.moving = nn.Parameter(self.moving.unsqueeze(0), requires_grad=True)

The typical idiom in pytorch is to instantiate function objects like you have done. nn.Conv3d constructor returns a function object. When you call the constructor, it instantiates its own parameters (the filter weights and biases). Then you can call the op with input:

op = nn.Conv3d(in_channels=2, out_channels=16, kernel_size=3, stride=2, padding=1)
batch, in_channel, spatial_dims = 10, 2, (30, 40, 50)
input = t.randn(batch, in_channel, *spatial_dims) # input.shape = [10, 2, 30, 40, 50]
out = op(input) # call the op using input.
# out.shape = [10, 16, 15, 20, 25], batch=10, out_channel=16, out_spatial = (15, 20, 25)

Just to add to the bigger picture: When you want to create a network:

  1. define a new class derived from nn.Module
  2. instantiate all of the ops in the __init__ method
  3. write a forward(self, input) method which calls each one sequentially
class MyNetwork(nn.Module):
    def __init__(self):
        self.conv1 = nn.Conv3d(...)
        self.leakurelu1 = nn.LeakyReLU(....)
        self.conv2 = nn.Conv3d(...)
        ...

    def forward(self, input):
        x = self.conv1(input)
        x = self.leakyrelu1(x)
        x = self.conv2(x)
        x = ...
        return x

If the graph structure is just a linear chain as it seems to be for your case, you can use nn.Sequential to automatically implement this forward function. Although it might be fun to implement it yourself first.

Then, you can run your network as:


def main():
    my_net = MyNetwork()
    input = t.randn(batch, in_channel, *spatial_dims)
    out = my_net(input)

And as I mentioned, all of the ops (self.conv1, self.leakyrelu, etc) contain their own parameters, and parameters never have batch dimensions. When they are called, torch automatically broadcasts the parameters across each batch of the input and performs the identical computation, but on different batches of data.