Getting and using output dimensions in a neural network by code (not manual calculation)

I’m working on a VAE that can take in a flexible-dimension input, so I’ll like to store the pre-adaptive pooling dimensions in a variable. This is so that I can use the dimension in the decoder to help get back to the original size. I’m struggling to find a place to create this variable correctly. This is some of my code:

class Vanilla_3DVAE(nn.Module):
def __init__(self):
    super(Vanilla_3DVAE, self).__init__()
    # TODO: Parametrize the grid dimensions and channel values
    self.nn_1 = nn.Sequential(
        < layers >
        nn.AdaptiveAvgPool3d( 1 ),  # 4 -> 1x1x1
        nn.BatchNorm3d(32))
    self.nn_2 = nn.Sequential(
                              nn.Linear(32, 16),    #  32x1x1x1
                              nn.ReLU()
                            )
    self.mu_fc = nn.Linear(16,16)
    self.logvar_fc = nn.Linear(16,16)
    
    self.decoder_nn = nn.Sequential(
                                      nn.ConvTranspose3d(16,8,4,2,0),
                                      nn.BatchNorm3d(8),
                                      nn.ConvTranspose3d(8,16,4,2,0),
                                      nn.BatchNorm3d(16),
                                      nn.ConvTranspose3d(16,8,4,2,1)
                                      )

Following the suggestion of someone who helped, I separated the adaptive pooling layer from self.nn_1, and stored the pre-pooling shape in a variable called dim in the encode() function which forward() calls upon. My new forward() code is:

def forward(self, x):
    mu,logvar, dim = self.encode(x)
    z = self.reparametrize(mu, logvar)
    decoded = self.decode(z)
    return (decoded, z, mu, logvar)

My new issue is - how can I use dim in self.decoder_nn? The __init__ cannot change - I can only take in self as the parameter. I assume that this will cause issues in using dim in self.decoder_nn.

Something else that I’ve thought of doing was

self.nn_1 = nn.Sequential( ... )
self.pre-pool_shape = x.shape
self.nn_adpative_pool = nn.Sequential(...)
self.nn_2 = nn.Sequential(....)

self.mu & logvar_fc layers

self.decoder_nn = nn.Sequential( ... <using variable pre-pool_shape> ...)

This had issues because no variable x exists in the model. If using this method, how do I declare self.pre-pool_shape?

Thank you so much :slight_smile:

I would suggest creating a method which takes as input the desired shape and returns a Sequential layer.
You can call this in forward.

I don’t have too much experience with NNs so I apologize if I’m asking simple questions :frowning:. Suppose that forward() returns a Sequential layer. How can I use that with the self.decoder_nn so that the decoder can actually use the dimension?

I wasn’t sure if tagging you, @ptrblck, is okay (let me know), but do you have an idea on how to approach the issue in the post? Thank you.

I wouldn’t recommend to recreate new modules in the forward, as they would be reinitialized in each forward pass and thus not trained.
I’m not sure what the actual use case is and where these shapes would be used. Are you trying to pass them to the transposed conv layers?

Thanks for replying :). Yes I’d like to somehow pass the dimension back into self.decoder_nn = nn.Sequential( ... ) and use the dimension to help get to the original size. The code that I had to get the dimension in in encode():

class Vanilla_3DVAE(nn.Module):
    def __init__(self):
        super(Vanilla_3DVAE, self).__init__()
        self.nn_1 = nn.Sequential( ... ) # same as before except missing last 2 layers, moved below
        self.nn_adaptivePool = nn.Sequential(
            nn.AdaptiveAvgPool3d( 1 ),  # 4 -> 1x1x1
            nn.BatchNorm3d(32))
        self.nn_2 = nn.Sequential( ... ) # same as before
        self.mu_fc = nn.Linear(16,16)
        self.logvar_fc = nn.Linear(16,16)
        
        self.decoder_nn = nn.Sequential( ... ) # same as before but would like to pass the dimension here

    def encode(self, x):
        encoded = self.nn_1(x)
        prePoolShape = list( encoded.size())
        prePoolDim = prePoolShape[ 2 ] # <<<<< dimension <<<<<<<<<
        # print( prePoolDim ) 
        encoded1= self.nn_adaptivePool( encoded )
        encoded_flat = self.flatten(encoded1 )
        encoded_flat = self.nn_2(encoded_flat)
        mu = self.mu_fc(encoded_flat)
        logvar = self.logvar_fc(encoded_flat)
        return mu, logvar

    def decode(self, z ):
        z_4d = self.restore_dim(z)
        return self.decoder_nn(z_4d )

    def forward(self, x):
        mu,logvar = self.encode(x)
        z = self.reparametrize(mu, logvar)
        decoded = self.decode(z)
        return (decoded, z, mu, logvar)

I’m still unsure how prePoolDim would be used, but I would probably write custom modules to get the needed flexibility. If you want to reuse the sequential pattern, you could derive a custom nn.Sequential module and allow multiple input arguments, which would then be passed to the submodules.

I was planning to make my decoder_nn like this:

self.decoder_nn = nn.Sequential(
                                      nn.ConvTranspose3d(16,8, prePoolDim,2,0), # <<<<< dimension <<<<<<<<<
                                      nn.BatchNorm3d(8),
                                      nn.ConvTranspose3d(8,16,4,2,0),
                                      nn.BatchNorm3d(16),
                                      nn.ConvTranspose3d(16,8,4,2,1)
                                      )

The output dimensions from this first ConvTranspose3d layer should essentially be 8xDimxDimxDim.

Since I’m only allowed self in def __init__(self), would the custom nn.Sequential module idea with multiple input args work? Also, as mentioned at the bottom of the post, I had this idea but wasn’t sure if it would work:

self.nn_1 = nn.Sequential( ... )
self.prePoolShape = list( x.size()) # <<<<<<<<<<
self.prePoolDim = prePoolShape[ 2 ] # <<<<<<<< 
self.nn_adpative_pool = nn.Sequential(...)
self.nn_2 = nn.Sequential(....)

self.mu & logvar_fc layers

self.decoder_nn = nn.Sequential( ... <using variable prePoolDim> ...)

Would this idea work? I had issues trying this because no variable x exists in the model but I wasn’t sure what to put in place of x.

No, this approach won’t work, as you are trying to use a dynamic variable (prePoolDim depends on the actual input size) as the kernel size.
The nn.ConvTranspose3d layer will use the value to initialize the kernel in the desired shape, so changing the kernel shape afterwards would work, if you are directly manipulating the kernel in the forward method. However, to do so, you would need to come up with a strategy how the additional filter values should be filled or how the filter should be sliced.

Transposed conv layers accept an output_size argument to specify the desired spatial output size in case multiple values would be valid.
In case the desired output size is fixed, you could write a wrapper to set it, and then use this custom module in the nn.Sequential container.
However, since it’s dynamic I would avoid using nn.Sequential for the lack of flexibility.

1 Like

Thanks for getting back! In my case, I don’t want my desired output to be fixed – I’d like it to match the dimensions of the input, which was why I wanted to keep track of the dimension before the adaptive pooling layer.

Keeping the arguments in __init__() the same (self) – so no adding an argument for input dimension or anything – what do you suggest is the best way to write a flexible VAE that will output something with the same dimensions? Sorry if this is a really general question

I would claim it depends a bit on the expected input shapes.
I.e. in case you are using input shapes as powers of 2, reduce the spatial size by 2, and increase it later by the same factor in the transposed conv layers, I don’t think you would be running into a lot of issues.
However, for arbitrary input shapes, the upsampling layers could cause ambiguous output shapes, so that you could then pass the desired output shape as an argument to the forward method of the transposed conv layers.
To do so I would remove the nn.Sequential container and use the layers manually in the forward of the main model, which would allow you to pass the output_size argument.

1 Like