From 2D to 3D using convolutional autoencoder

soshishimada · November 19, 2017, 3:23am

I’d like to reconstruct 3D object from 2D images.
For that, I try to use convolutional auto encoder. However, in which layer should I lift the dimensionality?

I wrote a code below, however, it shows an error “RuntimeError: invalid argument 2: size ‘[1 x 1156 x 1156]’ is invalid for input of with 2312 elements at pytorch-src/torch/lib/TH/THStorage.c:41”

    class dim_lifting(nn.Module):
        def __init__(self):
            super(dim_lifting, self).__init__()
            self.encode = nn.Sequential(
                nn.Conv2d(1, 34, kernel_size=5, padding=2),
                nn.MaxPool2d(2),
                nn.Conv2d(34, 16, kernel_size=5, padding=2),
                nn.MaxPool2d(2),
                nn.Conv2d(16, 8, kernel_size=5, padding=2),
                nn.MaxPool2d(2),
                nn.LeakyReLU()
            )
    
            self.fc1 = nn.Linear(2312, 2312)
            self.decode = nn.Sequential(
                nn.ConvTranspose3d(1, 16, kernel_size=5, padding=2),
                nn.LeakyReLU(),
                nn.ConvTranspose3d(16, 32, kernel_size=5, padding=2),
                nn.LeakyReLU(),
                nn.MaxPool2d(2))
    
        def forward(self, x):
            out = self.encode(x)
            out = out.view(out.size(0), -1)
            out = self.fc1(out)
            out = out.view(1, 1156, 1156)
            out = self.decode(out)
            return out

ptrblck · November 20, 2017, 12:00pm

You are trying to reshape your fc1 output of size 2312 to (1, 1156, 1156) which is 1156*1156=1336336.

You could try to change the linear output to self.fc1 = nn.Linear(2312, 1024) and corresponding to this in the forward pass out = out.view(out.size(0), 1, 32, 32).