I created an autoencoder with skip connections whose blocks are as follows;
class ResidualDecoderDoublingBlock(nn.Module):
def __init__(self,in_channels,out_channels):
super().__init__()
self.in_channels, self.out_channels = in_channels,out_channels
self.block = nn.Sequential(
convT2x2(self.in_channels,self.out_channels,stride=1),
nn.BatchNorm2d(self.out_channels,eps=1e-05, momentum=0.1, affine=True),
nn.PReLU(self.out_channels),
convT2x2(self.out_channels,self.out_channels,stride=2,padding=1),
nn.BatchNorm2d(self.out_channels,eps=1e-05, momentum=0.1, affine=True),
)
self.shortcut = nn.Sequential(
convT2x2(self.in_channels,self.out_channels,stride=2),
nn.BatchNorm2d(self.out_channels,eps=1e-05, momentum=0.1, affine=True),
)
self.activate = nn.PReLU(self.out_channels)
def forward(self,x):
residual = self.shortcut(x)
x = self.block(x)
x += residual
return self.activate(x)
class ResidualEncoderHalvingBlock(nn.Module):
def __init__(self,in_channels,out_channels):
super().__init__()
self.in_channels, self.out_channels = in_channels,out_channels
self.block = nn.Sequential(
conv2x2(self.in_channels,self.out_channels,stride=1),
nn.BatchNorm2d(self.out_channels,eps=1e-05, momentum=0.1, affine=True),
nn.PReLU(self.out_channels),
conv2x2(self.out_channels,self.out_channels,stride=2,padding=1),
nn.BatchNorm2d(self.out_channels,eps=1e-05, momentum=0.1, affine=True),
)
self.shortcut = nn.Sequential(
conv2x2(self.in_channels,self.out_channels,stride=2),
nn.BatchNorm2d(self.out_channels,eps=1e-05, momentum=0.1, affine=True),
)
self.activate = nn.PReLU(self.out_channels)
def forward(self,x):
residual = self.shortcut(x)
x = self.block(x)
x += residual
return self.activate(x)
where conv2x2 is Conv2d(kernel_size=2,bias=False), convT2x2 is ConvTranspose2d(kernel_size =2,bias=False)
I train the model I made via chaining these blocks about 25000 iterations each of which has minibatch size of 64, the latent size is [2048,1,1] per image, and loss is MSE (original pytorch implementation). I use adam optimizer with learning rate of 0.001. if my training set is small (~18.000 samples), my model overfits and I get crisp images which is fine for now. However if my training set is large (~260.000 samples) after 90.000 iterations (not epochs), the output is as follows;
the input ;
the output;
This is true for every output image. left side is either crisp, or blurred negligibly but the right side is blurred too much (as in unrecognizable or too much information loss), the reconstruction loss does not decrease after about 60.000 iterations. Decreasing learning rate 10 fold did not help.
I don’t know if this is due to an inherit design flaw of mine which shows up when the dataset is big, or something else.
Any solutions, or theories as to why?