Please help me edit generator and discriminator for new size

Hi all!

I’m trying to edit a generator to input differently shaped images. At the moment, it is working for 8 x 8 images. I wish to transform them to accept “images” 8x1. I’ve already tried changing the kernel size, stride, padding etc, but I keep getting

RuntimeError: Input type (torch.cuda.DoubleTensor) and weight type (torch.cuda.FloatTensor) should be the same

Here is the generator:

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Created on Wed Jan 12 12:46:54 2022

@author: leon
"""

import torch
import torch.nn as nn
import torch.optim as optim


""
class Downscale(nn.Module):
    def __init__(self, in_size, out_size, normalize = True, dropout = 0.0):
        super(Downscale, self).__init__()
        
        model = [nn.Conv2d(
            in_size,
            out_size,
            kernel_size = 4,
            stride = 2,
            padding = 1,
            bias = False
            )]
        
        if normalize:
            model.append(
                nn.BatchNorm2d(out_size, 0.8)
                )
            
        model.append(
            nn.LeakyReLU(0.2)
            )
        
        if dropout:
            
            model.append(
                nn.Dropout(dropout)
                )
        
        self.model = nn.Sequential(*model)
        
    def forward(self, x):
        return self.model(x)
##############################################################################       
# #############################################################################   
class Upscale(nn.Module):
    def __init__(self, in_size, out_size, dropout = 0.0):
        super(Upscale, self).__init__()
        
        model =[
            nn.ConvTranspose2d(
                in_size,
                out_size,
                kernel_size = 4,
                stride = 2,
                padding = 1,
                bias = False
                ),
            nn.BatchNorm2d(out_size, 0.8),
            nn.ReLU(inplace = True)
            ]
        
        if dropout:
            model.append(
                nn.Dropout(dropout)
                )
            
        self.model = nn.Sequential(*model)
        
    def forward(self, x, skip_input):
        x = self.model(x)
        out = torch.cat((x, skip_input), dim = 1)
        return out
##############################################################################       
# #############################################################################   

class Generator(nn.Module):
    def __init__(self, features_g, num_channels):
        super(Generator, self).__init__()
        self.features_g = features_g
        self.num_channels = num_channels
        self.build()
        
        
    def build(self):
        # input: 1 x 8 x 8   
        self.down4 = Downscale(
            in_size = self.num_channels,
            out_size = self.features_g,
            dropout = 0.5
            )
        
        # input: 8 x 4 x 4
        self.down5 = Downscale(
            in_size = self.features_g,
            out_size = self.features_g * 2,
            dropout = 0.5
            )
        
        # input: 18 x 2 x 2
        self.down6 = Downscale(
            in_size = (self.features_g * 2  + self.num_channels),
            out_size = self.features_g * 4,
            dropout = 0.5
            )
        ## state: 32 x 1 x 1  ##
        
        
        
        
        
        # input: 32 x 1 x 1
        self.up1 = Upscale(
            in_size = self.features_g * 4,
            out_size = self.features_g * 2,
            dropout = 0.5
            )

        # input: 16 x 2 x 2
        self.up2 = Upscale(
            in_size = (self.features_g * 4 + self.num_channels),
            out_size = self.features_g * 1, 
            dropout = 0.5
            )
        

        ## state: 8 X 4 X 4 ##
        
        final = [
            nn.Upsample(scale_factor = 2),
            
            # input: 8 X 8 X 8
            
            nn.Conv2d(
                in_channels = self.features_g * 2, 
                out_channels = self.num_channels,
                kernel_size = 3,
                stride = 1,
                padding = 1
                ),
            
            # input: 1 X 8 X 8
            
            #nn.Tanh()
            nn.Sigmoid()
            ]
         
        self.final = nn.Sequential(*final)
            
    def forward(self, input, constraint_map):
        
       
        d4 = self.down4(input)
        d5 = self.down5(d4)
        d5 = torch.cat((d5, constraint_map), dim = 1)
        d6 = self.down6(d5)
        u1 = self.up1(d6, d5)
        u2 = self.up2(u1, d4)
        
        return self.final(u2)

    def define_optim(self, learning_rate, beta1):
        self.optimizer = optim.Adam(self.parameters(), lr = learning_rate, betas = (beta1, 0.999))
     
    @staticmethod    
    def init_weights(layers):
        classname = layers.__class__.__name__
        
        if classname.find('Conv') != -1:
            nn.init.normal_(layers.weight.data, 0.0, 0.02)
            
        elif classname.find('BatchNorm') != -1:
            nn.init.normal_(layers.weight.data, 1.0, 0.02)
            nn.init.constant_(layers.bias.data, 0)
         

    

and the discriminator:

class Discriminator(nn.Module):
    def __init__(self, latent_vector_size, features_d, num_channels):
        super(Discriminator, self).__init__()
        
        self.latent_vector_size = latent_vector_size
        self.features_d = features_d
        self.num_channels = 1 #num_channels 
        self.optimizer = None
        self.main = None
    
    @staticmethod    
    def discriminator_block(in_filters, out_filters, stride, normalize):
        
        layers = [
            nn.Conv2d(
                in_channels = in_filters,
                out_channels = out_filters,
                kernel_size = 3,
                stride = stride,
                padding = 1                
                )
            ]
        if normalize:
            layers.append(
                nn.InstanceNorm2d(out_filters)
                )
        
        layers.append(
            nn.LeakyReLU(0.2, inplace = True)
            )
        
        return layers
        
        
    def build(self):
        
        layers = []
        in_filters = self.num_channels
        
        for out_filters, stride, normalize in [(self.features_d, 2, False)]:#, (self.features_d * 2, 2, True),
                                               #(self.features_d * 4, 2, True), (self.features_d * 8, 1, True)]:
        
            layers.extend(self.discriminator_block(
                in_filters,
                out_filters,
                stride,
                normalize
                ))
            
            in_filters = out_filters
        
        layers.append(
            nn.Conv2d(
                in_channels = out_filters,
                out_channels = 1,
                kernel_size = 3,
                stride = 1,
                padding = 1
                )
            )
        self.main = nn.Sequential(*layers)
        
    def forward(self, input):
        return self.main(input)
    
    def define_optim(self, learning_rate, beta1):
        self.optimizer = optim.Adam(self.parameters(), lr = learning_rate, betas = (beta1, 0.999))
    
    @staticmethod   
    def init_weights(layers):
        classname = layers.__class__.__name__
        if classname.find('Conv') != -1:
            nn.init.normal_(layers.weight.data, 0.0, 0.02)
            
        elif classname.find('BatchNorm') != -1:
            nn.init.normal_(layers.weight.data, 1.0, 0.02)
            nn.init.constant_(layers.bias.data, 0)

I know it’s a lot to ask, but I’m utterly stuck and don’t know how to proceed.
Any help is greately appreciated :slight_smile:

The error message points to a dtype mismatch to make sure the model and inputs use the same dtype by transforming one or the other.
The common approach is to use the default float32 dtype, so cast your inputs via:

input = input.float()

which should fix the error.

Hi. Thanks for the reply. That seems to fix the problem, but for some reason, my [batch_size, 1, 8, 1] images are being interpreted as having a dimension of [batch_size, 1, 8, 20]. I’ll guess I’ll have to rewrite the whole thing and switch to Conv1d. Thanks anyaways :slight_smile:

Could you explain the issue in a bit more detail and what “interpret as having a dimension of” means?

The code I’ve pasted works for images of dimensions 8 x 8, but I’ve since discovered that my training data is not sampled representatively (due to gaps in original data), so I had to reduce the images to 8 x 1 to match the original data as closely as possible. This now leaves me with the issue of network being inadequate, so I reckon I’m better off rewriting all of it to deal with, essentially, one dimensional data (e.g. Conv1d instead of Conv2d etc.), than to modify the existing network piece by piece. The object-oriented programming of the network is more confusing to me than anything else really.

On that note, mathematically, is there any difference between applying Conv2d on images HxW = 8x1 nad applying Conv1d on data size = 8, if we keep the kernel size, padding and stride to result with the same output dimensions?

No, there is no difference in both approaches.

If using nn.Module objects is confusing, you could also use the pure functional API (e.g. via F.conv2d, F.linear etc.) and would need to store the parameters and buffers yourself.

1 Like

Thanks for the help! :smiley: