Input has less dimensions


#6

I tried with your code, I was getting Input mismatch error.
clf_outputs will have two tensors of shape (2048,751) and (2048,1360)

I did this

def forward(self,x):
        x = self.base(x)
        x = F.avg_pool2d(x,x.size()[2:])
        f = x.view(x.size(0),-1)         # (Features) 
        clf_outputs = {}
        num_fcs = 2
        
        for i in range(num_fcs):
            clf_outputs["fc%d" %i] = getattr(self, "fc%d" %i)(f)

        return clf_outputs,f

I get the same error, input has less dimensions


#7

I had a look at your code and there still were some issues. This should work now:


class augmented1(nn.Module):
    def __init__(self,ni):
        super(augmented1, self).__init__()
        self.conv1 = conv_layer(ni,ni//2,kernel_size=1)
        self.conv2 = conv_layer(ni//2,ni,kernel_size=3)
        self.classifier = nn.Linear(ni*7*7,751)

    def forward(self,x):
        x = self.conv2(self.conv1(x))
        x = x.view(x.size(0), -1)
        return self.classifier(x)



class augmented2(nn.Module):
    def __init__(self,ni):
        super(augmented2, self).__init__()
        self.conv1 = conv_layer(ni,ni//2,kernel_size=1)
        self.conv2 = conv_layer(ni//2,ni,kernel_size=3)
        self.classifier = nn.Linear(ni*7*7,1360)

    def forward(self,x):
        x = self.conv2(self.conv1(x))
        x = x.view(x.size(0), -1)
        return self.classifier(x)


class hybrid_cnn(nn.Module):
    def __init__(self,**kwargs):
        super(hybrid_cnn,self).__init__()
        resnet = models.resnet50(pretrained=False)
        self.base = nn.Sequential(*list(resnet.children())[:-2])
        
        setattr(self,"fc0",augmented1(2048))
        setattr(self,"fc1",augmented2(2048))


    def forward(self,x):
        x = self.base(x)
        #x = F.avg_pool2d(x,x.size()[2:])
        #print(x.shape)
        clf_outputs = {}
        num_fcs = 2
        for i in range(num_fcs):
            clf_outputs["fc%d" %i] = getattr(self, "fc%d" %i)(x)

        return clf_outputs, x

model = hybrid_cnn()
n1 = 2048
x = torch.randn(1, 3, 224, 224)
model(x)

In your last code snippet you still passed a flat tensor to the augmentedX layers, which cannot work, since they expect a 4-dimensional input ([batch_size, channels, height, width]).
I removed this part.
Also you are using F.avg_pool(x, x.size()[2:]) which yields an output of dimension [batch_size, channels, 1, 1].
This cannot work in your sub-models, since you are trying to use a kernel size of 3 in one point.
I removed this part as well.
Now you pass an activation of [batch_size, 2048, 7, 7] to the sub-modules.
I had to increase the in_features in both Linear layers to match this input.


(Quang Nguyen) #8

@kl_divergence I just noticed your problem and will keep track on it. Hope that you can solve your problem soon.

@all I’m sorry to interrupt your conversation. I am wondering why we use setattr method?

setattr(self,"fc0",augmented1(2048))
setattr(self,"fc1",augmented2(2048))

but not like the normal way below:

self.fc0 = augmented1(2048)
self.fc1 = augmented2(2048)

Is there any difference between two above implementation for network definition?
Thanks.


#9

The second approach would be the “standard” one.
If you have a lot of additional paths, the first one could be could be simpler, since you can just use a for loop in forward as @kl_divergence is doing.


#10

Thank you so much for helping me. I have learned what was going wrong. Just one question, the model expects a Tensor of shape [batch size,3,224,224]. But I’m passing a Tensor of shape [batch_size,3,256,128] . How can I make model accept tensor of different dim ?


#11

To complement what @ptrblck has said, any sub module that you aren’t declaring as attribute won’t be added as module of the model and so it’s parameters won’t be trained.


#12

You could try to upsample your input to [3, 224, 224] using one upsample approach.
You could do this before passing your input to the model, but since the width is approx. only half of the desired width, your model might perform bad. However, I think it is worth a try:

x = torch.randn(1, 3, 256, 128)
up = nn.Upsample(size=224, mode='bilinear')
x = up(x)
print(x.shape)

Just define the layer in your __init__ and call if before passing it to the resnet.

Alternatively, your input is not too small for the used layers of resnet, so you could still pass it as [3, 256, 128] and set in_features in augmentedX to (ni * 8 * 4), where ni=2048.


(Quang Nguyen) #13

@kl_divergence @ptrblck Thanks all for replying my question.

@kl_divergence Regarding to your answer, it’s true to say that “sub module that you aren’t declaring as attribute won’t be added as module of the model and so it’s parameters won’t be trained.”. But when we implement those lines in __init__ method, two submodules augmented1(2048) and augmented2(2048) are declared as the attributes of the module already.
Is it correct? If so, both of the approaches are correct. :smiley:

def __init__(self, ...):
    self.fc0 = augmented1(2048)
    self.fc1 = augmented2(2048)

#14

I’m not sure, it maybe correct.


#15

Thank you so much. Upsampling worked. But I had one query regarding the performance. I created this arch to improve model’s performance drastically. I shared the results with you as well. As per what you told me that Upsampling may degrade performance. I tried your other method of increasing in_features , but I ran out of memory cause I already have 277 M parameters and increasing it by a factor of 32 just increases parameters so much. If you have any other way to handle this, I’d glad to learn . Thanks again !


#16

@quanguet
Yes, both are correct and your example would be the standard way.
Having a lot of repeating paths, the current approach in this thread might be cleaner, but it depends on your coding style I guess. :wink:

@kl_divergence
You can often observe a worse accuracy if you pass a different sized input compared with the inputs that were used to train the model. I’m not sure, if upsampling your input by a factor of approx. x2 in one spatial dimension will hurt your performance. You should just try it and see, if it’s working.

You could try to add a pooling layer in both augmentX modules, e.g. x = F.avg_pool2d(x,x.size()[2:]) so that your linear layers would get in_features=ni.


#17

I evaluated the model for two days with Upsampling, it seems that loss goes down than before but accuracy hasn’t improved at all (CMC)


#18

Could you please suggest alternatives for Upsampling, How about I add trainable parameters in the axis which has less dimensions instead of adding zeros (upsampling). I have read that nn.ConvTranspose2d can help ? Will it be helpful. My model hasn’t improved much even after making the architecture better


#19

The other way would be to skip the upsampling and just pass your input as it is to the model.
Since the size of your input is large enough, you would just have to change the in_features of the Linear layer.


#20

I tried this method also

class augmentedX(nn.Module):
    def __init__(self,ni):
        super().__init__()
        self.conv1 = conv_layer(ni,ni//2,kernel_size=1)
        self.conv2 = conv_layer(ni//2,ni,kernel_size=3)
        self.classifier = nn.Linear(ni*7*7,1360)

    def forward(self,x):
        x = self.conv2(self.conv1(x))
        x = x.view(x.size(0),-1)
        x = F.avg_pool2d(x,x.size()[2:]) 
        return self.classifier(x)

In forward pass:

setattr(self,"fc0",augmented1(8*4*2048))

It goes out of memory. I have 8 GB RAM on my GPU server.


#21

Did it work before?
7*7*2048 > 8*4*2048, so you shouldn’t run out of memory using a smaller layer.


#24

Tried this one as well, results haven’t improved at all.


#25

I’m not sure, how to help, since it seems to be related to your general use case.
Have you tried different learning rates? Are you freezing the base model or are you fine-tuning it as well?
How are you handling the loss? Do you have different loss functions for your outputs?


#26
  1. I have tried different learning rates
  2. I am not freezing the base model yet, only augmented 2 for now.
  3. I am using Cross Entropy now, although i have tried triplet+cross entropy as well.
  4. No I don’t have different loss functions.

(Ahmed Mamoud) #29

Hi,
In case of ResNet with stride convolutions (i.e. the input and output differ in dimensional). What is the best practice to add input with output ? or it doesn’t matter if I used trivial padding for the output tensor ?

Best