nn.Module with multiple inputs

Hey,
I am interested in building a network having multiple inputs. I understand that when calling the forward function, only one Variable is taken in parameter. I have two possible use case here :

  • the same image at multiple resolutions is used
  • different images are used

I would like some advice to design a nn.Module in the same fashion as alexnet for example.
I have no idea how to :

  • give multiple inputs to the nn.Module
  • join the fc layers together

I am following the example of imagenet, which looks like this :

class SimpleConv(nn.Module):
    def __init__(self, num_classes):
        super(SimpleConv, self).__init__()
        self.features = nn.Sequential(
            nn.Conv2d(3, 64, kernel_size=11, stride=4, padding=2),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2),
            nn.Conv2d(64, 192, kernel_size=5, padding=2),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2),
            nn.Conv2d(192, 384, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(384, 256, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(256, 256, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2),
        )
        self.classifier = nn.Sequential(
            nn.Dropout(),
            nn.Linear(256 * 6 * 6, 4096),
            nn.ReLU(inplace=True),
            nn.Dropout(),
            nn.Linear(4096, 4096),
            nn.ReLU(inplace=True),
            nn.Linear(4096, num_classes),
        )

    def forward(self, x):
        x = self.features(x)
        x = x.view(x.size(0), 256 * 6 * 6)
        x = self.classifier(x)
        return x


def simple_conv(pretrained=False, num_classes=140):
    model = SimpleConv(num_classes)
    # if pretrained:
    #     model.load_state_dict(model_zoo.load_url(model_urls['alexnet']))
    return model

I hope it’s clear :slight_smile:
Thanks

1 Like

Maybe I am mistaking, but I think the magic should happen in the forward call where the input is a tensor, not a Variable as I was thinking ?
For the second point, about merging the fc layers, I guess I should sum the layers outputs to a final layer ?

1 Like

Hi,

You can pass multiple inputs to the forward call of the network, that is not a problem, just pass a Variable and you will be fine.
About merging the fc layers, you can do any operation you want, for example concatenating the outputs (via torch.cat([res1, res2],1)), summing them, etc.
Here is a simplified example

class SimpleConv(nn.Module):
    def __init__(self):
        super(SimpleConv, self).__init__()
        self.features = nn.Sequential(
            nn.Conv2d(1, 1, kernel_size=3, stride=1, padding=1),
            nn.ReLU(inplace=True),
        )

    def forward(self, x, y):
        x1 = self.features(x)
        x2 = self.features(y)
        x = torch.cat((x1, x2), 1)
        return x
22 Likes

oh nice !! Thanks @fmassa :slight_smile:

Is there a way in which we can actually put the x.view(x.size(0), 256 * 6 * 6) operation INSIDE the nn.Sequential? This is been bugging me for some time…thanks

No, there’s not. We don’t recommend that. Just write a custom container, with two sequential parts and a reshape in the middle. See torchvision models.

Would someone please explain what this function does? Does a pre-trained model means that I can just use its weights right away without doing anything? Thanks for the help.

def simple_conv(pretrained=False, num_classes=140):
    model = SimpleConv(num_classes)
    # if pretrained:
    #     model.load_state_dict(model_zoo.load_url(model_urls['alexnet']))
    return model

Yes, pretrained models are ones that have been trained by someone earlier and that you can use in different applications.

To the second point of the question, concatenating features is not okay because the tensor’s size will grow hence not compatible with classifier anymore.

What do you mean with:

…just pass a Variable and you will be fine.

? Can you show an example of usage in the train loop? Thanks a lot

But changing the definition of forward would constrain you, as you might not be able to use functions that assume forward takes 1 input like the tensorboardX library.
Maybe you could just pass a dictionary of your inputs.

Hi there,

Sorry for jumping in, but I am also trying to write modules that can accept multiple inputs. Since I want them to be flexible with regards to the number of inputs, I tried passing a list of Variables() to the forward() method. Here is an example among other similar modules:

class Addition(Aggregation):
    """
    Add two input tensors, return a single output tensor of same dimensions. If input and output have different sizes,
    use largest in each dimension and zero-pad or interpolate (spatial dimensions), or convolve with a 1x1 filter
    (number of channels)
    """
    def __init__(self, in_channels: list, pad_or_interpolate: str = 'pad', pad_mode: str = 'replicate', 
        interpolate_mode: str = 'nearest'):

        assert pad_or_interpolate in ['pad', 'interpolate'], \
        "Error: Unknown value for `pad_or_interpolate` {}".format(pad_or_interpolate)

        super(Addition, self).__init__()
        self.ch_align = ChannelAlignment(in_channels)  # use 1x1 convolution to align n_channels

        if pad_or_interpolate == 'pad':
            self.sz_align = partial(self.align_sizes_pad, mode=pad_mode)
        else: 
            self.sz_align = partial(self.align_sizes_interpolate, mode=interpolate_mode)

    def forward(self, inputs: list):
        """
        Performs element-wise sum of inputs. If they have different dimensions, they are first adjusted to
        common dimensions by 1/ padding or interpolation (h and w axes) and/or 2/ 1x1 convolution.
        :param inputs: List of torch input tensors of dimensions (N, C_i, H_i, W_i)
        :return: A single torch Tensor of dimensions (N, max(C_i), max(H_i), max(W_i)), containing the element-
            wise sum of the input tensors (or their size-adjusted variants)
        """
        inputs = self.sz_align(inputs)  # Perform size alignment
        inputs = self.ch_align(inputs)  # Perform channel alignment
        stacked = torch.stack(inputs, dim=4)  # stack inputs along an extra axis (will be removed when summing up)
            
        return torch.sum(stacked, 4, keepdim=True).squeeze(4)

However I am getting weird errors:

  • Models using these modules do not train if they are more than a few layers deep (accuracy does not increase and loss is “infinite”)
  • Sometimes they crash and I get error messages such as
RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED

or sometimes:

  File "C:\Users\Luc\Miniconda3\envs\pytorch\lib\site-packages\torch\autograd\__init__.py", line 90, in backward
    allow_unreachable=True)  # allow_unreachable flag
RuntimeError: CUDA error: an illegal memory access was encountered

I am not sure the issue comes from the fact that I am passing lists to forward() but the reason why I suspect this is that when I try viewing my models with pytorch-summary, I get the following message:

  File "C:\Users\Luc\Miniconda3\envs\pytorch\lib\site-packages\torchsummary\torchsummary.py", line 19, in hook
    summary[m_key]["input_shape"] = list(input[0].size())
AttributeError: 'list' object has no attribute 'size'

(even though testing the forward pass with a simple tensor returns no error).

I am trying to generate CNNs automatically so it has a lot of boilerplate code which makes it difficult for me to provide a simple reproducible example, but I hope you can assist!

Many thanks