Custom model for multiple modalities

Hello everybody,

Im a very new PyTorch user, and I have a question which might be trivial but maybe not. Im trying to implement my own model for fusing multiple modalities, e.g. RGB and optical flow. What I want to do is to use a well known model (e.g. VGG) as my base model, to create a two-stream network. As I was reading in the PyTorch documentation I could do something like:

    class Model(nn.Module):
        def __init__(self):
            super(Model, self).__init__()
            self.rgb = torchvision.models.vgg16(True)
            self.optical_flow = torchvision.models.vgg16(True)

where later on, I will first modify the optical flow net in the first conv layer to take as input channels consecutive optical flow images, I will remove the last two fully connected layers, I will concatenate the features of the first fully connected layer of the two nets and add on top two new fully connected layers.

My problem is that I will not use only optical flows and RGBs but other modalities as well (each time I will use a combination of 2), where the user will be able to choose which modalities to use. So I was thinking to do something like

    class Model(nn.Module):
        def __init__(self, modalities):
            #modalities argument is a list of strings which indicates which modalities to be fused
            super(Model, self).__init__()
            self.base_models = {}
            for m in modalities:
                 self.base_model[m] = torchvision.models.vgg16(True)

and later on modify each network accordingly. Is something like this possible to be done, i.e. to use a dictionary which will contain different models as submodules of the whole module? Im asking this because the documentation indicates to assign submodules to regular attributes and not to dictionaries. If this is not possible is there an alternative that you can suggest?

I hope I explained well the situation. Thank you in advance.

If the models are separate from each other, for instance, not sharing components, you can just simply store them in a regular python dictionary rather than in a nn.Module class.

If you need them to be submodule of a nn.Module containing class and want to store them in ways other than direct attributes (e.g. a dictionary attribute), remember to use this method to bind them to the container:

Thank you for your answer which is indeed very helpful. The models will not share parameters but will be concatenated and continue as a single network with common fully connected layers. So, there will be two independent submodels which then continue as a single model. Does this situation fall in the first case which you described or the second one? Thank you.

Then they make sense to be put into a containing nn.Module class (2nd case) :slight_smile:. I recommend trimming the model (removing fc layers you don’t need) in init to reduce memory usage :slight_smile:

If you mean in the constructor I already do it this way. So add_module should also be done in the constructor?

add_module should be in constructor. I was saying that ideally you should do some surgery to the vgg models before adding them as submodules.

Thank you so much! Last question and I will stop being a pain. When you say to do the model surgery before adding them as submodules you mean to do the surgery still in the constructor but before using add_module right?

No, I’m happy to help.

And yes, your understanding is correct.

Thank you so much! I 've been in other DL libaries forums too, and this one is really responsive! :smiley:

Do you have a working example for this? I am also interested in this implementation. But I cannot find an example where you can fuse these two networks. Thanks.