Python dictionary in the model not trained?

hongtaesuk · December 19, 2019, 5:50am

Hey guys,
can anyone of you could explain why such things below happened?

 def __init__(self, config):
        super(BertForMultitask, self).__init__(config)
        self.bert = BertModel(config)
        self.dropout = nn.Dropout(config.hidden_dropout_prob)
        self.apply(self.init_weights)
        self.multitask_ratio = self.set_update_ratio(config, 0.7)

        #####################
        ####    Part A. This works     ####
        #####################
        self.classifier_student = nn.Linear(config.hidden_size, 2)
        self.classifier_squad = nn.Linear(config.hidden_size, 2)
        self.classifier_usermatch = nn.Linear(config.hidden_size, 2)
        self.classifier_stsb = nn.Linear(config.hidden_size, 1)

        ########################
        ####    Part B. Below doesn't     ####
        ########################        
        self.classifier = {}
        for task in config.tasks:
             if task in ['student_response', 'squad', 'user_match']:
                 self.classifier[task] = nn.Linear(config.hidden_size, self.config.num_labels)
             elif task in ['sts-b']:
                 self.classifier[task] = nn.Linear(config.hidden_size, 1)

I’ve been implementing multitask model that deals with several different tasks
and I’ve trained with each of different Linear layer for each task after the inputs
pass through a BERT model.

However, the problem here is that when I defined the Linear layers for each task
just like the part B - dictionary holding each linear layers for each tasks - then it seems
the model doesn’t hold the trained parameters of that layer after it completes the training.
Part A where each layer is defined separately, not aggregated in a single python dictionary,
works just fine.

Is there any secret behind this means of defining layers?
Anyone knows any clue?

ptrblck · December 19, 2019, 6:42am

Plain Python containers, such as list and dict won’t be properly registered inside your module, so use nn.ModuleDict in that case (or nn.ModuleList instead of list).

hongtaesuk · December 19, 2019, 7:00am

Thanks a lot for the clear reply!!
Should I just fix the normal dictionary like

self.classifier = nn.ModuleDict({})

then add elements??

And what do you mean by ‘not properly registered inside the module’??
Does that mean that the model won’t be able to properly record operations and data flows
for that variable??

ptrblck · December 19, 2019, 7:13am

Just create it via:

self.classifier = nn.ModuleDict()

No, Autograd will still track all operations on parameters etc. However, the parameters (of the child modules) won’t be registered internally, so that model.parameters() won’t return them (they might be thus missing in when you pass the parameters with this call to the optimizer). Also model.to() won’t grab these parameters and they will stay on the initial device.

hongtaesuk · December 19, 2019, 7:33am

Woah…!
goosebumps!! haha
model.to() didn’t grab those parameters and it actually arose a operation confliction
in two different devices (cpu & gpu).

I don’t think I understood the whole things you’ve said,
but I guess what I produce as a parent module which holds the normal python dict,
produces - somehow, what I didn’t know in advance - child modules internally
but those normal python dicts are not registered properly,
is this correct?

Sorry to bother you.

ptrblck · December 19, 2019, 7:41am

Yes, your explanation is correct.
Therefore you could still use lists and dicts, if you explicitly don’t want to register the content inside the module.
A bit unrelated to this, but self.param = nn.Parameter(...) and self.register_buffer will register the parameters/buffers properly, while a simple assignment of a tensor (self.tensor = torch.tensor(...)) will not.

Feel free to ask, in case something is unclear or you want more information.

hongtaesuk · December 24, 2019, 3:53am

Awesome!!!
so good to get such a quality answers
Thanks a lot ptrblck!!
and merry Christmas