Parameter Lists in Pytorch

(Aredd Cmu) #1

My model has many parameters for each data value in the data set. I initialize them in my model constructor as follows.

        init_lev_sms = []
        init_seas_sms = []
        init_seasonalities = []

        for i in range(num_series):
            temp_seas = []
            for j in range(config['seasonality']):

        self.init_lev_sms = nn.ParameterList(init_lev_sms)
        self.init_seas_sms = nn.ParameterList(init_seas_sms)
        self.init_seasonalities = init_seasonalities

And then in my forward I access them as follows
Logistic is nn.Sigmoid
idx is a list of indices from for the shuffled batch so we can look up the right parameters
init_lev_sms and seas_sms are showing up when I print the model parameters
init_seasonalities is not showing up
none of the gradients are being accumulated with the operations that I apply after this initialization

lev_sms = self.logistic(torch.stack([self.init_lev_sms[idx] for idx in idxs]).squeeze(1))
        seas_sms = self.logistic(torch.stack([self.init_seas_sms[idx] for idx in idxs]).squeeze(1))
        init_seasonalities = torch.stack([self.init_seasonalities[idx] for idx in idxs])

        seasonalities = []
        # prime seasonality
        for i in range(self.config['seasonality']):
            seasonalities.append(torch.exp(init_seasonalities[:, i]))
        seasonalities.append(torch.exp(init_seasonalities[:, 0]))

Since I’m indexing the object itself it feels like I should be fine. How can I go about debugging this issue (i.e. visualizing the gradients flow backward in the network)


The reason that init_lev_sms and seas_sms are showing up as model parameters while init_seasonalities does not, is that you are rightly using a nn.ParameterList for the former ones while a plain Python list for the latter.
Change the last line to:

self.init_seasonalities = nn.ParameterList(init_seasonalities)

and it should also be shown in the parameters of your model.

(Aredd Cmu) #3

Thanks @ptrblck! That did the trick in terms of adding the parameters to the model. Should indexing and stacking as shown in the second code block be a problem in terms of back propagating the loss through the network? All the .grads on the parameters are still None at backward(). What is the best way to debug where the gradient might be getting broken if this setup should work?


I found the issue. I was using a copy function and thought it would somehow copy the gradients… it doesn’t :slight_smile:

If you ever want to debug the gradient chain, a crude way is to go inside the object and follow the gradient_fn attribute of the tensor all the way back to your parameters.

(auk) #4

Hay, can you please explain more what does it mean? I also want to make sure my gradient chain is correct.
Should I check for each layer if it has the grad_fn or not? Will it be enough to make it sure?