Applying a mask to ignore certain rows in model (multinomial logit)

I’m trying to write a fairly simple MNL model that allows for choice sets to be different sizes. For each choice ‘session’, the data include a single target (1 if chosen, else 0) and an (N, C) array of C features. By different size choice sets, I mean that N can vary across sessions.

However, DataLoader requires that all elements of a batch have the same shape. I’ve written the Dataset to add padding to the end and return a mask to denote which parts of the tensor are just padding and shouldn’t be used in the model. However, I’m not sure how to adjust the model to exclude the padding rows.

Below I’ve included the Dataset and Model classes (stripped down to be as minimal as possible to help convey what I’m trying to do). Any advice would be wonderful – still new to pytorch so may be missing something obvious :slight_smile:

class ChoiceDataset(Dataset):
    def __init__(self, filename, choice_var, feature_vars):
        choiceData = pd.HDFStore(filename)
        self.sessions = choiceData["sessions"]
        self.length = len(choiceData["sessions"])
        self.n_y = len(choiceData["items"]) # number of *all* items, which may or may not be individual session
        self.n_x = len(feature_vars)
        self.choiceVar = choice_var
        self.features = feature_vars
        self.data = choiceData.select("data")

    def __getitem__(self, idx):
        
        # Get all data for all alternatives in this session 
        session = self.data[self.data.index==idx]
        x_f = torch.tensor(session[self.features].values)
        
        # Add padding so all tensors returned to DL will have same dimensions, (n_y, n_x)
        x_f_padded = F.pad(input=x_f, pad=(0, 0, 0, self.n_y-x_f.shape[0]), mode='constant', value=0)
        
        # Create a boolean mask for values that aren't part of the padding (same shape as x_f_padded)
        x_f_mask = torch.cat((torch.BoolTensor(x_f.shape[0], self.n_x).fill_(0), torch.BoolTensor(self.n_y-x_f.shape[0], choice_data.n_x).fill_(1)))
        
        # Get the index of which item chosen (nn.NLLLoss takes single value, not full tensor of 0s and one 1)
        label = torch.tensor(np.argwhere(session[choice_data.choiceVar].values>0).item())
        
        return (x_f_padded, x_f_mask, label)

    def __len__(self):
        return self.length

class MultinomialLogit(torch.nn.Module):

    def __init__(self,m):
        super(MultinomialLogit,self).__init__()
        self.linear = torch.nn.Linear(m,1,bias=False)
        self.lsmax = torch.nn.LogSoftmax(1)
        
    def forward(self,x): 
        ## This currently uses *all* values, included the padding -- want to ignore padding
        y_pred=self.linear(x.float())
        y_pred=self.lsmax(y_pred)
        return y_pred.squeeze()

The best solution I’ve found so far is to ditch the mask and instead add a feature that takes value 1 if the row is padded (not real data), 0 otherwise, and then include that feature in the model. While it’s not the most efficient, in simulations doing this made it possible to recover the correct weights on the other features successfully.