How to predict one single example using a pre-trained model with bach normalization layer?

Mengran_Wang · June 19, 2020, 3:42pm

Hi all,
This may be not a difficult question, but it has bothered me a few days! Really need some help.
I have trained a model with following structure:

RNNModel_GRU(
  (embed): Embedding(6, 1, padding_idx=1)
  (gru): GRU(1, 64, bidirectional=True)
  (linear1): Linear(in_features=128, out_features=32, bias=True)
  (linear2): Linear(in_features=32, out_features=1, bias=True)
  **(b_n): BatchNorm1d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)**
  (dropout): Dropout(p=0.2)
  (relu): ReLU()
)

Now I’m going to load this model and predict new examples, since I’ve trained the model by feeding 32 data per batch, when I try a single new data to the model, an error occurs:

In [259]: scores = model(preprocess_seq)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-259-3ccf22219850> in <module>()
----> 1 scores = model(preprocess_seq)

/picb/rnomics3/wangmr/INSTALL/miniconda3/envs/GPU/lib/python2.7/site-packages/torch/nn/modules/module.pyc in __call__(self, *input, **kwargs)
    487             result = self._slow_forward(*input, **kwargs)
    488         else:
--> 489             result = self.forward(*input, **kwargs)
    490         for hook in self._forward_hooks.values():
    491             hook_result = hook(self, input, result)

<ipython-input-219-9bb382769aa2> in forward(self, text)
     28         linear1 = self.relu(linear1)
     29         linear1 = self.dropout(linear1) # [32 * 32]
---> 30         **b_n = self.b_n(linear1) # 32 * 32**
     31         linear2 = self.linear2(b_n)
     32

/picb/rnomics3/wangmr/INSTALL/miniconda3/envs/GPU/lib/python2.7/site-packages/torch/nn/modules/module.pyc in __call__(self, *input, **kwargs)
    487             result = self._slow_forward(*input, **kwargs)
    488         else:
--> 489             result = self.forward(*input, **kwargs)
    490         for hook in self._forward_hooks.values():
    491             hook_result = hook(self, input, result)

/picb/rnomics3/wangmr/INSTALL/miniconda3/envs/GPU/lib/python2.7/site-packages/torch/nn/modules/batchnorm.pyc in forward(self, input)
     58     @weak_script_method
     59     def forward(self, input):
---> 60         self._check_input_dim(input)
     61
     62         exponential_average_factor = 0.0

/picb/rnomics3/wangmr/INSTALL/miniconda3/envs/GPU/lib/python2.7/site-packages/torch/nn/modules/batchnorm.pyc in _check_input_dim(self, input)
    167         if input.dim() != 2 and input.dim() != 3:
    168             raise ValueError('expected 2D or 3D input (got {}D input)'
--> 169                              .format(input.dim()))
    170
    171

**ValueError: expected 2D or 3D input (got 1D input)**

The debug info shows maybe some problems on input data size.
Indeed, the size of batch.text is [Length32][32 = Batch Size] during training, but the size of the single new data is [Length1]. I’ve tried to repeat the single data 32 times to match the batch size, but this really looks silly! How can I break the constrains of batch size, in an elegant way?

Nikronic · June 19, 2020, 3:48pm

Hi,

When you have a single data, still it means you have a batch of data, but size of batch is 1. Note that in almost all math libraries, rows correspond to observations(samples) and columns correspond to features, so when you pass a single output with length of L, model considers it as L different samples with 1 feature. To solve this issue, just add another dimension to your input.

Can be replaced by:

scores = model(preprocess_seq.unsqeeuze(0))

Bests

Mengran_Wang · June 22, 2020, 3:32am

Thanks for your reply! But it seems doesn’t work. The reason is that the shape of my data followed the format of dataset loaded by Bucket Iterator.

As annotated in the above, a batch when training is tensor with shape [length_seq * batch number],here batch number equals to 32 ,however, an input for prediction is tensor with shape [length_seq * batch], here batch number equals to 1. The shapes between training and prediction are matched but the specific number are different.
when I use preprocess_seq.unsqueeze(0), the shape is [1 * length_seq * 1], the dimension is not matched with the model input.

chetan_patil · June 22, 2020, 4:38am

Previously, the size was [length_seq, 1].
After, unsqueezing it became [1,length_seq,1].
You should use scores = model(preprocess_seq.reshape(1,-1))
This will make preprocess_seq of shape [1, length_seq] .

Mengran_Wang · June 22, 2020, 8:06am

Thanks for your suggestion! I tried and the output of model(preprocess_seq.reshape(1,-1)) is [length_seq, 1], The length_seq dim is wrongly recognized as batch dim. as a result, the model computes each character in my seq as a seperated seq. As shown here,

still waiting for solution

chetan_patil · June 22, 2020, 8:24am

If previously, the preprocess_seq was of shape [length_seq, 1],
then preprocess_seq.reshape(1,-1) will make it of shape [1, length_seq].
How come it changes back to [lenth_seq,1] ?

Mengran_Wang · June 22, 2020, 8:36am

Here I post the structure of the model

class RNNModel_GRU(nn.Module):
    def __init__(self, vocab_size, embedding_size, output_size, pad_idx, hidden_size, dropout):
        super(RNNModel_GRU, self).__init__()
        self.embed = nn.Embedding(vocab_size, embedding_size, padding_idx=pad_idx)
        self.gru = nn.GRU(embedding_size, hidden_size, bidirectional=True, num_layers=1)
        self.linear1 = nn.Linear(hidden_size*2, output_size[0])
        self.linear2 = nn.Linear(output_size[0], output_size[1])
        self.b_n = nn.BatchNorm1d(32)
        self.dropout = nn.Dropout(dropout)
        self.relu = nn.ReLU()
        #self.sigmoid = nn.Sigmoid()
        #self.softmax = nn.Softmax(dim=1)
  
    
    def forward(self, text):
        embedded = self.embed(text.long()) # [seq_len, batch_size, embedding_size]
        embedded  = self.dropout(embedded)
        output, hidden = self.gru(embedded)
        # output is [Seqlen * 32(batchSize) * 64(hiddenSize)]
        # hidden is [4 * 32 * 64]
        # cell is [4* 32 * 64]

        # hidden: 2 * batch_size * hidden_size
        hidden = torch.cat([hidden[0], hidden[1]], dim=1)
        # hidden: 32 * 128
        hidden = self.dropout(hidden.squeeze())
        linear1 = self.linear1(hidden)
        linear1 = self.relu(linear1)
        linear1 = self.dropout(linear1) # [32 * 32]
        b_n = self.b_n(linear1) # 32 * 32
        linear2 = self.linear2(b_n)

        return linear2 # linear2 shape: 32

the first step of my model is embedding, I think the shape after embeddding is [seq_len, batch_size, embedding_size], here is [1,14,1]. the original seq_len is recognized as batch_size after embedding layer.

In [130]: embed(preprocess_seq.reshape(1,-1).long())
Out[130]:
tensor([[[ 0.4355],
         [-0.0817],
         [ 0.2178],
         [ 2.4970],
         [ 2.4970],
         [ 0.2178],
         [-0.0817],
         [ 0.4355],
         [ 0.2178],
         [ 0.2178],
         [-0.0817],
         [-0.0817],
         [-0.0817],
         [-0.0817]]], grad_fn=<EmbeddingBackward>)

In [131]: embed(preprocess_seq.reshape(1,-1).long()).size()
Out[131]: torch.Size([1, 14, 1])

Andrew_OBrien · November 29, 2020, 8:41pm

@Mengran_Wang Have you found a solution to this issue? I am facing the same problem and it looks like most of the answers here don’t directly give a solution.
A workaround I am using for the moment is to feed my network with the single sample repeated batch_size amount of times… Surely there is a better way!?

Krishna_Singh · January 5, 2022, 5:12pm

@Andrew_OBrien @Mengran_Wang Did either of you find any solution for this?I too have been doing the same thing until now,that is.repeating the same sample batch_size number of times.