Hello, I am having difficulties with batch processing. There is the detail:

I extracted embedding from BERT for my text data which is (batch_size, sequence_length, embed_dim), to be specific, for a batch of 16, it is (16, 512, 768). I passed this to a linear layer (768, 300) to make the dimension of 300. In this step, I have the layer shape is (16,512, 300), I converted the dimension to 300 because, I want to do element wise multiplication with another embedding of dim 300. Now, I want to get an out put of dimension (16, 1) which is for each sentence I want one value. I used output layer as out=nn.Linear(300, 1) but I get the out shape of (16, 512, 1) where as I need (16,1) which means for each sentence I want to have 1 score.

Following is the code deatil.

class Gate(nn.Module):

def **init**(self, input_dim):

super(Gate, self).**init**()

```
self.input_dim=input_dim
self.weights=nn.Parameter(torch.Tensor(self.input_dim), requires_grad=True)
#nn.init.xavier_uniform_(self.weights)
self.sigmoid=nn.Sigmoid()
self.out=nn.Linear(input_dim, 1)
```

def forward(self, x1,x2):

```
gate_vec=self.sigmoid(self.weights)
gating_f=(gate_vec*x1)+(1.0-gate_vec)*x2
output=self.out(gating_f)
return output
```

Please provide your suggestion. I highly appreciate your inputs and thank you very much!