Batch processing

Hello, I am having difficulties with batch processing. There is the detail:
I extracted embedding from BERT for my text data which is (batch_size, sequence_length, embed_dim), to be specific, for a batch of 16, it is (16, 512, 768). I passed this to a linear layer (768, 300) to make the dimension of 300. In this step, I have the layer shape is (16,512, 300), I converted the dimension to 300 because, I want to do element wise multiplication with another embedding of dim 300. Now, I want to get an out put of dimension (16, 1) which is for each sentence I want one value. I used output layer as out=nn.Linear(300, 1) but I get the out shape of (16, 512, 1) where as I need (16,1) which means for each sentence I want to have 1 score.
Following is the code deatil.

class Gate(nn.Module):
def init(self, input_dim):
super(Gate, self).init()

  self.input_dim=input_dim

  self.weights=nn.Parameter(torch.Tensor(self.input_dim), requires_grad=True)
  #nn.init.xavier_uniform_(self.weights)
  self.sigmoid=nn.Sigmoid()
  
  self.out=nn.Linear(input_dim, 1)

def forward(self, x1,x2):

gate_vec=self.sigmoid(self.weights)

gating_f=(gate_vec*x1)+(1.0-gate_vec)*x2

output=self.out(gating_f)

return output

Please provide your suggestion. I highly appreciate your inputs and thank you very much!

On which level do you want to get ‘gated effect’?
In your current implmentation, ‘gated effect’ is applied to word embedding, not sequence.

If you want to choose some words in sentence(sequence) with gate function, the implementation should be like this.

import torch
import einops


class Gate(torch.nn.Module):
    def __init__(self, axis_dim=512, axis=1):
        super(Gate, self).__init__()
        self.axis_dim = axis_dim
        self.axis = axis
        self.weights = torch.nn.Parameter(torch.Tensor(self.axis_dim), requires_grad=True)
        # torch.nn.init.xavier_uniform_(self.weights)
        self.sigmoid = torch.nn.Sigmoid()
        self.out = torch.nn.Linear(self.axis_dim, 1)

    def forward(self, x1, x2):
        print(f'[DEBUG] x1.shape: {x1.shape}, x2.shape: {x2.shape}, self.axis: {self.axis}, self.axis_dim: {self.axis_dim}')
        assert x1.size(self.axis) == self.axis_dim
        assert x2.size(self.axis) == self.axis_dim
        gate_vec = self.sigmoid(self.weights)
        x1 = einops.rearrange(x1, 'b s e -> b e s')
        x2 = einops.rearrange(x2, 'b s e -> b e s')
        gating_f = (gate_vec * x1) + (1.0 - gate_vec) * x2
        output = self.out(gating_f).squeeze()
        return output


if __name__ == '__main__':
    gate = Gate()
    x1 = torch.rand(16, 512, 568)
    print(f'[DEBUG] x1: {x1.shape}, {x1.min()}, {x1.max()}')
    x2 = torch.rand(16, 512, 568)
    print(f'[DEBUG] x2: {x2.shape}, {x2.min()}, {x2.max()}')
    out = gate(x1, x2)
    print(f'[DEBUG] out: {out.shape}, {out.min()}, {out.max()}')

with output log

# python test_gate.py 
# python ebubu.py 
[DEBUG] x1: torch.Size([16, 512, 568]), 5.364418029785156e-07, 0.9999998211860657
[DEBUG] x2: torch.Size([16, 512, 568]), 0.0, 0.9999999403953552
[DEBUG] x1.shape: torch.Size([16, 512, 568]), x2.shape: torch.Size([16, 512, 568]), self.axis: 1, self.axis_dim: 512
[DEBUG] out: torch.Size([16, 568]), -0.5037943124771118, 0.500106394290924

Hi Sejong Yang,
Yes. I wanted the gate effect in world embedding. Thank you very much for debugging and explanation. I appreciate it.

Regards,

Hi Kings, Actually in my output I want an array of batach size. For example out.shape=[16, 1] and if I flatten this, it will give me one dimensional array of 16 values. How do I do this? Do I have to pass the output to another linear layer with size (568, 1)? Thanks