Converting tf Keras code to Pytorch help

I have this simple tf code, what is the equivalent in pytorch? I am stuck trying to code it. I have encountered multiple errors, due to the dimensions.
This is the tensorflow code:

    Bidirectional(GRU(units=50, return_sequences=True)),
    tfa.layers.GroupNormalization(50),
    Dropout(0.2),
    Dense(units=1, activation='sigmoid')

How can I implement the same in pytorch? i’m stuck at this step:

def __init__(self,  input_dim, hidden_dim, output_dim, n_layers, drop_prob=0.2):
        super(GRU, self).__init__()
        self.hidden_dim = hidden_dim
        self.n_layers = n_layers
        self.gru = nn.GRU(input_size=input_dim, hidden_size=hidden_dim, num_layers=n_layers, batch_first=True, bidirectional=True)
        self.gn = nn.GroupNorm(50, hidden_dim)
        self.dr = nn.Dropout(drop_prob)       
        self.lin = nn.Linear(input_dim, output_dim)
        self.sig = nn.Sigmoid()
        
    def forward(self, x, h):
        print(x.shape)
        print(h.shape)
        out, h = self.gru(x, h)
        print(out.shape)
        out = self.gn(out)
        out = self.lin(out)
        out = self.sig(out)
        return out, h

get output:

torch.Size([32, 64, 7])
torch.Size([2, 32, 64])
torch.Size([32, 64, 128])

and error:

     15         out, h = self.gru(x, h)
     16         print(out.shape)
---> 17         out = self.gn(out)
     18         out = self.lin(out)
     19         out = self.sig(out)

Expected number of channels in input to be divisible by num_groups, but got input of shape [32, 64, 128] and num_groups=50

Could you explain how the Keras model is working internally?
In your current code snippets it seems you didn’t match the shapes of the GRU module, which would cause the shape mismatch error.
I also assume that units=50 would refer to the hidden_size of the GRU?
If so, do you know how the GroupNorm layer is applied in Keras, i.e. which dimension is used as the “channel” dimension?
Additionally, do you know, how the temporal output is passed to the linear layer. In particular:

  • is the last time step used only
  • are both directions used or only the forward direction?

Here is a code snippet with some assumptions, but since I’m not deeply familiar with Keras (and can’t see in the code how the model is working) it might not use the same workflow:

class MyModel(nn.Module):
    def __init__(self,  input_dim, hidden_dim, output_dim, n_layers, drop_prob=0.2):
        super().__init__()
        self.hidden_dim = hidden_dim
        self.n_layers = n_layers
        self.gru = nn.GRU(input_size=input_dim, hidden_size=hidden_dim, num_layers=n_layers, batch_first=True, bidirectional=True)
        self.gn = nn.GroupNorm(50, hidden_dim)
        self.dr = nn.Dropout(drop_prob)       
        self.lin = nn.Linear(hidden_dim, output_dim)
        self.sig = nn.Sigmoid()
        
    def forward(self, x, h):
        print(x.shape)
        print(h.shape)
        out, h = self.gru(x, h)
        # permute to [batch_size, nb_features, seq_len]
        out = out.permute(0, 2, 1)
        # use only first direction
        out = self.gn(out[:, :50, :])
        # use last time step
        out = out[:, :, -1]
        # flatten
        out = out.view(out.size(0), -1)
        out = self.lin(out)
        out = self.sig(out)
        return out, h
    
num_layers = 1
hidden_dim = 50
batch_size, seq_len, nb_features = 32, 64, 7
model = MyModel(input_dim=nb_features, hidden_dim=hidden_dim, output_dim=1, n_layers=num_layers)
x = torch.randn(batch_size, seq_len, nb_features)
h = torch.randn(2*num_layers, batch_size, hidden_dim)
out = model(x, h)

thank you very much for your help