Nlp torch implementation for tf sample

Novice here. I am trying to implement convnet on top of huggingface bert-base-uncased, and I am using a tensorflow example. The original model looks something like:

tf_roberta_model (TFRobertaMode ((None, 96, 768), (N 124645632 input_1[0][0]


dropout_38 (Dropout) (None, 96, 768) 0 tf_roberta_model[0][0]



conv1d (Conv1D) (None, 96, 128) 196736 dropout_38[0][0]


I have a couple questions:

  1. how can an input shape of (None, 96, 768) after being applied: tf.keras.layers.Conv1D(128, 2,padding=‘same’)(x1) output a shape of (None, 96, 128) ?
  2. I’m struggling to implement the same as above in pytorch, and so far I got it to run (albeit does no good) as follows: nn.Conv1d(in_channels=128, out_channels=1,kernel_size=2). What am I doing wrong?

Thanks in advance!!!

  1. Assuming that TF uses the shape [batch_size=None, sequence_length=96, channels=768], the specified conv layer should yield the same temporal size with 128 output channels, which would correspond to [None, 96, 128].

  2. The input channels seem to be 768, while the output channels seem to be specified as 128. Also, for a kernel size of 2 would would need to add padding to get the same sequence length.