How to get the batch dimension right in the forward path of a custom layer

I would like to implement a custom layer but I can’t get the shapes correct because of the batch dimension in the forward path. I however tried to make it exactly the way nn.Linear is implemented. What am I missing here?

import torch as t
import torch.nn as nn
from torch.nn import init

class Time2Vec(nn.Module):

    def __init__(self, input_dim, output_dim):
        self.output_dim = output_dim

        self.W = nn.Parameter(t.Tensor(output_dim, output_dim))
        self.B = nn.Parameter(t.Tensor(input_dim, output_dim))
        self.w = nn.Parameter(t.Tensor(1, 1))
        self.b = nn.Parameter(t.Tensor(input_dim, 1))

    def reset_parameters(self):
        init.uniform_(self.W, 0, 1)
        init.uniform_(self.B, 0, 1)
        init.uniform_(self.w, 0, 1)
        init.uniform_(self.b, 0, 1)

    def forward(self, x):
        original = self.w * x + self.b
        x = t.repeat_interleave(x, self.output_dim, dim=-1)
        sin_trans = t.sin(, self.W) + self.B)
        return[sin_trans, original], -1)

And create a module:

 class MyModule(nn.Module):

            def __init__(self, input, output):
       = Time2Vec(input, output)

            def forward(self, x):

        t2v = MyModule(3, 3)
        print(t2v(t.from_numpy(np.array([[[0.1], [0.2], [0.3]], [[0.1], [0.2], [0.3]]])).float()))

RuntimeError: The size of tensor a (2) must match the size of tensor b (3) at non-singleton dimension 0

The first operation will use a parameter of shape [1, 1] and an input of [2, 3]:

self.w * x

So it seems that the feature dimension isn’t even matching.
Could you explain your use case a bit?
self.W has the shape [output_dim, output_dim], which also doesn’t consider the input feature dimension (but isn’t used at all).

Thx for your help!

Sorry, I had to fix the example, it is actually a 3D tensor: print(t2v(t.from_numpy(np.array([[[0.1], [0.2], [0.3]], [[0.1], [0.2], [0.3]]])).float())). And it should fail at the sinus part: sin_trans = t.sin(, self.W) + self.B)

Sure the use case is a Time embedding. Let me quote the paper mentioned in the comment:

In designing a representation for time, we identify three important properties: 1- capturing both
periodic and non-periodic patterns, 2- being invariant to time rescaling, and 3- being simple enough
so it can be combined with many models. In what follows, we provide more detail on these properties.

We propose Time2Vec, a representation for time which has the three identified properties.
For a given scalar notion of time τ , Time2Vec of τ , denoted as t2v(τ ), is a vector of size k + 1
defined as follows:

To match the dimension [output, output] we repeat the vector x output times. So we feed in a single vector and get back a 2D matrix. I have an other implementation in keras/tensorflow and there it works:

class tf_Time2Vec(tf.keras.layers.Layer):

    def __init__(self, output_dim=None, **kwargs):
        self.output_dim = output_dim

    def build(self, input_shape):

        self.W = self.add_weight(name='W',

        self.B = self.add_weight(name='B',

        self.w = self.add_weight(name='w',
                                 shape=(1, 1),

        self.b = self.add_weight(name='b',
                                 shape=(input_shape[1].value, 1),


    def call(self, x, **kwargs):
        K = tf.keras.backend

        original = self.w * x + self.b
        x = K.repeat_elements(x, self.output_dim, -1)
        sin_trans = K.sin(, self.W) + self.B)
        return K.concatenate([sin_trans, original], -1)

    def compute_output_shape(self, input_shape):
        return input_shape[0], input_shape[1], self.output_dim +1

pytorch .dot function is different from tensorflow or numpy

I am trying to implement the Time2Vec in Pytorch too and I am running into the same problem. How did you solve it exactly. Thanks