Any PyTorch function can work as Keras' Timedistributed?

You can use time distributed by just iterating over modules in the forward pass using nn.ModuleList

# 24 fc timedistributed
num = 24
fc = nn.ModuleList([nn.Linear(8, 1) for i in range(num)])
# forward pass 
x = np.zeros(64, 24, 8)
for i in range(x.shape[1]):
 outs.append(fc[i](x[:, i, :].unsqueeze(1))), axis=1)

The repo is not longer available.

The folder structure changed and it seems to available here now.


The time distributed wrapper equivalent in Tensorflow states

This wrapper allows to apply a layer to every temporal slice of an input

The reference you included collapses the batches and time steps into one
Are these two operations same?

Thank in advance :slight_smile:

1 Like

I haven’t checked the reference, just posted the updated link as @yongen9696 had trouble accessing the old one.

How about something like this, where we use a FFN for each timestep

def __init__(self, sequence_length, hidden_size):
	self.fc_list = nn.ModuleList()
	for j in range(self.sequence_length):
		fc = nn.Linear(hidden_size, hidden_size)
def forward(self, x): 
	lst = []
	for j in range(self.sequence_length):
	out =, axis=1)

Awesome!but I think when the original inputs X shape is (samples, timesteps, output_size),and we need to reshape Y to shape (timesteps, samples, output_size), we should use y.transpose after the view ; if not transpose ,the timesteps will mixed!

I have fixed the code to

        y = y.contiguous().view(x.size(0), -1, y.size(-1))  # (samples, timesteps, output_size)

        # IF need timesteps first,   We have to reshape Y
        if not  self.batch_first:
            y = y.transpose(0,1).contiguous()  # transpose to (timesteps, samples, output_size)


If you model contains BatchNorm, the output maybe different. The running mean and var will be based on a batch of batch_size*time_step, instead of the actual batch_size.