You can use time distributed by just iterating over modules in the forward pass using nn.ModuleList

# 24 fc timedistributed
num = 24
fc = nn.ModuleList([nn.Linear(8, 1) for i in range(num)])
# forward pass
x = np.zeros(64, 24, 8)
outs=[]
for i in range(x.shape[1]):
outs.append(fc[i](x[:, i, :].unsqueeze(1)))
outs=torch.cat(outs, axis=1)

Awesome！but I think when the original inputs X shape is (samples, timesteps, output_size)，and we need to reshape Y to shape (timesteps, samples, output_size)， we should use y.transpose after the view ; if not transpose ，the timesteps will mixed！

y = y.contiguous().view(x.size(0), -1, y.size(-1)) # (samples, timesteps, output_size)
# IF need timesteps first, We have to reshape Y
if not self.batch_first:
y = y.transpose(0,1).contiguous() # transpose to (timesteps, samples, output_size)

If you model contains BatchNorm, the output maybe different. The running mean and var will be based on a batch of batch_size*time_step, instead of the actual batch_size.