Differeces between Multi-dimension input to the nn.Linear layer and Iterative forwarding

I recently realized that nn.Linear can handle multi-dimension tensor.
However, it seems that I do not fully understand the computation process in the linear layer (or matrix multiplication something…)

In the example code above, I have (3 * 100 = 300) numbers of 512-dimension vectors.
result1 is the result of directly forwarding the multi dimension tensor to the linear layer.
result2 is the result of iteratively(100) forwarding (3 X 512) tensor to the linear layer and stacked them.

The two results are similar but not the same as you can see in the output of the code.
Why the two computation yields different results?

The difference is most likely created due to the limited floating point precision and your code looks correct in comparing these use cases. :slight_smile:

1 Like

Hi ptrblck, if here 100 is assumed to be the time_step, what fc_layer(rand_vector) is doing is same as TimeDistributed(Dense(256)) in tensorflow, right?
I am actually referring to this problem: https://stackoverflow.com/questions/61372645/how-to-implement-time-distributed-dense-tdd-layer-in-pytorch, It seems to me that what nn.linear() does is actually the fashion of time-distributed dense, why are people still looking for time-distributed dense function? I would appreciate your correction. Thanks alot.

I’m not sure how the TDD layer works in TensorFlow, but as given in the code snippet, the linear layer will be applied to all samples in dim1 separately.
If that’s what TDD does, then it should be equal. A verification with some constant inputs would be nice, in case you have a TF installation :wink: