Pytorch different result when using `torch.matmul` and `for-loop` to pass input through linear layers

please be aware that the two operations are not the same. the second one is equivalent to passing through a linear layer. the one you posted is not
check (heads_q_res -_stacked) 10-8 precision should be fine