please be aware that the two operations are not the same. the second one is equivalent to passing through a linear layer. the one you posted is not
check (heads_q_res -_stacked) 10-8 precision should be fine
please be aware that the two operations are not the same. the second one is equivalent to passing through a linear layer. the one you posted is not
check (heads_q_res -_stacked) 10-8 precision should be fine