Why two multiplication produce different result?

Hyeonuk_Woo · March 23, 2023, 3:54am

Hello, I think I did the same multiplication, but the result is different.

import torch

#Self attention with 100-tokens
n_token=100
c_hidden=128
query=torch.randn(n_token,c_hidden)
key  =torch.randn(n_token,c_hidden)
attn_matrix= torch.matmul( query, key.transpose(0,1)) #(query_token_index, key_token_index)

#Graph frame work : every 50-tokens are fully connected with each other
edge_index=torch.ones(n_token,n_token).nonzero(as_tuple=True)# all token is connected with all token 


#edge_index = ( source_token_index, destination_token_index )
query_graph=query[edge_index[0]] # query from source_token_index
key_graph=key[edge_index[1]] # key from destination_token_index

attn_graph_1= (query_graph*key_graph).sum(dim=-1)
attn_graph_2= torch.matmul(query_graph[:,None,:],key_graph[:,:,None]).squeeze()

diff_1=torch.abs(attn_matrix[edge_index] - attn_graph_1).sum()
diff_2=torch.abs(attn_matrix[edge_index] - attn_graph_2).sum()

print (diff_1) #0.0157
print (diff_2) #0.0089

I have some idea why ‘diff_1’ is not 0, but I don’t know why ‘diff_2’ is not 0. Does anybody have idea?

eqy · March 23, 2023, 5:15am

Do the differences become smaller when e.g., using a double precision data type? In general even if two expressions are mathematically equivalent, if they are dispatched to different (or even nondeterministic) implementations then there are no guarantees the results would be identical.

Hyeonuk_Woo · March 23, 2023, 5:26am

Using double precision data type, diff_1 become much smaller, and diff_2 become 0. Thanks!