Hello all,
I implemented a Transformer model and I want to use masking in the score table.
I implemented the masking with 3 different ways that represent exactly the same masking and I have 3 different results.
The score table is of dimension (time_len * joint_num):
Way 1:
t_mask = torch.ones(time_len * joint_num, time_len * joint_num)
filtered_area = torch.zeros(joint_num, joint_num)
for i in range(time_len):
row_begin = i * joint_num
column_begin = row_begin
row_num = joint_num
column_num = row_num
t_mask[row_begin: row_begin + row_num, column_begin: column_begin + column_num] *= filtered_area
Way 2
t_mask = torch.ones(time_len * joint_num, time_len * joint_num)
for i in range(time_len):
row_begin = i * joint_num
column_begin = row_begin
row_num = joint_num
column_num = row_num
t_mask[row_begin: row_begin + row_num, column_begin: column_begin + column_num] *= 0.0
Way 3
t_mask = torch.ones(time_len * joint_num, time_len * joint_num)
for i in range(time_len):
row_begin = i * joint_num
column_begin = row_begin
row_num = joint_num
column_num = row_num
t_mask[row_begin: row_begin + row_num, column_begin: column_begin + column_num] *= torch.tensor(0.0)
The Accuracy of Way1 is better compared to the Accuracy of Way2 and 3
Why does it happen ?
torch version : 1.7.0