Bert attention mask for pytorch dot product

Skyrax · February 12, 2024, 4:32pm

Hello, how can i pass bert attention mask to pytorch scaled dot product attention. For example, i have qkv shape (32,8,64,128) and my mask (32, 64). Is it correct to use unsqueeze and expand for changing shape?
mask = mask.unsqueeze(1).expand(-1, 64,-1)
mask = mask.unsqueeze(1).expand(-1,8,-1,-1)