Attention type 'block_sparse' is not possible if sequence_length: 458 <= num global tokens:

Krishna_Garg · August 13, 2021, 11:02pm

I am using the pre-trained google/bigbird-pegasus-large-arxiv model.

But I receive the following update during the forward pass.

Attention type 'block_sparse' is not possible if sequence_length: 458 <= num global tokens: 2 * config.block_size + min. num sliding tokens: 3 * config.block_size + config.num_random_blocks * config.block_size + additional buffer: config.num_random_blocks * config.block_size = 704 with config.block_size = 64, config.num_random_blocks = 3.Changing attention type to 'original_full'...

I understand the update and I am aware of benefit of time and memory it saves while using block_sparse than original_full.

So, how should I go about selecting the suitable block_size and num_random_blocks when I know that there is a lot of variation in the sequence length of my inputs?