Unsupported comparison operator: Transformer Model Export (MultiHeadAttention Layer)

I am trying to export the Transformer Model to Torch Script. In creating a module list from MultiHeadAttention Layer, the following error is generated. Accompanying code is also attached.

File "/anaconda3/lib/python3.7/site-packages/torch/jit/frontend.py", line 505, in build_Compare
    raise NotSupportedError(err_range, "unsupported comparison operator: " + op.__name__)
torch.jit.frontend.NotSupportedError: unsupported comparison operator: In
    kv_same = key.data_ptr() == value.data_ptr()

    tgt_len, bsz, embed_dim = query.size()
    assert embed_dim == self.embed_dim
    assert list(query.size()) == [tgt_len, bsz, embed_dim]
    assert key.size() == value.size()

    if incremental_state is not None:
        saved_state = self._get_input_buffer(incremental_state)
        if 'prev_key' in saved_state:
            ~~~~~~~~~~~~~ <--- HERE

CODE:

__constants__ = ['attentions', 'causal', 'layers_module']

def __init__(self, <parameters>)
        att_modules = []
        for _ in range(num_layers):
            att_modules.append(nn.MultiheadAttention(embed_dim, num_heads, dropout=dropout))
        self.attentions = nn.ModuleList(att_modules)

If I create an empty ModuleList and then append the MultiHeadAttention Layer it gives no error in no script mode, but ModuleList has to be Const in Script Mode, that route is blocked as well. Error is generated in constructor itself as confirmed while debugging, not in forward method.

Support for in was recently added, could you try using pytorch-nightly and see if that fixes this issue?

@driazati Yup, that issue is resolved but it fails now at this stage:

if hasattr(self, '_qkv_same_embed_dim') and self._qkv_same_embed_dim is False:
           ~~~~~~~ <--- HERE
            return F.multi_head_attention_forward(
                query, key, value, self.embed_dim, self.num_heads,
                self.in_proj_weight, self.in_proj_bias,
                self.bias_k, self.bias_v, self.add_zero_attn,
                self.dropout, self.out_proj.weight, self.out_proj.bias, 
                training=self.training,
                key_padding_mask=key_padding_mask, need_weights=need_weights, 
                attn_mask=attn_mask, use_separate_proj_weight=True,
                q_proj_weight=self.q_proj_weight, k_proj_weight=self.k_proj_weight,

Is it because hasattr or and is not supported yet ?