Missing key(s) in state_dict

It looks like the cap_length are created in the TextDataset’s get_caption method.
I think it’s worth trying to fix this problem first.

Just wanted to make sure. Your saying cap_length is the same as cap_lens correct?

Based on the code, it looks like in get_caption x_len is calculated, then returned to __getitem__ as cap_len.
prepare_data gets a new sample from TextDataset (so from its __getitem__), and returns sorted_cap_lens, which is finally renamed to cap_lens.

I see your confusion and think the naming in the repo could be a bit more consistent, but maybe there is a good reason to rename the same variables.

1 Like

Link to my conversion (ConvertML_Models/convert.ipynb at master · rchavezj/ConvertML_Models · GitHub)

So I tried using the prepare_data function and it looks like my cap_lens is getting a new matrix of data from the dataloader. I’m having confusion wrapping my head around why the hidden matrix keeps return nothing but zero. Either one of two cases comes to my head

  1. The pre-trained loaded model doesn’t have hidden content
  2. The way I’m loading the hidden decisions disapear.

At least now I’m getting an error that looks reasonable. When I try to create a fake inputDimension and feed it into text_encoder to perform coreML conversion, I get an error with argument 1 not having proper data.

The initial hidden state might be all zeros, so I don’t think it’s a bug.
I haven’t compared your code to the other code base, but this line of code seems to confirm by assumption.

The error message states that indices should be provided as torch.long.
Could you try to cast x using x = x.long()?

1 Like

It looks like there’s something wrong with the input dimensions since I’m getting a message that I’m out of bound based on this forum post (Embeddings index out of range error)

I honestly thought the first layer from the bottom picture was the required dimension. Unless I need to make my random torch input into some sort of embedding format that I’m not self aware of.

inputDimen

The Embedding layer works a bit different than e.g. Linear.
You specify the num_embeddings, i.e. the size of the dictionary, and the embedding_dim, i.e. the size of each embedding vector.

In your case num_embeddings=27297 means, that your input tensor should store indices in the rage [0, 27296].

Have a look at this small example:

num_embeddings = 27297
embedding_dim = 300
emb = nn.Embedding(
    num_embeddings=num_embeddings,
    embedding_dim=embedding_dim
)

batch_size = 100
x = torch.empty(batch_size, dtype=torch.long).random_(num_embeddings)

output = emb(x)
output.shape
1 Like

I grew up visually learning concepts. Taking your advice does that mean I’m given a one dimensional (1x300) input then embedded to look like (27297 x 300) compared to the diagram I found on the internet (S x B x I)? The example I found looks like 3 dimensional. Does B = num_embeddings and S = embedding_dim?

nn.Embedding basically keeps your input dimensions and adds the embedding_dim to it.
So if you provide a two-dimensional input, you will get a three-dimensional output.
I’m not familiar with your example, but it looks like you transpose your input to get the dimensions [sequence, batch_size] and pass it into the embedding layer.
I would therefore correspond to the embedding_dim.

1 Like

Okay this is my overview of everything I’ve learned from converting a pytorch model to ONNX. Before I converted the pytorch model I wanted to make sure the dimensions for captions, cap_lens and hidden were correct through the forward function and no errors! :slight_smile:

.

However I have a new problem…I get an error from exporting the model using the exact same inputs???

“TypeError: wrapPyFuncWithSymbolic(): incompatible function arguments. The following argument types are supported: (self: torch._C.Graph, arg0: function, arg1: List[torch::jit::Value], arg2: int, arg3: function) → iterator”.

I tried tupling all 3 inputs (captions, cap_lens, hidden) onto the onnx converter yet I get some sort of data type error…Before showing the output terminal from the conversion I want to show how all three inputs look like. I came to a conclusion I need to either convert all three inputs into float or long dtype and idk how to properly convert dtypes.

caption is a (48,15) with torch.LongTensor data type
captions

cap_lens is (48,) with torch.LongTensor data type
cap_lens

and lastly hidden is a tuple of two (2, 48, 128) with torch.FloatTensor datatype
hidden

# Export the model
torch_out = torch.onnx._export(text_encoder,                 # model being run
                               (captions_fake_input, cap_lens, hidden), # model input (or a tuple for multiple inputs)
                               "kol.onnx",      # where to save the model (can be a file or file-like object)
                               export_params=True)           # store the trained parameter weights inside the model file

[output]

TypeError Traceback (most recent call last)
in ()
3 (captions_fake_input, cap_lens, hidden), # model input (or a tuple for multiple inputs)
4 “kol.onnx”, # where to save the model (can be a file or file-like object)
----> 5 export_params=True) # store the trained parameter weights inside the model file

~/anaconda/lib/python3.6/site-packages/torch/onnx/init.py in _export(*args, **kwargs)
18 def _export(*args, **kwargs):
19 from torch.onnx import utils
—> 20 return utils._export(*args, **kwargs)
21
22

~/anaconda/lib/python3.6/site-packages/torch/onnx/utils.py in _export(model, args, f, export_params, verbose, training, input_names, output_names, aten, export_type)
132 # training mode was.)
133 with set_training(model, training):
→ 134 trace, torch_out = torch.jit.get_trace_graph(model, args)
135
136 if orig_state_dict_keys != _unique_state_dict(model).keys():

~/anaconda/lib/python3.6/site-packages/torch/jit/init.py in get_trace_graph(f, args, kwargs, nderivs)
253 if not isinstance(args, tuple):
254 args = (args,)
→ 255 return LegacyTracedModule(f, nderivs=nderivs)(*args, **kwargs)
256
257

~/anaconda/lib/python3.6/site-packages/torch/nn/modules/module.py in call(self, *input, **kwargs)
489 result = self._slow_forward(*input, **kwargs)
490 else:
→ 491 result = self.forward(*input, **kwargs)
492 for hook in self._forward_hooks.values():
493 hook_result = hook(self, input, result)

~/anaconda/lib/python3.6/site-packages/torch/jit/init.py in forward(self, *args)
286 _tracing = True
287 trace_inputs = _unflatten(all_trace_inputs[:len(in_vars)], in_desc)
→ 288 out = self.inner(*trace_inputs)
289 out_vars, _ = _flatten(out)
290 _tracing = False

~/anaconda/lib/python3.6/site-packages/torch/nn/modules/module.py in call(self, *input, **kwargs)
487 hook(self, input)
488 if torch.jit._tracing:
→ 489 result = self._slow_forward(*input, **kwargs)
490 else:
491 result = self.forward(*input, **kwargs)

~/anaconda/lib/python3.6/site-packages/torch/nn/modules/module.py in _slow_forward(self, *input, **kwargs)
477 tracing_state._traced_module_stack.append(self)
478 try:
→ 479 result = self.forward(*input, **kwargs)
480 finally:
481 tracing_state.pop_scope()

~/Desktop/text-to-image-transcribed/code/model.py in forward(self, captions, cap_lens, hidden, mask)
153
154 #print("======= Packed emb ====== ")
→ 155 emb = pack_padded_sequence(emb, cap_lens, batch_first=True)
156 #print("emb: ", emb)
157 #print("emb shape: ", emb.shape)

~/anaconda/lib/python3.6/site-packages/torch/onnx/init.py in wrapper(*args, **kwargs)
71
72 symbolic_args = function._unflatten(arg_values, args)
—> 73 output_vals = symbolic_fn(tstate.graph(), *symbolic_args, **kwargs)
74
75 for var, val in zip(

~/anaconda/lib/python3.6/site-packages/torch/nn/utils/rnn.py in _symbolic_pack_padded_sequence(g, input, lengths, batch_first, padding_value, total_length)
144 outputs = g.wrapPyFuncWithSymbolic(
145 pack_padded_sequence_trace_wrapper, [input, lengths], 2,
→ 146 _onnx_symbolic_pack_padded_sequence)
147 return tuple(o for o in outputs)
148

TypeError: wrapPyFuncWithSymbolic(): incompatible function arguments. The following argument types are supported:
1. (self: torch._C.Graph, arg0: function, arg1: List[torch::jit::Value], arg2: int, arg3: function) → iterator

Invoked with: graph(%0 : Long(48, 15)
%1 : Long(48)
%2 : Float(2, 48, 128)
%3 : Float(2, 48, 128)
%4 : Float(27297, 300)
%5 : Float(512, 300)
%6 : Float(512, 128)
%7 : Float(512)
%8 : Float(512)
%9 : Float(512, 300)
%10 : Float(512, 128)
%11 : Float(512)
%12 : Float(512)) {
%13 : Float(48, 15, 300) = aten::embedding[padding_idx=-1, scale_grad_by_freq=0, sparse=0](%4, %0), scope: RNN_ENCODER/Embedding[encoder]
%16 : Float(48, 15, 300), %17 : Handle = ^Dropout(0.5, False, False)(%13), scope: RNN_ENCODER/Dropout[drop]
%15 : Float(48, 15, 300) = aten::slicedim=0, start=0, end=9223372036854775807, step=1, scope: RNN_ENCODER/Dropout[drop]
%14 : Float(48, 15, 300) = aten::as_stridedsize=[48, 15, 300], stride=[4500, 300, 1], storage_offset=0, scope: RNN_ENCODER/Dropout[drop]
%18 : Long(48) = prim::Constantvalue=, scope: RNN_ENCODER
%76 : Float(502, 300), %77 : Long(15), %78 : Handle = ^PackPadded(True)(%16, %18), scope: RNN_ENCODER
%19 : Float(15!, 48!, 300) = aten::transposedim0=0, dim1=1, scope: RNN_ENCODER
%21 : Long() = aten::selectdim=0, index=47, scope: RNN_ENCODER
%20 : Long() = aten::as_stridedsize=[], stride=[], storage_offset=47, scope: RNN_ENCODER
%22 : Byte() = aten::leother={0}, scope: RNN_ENCODER
%24 : Float(7!, 48!, 300) = aten::slicedim=0, start=0, end=7, step=1, scope: RNN_ENCODER
%23 : Float(7!, 48!, 300) = aten::as_stridedsize=[7, 48, 300], stride=[300, 4500, 1], storage_offset=0, scope: RNN_ENCODER
%25 : Float(7, 48, 300) = aten::clone(%24), scope: RNN_ENCODER
%26 : Float(336, 300) = aten::viewsize=[-1, 300], scope: RNN_ENCODER
%28 : Float(1!, 48!, 300) = aten::slicedim=0, start=7, end=8, step=1, scope: RNN_ENCODER
%27 : Float(1!, 48!, 300) = aten::as_stridedsize=[1, 48, 300], stride=[300, 4500, 1], storage_offset=2100, scope: RNN_ENCODER
%30 : Float(1!, 46!, 300) = aten::slicedim=1, start=0, end=46, step=1, scope: RNN_ENCODER
%29 : Float(1!, 46!, 300) = aten::as_stridedsize=[1, 46, 300], stride=[300, 4500, 1], storage_offset=2100, scope: RNN_ENCODER
%31 : Float(1, 46, 300) = aten::clone(%30), scope: RNN_ENCODER
%32 : Float(46, 300) = aten::viewsize=[-1, 300], scope: RNN_ENCODER
%34 : Float(1!, 48!, 300) = aten::slicedim=0, start=8, end=9, step=1, scope: RNN_ENCODER
%33 : Float(1!, 48!, 300) = aten::as_stridedsize=[1, 48, 300], stride=[300, 4500, 1], storage_offset=2400, scope: RNN_ENCODER
%36 : Float(1!, 43!, 300) = aten::slicedim=1, start=0, end=43, step=1, scope: RNN_ENCODER
%35 : Float(1!, 43!, 300) = aten::as_stridedsize=[1, 43, 300], stride=[300, 4500, 1], storage_offset=2400, scope: RNN_ENCODER
%37 : Float(1, 43, 300) = aten::clone(%36), scope: RNN_ENCODER
%38 : Float(43, 300) = aten::viewsize=[-1, 300], scope: RNN_ENCODER
%40 : Float(1!, 48!, 300) = aten::slicedim=0, start=9, end=10, step=1, scope: RNN_ENCODER
%39 : Float(1!, 48!, 300) = aten::as_stridedsize=[1, 48, 300], stride=[300, 4500, 1], storage_offset=2700, scope: RNN_ENCODER
%42 : Float(1!, 29!, 300) = aten::slicedim=1, start=0, end=29, step=1, scope: RNN_ENCODER
%41 : Float(1!, 29!, 300) = aten::as_stridedsize=[1, 29, 300], stride=[300, 4500, 1], storage_offset=2700, scope: RNN_ENCODER
%43 : Float(1, 29, 300) = aten::clone(%42), scope: RNN_ENCODER
%44 : Float(29, 300) = aten::viewsize=[-1, 300], scope: RNN_ENCODER
%46 : Float(1!, 48!, 300) = aten::slicedim=0, start=10, end=11, step=1, scope: RNN_ENCODER
%45 : Float(1!, 48!, 300) = aten::as_stridedsize=[1, 48, 300], stride=[300, 4500, 1], storage_offset=3000, scope: RNN_ENCODER
%48 : Float(1!, 20!, 300) = aten::slicedim=1, start=0, end=20, step=1, scope: RNN_ENCODER
%47 : Float(1!, 20!, 300) = aten::as_stridedsize=[1, 20, 300], stride=[300, 4500, 1], storage_offset=3000, scope: RNN_ENCODER
%49 : Float(1, 20, 300) = aten::clone(%48), scope: RNN_ENCODER
%50 : Float(20, 300) = aten::viewsize=[-1, 300], scope: RNN_ENCODER
%52 : Float(1!, 48!, 300) = aten::slicedim=0, start=11, end=12, step=1, scope: RNN_ENCODER
%51 : Float(1!, 48!, 300) = aten::as_stridedsize=[1, 48, 300], stride=[300, 4500, 1], storage_offset=3300, scope: RNN_ENCODER
%54 : Float(1!, 12!, 300) = aten::slicedim=1, start=0, end=12, step=1, scope: RNN_ENCODER
%53 : Float(1!, 12!, 300) = aten::as_stridedsize=[1, 12, 300], stride=[300, 4500, 1], storage_offset=3300, scope: RNN_ENCODER
%55 : Float(1, 12, 300) = aten::clone(%54), scope: RNN_ENCODER
%56 : Float(12, 300) = aten::viewsize=[-1, 300], scope: RNN_ENCODER
%58 : Float(1!, 48!, 300) = aten::slicedim=0, start=12, end=13, step=1, scope: RNN_ENCODER
%57 : Float(1!, 48!, 300) = aten::as_stridedsize=[1, 48, 300], stride=[300, 4500, 1], storage_offset=3600, scope: RNN_ENCODER
%60 : Float(1!, 10!, 300) = aten::slicedim=1, start=0, end=10, step=1, scope: RNN_ENCODER
%59 : Float(1!, 10!, 300) = aten::as_stridedsize=[1, 10, 300], stride=[300, 4500, 1], storage_offset=3600, scope: RNN_ENCODER
%61 : Float(1, 10, 300) = aten::clone(%60), scope: RNN_ENCODER
%62 : Float(10, 300) = aten::viewsize=[-1, 300], scope: RNN_ENCODER
%64 : Float(1!, 48!, 300) = aten::slicedim=0, start=13, end=14, step=1, scope: RNN_ENCODER
%63 : Float(1!, 48!, 300) = aten::as_stridedsize=[1, 48, 300], stride=[300, 4500, 1], storage_offset=3900, scope: RNN_ENCODER
%66 : Float(1!, 4!, 300) = aten::slicedim=1, start=0, end=4, step=1, scope: RNN_ENCODER
%65 : Float(1!, 4!, 300) = aten::as_stridedsize=[1, 4, 300], stride=[300, 4500, 1], storage_offset=3900, scope: RNN_ENCODER
%67 : Float(1, 4, 300) = aten::clone(%66), scope: RNN_ENCODER
%68 : Float(4, 300) = aten::viewsize=[-1, 300], scope: RNN_ENCODER
%70 : Float(1!, 48!, 300) = aten::slicedim=0, start=14, end=15, step=1, scope: RNN_ENCODER
%69 : Float(1!, 48!, 300) = aten::as_stridedsize=[1, 48, 300], stride=[300, 4500, 1], storage_offset=4200, scope: RNN_ENCODER
%72 : Float(1!, 2!, 300) = aten::slicedim=1, start=0, end=2, step=1, scope: RNN_ENCODER
%71 : Float(1!, 2!, 300) = aten::as_stridedsize=[1, 2, 300], stride=[300, 4500, 1], storage_offset=4200, scope: RNN_ENCODER
%73 : Float(1, 2, 300) = aten::clone(%72), scope: RNN_ENCODER
%74 : Float(2, 300) = aten::viewsize=[-1, 300], scope: RNN_ENCODER
%75 : Float(502, 300) = aten::cat[dim=0](%26, %32, %38, %44, %50, %56, %62, %68, %74), scope: RNN_ENCODER
return ();
}
, <function _symbolic_pack_padded_sequence..pack_padded_sequence_trace_wrapper at 0x1c24e95950>, [16 defined in (%16 : Float(48, 15, 300), %17 : Handle = ^Dropout(0.5, False, False)(%13), scope: RNN_ENCODER/Dropout[drop]
), [15, 15, 14, 14, 13, 13, 13, 13, 13, 13, 12, 12, 11, 11, 11, 11, 11, 11, 11, 11, 10, 10, 10, 10, 10, 10, 10, 10, 10, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 8, 8, 8, 7, 7]], 2, <function _symbolic_pack_padded_sequence.._onnx_symbolic_pack_padded_sequence at 0x1c1be48378>

Could you post the code to text_encoder?
In your notebook it’s loaded from models.py, which seems to be missing.

I’m not that familiar with ONNX, but is there a reason, you are using _export instead of .export?
Exporting a model with an Embedding layer and multiple outputs works, so I would have to see your whole model to see the reason it’s failing.

I created a separate notebook in the same repo with models.py (imported) and the rest of AttnGAN project. My notebook is similar to pretrain_DAMSM.py but modified so I can import the model into production. I also switched to .export and got the same results. I’ll post the code on my git repo but you need to download the coco data just fyi. I didn’t push commit the attngan project by the way

^My project is the comment above. Below is another developer contributing to AttnGAN

Sorry, maybe I’m blind, but I still couldn’t find your model definition.
I just took the one defined from the other repo.
After some minor modifications, this code works for me:
EDIT: Sorry, my mistake. The code throws the same error and does not work!

import torch
import torch.nn as nn
import torch.onnx
from torch.nn.utils.rnn import pack_padded_sequence, pad_packed_sequence


# ############## Text2Image Encoder-Decoder #######
class RNN_ENCODER(nn.Module):
    def __init__(self, ntoken, ninput=300, drop_prob=0.5,
                 nhidden=128, nlayers=1, bidirectional=False):
        super(RNN_ENCODER, self).__init__()
        self.n_steps = 10
        self.ntoken = ntoken  # size of the dictionary
        self.ninput = ninput  # size of each embedding vector
        self.drop_prob = drop_prob  # probability of an element to be zeroed
        self.nlayers = nlayers  # Number of recurrent layers
        self.bidirectional = bidirectional
        self.rnn_type = 'LSTM'
        if bidirectional:
            self.num_directions = 2
        else:
            self.num_directions = 1
        # number of features in the hidden state
        self.nhidden = nhidden // self.num_directions

        self.define_module()
        self.init_weights()

    def define_module(self):
        self.encoder = nn.Embedding(self.ntoken, self.ninput)
        self.drop = nn.Dropout(self.drop_prob)
        if self.rnn_type == 'LSTM':
            # dropout: If non-zero, introduces a dropout layer on
            # the outputs of each RNN layer except the last layer
            self.rnn = nn.LSTM(self.ninput, self.nhidden,
                               self.nlayers, batch_first=True,
                               dropout=self.drop_prob,
                               bidirectional=self.bidirectional)
        elif self.rnn_type == 'GRU':
            self.rnn = nn.GRU(self.ninput, self.nhidden,
                              self.nlayers, batch_first=True,
                              dropout=self.drop_prob,
                              bidirectional=self.bidirectional)
        else:
            raise NotImplementedError

    def init_weights(self):
        initrange = 0.1
        self.encoder.weight.data.uniform_(-initrange, initrange)
        # Do not need to initialize RNN parameters, which have been initialized
        # http://pytorch.org/docs/master/_modules/torch/nn/modules/rnn.html#LSTM
        # self.decoder.weight.data.uniform_(-initrange, initrange)
        # self.decoder.bias.data.fill_(0)

    def init_hidden(self, bsz):
        weight = next(self.parameters()).data
        if self.rnn_type == 'LSTM':
            return (weight.new(self.nlayers * self.num_directions,
                                        bsz, self.nhidden).zero_(),
                    weight.new(self.nlayers * self.num_directions,
                                        bsz, self.nhidden).zero_())
        else:
            return weight.new(self.nlayers * self.num_directions,
                                       bsz, self.nhidden).zero_()

    def forward(self, captions, cap_lens, hidden, mask=None):
        # input: torch.LongTensor of size batch x n_steps
        # --> emb: batch x n_steps x ninput
        emb = self.drop(self.encoder(captions))
        #
        # Returns: a PackedSequence object
        cap_lens = cap_lens.data.tolist()
        emb = pack_padded_sequence(emb, cap_lens, batch_first=True)
        # #hidden and memory (num_layers * num_directions, batch, hidden_size):
        # tensor containing the initial hidden state for each element in batch.
        # #output (batch, seq_len, hidden_size * num_directions)
        # #or a PackedSequence object:
        # tensor containing output features (h_t) from the last layer of RNN
        output, hidden = self.rnn(emb, hidden)
        # PackedSequence object
        # --> (batch, seq_len, hidden_size * num_directions)
        output = pad_packed_sequence(output, batch_first=True)[0]
        # output = self.drop(output)
        # --> batch x hidden_size*num_directions x seq_len
        words_emb = output.transpose(1, 2)
        # --> batch x num_directions*hidden_size
        if self.rnn_type == 'LSTM':
            sent_emb = hidden[0].transpose(0, 1).contiguous()
        else:
            sent_emb = hidden.transpose(0, 1).contiguous()
        sent_emb = sent_emb.view(-1, self.nhidden * self.num_directions)
        return words_emb, sent_emb


model = RNN_ENCODER(27297)
captions = torch.empty(48, 15, dtype=torch.long).random_(27297)
cap_lens = torch.sort(torch.empty(48, dtype=torch.long).random_(1, 15), descending=True)[0]
hidden = (torch.randn(1, 48, 128), torch.randn(1, 48, 128))

output = model(captions, cap_lens, hidden)

torch.onnx.export(model, (captions, cap_lens, hidden), 'test.proto', verbose=True, export_params=True)

Could you compare your code with this one?

I tested out your code and I’m still getting the same error. Am I’m dealing with a software package that I’m missing?

I’m currently using a PyTorch version compiled from master.
Let me check the code with 0.4.0.

EDIT: It’s also working on 0.4.0. Which PyTorch version do you have?
Could you update to the current stable release? You will find the install instructions on the website.

I’m currently on 0.4.0 as well. How do I check I’m on master?

version

I’m sorry! It’s not working on 0.4.0 and master. I’ve mixed up the files.
Let me check it again and get back to you.

1 Like

I debugged the code a bit and apparently ONNX is throwing the error when pack_padded_sequence is used.
Have a look at these docs.
At _symbolic_pack_padded_sequence, there is a statement:

There currently is no PackPadded operator in ONNX. We rely on an
optimization pass to remove this later. It is an error if all
PackPadded operators cannot be optimized out.

I’m not sure, as I haven’t used ONNX a lot, but maybe you could file an issue at the ONNX github.

1 Like

I’ll make sure to post my issue on the ONNX github repo. If your code sample worked may I take a look at the exported onnx model ‘test.proto’ in onnx/caffe format?