I am new to PyTorch and NLP. I want to encode a list of sentences for an NLP task and I am following this demo: Demo link.
I successfully load the sentences and build the vocabulary for 100000 most frequent words but when I try encoding the sentences using
embeddings = model.encode(sentences, bsize=128, tokenize=False, verbose=True)
I get the error:
ValueError: some of the strides of a given numpy array are negative. This is currently not supported, but will be added in future releases.
I tried changing the sentences but it doesn’t work?
What can be the possible error?
You should use
torch.from_numpy() to convert numpy arrays to Tensor before giving them to pytorch’s function to improve performances.
The error you see most certainly comes from the fact that not all numpy arrays can be represented as Tensor (arrays that were flipped in particular). You can use
np.ascontiguousarray() before giving your array to pytorch to make sure it will work.
Thanks for the reply @albanD. I tried your suggestions but I am unable to convert a list of strings to a tensor.
So, I converted the list to a one-hot encoded list and then convert it to a contiguous array. However, the function encode defined in Infersent is taking only a list of strings. I get this error now:
TypeError: split() missing 1 required positional argument: ‘split_size’
Tensors cannot contain strings. You would usually use an Embedding layer ton convert a string to some learnable features that represent that string. And then use these features in your model.
Okay, let me try that and get back.