How to train own text data and preprocessing

Pytorch_Text_User · July 23, 2017, 3:38pm

Hi, I am new current user of pytorch and I was wondering if you have better way to preprocessing your own text data.

For example, if you have bunch of text data such as

sentences = “I have a pen” -> words -> “I”, “have”, “a”, “pen”.
classification = 0 or 1 (pos or neg).

Of course, there are many sentences and all the sentence lengths are different.

In Keras or Tensorflow, it has own function like text padding or so.

How to embed those into pytorch way and train simple classification for that?

There are not many good examples for this.

Jing · July 25, 2017, 9:29am

I think maybe torch.nn.utils.rnn.pack_padded_sequence() can do text padding.