Word language model to assign probabilities to sentences in test data?

Greetings all,

I’m very new to PyTorch and have been looking through some of the examples. I’m especially interested in language modeling, and would like to train the word_language_model on a corpus (PTB data, or any other). The example seems to show how to do the training :

but then it goes on to randomly generate text and then estimate the perplexity. Instead of this, I would like to actually provide a series of sentences (which weren’t in the training data) to which probabilities are assigned by the language model learned from the training data. So if my test data is :

I would like to talk to you for a minute.
I would like to talk to you for a minuet.
I would like to talk to you for a fdasdf.

I would like to get a probability for each of these sentences based on the language model that has been learned. The example doesn’t seem to show how to deal with a separate test set, and how to get probabilities for each of the test sentences. Can anyone offer any guidance on this? This example is just a starting point for me so if there are other examples, etc. that are closer to what I’d like to do pointers to those would be great.

Thanks!
Ted

2 Likes

Hi Ted,

I was wondering if you found the answer to your question? I am dealing with the same problem.

Thanks,

Hi Ted,
if you soved this problem, please email me:763738799@qq.com.
thanks

if you soved this problem, please email me:763738799@qq.com.
thanks