Translation of new sentence with new words

Shalom_Broyer · September 4, 2019, 8:12am

I tried the demo of “eng-fra” translation.
I want to implement this algorithm on another area.
My question is what should I do when I try to translate new English sentence when it has new word which doesn’t exist in the original English sentences?

zhangguanheng66 · September 4, 2019, 1:38pm

This is a hard problem for NLP when a new word is not in your vocabulary. Take a look at subword method (like SentencePiece). It may be helpful. Some pretrained embedding may also help.

zhangguanheng66 · September 4, 2019, 2:19pm

here is an example.

github.com

bentrevett/pytorch-seq2seq/blob/master/3 - Neural Machine Translation by Jointly Learning to Align and Translate.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 3 - Neural Machine Translation by Jointly Learning to Align and Translate\n",
    "\n",
    "In this third notebook on sequence-to-sequence models using PyTorch and TorchText, we'll be implementing the model from [Neural Machine Translation by Jointly Learning to Align and Translate](https://arxiv.org/abs/1409.0473). This model achives our best perplexity yet, ~27 compared to ~34 for the previous model.\n",
    "\n",
    "## Introduction\n",
    "\n",
    "As a reminder, here is the general encoder-decoder model:\n",
    "\n",
    "![](assets/seq2seq1.png)\n",
    "\n",
    "In the previous model, our architecture was set-up in a way to reduce \"information compression\" by explicitly passing the context vector, $z$, to the decoder at every time-step and by passing both the context vector and input word, $y_t$, along with the hidden state, $s_t$, to the linear layer, $f$, to make a prediction.\n",
    "\n",
    "![](assets/seq2seq7.png)\n",
    "\n",

This file has been truncated. show original