Dear all
I’m working on a grammatical error correction (GEC) task based on neural machine translation (NMT). The only difference between GEC and NMT is the shared embedding.
NMT embedding:
SRC = Field(tokenize= tokenizer, init_token=‘’, eos_token=‘’, batch_first=True)
TRG = Field(tokenize= tokenizer, init_token=‘’, eos_token=‘’, batch_first=True)train_data, valid_data = TabularDataset.splits(path=‘…/data/’,train=‘train.csv’,
validation=‘valid.csv’ , format=‘csv’,
fields=[(‘src’, SRC), (‘trg’, TRG)], skip_header=True)
My implementation of shared embedding is like this:
TRG = Field(tokenize= tokenizer, init_token=‘’, eos_token=‘’, batch_first=True)
train_data, valid_data = TabularDataset.splits(path=‘…/data/’,train=‘train.csv’,
validation=‘valid.csv’ , format=‘csv’,
fields=[(‘src’, TRG), (‘trg’, TRG)], skip_header=True)
But the results are not good, what is the optimal implementation of shared embedding in Pytorch?
Kind regards,
Aiman Solyman