Model not re-producible with pretrained embeddings and freeze=False

ironv · October 13, 2019, 4:55pm

FWIW, here’s how I could work around the above issue based on information posted in this group.
[link1] [link2]. I implemented three variants of using pre-trained embeddings and the models are reproducible.

read pre-trained embeddings and FREEZE

embed = nn.Embedding(num_embeddings, embedding_dim, padding_idx=0)

#embed_init is a numpy array with the embeddings
embedpt = torch.from_numpy(embed_init).float().to(device)
#ind_init is a numpy array with indices of words for which embedding are available
indpt = torch.from_numpy(ind_init).long().to(device)

#after model object has been instantiated
assert model.embed.weight.shape == embedpt.shape
model.embed.weight.data.copy_(embedpt)
model.embed.weight.requires_grad = False

use PT embeddings to initialize (may be better than initializing with random values)

(same as 1) except model.embed.weight.requires_grad = True

freeze embeddings if available, train embeddings which are not available eg. vocab has 1000 words of which you have embeddings for 700 of them which you would like to freeze, but train the other 300.

(same as 2) additionally in the training loop

optimizer.zero_grad()
loss.backward()
model.embed.weight.grad[indpt] = 0  <=======
optimizer.step()

My code is set up to do an extended grid search on multiple gpus including a patience parameter (#epochs with no improvement in monitored quality after which training will be stopped). The best parameter set including number of epochs is saved. When the final model is run patience is turned off. The final model now re-traces the steps of the selected grid search run, which was not happening when with the function calls in the first post.