Different Output on Exact Same Setting

vitouphy · August 27, 2020, 3:45pm

I tried to create a Transformer-based Seq2Seq generation using greedy search. Once I ran the code below, the outputs are exactly the same. That’s good; nothing is wrong with this. However, if I run it again, the result is completely different.

Anyone knows what might cause this?


for _ in range(10):  # Do not forget this "loop" line.
    
    import math
    import os
    from collections import namedtuple
    import random
    import numpy as np
    import torch 
    import torch.nn as nn
    from tqdm.auto import tqdm
    from transformers import BertTokenizer
    from torch.utils.data import Dataset, DataLoader
    from src.models.transformer import Transformer
    
    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
    tokenizer = BertTokenizer.from_pretrained('bert-base-uncased', additional_special_tokens=['_eos', '_go'])
    args = {
        'max_src_len': 300,
        'max_tgt_len': 50,
        'batch_size': 64,
        'vocab_size': len(tokenizer),
        'hidden_size': 512,
        'dropout': 0.2,
        'num_layers': 4,
        'num_heads': 8,
        'sos_idx': tokenizer.encode('_go', add_special_tokens=False)[0],
        'eos_idx': tokenizer.encode('_eos', add_special_tokens=False)[0],
        'pad_idx': tokenizer.encode('[PAD]', add_special_tokens=False)[0],
    }

    args = namedtuple('args', args.keys())(*args.values())
    
    # Helper function
    def convert_att_into_mask(mask):
        return mask.bool().masked_fill(mask == 0, True).masked_fill(mask == 1, False)

    # Transformer Generatior - Pretrained
    gen_weight_path = '/data/vitou/100DaysofCode/conv_agent/checkpoints/pretrained/transformer_generative.pt'
    generator = Transformer(args).to(device)
    generator.load_state_dict(torch.load(gen_weight_path))
    generator.eval()

    tokenizer_config = {
        'add_special_tokens' : False, 
        'return_token_type_ids' : False, 
        'return_tensors' : 'pt',
        'padding': True
    }

    with torch.no_grad():
        ctx_text = "hello. do you play any video games? _eos"
        src = tokenizer(ctx_text, **tokenizer_config)
        src_input_ids, src_att_mask = [ x.to(device) for x in src.values() ]
        src_input_ids = torch.transpose(src_input_ids, 0, 1).contiguous()
        src_att_mask = convert_att_into_mask(src_att_mask)

        res = generator.generate(src_input_ids, src_att_mask[:1])
        print ("[context]: ", ctx_text)
        print ("[greedy]:  ", tokenizer.decode(res))
        print ("-------------------------------------------")

Output #1

[context]:  hello. do you play any video games? _eos
[greedy]:   of the games. i don't really watch much tv. i don't watch much tv. you? _go people like to watch tv. _go of the time. _go people do. _go of the sports. _go of the actors
------------------------------------------- 
[context]:  hello. do you play any video games? _eos
[greedy]:   of the games. i don't really watch much tv. i don't watch much tv. you? _go people like to watch tv. _go of the time. _go people do. _go of the sports. _go of the actors
-------------------------------------------
[x8] more of this (exactly the same for each iteration)

Output #2

[context]:  hello. do you play any video games? _eos
[greedy]:   i do, i love the simpsons! _eos
------------------------------------------- 
[context]:  hello. do you play any video games? _eos
[greedy]:   i do, i love the simpsons! _eos
-------------------------------------------
[x8] more of this (exactly the same for each iteration)

So the output I got is either one of the two outputs above.

Samarth_Mishra · August 27, 2020, 4:32pm

Is there any hidden randomness in Transformer in the generate function?

If it is something hard to spot (maybe something within a library function), to perfectly replicate something I fix random seeds of torch, numpy and random.

torch.manual_seed(0)
np.random.seed(0)
random.seed(0)

vitouphy · August 27, 2020, 4:36pm

I did that as well. But it has no luck.
Also, I put everything in a loop at Line 01.
It even loops through the import (LOL, I’ve never done that before).

generate function does greedy search, so I don’t think it’s the randomness over that.

Samarth_Mishra · August 28, 2020, 1:20pm

Even with the random seeds fixed, does it show different outputs? And do Output #1 and Output #2 alternate each time you run the snippet?

vitouphy · August 28, 2020, 2:28pm

Yeah, apparently fixing the seed does not resolve the problem. And the outputs are only alternating between #1 or #2 each time I run the code.

Samarth_Mishra · August 28, 2020, 4:12pm

It is also curious that those are the 2 outputs you cycle over. I’m a little out of ideas.

I would probably check the output after each step, help narrow it down more. Currently, it would seem it is something to do with generate, or some state in generator

vitouphy · August 29, 2020, 2:02pm

This is the code in the generator. It uses argmax to choose the best token at each timestep.
Now I’m starting to doubt if there is something to do with PyTorch Transformer.

def generate(self, src, src_key_padding_mask=None):
        ''' src has dimension of LEN x 1 '''
        src = self.embedding(src)
        src = self.pos_encoder(src)
        src = self.transformer_encoder(src, src_key_padding_mask=src_key_padding_mask)
        
        inputs = [self.args.sos_idx]
        for i in range(self.args.max_tgt_len):
            tgt = torch.LongTensor([inputs]).view(-1,1).to(device)
            tgt_mask = self.get_mask(i+1).to(device)
            tgt = self.embedding(tgt)
            tgt = self.pos_encoder(tgt)
            output = self.transformer_decoder(
                tgt=tgt, 
                memory=src, 
                tgt_mask=tgt_mask,
                memory_key_padding_mask = src_key_padding_mask )
            
            output = self.linear(output)
            output = self.softmax(output)
            output = output[-1] # the last timestep
            
            values, indices = output.max(dim=-1) 
            pred_token = indices.item()
            inputs.append(pred_token)

            # end when reach eos idx
            if pred_token == self.args.eos_idx:
                break

vitouphy · September 1, 2020, 12:53am

Thank you @Samarth_Mishra for trying to help. After going through it line-by-line, I have identified the problem. And as weird as it sounds, it has nothing to do with the model. It was this line.

tokenizer = BertTokenizer.from_pretrained('bert-base-uncased', additional_special_tokens=['_eos', '_go'])

For some strange reason, the BertTokenizer randomly assigns different index to _eos and _go each time the code is run. I suspect they might use their own custom random function. That was probably why the seed we set does not affect this.

Simply change the line to this, fix my problem.

tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
tokenizer.add_tokens(['_eos', '_go'])