TEXT = data.Field(tokenize = 'spacy', tokenizer_language = 'en_core_web_sm', include_lengths = True) LABEL = data.LabelField(dtype = torch.float) train_data, test_data = datasets.IMDB.splits(TEXT, LABEL) train_data, valid_data = train_data.split(random_state = random.seed(SEED)) MAX_VOCAB_SIZE = 25_000 TEXT.build_vocab(train_data, max_size = MAX_VOCAB_SIZE, vectors = "glove.6B.100d", unk_init = torch.Tensor.normal_) LABEL.build_vocab(train_data)
Suppose I have built vocabulary from training data like above. Now I want to take a look of the one-hot encoding of the sentences in the training data, how should I do that? (I know iterator will automatically give the encoded and padded sentence, but I just want to take a look of what the encoding look like).
Thanks in advance!