# Feedback on manually implemented hierarchical softmax

I manually implemented the hierarchical softmax, since I did not find its implementation. I implemented my model as follows. The model is simple word2vec model, but instead of using negative sampling, I want to use hierarchical softmax. In hierarchical softmax, there is no output word representations like the ones used in vanila softmax, or negative sampling. In hierarchical softmax, we have a target word, and a binary tree with each leaf node corresponding to words in the vocabulary. I have one question and wanted feedback on the correctness of my approach. Thank you in advance. Following snippet correspond to implementation of binary tree for hierarchical softmax. I use huffman tree as a binary tree for softmax.

``````import numpy as np
import torch.nn as nn
import torch
import heapq

class Node:
def __init__(self, token, freq):
self.token = token
self.freq = freq
self.left = None
self.right = None

def __lt__(self, other):
return self.freq < other.freq

def __gt__(self, other):
return self.freq > other.freq

def __eq__(self, other):
if(other == None):
return False
if(not isinstance(other, Node)):
return False
return self.freq == other.freq

class HuffmanTree:
def __init__(self):
self.heap = []
self.codes = {}
self.reverse_mapping = {}
self.root = None

def make_heap(self, frequency):
for key in frequency:
node = Node(key, frequency[key])
heapq.heappush(self.heap, node)

def merge_nodes(self):
while(len(self.heap)>1):
node1 = heapq.heappop(self.heap)
node2 = heapq.heappop(self.heap)

merged = Node(None, node1.freq + node2.freq)
merged.left = node1
merged.right = node2
heapq.heappush(self.heap, merged)

def make_codes_helper(self, root, current_code):
if(root==None):
return
if(root.token != None):
self.codes[root.token] = current_code
self.reverse_mapping[current_code] = root.token
return

self.make_codes_helper(root.left, current_code + "0")
self.make_codes_helper(root.right, current_code + "1")

def make_codes(self):
root = heapq.heappop(self.heap)
self.root = root
current_code = ""
self.make_codes_helper(root, current_code)
``````

Above snippet is to construct the binary hauffman tree for softmax. Then I initialise the leaves of the binary tree as words in my vocabulary as follows.

``````import pickle

# d1 is a dictionary with tokens as keys and their frequencies as values
d1 = {k:v for k,v in sorted(d.items(), key=lambda x: x[1], reverse=True)}

# ht corresponds to one instance of a huffman tree for softmax
ht = HuffmanTree()
ht.make_heap(d1)
ht.merge_nodes()
ht.make_codes()
``````

Now I define the calculation of loss function for hierarchical softmax as follows :

``````def cal_loss(h, target):
path_to_word = ht.codes[target]
root=ht.root
for i in path_to_word:
if(i=='0'):
loss = loss +  torch.log(torch.sigmoid(torch.dot(root.vec, h)))
root = root.left
else:
loss = loss +  torch.log(torch.sigmoid(-1*torch.dot(root.vec, h)))
root = root.right
loss = loss*-1
return loss
``````

And I define my word2vec model class as follows :

``````vocab_size=30000
class Word2Vec(nn.Module):
def __init__(self):
super().__init__()
self.embedding_layer = nn.Linear(vocab_size, 300, bias=False)

def forward(self, inp):
hidd = self.embedding_layer(inp)
return hidd

model = Word2Vec()

optimizer = torch.optim.SGD(model.parameters(), lr=1e-4, momentum=0.9)
for word, target in tranning_set:
hidd = model(word)
loss = cal_loss(hidd, target)
loss.backward()
optimizer.step()
(1) As we can see, I’ve added `model.parameters()` in the optimizer I defined. I doubt that this just includes the parameters of the model defined in Word2Vec class. How do I ensure that the parameters corresponding to the binary tree built for softmax approximation also get updated. I know that we can easily get the model parameters using model.parameters(), but for the binary tree class I implemented, we can not do anything like that. How do I ensure that my model updates all the parameters in the huffman tree as well during the training.