I am trying to implement a model in which there are categories (genera) and subcategories (species) which I am trying to classify images into. I want to build a model which explicitly separates the output into
P[species|data] = P[genus | data] * P[species | genus, data]. I am implementing this by having one layer compute
Log[P[genus|data]] and adding that to another layer which computes
Log[P[species | genus, data]]. Unfortunately the various genera have different numbers of species in the dataset. So, in order to compute the final output I need to transform
Log[P[genus|data]] into a tensor where each entry is repeated a number of times equal to the number of species in each genus. I am achieving this right now in a very inefficient way which relies on a python list comprehension as follows: (
G is a tensor of size
[batch_size, n_genera] giving the genus log-probabilities and
S is a tensor of size
[batch_size, n_species] giving the species log-probabilities with
n_species > n_genera):
genus_expand = torch.transpose(torch.stack([g for s,g in zip(n_species_list,torch.transpose(G,0,1)) for i in range(s)],0),0,1) final_prob = S + G
n_species_list is a list of length
n_genera, of integers giving the number of species in each genera.
Anyway, all that is just to explain what I am doing. Implemented this way I only seem to be able to get about 70% GPU utilization (an implementation without this line, where I only use
P[species|data] so its entirely on the GPU and no python list comprehension) gets more like 90-95% utilization). I want to use something like
tile but since the number of repetitions is different for different genera I don’t think I can make it work without essentially reproducing the above line of code.
Let me know if my explanation of my goal here is not fully clear.