Hi, I am working on a project where I am initialising an LSTM with objects detected in an image:
For example I have categories: dog, person, car, ... -> [1,3,0,...]
where the value at each index corresponds to the numbers of detections for that category in the image (so 1 dog, 3 people, 0 cars …).
Currently I am just using this frequency count vector to initialise the LSTM hidden/cell state: [1,3,0,...] -> Linear layer -> LSTM init
I am using a object detector with 1,600 categories, some of the categories are similar such as car, bus and truck. I am looking for a way to represent these similarities using nn.Embeddings
. My plan is to use bert to encode object labels but I am wondering how I can use the bert embeddings with the objects frequency ([1,3,0,...]
) to generate a 1D vector to initialise the LSTM.