Hi, I am working on a project where I am initialising an LSTM with objects detected in an image:
For example I have categories:
dog, person, car, ... -> [1,3,0,...] where the value at each index corresponds to the numbers of detections for that category in the image (so 1 dog, 3 people, 0 cars …).
Currently I am just using this frequency count vector to initialise the LSTM hidden/cell state:
[1,3,0,...] -> Linear layer -> LSTM init
I am using a object detector with 1,600 categories, some of the categories are similar such as car, bus and truck. I am looking for a way to represent these similarities using
nn.Embeddings. My plan is to use bert to encode object labels but I am wondering how I can use the bert embeddings with the objects frequency (
[1,3,0,...]) to generate a 1D vector to initialise the LSTM.