Thanks again @ptrblck I am looking at visit data by customer. So within each list is a list of visits for a specific customer. Ex.
customer 1( two visits) : [[1, 2, 2], [1, 2, 2, 3, 4]]
customer 2( 1 visit): [[8, 9, 10]]
customer 3 ( two visits): [[1, 2, 2, 3, 4], [1, 2, 2, 5, 6, 7]]
Within each visit are items ordered. So customer 1 ordered items [1,2,2] on visit 1 and items [1,2,2,3,4] on visit two.
If I one-hot encode it this is the order I expect to get and this works well… but it is manual
tensor([[
Batch 1: [0., 1., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 1., 1., 1., 0.],
[0., 1., 1., 1., 1., 0., 0., 0., 0., 0., 0., 0.]],
Batch 2: [[0., 1., 1., 1., 1., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 1., 1., 0., 0., 1., 1., 1., 0., 0., 0., 0.]]])
Here the first visit of each customer is in the first batch and the second visit of each customer is in the second and so on… However, if I manually code this up I would have to create a mask etc… which is fine, but I am trying to try out Pytorch’s pack_padding approach instead. With the intent of getting the visits order maintained. How should I a nested list of visits?
Here is the encoding code:
seqs = inputs
lengths = np.array([len(seq) for seq in seqs]) - 1 # remove the last list in each cutomers's sequences for labels
n_samples = len(lengths)
maxlen = np.max(lengths)
x = torch.FloatTensor(torch.zeros(maxlen, n_samples, 12)) # maxlen = number of visits, n_samples = samples
y = torch.FloatTensor(torch.zeros(maxlen, n_samples, 12))
for idx, (seq,label) in enumerate(zip(seqs,labels)):
for xi, visit in zip(x[:,idx,:], seq[:-1]):
xi[visit] = 1.
for yi, visit in zip(y[:,idx,:], label[1:]):
yi[visit] = 1.
Thank you in advance!!