Passing a vector alongside sequences in a batch

Hi all, I hope you are well,

I have a problem where I have a CNN with two inputs, one is a 75 x 4 matrix and the other one is a single value. With a batch size of 512, the size of the training batch matrix is naturally 512 x 75 x 4, and the network trains fine without the second input.
But, as soon as I try to add the vector of single values, each 1 x 75 x 4 Matrix is passed alongside the whole vector (512 entries) which raises an error.

During the training, I would like to forward pass each 1 x 75 x 4 matrix alongside its corresponding single value but Iāve yet to find how,

Mat

Could you post the error message and a small code snippet so that we could have a look?

Hi, thank you for your response, sure :

Here is my collate function which retrieves the length of one-hot-encoded sequences lengths, and then returns padded sequence, and lengths. (tag corresponds to the y, this is a regression problem).

``````def collate(batch):

## Get sequence lengths
lengths = torch.IntTensor([np.array(t['seq']).shape[0] for t in batch  ])
batch_ = [torch.Tensor(t['seq']).to(device) for t in batch]

# Get tags
tags = [t['tag'] for t in batch]
tags = torch.FloatTensor(tags)

return batch_, lengths, tags
``````

Here is a snippet of my forward function in my CNN :

``````    def forward(self, x, length):

# Would like to resize x based on the length here :
# Something like x = x[:length,:]

x = self.conv1(x)
x = F.relu(x)
x = self.batch1(x)
``````

And this is my training loop :

``````    for i_batch, item in enumerate(train_loader):

seqs = item[0]
tags = item[2]
lengths = item[1]
# (1) :
outputs = custom_model(seqs, lengths)
``````

the line after #(1): raises an error because lengths is a vector. But I would like to apply the resize function to each of my sequences based on the corresponding value in the length vector.

If you would have more questions, donāt hesitate !

Mat

Thanks for the code!
Could you please also post the error message?

Welcome !
Instead of the #resize line, Iāve used :

``````def forward(self, x, length):
x = torch.narrow(0,0,length) #this line instead of the resize line in the previous forward code
x = self.conv1(x)
``````

And it raises the error : narrow(): argument ālengthā (position 4) must be int, not Tensor

Further analysis showed that ālengthā is my 512 x 1 tensor with the 512 different lengths (512 being the batch size).
I would like to apply the narrow function to each sequence, so that each sequence is ānarrowedā by itās corresponding length in the length vector.

Unfortunately this wonāt work. as `x` would contain tensors with different lengths, if I understand the use case correctly.
Could you zero-pad `x` and use `lengths` as a mask?

1 Like

Hi again !
Here is an explanation of my use-case :

I have one-hot encoded sequences which are L x 4 in dimensions (L being the lenght of the sequence).
They look like this :
[[1,0,0,0],
[1,0,0,0],
[1,0,0,0],
[0,1,0,0],
[0,0,1,0],
[1,0,0,0],
[0,0,0,1],
[1,0,0,0]] (just a silly example).

These sequences are associated with a value that I am trying to predict.

They have various sizes, and Iāve tried 0 padding them before (which works) but I am afraid this will lead to some problems because we are looking at aligned motifs. (example a [1,0,0,0] followed by a [0,0,1,0] that are always at the same position). I am afraid that padding would remove this information.
As for a mask, Iāve not thought of this, do you have any general idea how it could be implemented so that the network ācutsā the padded part of the sequences ?

Edit : For now Iāve gone around the problem by using a batch_size of one, but it greatly increases computing time, which is going to be a burden when I try to scale this up to datasets of over 1,2 or 3M sequences.

What would be the desired shape of `x` after the `narrow` operation?
Would it be in the desired use case `[batch_size, 4, varying_shape]`?