Tensor Manipulation: Splitting then Padding a New Dimension

HHousen · March 31, 2020, 3:22am

Lets say I have a tensor of torch.Size([32, 53, 768]). How would I go about converting it to torch.Size([32, 12, 5, 768])? When this conversion is computed, the dimension with original size 53 should be split at variable indexes (there will be 11 indexes so that 12 new sequences are formed). If the distance between two indexes is less than 5 then zeros should be added so that each sequence is 5 units long.

Essentially, I would like to split a sequence at certain indexes, pad the new sequences to some value, and end up with a tensor of torch.Size([15, 10]) (if starting with torch.Size([100]), splitting to create 15 sequences, and padding to 10).

I would like to do this without a loop since it will happen in the forward pass of a model and loops dramatically decrease performance in my understanding. It would be fine if a loop was used if it would not greatly impact model training time or performance.

My end goal is to be able to take the mean across the newly formed padded dimension and, in the case of the first example, end up with torch.Size([32, 12, 1, 768].

Example Image: See Post 5

Thank you. Any tips or suggestions are welcome.

charan_Vjy · March 31, 2020, 4:06am

A couple of clarifications

How do you decide the split at 53? Is it arbitarary(number dependent)? To rephrase, how do you decide that 53 has to split into (12,5). 12*5 = 60, which means there 7 values would have been padded.
Is the padding arbitarary or do you want to pad all the sequences equally? For example, I could pad 7 to 53 to obtain a length of 60 and then split as (12,5). Or do you have any other scheme in mind?

HHousen · March 31, 2020, 4:17am

Thanks for the help.

It is arbitrary in this example, meaning those numbers will vary, but they will always be provided. It sometimes may occur that no sequences need padding but it will never happen such that all sequences need padding.
For the padding, all sequences should be padded equally. So, once the large original sequence is split at the indexes provided, each sub-sequence will be padded to the constant length.

charan_Vjy · March 31, 2020, 4:39am

Can you clarify the padding scheme with an example?

HHousen · March 31, 2020, 2:53pm

Yeah, sure.

charan_Vjy · April 1, 2020, 6:22am

I’m not sure whether this can be done without the use of loops. Will try. If I able to do it, will let you know.

HHousen · April 1, 2020, 1:17pm

Thank you so much for trying.