Filter Out Undesired Rows

Marina_Drygala · November 6, 2018, 4:57pm

I am generating artifical data. I would like to filter out the rows of my input tensor that don’t satisfy a certain condition and then save the indices so that I can remove the corresponding rows from my output tensor.

ptrblck · November 6, 2018, 5:08pm

Would it be possible to use your condition to index the tensor?
Here is a small example:

x = torch.randn(10, 2)
condition = x > 0.
row_cond = condition.all(1)
x[row_cond, :]

Marina_Drygala · November 6, 2018, 5:17pm

I have a function that computes a value using all the inputs in a row, and I would like to filter out the rows that don’t meet a certain threshold for that value.

Marina_Drygala · November 6, 2018, 5:18pm

Let me know if you want me to be more specific

ptrblck · November 6, 2018, 5:49pm

Could you use this calculated value to index your tensor?
Let’s say your function computes the sum of each row:

value = x.sum(1)
x[value>threshold]

If that doesn’t work, could you post the shape of your tensor and the value you’ve calculated?

Marina_Drygala · November 6, 2018, 7:49pm

A small example of what I am doing, with 5 training examples:

Input:
tensor([[ 0.0166, -0.2023, -0.2503, -0.3227, -0.2823, 0.8440],
[ 0.4075, 0.0052, -0.7873, -0.3248, 0.1329, 0.3014],
[ 0.2826, 0.4441, 0.2709, 0.4514, 0.2911, 0.6008],
[-0.1225, 0.0034, -0.2977, 0.3847, 0.5563, 0.6625],
[-0.3808, -0.5172, 0.4302, -0.2792, 0.1753, 0.5419]]
Output:
tensor([[ 0.2294, -0.2380, 0.2742, -0.0511, 0.4272, 0.2381, -0.1149, -0.8085,
0.2283, -0.8853, 0.1314, 0.0665, -0.2199, 0.8177, 0.0667, 0.4147],
[ 0.4232, -0.5899, -0.3844, 0.9617, -0.9795, -0.0679, -0.0792, 0.7093,
-0.0951, 0.2633, -0.0480, -0.5599, -0.5668, -0.4858, -0.9084, -0.6490],
[ 0.2353, 0.6581, 0.0493, -0.4584, 0.4395, -0.3839, -0.2215, -0.5482,
-0.3140, -0.9266, 0.4267, 0.3888, 0.1986, 0.4910, 0.4238, 0.0442],
[ 0.1059, 0.0764, 0.5336, 0.6717, 0.7181, 0.5796, -0.2438, -0.0445,
-0.2032, 0.5817, 0.1111, 0.9255, 0.5072, -0.8547, 0.2925, 0.9609],
[ 0.8882, -0.0157, 0.3318, -0.9381, -0.3188, 0.4876, -0.9110, 0.8712,
-0.6576, 0.3162, -0.0379, 0.1762, 0.0969, -0.9348, -0.2148, -0.6321]])

I have a function theta that takes a row of my input tensor and outputs a value.

if I apply theta to each row of my input I get:
tensor([0.2515, 0.2275, 0.2988, 0.2581, 0.2819])

Now if I select 0.26 as my threshold value I would like to filter out rows 0,1,3 from my input and output tensors and be left with just 2 and 4.

ptrblck · November 6, 2018, 7:58pm

Thanks for the example.
Let’s call the first tensor input, the second output and the last theta.
Would the following work:

threshold = 0.26
idx = theta > threshold
input_filt = input[idx]
output_filt = output[idx]

If you want to keep the negation, you could use:

input_filt = input[~idx]
output_filt = output[~idx]

or just flip the > sign.

Let me know, if that’s what you want.

J_Johnson · November 14, 2021, 2:49pm

@ptrblck - This is a very elegant solution. Thank you. Is there a way to apply this in the case where the mask should remove values that are equal to the previous value?

For example, suppose a tensor has:
tensor([
[0, 2.4],
[0, -4.1],
[1, 0.5],
[0, 1.6]
])
And so the mask should only keep the last two values since in the first column they are not equal to the previous value.
tensor([
[1, 0.5],
[0, 1.6]
])

How would you suggest applying a mask in this case?

J_Johnson · November 14, 2021, 3:08pm

Figured it out. Pretty simple actually.

condition = x[1:, 0:1] != x[:-1, 0:1]
row_cond=condition.all(1)
y=x[1:,:]

y=y[row_cond,:]

Cheers!