Check if a value is present in each group of another va

Doulson_Mathis · May 16, 2022, 9:41am

Hello !
forum python

I have a dataframe with two columns : an ‘ID’ column and a column ‘V1’ with different values between 1 and 3.
I’d like to mark each ID with 1 if value 3 is present for this group and 0 otherwise.

The attachment is more clear. I’d like to obtain the column ‘Result’, based on each group of ‘Id’ and on the occurence of “'3” in the column “Var1”

Thank you by advance for your help !

Andrei_Cristea · May 16, 2022, 11:35am

This is a pandas question rather than a PyTorch question, but you can try this:

df["Result"] = (df["Var1"] == 3).groupby(df["ID"]).transform("sum").astype(bool).astype(int)

Doulson_Mathis · May 16, 2022, 12:15pm

Thank you very much, it works !
But I don’t get what the “.transform(“sum”)” does ? What does is sum exactly ?

Andrei_Cristea · May 16, 2022, 12:27pm

Basically what this ugly chain does is:

Create a column that is populated with True if Var1 == 3, and False otherwise:
(df["Var1"] == 3)
Group by ID and, inside each group, sum these boolean values. So you’re going to get some integer that basically counts how many times 3 appears inside each ID group:
.groupby(df["ID"]).transform("sum")
Then cast this sum to bool and then back to int, as a quick and dirty way to collapse all numbers >=1 to 1 and keep zeros intact:
.astype(bool).astype(int)