I have a dataframe with two columns : an ‘ID’ column and a column ‘V1’ with different values between 1 and 3.
I’d like to mark each ID with 1 if value 3 is present for this group and 0 otherwise.
The attachment is more clear. I’d like to obtain the column ‘Result’, based on each group of ‘Id’ and on the occurence of “'3” in the column “Var1”
Thank you by advance for your help !
This is a pandas question rather than a PyTorch question, but you can try this:
df["Result"] = (df["Var1"] == 3).groupby(df["ID"]).transform("sum").astype(bool).astype(int)
Thank you very much, it works !
But I don’t get what the “.transform(“sum”)” does ? What does is sum exactly ?
Basically what this ugly chain does is:
- Create a column that is populated with True if Var1 == 3, and False otherwise:
(df["Var1"] == 3)
- Group by ID and, inside each group, sum these boolean values. So you’re going to get some integer that basically counts how many times 3 appears inside each ID group:
- Then cast this sum to bool and then back to int, as a quick and dirty way to collapse all numbers >=1 to 1 and keep zeros intact: