df_train.head():
country league home_team away_team home_odds draw_odds away_odds home_score away_score dow month
0 Brazil Copa do Nordeste Sport Recife Imperatriz 1.36 4.31 7.66 2 2 4 2
1 Brazil Copa do Nordeste ABC America RN 2.62 3.30 2.48 2 1 6 2
2 Brazil Copa do Nordeste Frei Paulistano Nautico 5.19 3.58 1.62 0 2 6 2
3 Brazil Copa do Nordeste Botafogo PB Confianca 2.06 3.16 3.50 1 1 6 2
4 Brazil Copa do Nordeste Fortaleza Ceara 2.19 2.98 3.38 1 1 6 2
df_test.shape:
(76544, 11)
df_test.head()
country league home_team away_team home_odds draw_odds away_odds home_score away_score dow month
0 World Club Friendly Westerlo Gent 2.93 3.47 2.19 NaN NaN 4 6
1 Malaysia Super League Johor DT Selangor 1.27 5.59 8.26 NaN NaN 4 6
2 Argentina Reserve League Lanus 2 River Plate 2 2.54 3.12 2.65 NaN NaN 4 6
3 Asia AFC Cup Bali United Kedah 1.58 4.08 4.93 NaN NaN 4 6
4 Ethiopia Premier League Defence Force Adama City 2.93 2.16 3.38 NaN NaN 4 6
df_test.shape:
(599, 11)
I perform encoding in sklearn using pandas as:
def encode_features(df_train, df_test):
features = ['country', 'league', 'home_team', 'away_team']
df_combined = pd.concat([df_train[features], df_test[features]])
for feature in features:
le = preprocessing.LabelEncoder()
le = le.fit(df_combined[feature])
df_train[feature] = le.transform(df_train[feature])
df_test[feature] = le.transform(df_test[feature])
return df_train, df_test
df_train, df_test = encode_features(df_train, df_test)
What is the best way to encode string values using pytorch?