What is the correct way to use NaNLabelEncoder?

Rohit · May 18, 2022, 8:42am

I am training a forecasting model using the TFT implementation from pytorch_forecasting library. I am preparing my dataset using the TimeSeriesDataset class from the same library. The class has a parameter which accepts to provide a dictionary of column name-label encoder mapping for label encoding the categorical variables (categorical_encoders). I am creating a dictionary as below and assigning it to this parameter in the following manner.

categorical_encoders = { ..., 'mycatvar' : NaNLabelEncoder(add_nan=True, warn=True), ... }

dataset = TimeSeriesDataSet(..., categorical_encoders=categorical_encoders, ...)

I have two questions:

Considering I am preparing my test data separately in a different notebook, how do I ensure that the test data is encoded using the same label mapping?
When should I use NaNLabelEncoder(..).fit() and NaNLabelEncoder(..).transform() ? I am assuming it is only when I use NaNLabelEncoder explicitly, without expecting TimeSeriesDataset to do it for me.