I am training a forecasting model using the TFT implementation from pytorch_forecasting library. I am preparing my dataset using the TimeSeriesDataset class from the same library. The class has a parameter which accepts to provide a dictionary of column name-label encoder mapping for label encoding the categorical variables (categorical_encoders). I am creating a dictionary as below and assigning it to this parameter in the following manner.
categorical_encoders = { ..., 'mycatvar' : NaNLabelEncoder(add_nan=True, warn=True), ... }
dataset = TimeSeriesDataSet(..., categorical_encoders=categorical_encoders, ...)
I have two questions:
- Considering I am preparing my test data separately in a different notebook, how do I ensure that the test data is encoded using the same label mapping?
- When should I use
NaNLabelEncoder(..).fit()
andNaNLabelEncoder(..).transform()
? I am assuming it is only when I use NaNLabelEncoder explicitly, without expecting TimeSeriesDataset to do it for me.