I tried to play with Pytorch’s TemporalFusionTensor model (beginner level in Pytorch) following this guide here.
I’m dumping it all on the CPU and are not using a GPU. To my surprise, even for such a small dataset, Pytorch runs out of memory, and Linux is killing the process. I’ve tried a number of things but the only thing that really makes a difference is to reduce the number of hidden layers by setting hidden_size=160 → 16.
I couldn’t run this on a 128GB RAM machine and went to one with 512GB when hidden_size=160 and hidden_continuous_size=160.
I generated another dataset myself, univariate timeseries that is [84961 rows x 8 columns], and for a hidden_size=64, hidden_continuous_size=32 it consumes 80GB RAM. Am I missing something or it should be that hungry? Is there anything I can do to optimize the memory consumption?
Here’s the code
import numpy as np
import pandas as pd
from lightning.pytorch.callbacks import EarlyStopping, LearningRateMonitor
from lightning.pytorch.loggers import TensorBoardLogger
from pytorch_forecasting import TimeSeriesDataSet, GroupNormalizer, TemporalFusionTransformer, QuantileLoss, RMSE, MAE, \
MAPE
import lightning.pytorch as pl
from torchmetrics import R2Score
data = pd.read_csv('LD2011_2014.txt', index_col=0, sep=';', decimal=',')
data.index = pd.to_datetime(data.index)
data.sort_index(inplace=True)
data.head(5)
print(data)
data = data.resample('1h').mean().replace(0., np.nan)
earliest_time = data.index.min()
df = data[['MT_002', 'MT_004', 'MT_005', 'MT_006', 'MT_008']]
print(df)
df_list = []
for label in df:
ts = df[label]
start_date = min(ts.fillna(method='ffill').dropna().index)
end_date = max(ts.fillna(method='bfill').dropna().index)
active_range = (ts.index >= start_date) & (ts.index <= end_date)
ts = ts[active_range].fillna(0.)
tmp = pd.DataFrame({'power_usage': ts})
date = tmp.index
tmp['hours_from_start'] = (date - earliest_time).seconds / 60 / 60 + (date - earliest_time).days * 24
tmp['hours_from_start'] = tmp['hours_from_start'].astype('int')
tmp['days_from_start'] = (date - earliest_time).days
tmp['date'] = date
tmp['consumer_id'] = label
tmp['hour'] = date.hour
tmp['day'] = date.day
tmp['day_of_week'] = date.dayofweek
tmp['month'] = date.month
# stack all time series vertically
df_list.append(tmp)
time_df = pd.concat(df_list).reset_index(drop=True)
# match results in the original paper
time_df = time_df[(time_df['days_from_start'] >= 1096)
& (time_df['days_from_start'] < 1346)].copy()
print(time_df)
time_df[['consumer_id', 'power_usage']].groupby('consumer_id').mean()
print(time_df)
#---------------------------------------------------
# Hyperparameters
# batch size=64
# number heads=4, hidden sizes=160, lr=0.001, gr_clip=0.1
max_prediction_length = 24
max_encoder_length = 7 * 24
training_cutoff = time_df["hours_from_start"].max() - max_prediction_length
training = TimeSeriesDataSet(
time_df[lambda x: x.hours_from_start <= training_cutoff],
time_idx="hours_from_start",
target="power_usage",
group_ids=["consumer_id"],
min_encoder_length=max_encoder_length // 2,
max_encoder_length=max_encoder_length,
min_prediction_length=1,
max_prediction_length=max_prediction_length,
static_categoricals=["consumer_id"],
time_varying_known_reals=["hours_from_start", "day", "day_of_week", "month", 'hour'],
time_varying_unknown_reals=['power_usage'],
target_normalizer=GroupNormalizer(
groups=["consumer_id"], transformation="softplus"
), # we normalize by group
add_relative_time_idx=True,
add_target_scales=True,
add_encoder_length=True,
)
validation = TimeSeriesDataSet.from_dataset(training, time_df, predict=True, stop_randomization=True)
# create dataloaders for our model
batch_size = 32
# if you have a strong GPU, feel free to increase the number of workers
train_dataloader = training.to_dataloader(train=True, batch_size=batch_size, num_workers=0)
val_dataloader = validation.to_dataloader(train=False, batch_size=batch_size * 10, num_workers=0)
early_stop_callback = EarlyStopping(monitor="val_loss", min_delta=1e-4, patience=5, verbose=True, mode="min")
lr_logger = LearningRateMonitor()
logger = TensorBoardLogger("/lightning_logs")
trainer = pl.Trainer(
max_epochs=45,
accelerator='cpu',
devices=1,
enable_model_summary=True,
gradient_clip_val=0.1,
callbacks=[lr_logger, early_stop_callback],
logger=logger)
tft = TemporalFusionTransformer.from_dataset(
training,
learning_rate=0.001,
hidden_size=160,
attention_head_size=4,
dropout=0.1,
hidden_continuous_size=160,
output_size=7,
loss=QuantileLoss(),
logging_metrics=[MAE(), RMSE(), MAPE(), R2Score()],
log_interval=10,
reduce_on_plateau_patience=4)
trainer.fit(
tft,
train_dataloaders=train_dataloader,
val_dataloaders=val_dataloader)
best_model_path = trainer.checkpoint_callback.best_model_path
print(best_model_path)
best_tft = TemporalFusionTransformer.load_from_checkpoint(best_model_path)
If it’s just that memory-intensive how do I then run a multivariate timeseries? The dataset always fits easily into RAM, but then during processing it explodes. Is there a way to do this via distributed training on a cloud environment (eg. Google Cloud)?