Hi, professionals. I’m trying to use PatchTST Model(Multivariate Timeseries forecasting using Transformers). I found out “tsai” which makes using PatchTST model easier, so I try to use tsai module in my project.
This is my X and y shape. ((38874, 23, 6), (38874, 1, 1))
I use 23 columns to predict one columns. My dataframe has 38874 rows.
This is my initial hyperparameter.
arch_config = dict(
n_layers=3, # number of encoder layers
n_heads=4, # number of heads
d_model=16, # dimension of model
d_ff=128, # dimension of fully connected network
attn_dropout=0.0, # dropout applied to the attention weights
dropout=0.3, # dropout applied to all linear layers in the encoder except q,k&v projections
patch_len=4, # length of the patch applied to the time series to create patches
stride=2, # stride used when creating patches
padding_patch=True, # padding_patch)
This is my model architecture.
PatchTST (Input shape: 16 x 23 x 6)
Layer (type) Output Shape Param # Trainable
16 x 23 x 1
RevIN 46 True
16 x 23 x 8
ReplicationPad1d
16 x 4 x 3
Unfold
16 x 23 x 3 x 16
Linear 80 True
Dropout
Linear 272 True
Linear 272 True
Linear 272 True
Dropout
Linear 272 True
Dropout
Dropout
16 x 16 x 3
Transpose
BatchNorm1d 32 True
16 x 3 x 16
Transpose
16 x 3 x 128
Linear 2176 True
GELU
Dropout
16 x 3 x 16
Linear 2064 True
Dropout
16 x 16 x 3
Transpose
BatchNorm1d 32 True
16 x 3 x 16
Transpose
Linear 272 True
Linear 272 True
Linear 272 True
Dropout
Linear 272 True
Dropout
Dropout
16 x 16 x 3
Transpose
BatchNorm1d 32 True
16 x 3 x 16
Transpose
16 x 3 x 128
Linear 2176 True
GELU
Dropout
16 x 3 x 16
Linear 2064 True
Dropout
16 x 16 x 3
Transpose
BatchNorm1d 32 True
16 x 3 x 16
Transpose
Linear 272 True
Linear 272 True
Linear 272 True
Dropout
Linear 272 True
Dropout
Dropout
16 x 16 x 3
Transpose
BatchNorm1d 32 True
16 x 3 x 16
Transpose
16 x 3 x 128
Linear 2176 True
GELU
Dropout
16 x 3 x 16
Linear 2064 True
Dropout
16 x 16 x 3
Transpose
BatchNorm1d 32 True
16 x 3 x 16
Transpose
16 x 23 x 48
Flatten
16 x 23 x 1
Linear 49 True
Total params: 16,351
Total trainable params: 16,351
This is my running function and starting code.
learn = TSForecaster(X, y, splits=splits, batch_size=16, path=“models”,
arch=“PatchTST”, arch_config=arch_config,
metrics=[r2_score, mae, rmse], cbs=ShowGraph())
n_epochs = 100
#lr_max = learn.lr_find().valley
lr_max = 0.0025
learn.fit_one_cycle(n_epochs, lr_max=lr_max)
learn.export(‘patchTST.pt’)
Why I have errors in mismatch tensor size?!! I don’t know where tensor number 368 and 16 came from. Does my hyper parameter and data size are mismatching at first? I really can’t find out why…
Please help me!!