Runtime Error : The size of tensor a (368) must match the size of tensor b (16) at non-singleton dimension 0

Hi, professionals. I’m trying to use PatchTST Model(Multivariate Timeseries forecasting using Transformers). I found out “tsai” which makes using PatchTST model easier, so I try to use tsai module in my project.

This is my X and y shape. ((38874, 23, 6), (38874, 1, 1))
I use 23 columns to predict one columns. My dataframe has 38874 rows.

This is my initial hyperparameter.
arch_config = dict(
n_layers=3, # number of encoder layers
n_heads=4, # number of heads
d_model=16, # dimension of model
d_ff=128, # dimension of fully connected network
attn_dropout=0.0, # dropout applied to the attention weights
dropout=0.3, # dropout applied to all linear layers in the encoder except q,k&v projections
patch_len=4, # length of the patch applied to the time series to create patches
stride=2, # stride used when creating patches
padding_patch=True, # padding_patch)


This is my model architecture.
PatchTST (Input shape: 16 x 23 x 6)

Layer (type) Output Shape Param # Trainable

                 16 x 23 x 1         

RevIN 46 True


                 16 x 23 x 8         

ReplicationPad1d


                 16 x 4 x 3          

Unfold


                 16 x 23 x 3 x 16    

Linear 80 True
Dropout
Linear 272 True
Linear 272 True
Linear 272 True
Dropout
Linear 272 True
Dropout
Dropout


                 16 x 16 x 3         

Transpose
BatchNorm1d 32 True


                 16 x 3 x 16         

Transpose


                 16 x 3 x 128        

Linear 2176 True
GELU
Dropout


                 16 x 3 x 16         

Linear 2064 True
Dropout


                 16 x 16 x 3         

Transpose
BatchNorm1d 32 True


                 16 x 3 x 16         

Transpose
Linear 272 True
Linear 272 True
Linear 272 True
Dropout
Linear 272 True
Dropout
Dropout


                 16 x 16 x 3         

Transpose
BatchNorm1d 32 True


                 16 x 3 x 16         

Transpose


                 16 x 3 x 128        

Linear 2176 True
GELU
Dropout


                 16 x 3 x 16         

Linear 2064 True
Dropout


                 16 x 16 x 3         

Transpose
BatchNorm1d 32 True


                 16 x 3 x 16         

Transpose
Linear 272 True
Linear 272 True
Linear 272 True
Dropout
Linear 272 True
Dropout
Dropout


                 16 x 16 x 3         

Transpose
BatchNorm1d 32 True


                 16 x 3 x 16         

Transpose


                 16 x 3 x 128        

Linear 2176 True
GELU
Dropout


                 16 x 3 x 16         

Linear 2064 True
Dropout


                 16 x 16 x 3         

Transpose
BatchNorm1d 32 True


                 16 x 3 x 16         

Transpose


                 16 x 23 x 48        

Flatten


                 16 x 23 x 1         

Linear 49 True


Total params: 16,351
Total trainable params: 16,351

This is my running function and starting code.
learn = TSForecaster(X, y, splits=splits, batch_size=16, path=“models”,
arch=“PatchTST”, arch_config=arch_config,
metrics=[r2_score, mae, rmse], cbs=ShowGraph())

n_epochs = 100
#lr_max = learn.lr_find().valley
lr_max = 0.0025
learn.fit_one_cycle(n_epochs, lr_max=lr_max)
learn.export(‘patchTST.pt’)

Why I have errors in mismatch tensor size?!! I don’t know where tensor number 368 and 16 came from. Does my hyper parameter and data size are mismatching at first? I really can’t find out why…
Please help me!!

Your code is unfortunately not properly formatted and contains seemingly random values without any explanation what these refer to, so I’m also unsure where the mismatch is coming from.

Hi I’m Koo from South Korea, currently majoring in Industrial Engineering in Korea University. First of all, I really appreciate your feedback from my question.
Question that I posted was from multivariate timeseries forecasting problem, using transformer(patchTST).
My objective was to predict plant’s temperature(temperature sensor in market), using various variables in X label.

Since I realize that you are so professional and intelligent in Pytorch, I wish I can hear more about my problem.
I attached my ipynb file and my dataset, which I try to use in the ipynb file.
Ipynb file’s lines are not much since it uses “tsai” module which contains original patchTST model from official github.
Would you like to give me a favor of looking through my codes and dataset?

If you can’t I still appreciate your help and works you’ve made in this site.
Have a nice day and take care.

Best regards
Koo

2023년 12월 15일 (금) 오후 11:56, ptrblck via PyTorch Forums <noreply@discuss.pytorch.org>님이 작성:

(Attachment PatchTST_Practice.ipynb is missing)

(Attachment airconditioner.csv is missing)

In the working TSAI PatchTST tutorial, the predicted features are the same number as the input features.
I have a similar case to yours where I want to predict a subset of the input timeseries features.
The error is pointing inside TSAI code, it looks like they are not expecting such case.

File ~\.conda\envs\tsai\lib\site-packages\torch\nn\functional.py:3315, in mse_loss(input, target, size_average, reduce, reduction)
   3308 r"""mse_loss(input, target, size_average=None, reduce=None, reduction='mean') -> Tensor
   3309 
   3310 Measures the element-wise mean squared error.
   3311 
   3312 See :class:`~torch.nn.MSELoss` for details.
   3313 """
   3314 if has_torch_function_variadic(input, target):
-> 3315     return handle_torch_function(
   3316         mse_loss, (input, target), input, target, size_average=size_average, reduce=reduce, reduction=reduction
   3317     )
   3318 if not (target.size() == input.size()):
   3319     warnings.warn(
   3320         f"Using a target size ({target.size()}) that is different to the input size ({input.size()}). "
   3321         "This will likely lead to incorrect results due to broadcasting. "
   3322         "Please ensure they have the same size.",
   3323         stacklevel=2,
   3324     )

My impression is that changing TSAI code would be required to make it work, correcting the input target to filter out unused features to calculate MSELoss against the target features. That been said even if that works they suggest that could lead to incorrect results.

Anyone else has more insights here?