When I have a time series dataset and a some reference datasets, where not all points has a label. Then, I train the model to make classification. In Bert transformer, ideally, one needs to supply sequence length / maximum length. So, if my time series is of max length 365 time steps and the labels are of length 84. I set the data points without reference labels to ignore value so that they are not used in calculating the loss. Here is the question, during training, will the model give an output for all the inputs (the 365 time steps), and use the referenced labelled only in the loss calculation or will it only predict the labelled data points? I need a way to know this.
Can anyone please answer. I need to know, or perhaps this is not the way these models are trained?