Just as there are “Missing tokens” in language modeling problems to direct prediction to only the missing “words”, is there an equivalent “missing token” for numerical values to *impute* missing values in a time series? This feels trivial but I’m seemingly stumped on how to do this!

Hello, If I have understood you question correctly. You can treat your “numerical values in a time series” as *series of characters* or *series of words* in a language model.

@AbdulsalamBande I see, but I feel that is cumbersome. Say I want really high precision, a floating point number broken into “characters” would be really long.

Is there a standard workflow where the network’s attention can be directed to only missing numerical values? For instance, if my input is

```
input = np.array([[0.5, 1.2, 0.2], [0.6, 1.3, 0.3], [np.nan, np.nan, np.nan], [0.8, 1.4, 0.5], [0.9, 1.5, 0.6]])
```

I’d like the model to 1) copy over the input values where it is not missing 2) calculate the loss based on the prediction of only the missing values (np.nan).