About time series models and system identification

Greetings to everyone!

I am a young data science researcher working on control engineering topics with DL and NN. I am also very new to DL and NN architectures. I am trying to predict the input and output sensor values of a dynamic and complex system, that is system identification. I want to get the predicted value of the temperature sensor according to the future fuel inputs of the system. There are 3 features with 300k lines of data including Train, Test and Validation datasets. Each line contains 1 minute sensor value timeseries. The minute sensor value is read. As a first step, I want to train a model like LSTM / GRU, I use the Darts library using the pytorch forecasting infrastructure. I even have some past covariates / time_varying_unknown_reals variables that I only know the past, I want to use the TFT model in the next stage. If I need to briefly describe the features of the PC I use, I have 1 GPU, tesla T4 16GB, 12 GB Ram and 5 CPU. In my first attempts, the training took too long and the results were not encouraging. I can share loss graphs if you want. I think exactly the model does not fit properly, do you think I should work on a small data and overfit it or should I use all the data and wait for long periods of time? Does Optune make the model fit properly?

Also I can only consume 1 gb of the 16 GB GPU during these trainings, do you think this is correct as gpu utalisation? Is there so little consumption because there is only 3 Column efficiency. GPU duty cycle reaches 100/100s.