Training and validation loss characteristics

I am working on an image reconstrction problem. I am observing the performance by training different models. For every model, in the first epoch, the difference between training loss and validation loss is almost same. Afterwards, from the second epoch the difference between training and validation loss significantly decreases from the first epoch’s loss. The differences between them becomes very low, and this low loss value tends to continues in all the epochs.
Can anyone help me understand why this is happening? Is it one form of overfitting?

When you start training, your model does very poorly on the train data since it’s never seen the data. So if you average out that first epoch and print it, you’re getting an average of “never seen this before” and “this is starting to make sense”.

By the time it gets to the test data, it’s starting to understand the problem and usually performs better. Additionally, the test data doesn’t use any dropout layers. Hence lower loss is initially expected on the test data. But eventually, these two converge, which suggests you are starting to overfit on the train data.