What is the cause of Training loss lower than validation loss

My training loss is lower than my validation loss, I need to know the cause and how to scale it up?

How less?
Could you please show some plots/values?

train loss —- 4.63e-5 valid loss —- 0.0094
train loss — 0.000427 valid loss —- 0.0020
train loss —- 0.000344 valid loss —- 0.0025

It seems you’re just overfitting your training data.

But I added dropout in my model.

Hi @Rexedoziem,

Can you share some code / more info on what you’re modelling?

Dropout won’t solve everything, but it can help. You might have too large a model for example, which will have a lower bias (as in relative error to the ground-truth), but has a higher variance (variance in predictions). Vary the size of your network and see if you get similar results.

1 Like

import torch.nn as nn
import torch
from transformers import DebertaModel, DebertaTokenizer
import transformers

class DebertaModel(nn.Module):
def init(self):
super(DebertaModel, self).init()
self.deberta = BertModel.from_pretrained(‘…/input/bert-base-uncased’, return_dict=False)
self.dropout = nn.Dropout(0.1)
self.classified = nn.Linear(768, config.out_features)

def forward(self, input_ids, attention_mask, token_type_ids):
    _, output = self.deberta(input_ids, attention_mask, token_type_ids)
    output_dropout = self.dropout(output)
    output = self.classified(output_dropout)

    return output

model = DebertaModel()
model.to(device)

from transformers import AdamW, get_linear_schedule_with_warmup
param_optimizer = list(model.named_parameters())
no_decay = [‘bias’, ‘LayerNorm.bias’, ‘LayerNorm.weight’]
optimizer_parameters = [
{‘params’: [p for n, p in param_optimizer if not any(nd in n for nd in no_decay)], ‘weight_decay’ : 0.001},
{‘params’: [p for n, p in param_optimizer if any(nd in n for nd in no_decay)], ‘weight_decay’ : 0.00}
]

num_train_step = int(len(train_df) / config.TRAIN_BATCH_SIZE * config.EPOCHS)
optimizer = torch.optim.AdamW(optimizer_parameters, lr=1e-3)
scheduler = get_linear_schedule_with_warmup(
optimizer,
num_warmup_steps=0,
num_training_steps= num_train_step
)

I believe I need to clean the text first before sending it to the model which I can show you.

df[‘full_text’][1]

Out[4]:

“When a problem is a change you have to let it do the best on you no matter what is happening it can change your mind. sometimes you need to wake up and look what is around you because problems are the best way to change what you want to change along time ago. A\n\nproblem is a change for you because it can make you see different and help you to understand how tings wok.\n\nFirst of all it can make you see different then the others. For example i remember that when i came to the United States i think that nothing was going to change me because i think that nothing was going to change me because everything was different that my country and then i realist that wrong because a problem may change you but sometimes can not change the way it is, but i remember that i was really shy but i think that change a lot because sometimes my problems make me think that there is more thing that i never see in my life but i just need to see it from a different way and dont let nothing happened and ruing the change that i want to make because of just a problem. For example i think that nothing was going to change me and that i dont need to be shy anymore became i need to start seeing everything in a different ways because you can get mad at every one but you need to know what is going to happened after,\n\npeople may see you different but the only way that you know how to change is to do the best and don’t let nothing or not body to change nothing about you. The way you want to change not one have that and can’t do nothing about it because is your choice and your problems and you can decide what to do with it.\n\nsecond of all can help you to understand how things work. For instance my mom have a lot of problems but she have faith when she is around people, my mom is scare of high and i’m not scare of high i did not understand why my mos is scare of high and in not scare of high and every time i see my mom in a airplane it make me laugh because she is scare and is funny, but i see it from a different way and i like the high but also she have to understand that hoe things work in other people because it can no be the same as you. For example i think that my mom and me are different because we are and i have to understand that she does not like high and i need to understand that. to help someone to understand how things work you need to start to see how things work in that persons life.\n\nA problem is a change for you and can make you a different and help you to understand. Everyone has a different opinion and a different was to understand then others. everyone can see the different opinion and what other people think.”

So I think I need to clean it first……