How training a model works

MA_CASANDRA_QUILANG · November 29, 2020, 7:39am

I am having different errors training my model that supposed to segment a given image into different categories. Right now I just wanna understand what happens in these steps that is usually used in training scripts:

outputs = model(batch_img_train)
          loss = loss_function(outputs, batch_mask_train)
          loss.backward
          optimizer.step()

I just dont understand what is the output of OUTPUTS and what LOSS is gonna do with my OUTPUTS and MASK, i just want to understand its inner working. Can someone explain it?

MA_CASANDRA_QUILANG · November 29, 2020, 7:42am

I tried doing this by the way and I dont understand what I printed

BATCH_SIZE = 10
EPOCHS = 1

def train(model):
  model.train()
  for epoch in range(EPOCHS):
      for i in tqdm(range(0, len(img_train), BATCH_SIZE)): 
          batch_img_train = img_train[i:i+BATCH_SIZE].view(-1, 3, 224, 224)
          batch_mask_train = mask_train[i:i+BATCH_SIZE].view(-1, 1, 224, 224 )

          model.zero_grad()

          outputs = model(batch_img_train)
          loss = loss_function(outputs, batch_mask_train)
          loss.backward
          optimizer.step()

          return outputs, loss
          

outputs, loss = train(model)

print(outputs[0])
print(loss)

tensor([[[-0.0091, -0.1961,  0.0587,  ..., -0.1641, -0.0139, -0.2890],
         [-0.0064,  0.0030, -0.1327,  ...,  0.0016, -0.0392,  0.0583],
         [ 0.0580, -0.1432,  0.0927,  ..., -0.0062, -0.0150, -0.2169],
         ...,
         [-0.0160, -0.0555, -0.0218,  ..., -0.0440,  0.0779,  0.0119],
         [ 0.0780, -0.2582,  0.3273,  ..., -0.1301, -0.0121, -0.3491],
         [-0.0095,  0.0300,  0.2434,  ...,  0.0927, -0.1081,  0.1011]],

        [[ 0.0240,  0.0760,  0.1297,  ..., -0.0281,  0.1930, -0.0558],
         [ 0.2875, -0.0392,  0.1630,  ..., -0.2731,  0.1639, -0.1631],
         [ 0.1795,  0.1011,  0.0933,  ..., -0.1308,  0.1352, -0.1574],
         ...,
         [ 0.2370, -0.0927,  0.1744,  ...,  0.0010,  0.2705, -0.2871],
         [ 0.2685,  0.0470,  0.0728,  ..., -0.0878,  0.3259, -0.0947],
         [ 0.0521, -0.0432,  0.2411,  ..., -0.0805,  0.0145, -0.1734]],

        [[-0.0826, -0.0991, -0.0454,  ..., -0.0914, -0.0570, -0.1069],
         [-0.0284, -0.2223,  0.2041,  ..., -0.2442, -0.0794, -0.2244],
         [-0.1062, -0.1029,  0.2294,  ..., -0.0914, -0.1032,  0.0496],
         ...,
         [-0.0181, -0.2399,  0.0967,  ..., -0.3608, -0.0362, -0.2599],
         [ 0.0174, -0.0861, -0.0526,  ...,  0.0006, -0.0621,  0.0562],
         [-0.0683, -0.2384, -0.1297,  ..., -0.2269, -0.1719, -0.2036]],

        ...,

        [[ 0.0466,  0.0729,  0.1712,  ...,  0.0808, -0.0174,  0.0344],
         [ 0.0591,  0.1214,  0.2544,  ..., -0.1711,  0.0215, -0.1528],
         [ 0.0919,  0.0274, -0.1394,  ...,  0.0419,  0.1209,  0.0010],
         ...,
         [ 0.1275, -0.0068,  0.1960,  ..., -0.0925,  0.0209, -0.0808],
         [-0.0907, -0.0289,  0.0956,  ..., -0.0043,  0.0141, -0.0482],
         [-0.0100, -0.0397,  0.1704,  ..., -0.0348,  0.0571,  0.0355]],

        [[-0.1661, -0.2054, -0.2219,  ..., -0.3749, -0.1241, -0.1909],
         [ 0.0185, -0.1433, -0.1410,  ..., -0.1159,  0.0940, -0.0041],
         [-0.1563, -0.1719, -0.0610,  ...,  0.0081,  0.0230, -0.1936],
         ...,
         [-0.0505, -0.0652, -0.1203,  ...,  0.0068,  0.1381, -0.0275],
         [-0.0941, -0.2070, -0.1704,  ..., -0.1199, -0.0481, -0.2115],
         [-0.0044, -0.0275, -0.1157,  ...,  0.0380, -0.0144,  0.1001]],

        [[-0.0658,  0.0374,  0.0149,  ...,  0.2753, -0.0432,  0.1743],
         [ 0.3474,  0.0585,  0.2438,  ...,  0.0770,  0.1662,  0.0813],
         [-0.0568,  0.0906,  0.1045,  ...,  0.1397,  0.1213,  0.0352],
         ...,
         [ 0.3072,  0.2205,  0.1899,  ...,  0.0265,  0.2470,  0.0975],
         [-0.1063,  0.1827,  0.0146,  ...,  0.1447, -0.0308,  0.0969],
         [ 0.1026,  0.1702,  0.2469,  ...,  0.0686,  0.1107,  0.1228]]],
       grad_fn=<SelectBackward>)
tensor(105.4350, grad_fn=<MseLossBackward>)

ptrblck · November 30, 2020, 1:39am

The outputs tensor is created by your model and represents the output of the forward method.
Usually it would contain e.g. class logits for your use case.
The loss_function calculates the loss, which will be used to compute the gradients of this loss w.r.t. all parameters of the model during the loss.backward() call. The optimizer.step() uses the gradients (and internal estimates depending which optimizer is used) to update all passed parameters.

I would recommend to take a look at some courses e.g. FastAI, which might be a good starter.