Question about batch size and loss function

Yolkandwhite · March 20, 2020, 4:26am

I got my code running right but it takes too much time and loss value is too high

I found out that the dataloader isn’t getting the right batch size.
It’s getting the whole data in the model.
number of data is 3607 each (img and mask)
I want the batch size to be 1
How can I fix it??

class BasicDataset(Dataset):
    def __init__(self, imgs_dir, masks_dir):
        self.imgs_dir = imgs_dir
        self.masks_dir = masks_dir
        self.mriids = next(os.walk(self.imgs_dir))[2]
        self.maskids = next(os.walk(self.masks_dir))[2]
        
        def atoi(text):
            return int(text) if text.isdigit() else text

        def natural_keys(text):
            return [atoi(c) for c in re.split(r'(\d+)', text) ] 
        
        self.mriids = sorted(self.mriids, key = natural_keys)
        self.maskids = sorted(self.maskids, key = natural_keys)

    def __len__(self):
        return len(self.mriids)

    def __getitem__(self, idx):

        mriidx = self.mriids[idx] #img file name
        maskidx = self.maskids[idx] #mask file name
        
        mask_file = os.path.join(self.masks_dir, maskidx)
        img_file = os.path.join(self.imgs_dir, mriidx)
        
        img = Image.open(img_file).convert("RGB")
        mask = Image.open(mask_file).convert("L")
        
        mask = np.array(mask)
        img = np.array(img)
        
        mask = np.expand_dims(mask, axis=2)
        
        img = np.transpose(img, (2, 0, 1))
        mask = np.transpose(mask, (2, 0, 1))

#         obj_ids = np.unique(mask)
    
        obj_ids = np.unique(mask)
        obj_ids = obj_ids[1:]

        num_objs = len(obj_ids)
        labels = torch.ones((num_objs,), dtype=torch.int64)
        mask = torch.as_tensor(mask, dtype=torch.uint8)

        image_id = torch.tensor([idx])

        target = {}
        target["labels"] = labels
        target["masks"] = mask
        target["image_id"] = image_id
        
        return img, target

epochs = 100
batch_size = 1
lr = 0.00001
momentum = 0.99

optimizer = optim.SGD(model.parameters(), lr=lr, momentum=momentum)
loss_func = nn.BCEWithLogitsLoss().to(device)

gen = BasicDataset('/home/intern/Desktop/YH/Brain_MRI/BrainMRI_train/MRI/MRI/', '/home/intern/Desktop/YH/Brain_MRI/BrainMRI_train/mask/mask/')
train_loader = DataLoader(gen, batch_size=batch_size)

total_batch = len(gen)
print(total_batch)

model.train()
print("start training")
for epoch in range(epochs):
    t0 = time.time()
    for mri, true_mask in train_loader:
        
        mri = mri.type(torch.FloatTensor)
        true_mask = true_mask["masks"]
        true_mask = true_mask.type(torch.FloatTensor)

        mri = mri.to(device)
        true_mask = true_mask.to(device)

        pred_mask = model(mri)
        loss = loss_func(pred_mask, true_mask)

        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        
    t1 = time.time()
    print('[Epoch:{}], loss = {}, time = {}'.format(epoch+1, loss, t1-t0))
print('training Finished!')

3607 
start training [Epoch:1], loss = nan, time = 141.68572974205017 [Epoch:2], loss = nan, time = 143.46247911453247 [Epoch:3], loss = nan, time = 143.64162826538086

--------------------------------------------------------------------------- 
KeyboardInterrupt                         Traceback (most recent call last)
<ipython-input-72-0b9f410e970f> in <module>()
     19 #         true_mask = true_mask.squeeze(1)
     20 
---> 21         mri = mri.to(device)
     22         true_mask = true_mask.to(device)
     23 

KeyboardInterrupt:

Nisan_Aryal · March 20, 2020, 4:34am

from what I see in the code…the “print(’[Epoch:{}], loss = {}, time = {}’.format(epoch+1, loss, t1-t0))” …is outside the loop of train_loader so even though the batch size is 1 you are printing it only after the epoch is finish …may be because of this you are seeing that it is loading all the data as the printing is done after the epoch.

Yolkandwhite · March 20, 2020, 5:12am

I made this training code part, implementing my previous mnist code.
It work good and fast(it takes about 12sec per 1 epoch)
In that case even though the printing part was out side of loading loop and used cpu.
MNIST has 60000 training image data so I don’t think this code(batch size is 100) below is getting the data 60000*100 times. or is it???

# dataset loader
data_loader = torch.utils.data.DataLoader(dataset=mnist_train,
                                          batch_size=batch_size,   #batch_size = 100
                                          shuffle=True,
                                          drop_last=True)

for epoch in range(training_epochs):
    avg_cost = 0
    total_batch = len(data_loader)

    for X, Y in data_loader:
        # reshape input image into [batch_size by 784]
        # label is not one-hot encoded
        X = X.view(-1, 28 * 28).to(device)
        Y = Y.to(device)

        optimizer.zero_grad()
        hypothesis = linear(X)
        cost = criterion(hypothesis, Y)
        cost.backward()
        optimizer.step()

        avg_cost += cost / total_batch

    print('Epoch:', '%04d' % (epoch + 1), 'cost =', '{:.9f}'.format(avg_cost))

print('Learning finished')

Nisan_Aryal · March 20, 2020, 5:34am

total data is 60000 and batch size is 100 so to complete one epoch the no. of iteration will be 60000/100 = 600 …so if the batch size is 1 like the previous case then the no. of iteration will be 60000 … so it will take a lot of time …by the way i am confused what you want to ask? you want your batch size to be 1 from earlier post and i think the above code is working fine for batch 1 but as above code has 3607 data so the no. of iteration is 3607 for for MNIST data it takes around 12 sec (600 iteration) then for 3607 iteration it will take around 6 times more about 72 sec… i think i am confused what you want to ask please be a little specific so that i can help…

ps. the speed also depends on other factor such as size of network and other preprocessing steps…so the above calculation is just a rough estimation

Yolkandwhite · March 20, 2020, 6:34am

I think I’m misunderstanding the meaning of batch size…
I googled about batch size and it says

Batch size is a term used in machine learning and refers to the number of training examples utilized in one iteration.

If this is right than 100 training data should be loaded in one iteration.

What I thought the data in each iteration is like this.

(100/60000)
(200/60000)
(300/60000)
…
(60000/60000)

but I just found out that printing out the X, and Y in inner for loop, it was 600 data and which will lead to 100 iteration in one epoch.

result was same in the very top previous code about img and mask(batch size = 1).
what I wanted to make was getting 1 data in inner loop instead of whole 3607 data.

If i’m misunderstanding the concept of batch size, then is the batch size == number of iteration???

Nisan_Aryal · March 20, 2020, 7:32am

The meaning of batch size is loading [batch size] training data in one iteration. If your batch size is 100 then you should be getting 100 data at one iteration. batch size doesnt equal to no. of iteration unless there is a coincidence. well looking at the code i cant find the problem check the batch size once if the iteration is 100 then the batch size should be 600…make sure you arent confusing 100 with the epoch, the only variable i see that can produce 100 is epoch.

Yolkandwhite · March 20, 2020, 7:47am

As you can see in this code I printed out the number of data coming in.

# parameters
training_epochs = 1
batch_size = 100

# dataset loader
data_loader = torch.utils.data.DataLoader(dataset=mnist_train,
                                          batch_size=batch_size,
                                          shuffle=True,
                                          drop_last=True)

for epoch in range(training_epochs):
    t0 = time.time()
    avg_cost = 0
    total_batch = len(data_loader)
    i = 1
    for X, Y in data_loader:
        print(i, end=" ")
        X = X.view(-1, 28 * 28).to(device)
        Y = Y.to(device)
        print(X.shape, Y.shape)

        optimizer.zero_grad()
        hypothesis = linear(X)
        cost = criterion(hypothesis, Y)
        cost.backward()
        optimizer.step()

        avg_cost += cost / total_batch
        i += 1
        t1 = time.time()

    print('Epoch:', '%04d' % (epoch + 1), 'cost =', '{:.5f}'.format(avg_cost), 'time : ',t1-t0)
print('Learning finished')

out

1 torch.Size([100, 784]) torch.Size([100])
2 torch.Size([100, 784]) torch.Size([100])
3 torch.Size([100, 784]) torch.Size([100])
4 torch.Size([100, 784]) torch.Size([100])
5 torch.Size([100, 784]) torch.Size([100])

...

598 torch.Size([100, 784]) torch.Size([100])
599 torch.Size([100, 784]) torch.Size([100])
600 torch.Size([100, 784]) torch.Size([100])
Epoch: 0001 cost = 0.33757 time :  28.640324354171753
Learning finished

seems it got 600 data in one iteration which batch size was 100.

this is so confusing

Nisan_Aryal · March 20, 2020, 7:54am

the code is working fine you got confused.

first the file size is 28*28 pixel which equals 784. the tensor showing torch.Size([100, 784]) it tells that it is loading 100 images of size 784 in one iteration (iteration means one loop inside the data loader) the 600 that is printing is the no. of loop(iteration) it went through to complete the data in dataloader. so you see the dataloader is loading 100 images at one time. hope this clarifies your confusion

Yolkandwhite · March 20, 2020, 8:11am

Now I’m fully understood. I was confused in iteration in dataloader and outside loop of dataloader… Thank you very much for replying.

Can I ask you another question regarding to code on the very top?
I changed epoch to 10 which will make 3607 data go through the model 10 times.
I didn’t touch the rest of the hyperparameters.
the average cost were all nan.

3607
start training
Epoch: 001 cost = nan time :  143.31351041793823
Epoch: 002 cost = nan time :  144.00150156021118
Epoch: 003 cost = nan time :  143.94768810272217
Epoch: 004 cost = nan time :  144.0049524307251
Epoch: 005 cost = nan time :  143.99128079414368
Epoch: 006 cost = nan time :  143.97400498390198
Epoch: 007 cost = nan time :  143.93819332122803
Epoch: 008 cost = nan time :  144.09143900871277
Epoch: 009 cost = nan time :  143.93545937538147
Epoch: 010 cost = nan time :  143.99345064163208
training Finished!

Do you have any idea why the average cost in each epoch is all nan?

Nisan_Aryal · March 20, 2020, 8:21am

from my experience nan occurs when the loss goes to infinity due to various reasons. print the loss(in your case cost) inside the data loader loop(where you printed 1 to 600) if the loss increases exponentially and reaches nan than decrease the learning rate and see again. normally if other things are right then decreasing learning rate will do. else there might be a problem with the model or with the data. hope it helps.

Yolkandwhite · March 20, 2020, 8:51am

I fixed learning rate to 0.0000001 and printed the loss values in the loop.

I found out that values of loss is somehow zigzaging positive and negative. It looks weird because I’ve leaned that binary crossentropy doesn’t gives negative value.
and the negative loss values were much greater than positive loss value.

does this mean I am giving the wrong data??
or is it just something to do with hyperparameters?

here are part of the outputs I got


...

tensor(6.9173, device='cuda:0', grad_fn=<BinaryCrossEntropyWithLogitsBackward>)
tensor(6.1122, device='cuda:0', grad_fn=<BinaryCrossEntropyWithLogitsBackward>)
tensor(7.1797, device='cuda:0', grad_fn=<BinaryCrossEntropyWithLogitsBackward>)
tensor(6.8887, device='cuda:0', grad_fn=<BinaryCrossEntropyWithLogitsBackward>)
tensor(6.5463, device='cuda:0', grad_fn=<BinaryCrossEntropyWithLogitsBackward>)
tensor(5.6460, device='cuda:0', grad_fn=<BinaryCrossEntropyWithLogitsBackward>)
tensor(5.9631, device='cuda:0', grad_fn=<BinaryCrossEntropyWithLogitsBackward>)
tensor(6.4065, device='cuda:0', grad_fn=<BinaryCrossEntropyWithLogitsBackward>)
tensor(6.0100, device='cuda:0', grad_fn=<BinaryCrossEntropyWithLogitsBackward>)
tensor(5.5298, device='cuda:0', grad_fn=<BinaryCrossEntropyWithLogitsBackward>)
tensor(5.1861, device='cuda:0', grad_fn=<BinaryCrossEntropyWithLogitsBackward>)
tensor(4.7786, device='cuda:0', grad_fn=<BinaryCrossEntropyWithLogitsBackward>)
tensor(5.5828, device='cuda:0', grad_fn=<BinaryCrossEntropyWithLogitsBackward>)
tensor(5.4706, device='cuda:0', grad_fn=<BinaryCrossEntropyWithLogitsBackward>)
tensor(5.6942, device='cuda:0', grad_fn=<BinaryCrossEntropyWithLogitsBackward>)
tensor(5.4059, device='cuda:0', grad_fn=<BinaryCrossEntropyWithLogitsBackward>)
tensor(5.5020, device='cuda:0', grad_fn=<BinaryCrossEntropyWithLogitsBackward>)
tensor(6.0613, device='cuda:0', grad_fn=<BinaryCrossEntropyWithLogitsBackward>)
tensor(5.6552, device='cuda:0', grad_fn=<BinaryCrossEntropyWithLogitsBackward>)
tensor(5.9129, device='cuda:0', grad_fn=<BinaryCrossEntropyWithLogitsBackward>)
tensor(-257.4940, device='cuda:0',
       grad_fn=<BinaryCrossEntropyWithLogitsBackward>)
tensor(-251.9309, device='cuda:0',
       grad_fn=<BinaryCrossEntropyWithLogitsBackward>)
tensor(-497.9838, device='cuda:0',
       grad_fn=<BinaryCrossEntropyWithLogitsBackward>)
tensor(-362.6104, device='cuda:0',
       grad_fn=<BinaryCrossEntropyWithLogitsBackward>)
tensor(-454.6986, device='cuda:0',
       grad_fn=<BinaryCrossEntropyWithLogitsBackward>)
tensor(-897.4257, device='cuda:0',
       grad_fn=<BinaryCrossEntropyWithLogitsBackward>)
tensor(-838.2051, device='cuda:0',
       grad_fn=<BinaryCrossEntropyWithLogitsBackward>)
tensor(-843.5089, device='cuda:0',
       grad_fn=<BinaryCrossEntropyWithLogitsBackward>)
tensor(-823.7191, device='cuda:0',
       grad_fn=<BinaryCrossEntropyWithLogitsBackward>)
tensor(-781.8899, device='cuda:0',
       grad_fn=<BinaryCrossEntropyWithLogitsBackward>)
tensor(-752.6201, device='cuda:0',
       grad_fn=<BinaryCrossEntropyWithLogitsBackward>)
tensor(-602.7383, device='cuda:0',
       grad_fn=<BinaryCrossEntropyWithLogitsBackward>)
tensor(-525.3030, device='cuda:0',
       grad_fn=<BinaryCrossEntropyWithLogitsBackward>)
tensor(-152.3427, device='cuda:0',
       grad_fn=<BinaryCrossEntropyWithLogitsBackward>)
tensor(-209.9590, device='cuda:0',
       grad_fn=<BinaryCrossEntropyWithLogitsBackward>)

...

Nisan_Aryal · March 20, 2020, 9:30am

the binary cross entropy is used in classification when there is two classes. see if the label is between 0 and 1. it seems your input to calculate the loss is wrong.

pourya_farzi · September 2, 2020, 10:24am

First of all, check your model, optimizer function, and type of loss function.
Then specify what is your prediction shape and even your output shape. Perhaps , the loss function would be changed for the task you are working on. For example, with respect to classification, we use cross entropy loss while for regression we need to apply MSE loss or L1 loss or …
After that you are ready to go for this link to grasp how loss is computed.
Interpreting loss value - PyTorch Forums