BrokenPipeError


(Zhila Mousavian) #1

Hello
How can I solve BrokenPipeError?
When I use def getitem for define a dataset class and then define the function which is in pytorch tutorials for training model as like as this:

def train_model(model, criterion, optimizer, scheduler, num_epochs=2):
since = time.time()

best_model_wts = copy.deepcopy(model.state_dict())
best_acc = 0.0

for epoch in range(num_epochs):
    print('Epoch {}/{}'.format(epoch, num_epochs - 1))
    print('-' * 10)

    # Each epoch has a training and validation phase
    for phase in ['train', 'val']:
        if phase == 'train':
            scheduler.step()
            model.train()  # Set model to training mode
        else:
            model.eval()   # Set model to evaluate mode

        running_loss = 0.0
        running_corrects = 0

        # Iterate over data.
        for inputs, labels in dataloaders[phase]:
            inputs = inputs.to(device)
            labels = labels.to(device)

            # zero the parameter gradients
            optimizer.zero_grad()

            # forward
            # track history if only in train
            with torch.set_grad_enabled(phase == 'train'):
                outputs = model(inputs)
                _, preds = torch.max(outputs, 1)
                loss = criterion(outputs, labels)

                # backward + optimize only if in training phase
                if phase == 'train':
                    loss.backward()
                    optimizer.step()

            # statistics
            running_loss += loss.item() * inputs.size(0)
            running_corrects += torch.sum(preds == labels.data)

        epoch_loss = running_loss / dataset_sizes[phase]
        epoch_acc = running_corrects.double() / dataset_sizes[phase]

        print('{} Loss: {:.4f} Acc: {:.4f}'.format(
            phase, epoch_loss, epoch_acc))

        # deep copy the model
        if phase == 'val' and epoch_acc > best_acc:
            best_acc = epoch_acc
            best_model_wts = copy.deepcopy(model.state_dict())

    print()

time_elapsed = time.time() - since
print('Training complete in {:.0f}m {:.0f}s'.format(
    time_elapsed // 60, time_elapsed % 60))
print('Best val Acc: {:4f}'.format(best_acc))

# load best model weights
model.load_state_dict(best_model_wts)

        
return model

and then use resnet model pretrained on imagenet database for finetuning as like as bellow:
model_ft = models.resnet18(pretrained=True)
#print(model_ft)
#model_ft = models.resnet18(pretrained=False)

num_ftrs = model_ft.fc.in_features
model_ft.fc = nn.Linear(num_ftrs, 3)

print(model_ft.fc.weight)

model_ft = model_ft.to(device)
criterion = nn.CrossEntropyLoss()

Observe that all parameters are being optimized

optimizer_ft = optim.SGD(model_ft.parameters(), lr=0.001, momentum=0.9)

Decay LR by a factor of 0.1 every 7 epochs

exp_lr_scheduler = lr_scheduler.StepLR(optimizer_ft, step_size=2, gamma=0.1)

##Train and evaluate
##It should take around 15-25 min on CPU. On GPU though, it takes less than a minute.

model_ft = train_model(model_ft, criterion, optimizer_ft, exp_lr_scheduler,
num_epochs=20)

I get Broken PipeError

I dont know where I should put def run(): and if name==‘main’:
run() in order to solve this problem.

thanks in advance for the help


#2

Could you wrap your whole code in a function named run() and just add the main guard in your script:

def run():
    # your complete code here

if __name__=='__main__':
    run()

(Zhila Mousavian) #3

Thankyou for replying

Yes. I do and dont get this error, but training process is not done.
when I type print(model_ft.fc.weight) I get this error:
name ‘model_ft’ is not defined


#4

Are you trying to print these parameters inside the run() function? Make sure that model_ft is defined the the scope the print statement is called.


(Zhila Mousavian) #5

when I did not need to printing the class probabilities for validation images along with the name of these images, I did not have for data loading and training.

Know, when I define a dataset class and use getitem in order to print the name of images with probabilities of them, with same training function, Broken Pipe error is got.

I dont get this error by wrapping the whole code inside the def run(): and if name==‘main’:
run() , but the training process which is inside the code is not done!
the whole file is shown as bellow:
from future import print_function, division

import torch
import torch.nn as nn
import torch.optim as optim
from torch.optim import lr_scheduler
import numpy as np
import torchvision
from torchvision import datasets, models, transforms
import matplotlib.pyplot as plt
import time
import os
import copy
import pandas as pd
from skimage import io, transform

plt.ion() # interactive mode

data_transforms = {
‘train’: transforms.Compose([
transforms.RandomResizedCrop(224),
transforms.RandomHorizontalFlip(),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
]),
‘val’: transforms.Compose([
transforms.Resize(256),
transforms.CenterCrop(224),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
]),
}

data_dir = ‘D:/datbases/TW’

root_dir = ‘D:/datbases/TW’

import torchvision.datasets as datasets
class MonaDataset(datasets.folder.ImageFolder):
def init(self, root, transform=None, target_transform=None,
loader=datasets.folder.default_loader):
super(MonaDataset, self).init(root, transform, target_transform, loader)

def __getitem__(self, index):
    path, target = self.samples[index]
    sample = self.loader(path)
    if self.transform is not None:
        sample = self.transform(sample)
    if self.target_transform is not None:
        target = self.target_transform(target)
    return sample, target, path

dataset = MonaDataset(‘D:/datbases/TW’)
print(len(dataset))
x, y, im_path = dataset[0]

print(“x is: {}, y is: {}, im_path is: {}”.format(x, y, im_path))

image_datasets = {x: MonaDataset(os.path.join(data_dir, x),
data_transforms[x])
for x in [‘train’, ‘val’]}
dataloaders = {x: torch.utils.data.DataLoader(image_datasets[x], batch_size=4,
shuffle=True, num_workers=4)
for x in [‘train’, ‘val’]}
dataset_sizes = {x: len(image_datasets[x]) for x in [‘train’, ‘val’]}

class_names = image_datasets[‘train’].classes
device = torch.device(“cuda:0” if torch.cuda.is_available() else “cpu”)

def run():
##Training the model
def train_model(model, criterion, optimizer, scheduler, num_epochs=2):
since = time.time()

best_model_wts = copy.deepcopy(model.state_dict())
best_acc = 0.0

for epoch in range(num_epochs):
    print('Epoch {}/{}'.format(epoch, num_epochs - 1))
    print('-' * 10)

    # Each epoch has a training and validation phase
    for phase in ['train', 'val']:
        if phase == 'train':
            scheduler.step()
            model.train()  # Set model to training mode
        else:
            model.eval()   # Set model to evaluate mode

        running_loss = 0.0
        running_corrects = 0

        # Iterate over data.
        for inputs, labels in dataloaders[phase]:
            inputs = inputs.to(device)
            labels = labels.to(device)

            # zero the parameter gradients
            optimizer.zero_grad()

            # forward
            # track history if only in train
            with torch.set_grad_enabled(phase == 'train'):
                outputs = model(inputs)
                _, preds = torch.max(outputs, 1)
                loss = criterion(outputs, labels)

                # backward + optimize only if in training phase
                if phase == 'train':
                    loss.backward()
                    optimizer.step()

            # statistics
            running_loss += loss.item() * inputs.size(0)
            running_corrects += torch.sum(preds == labels.data)

        epoch_loss = running_loss / dataset_sizes[phase]
        epoch_acc = running_corrects.double() / dataset_sizes[phase]

        print('{} Loss: {:.4f} Acc: {:.4f}'.format(
            phase, epoch_loss, epoch_acc))

        # deep copy the model
        if phase == 'val' and epoch_acc > best_acc:
            best_acc = epoch_acc
            best_model_wts = copy.deepcopy(model.state_dict())

    print()

time_elapsed = time.time() - since
print('Training complete in {:.0f}m {:.0f}s'.format(
    time_elapsed // 60, time_elapsed % 60))
print('Best val Acc: {:4f}'.format(best_acc))

# load best model weights
model.load_state_dict(best_model_wts)

return model 



##Finetuning the convnet
##Load a pretrained model and reset final fully connected layer.


model_ft = models.resnet18(pretrained=True)
#print(model_ft)
#model_ft = models.resnet18(pretrained=False)

num_ftrs = model_ft.fc.in_features
model_ft.fc = nn.Linear(num_ftrs, 3)

print(model_ft.fc.weight)



model_ft = model_ft.to(device)
criterion = nn.CrossEntropyLoss()

Observe that all parameters are being optimized

optimizer_ft = optim.SGD(model_ft.parameters(), lr=0.001, momentum=0.9)

Decay LR by a factor of 0.1 every 7 epochs

exp_lr_scheduler = lr_scheduler.StepLR(optimizer_ft, step_size=2, gamma=0.1)

##Train and evaluate
##It should take around 15-25 min on CPU. On GPU though, it takes less than a minute.

model_ft = train_model(model_ft, criterion, optimizer_ft, exp_lr_scheduler,
                   num_epochs=20)

print(model_ft.fc.weight)


if __name__=='__main__':
    run()

(Zhila Mousavian) #6
from __future__ import print_function, division

import torch
import torch.nn as nn
import torch.optim as optim
from torch.optim import lr_scheduler
import numpy as np
import torchvision
from torchvision import datasets, models, transforms
import matplotlib.pyplot as plt
import time
import os
import copy
import pandas as pd
from skimage import io, transform


plt.ion()   # interactive mode









data_transforms = {
    'train': transforms.Compose([
        transforms.RandomResizedCrop(224),
        transforms.RandomHorizontalFlip(),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ]),
    'val': transforms.Compose([
        transforms.Resize(256),
        transforms.CenterCrop(224),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ]),
}




data_dir = 'D:/datbases/TW'

root_dir = 'D:/datbases/TW'




import torchvision.datasets as datasets
class MonaDataset(datasets.folder.ImageFolder):
    def __init__(self, root, transform=None, target_transform=None,
                 loader=datasets.folder.default_loader):
        super(MonaDataset, self).__init__(root, transform, target_transform, loader)

    def __getitem__(self, index):
        path, target = self.samples[index]
        sample = self.loader(path)
        if self.transform is not None:
            sample = self.transform(sample)
        if self.target_transform is not None:
            target = self.target_transform(target)
        return sample, target, path

dataset = MonaDataset('D:/datbases/TW')
print(len(dataset))
x, y, im_path = dataset[0]


print("x is: {}, y is: {}, im_path is: {}".format(x, y, im_path))

image_datasets = {x: MonaDataset(os.path.join(data_dir, x),
                                          data_transforms[x])
                  for x in ['train', 'val']}
dataloaders = {x: torch.utils.data.DataLoader(image_datasets[x], batch_size=4,
                                             shuffle=True, num_workers=4)
              for x in ['train', 'val']}
dataset_sizes = {x: len(image_datasets[x]) for x in ['train', 'val']}


class_names = image_datasets['train'].classes
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")













def run():
    ##Training the model
    def train_model(model, criterion, optimizer, scheduler, num_epochs=2):
        since = time.time()

    best_model_wts = copy.deepcopy(model.state_dict())
    best_acc = 0.0

    for epoch in range(num_epochs):
        print('Epoch {}/{}'.format(epoch, num_epochs - 1))
        print('-' * 10)

        # Each epoch has a training and validation phase
        for phase in ['train', 'val']:
            if phase == 'train':
                scheduler.step()
                model.train()  # Set model to training mode
            else:
                model.eval()   # Set model to evaluate mode

            running_loss = 0.0
            running_corrects = 0

            # Iterate over data.
            for inputs, labels in dataloaders[phase]:
                inputs = inputs.to(device)
                labels = labels.to(device)

                # zero the parameter gradients
                optimizer.zero_grad()

                # forward
                # track history if only in train
                with torch.set_grad_enabled(phase == 'train'):
                    outputs = model(inputs)
                    _, preds = torch.max(outputs, 1)
                    loss = criterion(outputs, labels)

                    # backward + optimize only if in training phase
                    if phase == 'train':
                        loss.backward()
                        optimizer.step()

                # statistics
                running_loss += loss.item() * inputs.size(0)
                running_corrects += torch.sum(preds == labels.data)

            epoch_loss = running_loss / dataset_sizes[phase]
            epoch_acc = running_corrects.double() / dataset_sizes[phase]

            print('{} Loss: {:.4f} Acc: {:.4f}'.format(
                phase, epoch_loss, epoch_acc))

            # deep copy the model
            if phase == 'val' and epoch_acc > best_acc:
                best_acc = epoch_acc
                best_model_wts = copy.deepcopy(model.state_dict())

        print()

    time_elapsed = time.time() - since
    print('Training complete in {:.0f}m {:.0f}s'.format(
        time_elapsed // 60, time_elapsed % 60))
    print('Best val Acc: {:4f}'.format(best_acc))

    # load best model weights
    model.load_state_dict(best_model_wts)
    
    return model 

    

    ##Finetuning the convnet
    ##Load a pretrained model and reset final fully connected layer.

    
    model_ft = models.resnet18(pretrained=True)
    #print(model_ft)
    #model_ft = models.resnet18(pretrained=False)

    num_ftrs = model_ft.fc.in_features
    model_ft.fc = nn.Linear(num_ftrs, 3)

    print(model_ft.fc.weight)



    model_ft = model_ft.to(device)
    criterion = nn.CrossEntropyLoss()

# Observe that all parameters are being optimized
    optimizer_ft = optim.SGD(model_ft.parameters(), lr=0.001, momentum=0.9)
# Decay LR by a factor of 0.1 every 7 epochs
    exp_lr_scheduler = lr_scheduler.StepLR(optimizer_ft, step_size=2, gamma=0.1)


##Train and evaluate
##It should take around 15-25 min on CPU. On GPU though, it takes less than a minute.


    model_ft = train_model(model_ft, criterion, optimizer_ft, exp_lr_scheduler,
                       num_epochs=20)

    print(model_ft.fc.weight)
    
 
    if __name__=='__main__':
        run()
        

#7

Could you explain a bit why you think your model is not being trained?
Are the parameters not changing at all?
Skimming through your code I couldn’t find any obvious mistakes.


(Zhila Mousavian) #8

thanks for reply

because in the function defined for training process the model accuracy of training set and validation set in each epoch should be printed, but when I run the above code, these parameters are not printed.

Also, after the training process, we will need to call the trained model for printing the class probabilities, but when I type the model after the above code, I get error.


#9

There are some indentation errors, e.g. train_model() is just calling one line of code. Could you check if that’s just an issue while copying the code into the forum or also in your script?


(Zhila Mousavian) #10

I cant understand the meaning of your statement exactly.


#11

The whitespaces in your code snippet seem to be wrong, e.g. train_model() just has one line of code inside the function body.
Could you check, if the indentation is right in your code?


(Zhila Mousavian) #12

I checked, but any whitespace dose not exit in my code unfortunately.


#13

I think we are misunderstanding each other here :wink:

I just wanted to point to the indentation of your code (i.e. how many spaces/tabs are in front of the code):

# Your current version
def run():
    ##Training the model
    def train_model(model, criterion, optimizer, scheduler, num_epochs=2):
        since = time.time()  # only this line of code will be executed in train_model!

    best_model_wts = copy.deepcopy(model.state_dict())
    best_acc = 0.0

    for epoch in range(num_epochs):
        print('Epoch {}/{}'.format(epoch, num_epochs - 1))
        print('-' * 10)

While maybe you wanted to write your code like this:

def run():
    ##Training the model
    def train_model(model, criterion, optimizer, scheduler, num_epochs=2):
        # Now all lines will be executed in train_model!
        since = time.time()

        best_model_wts = copy.deepcopy(model.state_dict())
        best_acc = 0.0

        for epoch in range(num_epochs):
            print('Epoch {}/{}'.format(epoch, num_epochs - 1))
            print('-' * 10)

Could you check for this types of error?


(Zhila Mousavian) #14

thanks a lot. excuse me. You are right. because I have used Matlab so far, I have some problems in pytorch as like as above.