If __name__ == '__main__' for window10

Do we still need " if name == ‘main’ " in pytorch window official Version?
If i do not put that line, even in pytorch official example, codes will not run…
And if we still need that line, then why not linux but window needs that line?

Perhaps you can put:

print('__name__', __name__)

… and find out what __name__ is being set to

1 Like

If you are using multiprocessing, you should wrap your code into this guard:

def main():
    # your code

if __name__=='__main__':
    main()

On Linux systems new processes are created with fork, while this seems not to be possible on Windows machines. This yields to the new spawned processes to execute all module-level code, which breaks your script.
Here is a good explanation what’s going on.

You can find more info regarding Windows and PyTorch here.

2 Likes

Thank you very much.

One question though. If wrap my whole code under the main guard, wouldn’t that mean the subprocess will not do anything and the training will not be done at all?

No, that won’t be the case. You could use multiple workers while using the if-clause protection and observe your CPU workload during training.

1 Like

Hi Ptrblck,

I’m trying to activate the num_workers statement with a custom dataloader. I get the same error as above and I am confused as to where i insert if __name__ == '__main__': into the code.

If i place it thus:

if __name__ == '__main__':

    dataload=dataset3D(csv_file, data_dir,transform=None)

    dataloader = torch.utils.data.DataLoader(dataload, batch_size=20,
                                                shuffle=True, num_workers=4)

for epoch in range(num_epochs):
    for i,data in enumerate(dataloader):

I get the error message:

    for i,data in enumerate(dataloader):
NameError: name 'dataloader' is not defined

so do i have place it inside:

class dataset3D(Dataset):
    def __init__(self,csv_file,data_dir,transform=None):

        self.file_list=pd.read_csv(csv_file, header=None)
        self.data_dir=data_dir
        self.transform=transform
  def __len__(self):

  def __getitem__(self, index): 

*etc*

I have created a def train() then called:

if __name__ == '__main__':
  train()

however this causes the code to run a bit “lumpy-bumpy” & increase the GPU memoray requirements by 40% as the multi processing code is being appled to the model as well as the dataloader, is there a way which i can apply multi processing to the dataloader only?

cheers,

chaslie

Creating a train() method and calling it inside the if __name__ guard is the correct way.
Are you using any CUDA operations in the “global” section of the script?
If so, make sure to move them to separate functions, since only the DataLoader should use multiple workers.

hi Ptrblck,

I am using the following cuda lines in the main body:

device = torch.device("cuda:0" if (torch.cuda.is_available() and ngpu > 0) else "cpu")

class Gen_model_3D(nn.Module):
    def __init__(self, lat_dim=100):
        super(Gen_model_3D, self).__init__()
        self.D3_Encoder = D3_Encoder(lat_dim)
        self.D3_Decoder = D3_Decoder(lat_dim)
        self.cuda()
        self.lat_dim = lat_dim

which are outside the def train(): code, the def train looks like this:

def train():
    img_list=[]
    iters=0
    for epoch in range(num_epochs):
        for i,data in enumerate(dataloader):

Maybe i’m being a bit slow, but i don’t understand how to get the dataloader to use the multi processing?

The class definitions can be defined in the script directy (or imported from another script).
To use multiple workers in a DataLoader you would have to set num_workers>0 while creating the DataLoader instance.

What is concerning, is this statement:

The model should not be touched by the DataLoader in any way (unless you use it somehow inside your Dataset, which I don’t expect).
Are you using any CUDATensors inside the Dataset and how did you check that multiprocessing is also “applied to the model”?

I have set the dataloader value thus:


dataload=dataset3D(csv_file, data_dir,transform=None)

dataloader = torch.utils.data.DataLoader(dataload, batch_size=20,
                                            shuffle=True, num_workers=6)

I only have a single gpu at my disposal, when i set the num_workers=6 the model take 10992 Mb of GPU, and when num_workers=0 its only 6000Mb of GPU. I am assuming that this is because its batching the data loader into 6 and running 6 data sets through the model at the same time, which means that i am multiprocessing the whole model and not just the dataloader by placing the “def train()” command inside the “if name == ‘main’:” section.

No, that should not be the case. The DataLoader will use multiple workers to load the data batches in the background, while a single model is still used for the training.
The DataLoader loop:

for data in loader:
    # train model

should just yield the next data faster, since you are now able to preload the next batches while the GPU is busy training the model.

If you are seeing an increase in GPU memory, I guess you are using CUDATensors in the Dataset?
Could this be the case? If so, then note that each worker would create a copy of the Dataset, which would increase the memory usage.

in the custom data set the input data is set to torch using torch.from_numpy.

I am then converting them to cuda tensors in the def train() statement, is this the problem, should they be converted to cuda in the custom dataset???

No, the usual workflow would be to push the data to the GPU inside the DataLoader loop.
Could you come up with an executable code snippet using random input tensors, which would show the increase in GPU memory usage after setting the num_workers to a value larger than 0, so that we could reproduce and debug it?

I’ll try, be a couple of hours…

hi Ptrblck,

If you use this code, then setting num workers to 0 uses 2891Mb and num workers=6 uses 6597Mb. it seems to be using the GPU memory in accordance with the following:
GPU_mem=3.8333num_work^3 - 78.5num_work^2 + 950.67*num_work + 2891

import itertools
ngpu=1
device = torch.device("cuda:0" if (torch.cuda.is_available() and ngpu > 0) else "cpu")

class dataset3D(Dataset):
    def __init__(self,transform=None):
        self.transform = transform
    def __len__(self):
        count=1000
        return count

    def __getitem__(self, index):

        data3dt=np.random.randint(0,1,(1,128,128,128))
        rescale_size = 99

        if np.max(data3dt)!=0:
            scrap = np.argwhere(data3dt == 1)

            max_r=np.max(scrap[:,0])
            min_r=np.min(scrap[:,0])
            max_c = np.max(scrap[:, 1])
            min_c=np.min(scrap[:,1])
            max_d = np.max(scrap[:, 2])
            min_d=np.min(scrap[:,2])
            resize3d_t = np.zeros((max_r+1, max_c+1, max_d+1))
            # print(resize3d_t.shape)
            r_range=max_r-min_r
            c_range = max_c - min_c
            d_range = max_d - min_d

            r_offset = min_r-int((resize3d_t.shape[0]-(max_r - min_r)) / 2)-1
            c_offset = min_c-int((resize3d_t.shape[1]-(max_c - min_c)) / 2)-1
            d_offset = min_d-int((resize3d_t.shape[2]-(max_d - min_d)) / 2)-1

            for count in range (len(scrap)):
                row=int(scrap[count,0]-r_offset)
                col=int(scrap[count,1]-c_offset)
                dep=int(scrap[count,2]-d_offset)
                resize3d_t[row,col,dep]=1
            for count in range (len(scrap)):
                row=int(scrap[count,0])
                col=int(scrap[count,1])
                dep=int(scrap[count,2])
                resize3d_t[row,col,dep]=1
            d3shape=resize3d_t.shape
            r_sf=d3shape[0]/rescale_size
            c_sf=d3shape[1]/rescale_size
            d_sf = d3shape[2] / rescale_size
            resize3d = np.zeros((rescale_size,rescale_size,rescale_size))
            for x, y, z in itertools.product(range(rescale_size-1),
                                             range(rescale_size-1),
                                             range(rescale_size-1)):
                resize3d[x][y][z] = resize3d_t[int(x * r_sf)][int(y * c_sf)][int(z * d_sf)]
            resize3d = torch.from_numpy(resize3d)
            resize3d = resize3d.unsqueeze(0)
            # print(resize3d.shape)
        else:
            resize3d=np.zeros((1,rescale_size,rescale_size,rescale_size))
            resize3d = torch.from_numpy(resize3d)
        data3d = {'resize': resize3d}
        if self.transform:
            data3d=self.transform(data3d)

        return data3d
class ToTensor(object):
    """Convert ndarrays in sample to Tensors."""

    def __call__(self, data3d):

        resize_array = data3d['resize']
        return {'resize':torch.from_numpy(resize_array)}

class D3_Encoder(nn.Module):
    def __init__(self, lat_dim):
        super(D3_Encoder, self).__init__()
        self.conv3d1 = nn.Conv3d(1, 96, (9,9,9), stride=(5,5,5), padding=(0, 0, 0))
        self.conv3d2 = nn.Conv3d(96,128, (9,9,9), stride=(5,5,5), padding=(0, 0, 0))

        self.fc1 = nn.Linear(3456, 256)
        self.fc2 = nn.Linear(256, lat_dim + lat_dim)

    def forward(self, data):
        data = self.conv3d1(data)

        data = self.conv3d2(data)

        data = data.view(-1, 3456)

        data = self.fc1(data)
        data = self.fc2(data)

        return data
class D3_Decoder(nn.Module):
    def __init__(self, lat_dim):
        super(D3_Decoder, self).__init__()
        self.conv3dT5 = nn.ConvTranspose3d(128, 96,(9,9,9), stride=(5,5,5), padding=(0, 0, 0))
        self.conv3dT6 = nn.ConvTranspose3d(96 , 1,(9,9,9), stride=(5,5,5), padding=(0, 0, 0))
        self.fc1 = nn.Linear(lat_dim, 256)
        self.fc2 = nn.Linear(256, 3456)

    def forward(self, z2):
        ###insert fully connected in here
        z2 = self.fc1(z2)
        z2 = self.fc2(z2)
        ### reshape the data before leaving the fully conected layer
        z2 = z2.view(-1, 128, 3, 3, 3)
        z2 = self.conv3dT5(z2)
        out = self.conv3dT6(z2)
        return out

class Gen_model_3D(nn.Module):
    def __init__(self, lat_dim=100):
        super(Gen_model_3D, self).__init__()
        self.D3_Encoder = D3_Encoder(lat_dim)
        self.D3_Decoder = D3_Decoder(lat_dim)
        self.cuda()
        self.lat_dim = lat_dim

    def D3_encode(self, data):
        data = model_3d.D3_Encoder(data)
        mean, logvar = torch.chunk(data, 2, dim=1)
        return mean, logvar

    def D3_decode(self, z2, apply_sigmoid=False):
        logits = self.D3_Decoder(z2)
        if apply_sigmoid:
            probs = torch.sigmoid(logits)
            return probs
        return logits

    def reparm(self, mean, logvar):
        epsilon = torch.randn(mean.size()).to(device)
        z = epsilon * torch.exp(logvar.mul(0.5)) + mean
        # print("z=",z.shape)
        return z
dataload=dataset3D(transform=None)

dataloader = torch.utils.data.DataLoader(dataload, batch_size=20,
                                         shuffle=True, num_workers=0)
def loss_fn(model_3d, data):
    mean, logvar = model_3d.D3_encode(data)
    z2=model_3d.reparm(mean, logvar)
    out=model_3d.D3_decode(z2)
    criterion = torch.nn.BCEWithLogitsLoss(size_average=False, reduce=True, reduction='none')
    BCE = criterion(out, data)
    KLD = -0.5 * torch.sum(1 + logvar - mean.pow(2) - logvar.exp().pow(2))
    return BCE + 3 * KLD, out

model_3d= Gen_model_3D()
optimizer_gen = torch.optim.Adam(model_3d.parameters(), lr=1E-5,
               weight_decay=1e-6)

def train():
    img_list=[]
    iters=0
    for epoch in range(10):
        for i,data in enumerate(dataloader):
            data_in=data['resize']

            data_in=data_in.cuda()
            data_in = data_in.type(torch.cuda.FloatTensor)
            # data_in = data_in.unsqueeze(1)

            optimizer_gen.zero_grad()
            loss,out = loss_fn(model_3d, data_in)
            loss.backward()
            print(loss)
            optimizer_gen.step()
if __name__ == '__main__':
    img_list =  train()

Thanks for the code snippet.
You are currently creating the model instance as well as other objects at the module-level, which will most likely create the issue on Windows.
Move all instantiations into a method e.g. def main() and don’t initialize anything on the module level.

In particular, move:

ngpu=1
device = torch.device("cuda:0" if (torch.cuda.is_available() and ngpu > 0) else "cpu")
dataload=dataset3D(transform=None)

dataloader = torch.utils.data.DataLoader(dataload, batch_size=20,
                                         shuffle=True, num_workers=0)

model_3d= Gen_model_3D()
optimizer_gen = torch.optim.Adam(model_3d.parameters(), lr=1E-5,
               weight_decay=1e-6)

to a new method (call it main) and execute train from there.

The background for this issue and why the if-clause protection is needed is described in this StackOverflow post (I’ve posted this link already in this thread in a previous post).
The GPU memory increase in your current code snippet is thus most likely caused, because Windows needs to execute all module-level code in each child process.

Thanks for the help, but you have me a little confused…

I have created:

def main():
    ngpu = 1
    device = torch.device("cuda:0" if (torch.cuda.is_available() and ngpu > 0) else "cpu")
    dataload = dataset3D(transform=None)

    dataloader = torch.utils.data.DataLoader(dataload, batch_size=20,
                                             shuffle=True, num_workers=2)
    model_3d = Gen_model_3D()
    optimizer_gen = torch.optim.Adam(model_3d.parameters(), lr=1E-5,
                                     weight_decay=1e-6)
    train()

as wells as:

def train():
    img_list=[]
    iters=0
    for epoch in range(10):
        for i,data in enumerate(dataloader):
            data_in=data['resize']

            data_in=data_in.cuda()
            data_in = data_in.type(torch.cuda.FloatTensor)
            # data_in = data_in.unsqueeze(1)

            optimizer_gen.zero_grad()
            loss,out = loss_fn(model_3d, data_in)
            loss.backward()
            print(loss)
            optimizer_gen.step()
if __name__ == '__main__':
    img_list =  main()

I know this is not what you meant but i don’t fully get what you mean??? :confused:

I meant the first approach, where you would call:

if __name__=='__main__':
    main()

to start the training. Inside main() all objects would be created (as in your first approach) and the training will be started. Make sure to pass all needed objects as arguments to train().

I feel like a complete idiot, sorry :frowning: I know I have something wrong, but i don’t quite understand (missing some skills here)…

def main(Gen_model_3D):
    ngpu = 1
    device = torch.device("cuda:0" if (torch.cuda.is_available() and ngpu > 0) else "cpu")
    dataload = dataset3D(transform=None)

    dataloader = torch.utils.data.DataLoader(dataload, batch_size=20,
                                             shuffle=True, num_workers=2)
    model_3d = Gen_model_3D()
    optimizer_gen = torch.optim.Adam(model_3d.parameters(), lr=1E-5,
                                     weight_decay=1e-6)
    train(device,dataloader,model_3d,optimizer_gen)


def train(device,dataloader,model_3d,optimizer_gen):
    img_list=[]
    iters=0
    for epoch in range(10):
        for i,data in enumerate(dataloader):
            data_in=data['resize']

            data_in=data_in.cuda()
            data_in = data_in.type(torch.cuda.FloatTensor)
            # data_in = data_in.unsqueeze(1)

            optimizer_gen.zero_grad()
            loss,out = loss_fn(model_3d, data_in)
            loss.backward()
            print(loss)
            optimizer_gen.step()
if __name__ == '__main__':
    img_list =  main(Gen_model_3D)

gives:

NameError: name ‘model_3d’ is not defined