How to train a CNN on multiple sets

I have 20 .csv files which have data about some organs. They are way too big to combine since I run out of RAM even trying to merge them. I managed combine 10 of them but my RAM runs out when I load it so they cannot be merged.

I would like to train my CNN on 2-20 and then test it with 1. However my approach seems to reset the net each time I use a new set.

I pasted the output here: https://justpaste.it/28tk0

But to summarize, even after 9 hours of training, the test set is still at between 10% and 20% accuracy (btw there are 23 classes in total). The problem is it doesn’t improve at all. However the most important part is, for some reason, after file 14, the test accuracy is always around 40%. For file 3, it’s literally always as 13%.

From these results I conclude that I have failed to train the CNN on multiple sets and instead it always resets when I use a new set.

This is what I tried:

n_epochs = 10
for epoch in range(n_epochs):
    for i in range(2,21):
        train_dataset = mydata('./data/{}.csv'.format(i), transform= transforms.Compose(
                            [transforms.ToPILImage(), 
                             transforms.ToTensor(), 
                             transforms.Normalize(mean=(0.5,), std=(0.5,))]))
        train_loader = torch.utils.data.DataLoader(dataset=train_dataset,batch_size=batch_size, shuffle=True)
        train(epoch, i)

I have a few questions:

  1. From what I can observe, your dataset is basically an image, stored in the form of a csv(correct me if I am wrong), why don’t you store it as an image or nifti instead?
  2. Have you tried shuffling your dataset?
  3. Can you show your network?
  1. each row represents an image. in a row the first column is the label and the rest are the pixel values. no, should i? what would be the advantage?

  2. Yes the training datasets are all shuffled

  3. I’ll show the network and also other stuff in case you’d like to see

Network:

class Net(nn.Module):    
    def __init__(self):
        super(Net, self).__init__()
          
        self.features = nn.Sequential(
            nn.Conv2d(1, 32, kernel_size=5, stride=1, padding=2),
            nn.ReLU(inplace=True),
            nn.Conv2d(32, 32, kernel_size=3, stride=1, padding=1),
            nn.BatchNorm2d(32),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2, stride=2),
            nn.Dropout(p = 0.25),
            nn.Conv2d(32, 64, kernel_size=3, stride=1,padding=1),
            nn.BatchNorm2d(64),
            nn.Conv2d(64, 64, kernel_size=3, stride=1, padding=1),
            nn.BatchNorm2d(64),
            nn.Conv2d(64, 128, kernel_size=3, stride=1,padding=1),
            nn.Conv2d(128, 128, kernel_size=3, stride=1,padding=1),
            nn.BatchNorm2d(128),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2, stride=2),         
            nn.Dropout(p = 0.25),
        )
          
        self.classifier = nn.Sequential(
            nn.Dropout(p = 0.25),
            nn.Linear(128 * 8 * 8, 512),
            nn.ReLU(inplace=True),
            nn.Dropout(p = 0.25),
            nn.Linear(512, 23),
        )

    def forward(self, x):
        x = self.features(x)
        x = x.view(x.size(0), -1) 
        x = self.classifier(x)
        
        return x

Dataloader:

class mydata(Dataset):
     def __init__(self, file_path, 
                 transform = transforms.Compose([transforms.ToPILImage(), 
                                                 transforms.ToTensor(), 
                                                 transforms.Normalize(mean=(0.5,), std=(0.5,))])
                ):
        
        df = pd.read_csv(file_path)
        
        self.X = df.iloc[:,1:].values.reshape((-1,33,33)).astype(np.uint8)[:,:,:,None]
        
        self.y = torch.from_numpy(df.iloc[:,0].values)
            
        self.transform = transform
    
     def __len__(self):
        return len(self.X)

     def __getitem__(self, idx):
        return self.transform(self.X[idx]), self.y[idx]

Train function:

def train(epoch, i):
    
    model.train()
    
    exp_lr_scheduler.step()

    for batch_idx, (data, target) in enumerate(train_loader):
        
        optimizer.zero_grad()
        
        output = model(data)
        
        loss = criterion(output, target)
        
        loss.backward()
        
        optimizer.step()
        
        if (batch_idx + 1)% 100 == 0:
            print('File: {} Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
                i, epoch, (batch_idx + 1) * len(data), len(train_loader.dataset),
                100. * (batch_idx + 1) / len(train_loader), loss.data))

    print('Training data:')
    evaluate(train_loader)
    print('Test data:')
    evaluate(test_loader)

Evaluate function:

def evaluate(data_loader):
    model.eval()
    loss = 0
    correct = 0
    with torch.no_grad():
        for data, target in data_loader:

            output = model(data)

            loss += F.cross_entropy(output, target, size_average=False).data

            pred = output.data.max(1, keepdim=True)[1]

            correct += pred.eq(target.data.view_as(pred)).sum()
        
    loss /= len(data_loader.dataset)
        
    print('\nAverage loss: {:.4f}, Accuracy: {}/{} ({:.3f}%)\n'.format(
        loss, correct, len(data_loader.dataset),
        100. * correct / len(data_loader.dataset)))

The rest:

batch_size = 64
model = Net()
optimizer = optim.Adam(model.parameters(), lr=0.003)
criterion = nn.CrossEntropyLoss()
exp_lr_scheduler = lr_scheduler.StepLR(optimizer, step_size=7, gamma=0.1)
test_dataset = mydata('./data/1.csv', )
test_loader = torch.utils.data.DataLoader(dataset=test_dataset,batch_size=batch_size, shuffle=False)

Loading from images always has an advantage, because you can load faster, and also, your memory problem will be solved probably because CSV takes more memory (before tensor initialization). If I am not mistaken, you could have around(60000) images of (1x33x33) distributed across 20 csv, MNIST has 60000 images of (1x33x33), so I don’t think your dataset is very large. Convert them to images with labels first, then you can have an easier dataloader. Other than that, your network looks fine and so does everything else. Can you retrain the network with this?

Each CSV file contains between 20,000 and 23,000 images. So my training set of 19 CSV files would contain around 400,000 images. Do you still think I should? If so how should I do it? I found this stackoverflow answer but the images there don’t have a label. Thanks for the help.

Your dataset is pretty small. Consider ( 400000(images) x 33(image height) x 33(image width) ) / (1024)*(1024) == 415.42 MegaBytes. I think you can afford to store these many images, right? Also, dont do genfromtxt, write your custom image generator, should look something like this;

f=open(‘something.csv’, ‘r’)
x = f.readline() #readfirstline
label = x[0]
image = np.reshape(x[1:], (33, 33), dtype=np.float32)

Do this for all the images, hope you get the gist!

I converted some of them to images and put it in a folder etc. but now im getting this error:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-10-b1d4464b941e> in <module>
      1 n_epochs = 10
      2 for epoch in range(n_epochs):
----> 3     train(epoch)

<ipython-input-9-df779ce045bf> in train(epoch)
      5     exp_lr_scheduler.step()
      6 
----> 7     for batch_idx, (data, target) in enumerate(train_loader):
      8 
      9         optimizer.zero_grad()

~/.local/lib/python3.6/site-packages/torch/utils/data/dataloader.py in __next__(self)
    613         if self.num_workers == 0:  # same-process loading
    614             indices = next(self.sample_iter)  # may raise StopIteration
--> 615             batch = self.collate_fn([self.dataset[i] for i in indices])
    616             if self.pin_memory:
    617                 batch = pin_memory_batch(batch)

~/.local/lib/python3.6/site-packages/torch/utils/data/dataloader.py in <listcomp>(.0)
    613         if self.num_workers == 0:  # same-process loading
    614             indices = next(self.sample_iter)  # may raise StopIteration
--> 615             batch = self.collate_fn([self.dataset[i] for i in indices])
    616             if self.pin_memory:
    617                 batch = pin_memory_batch(batch)

~/.local/lib/python3.6/site-packages/torchvision/datasets/folder.py in __getitem__(self, index)
    101         sample = self.loader(path)
    102         if self.transform is not None:
--> 103             sample = self.transform(sample)
    104         if self.target_transform is not None:
    105             target = self.target_transform(target)

~/.local/lib/python3.6/site-packages/torchvision/transforms/transforms.py in __call__(self, img)
     47     def __call__(self, img):
     48         for t in self.transforms:
---> 49             img = t(img)
     50         return img
     51 

~/.local/lib/python3.6/site-packages/torchvision/transforms/transforms.py in __call__(self, pic)
    108 
    109         """
--> 110         return F.to_pil_image(pic, self.mode)
    111 
    112     def __repr__(self):

~/.local/lib/python3.6/site-packages/torchvision/transforms/functional.py in to_pil_image(pic, mode)
    101     """
    102     if not(_is_numpy_image(pic) or _is_tensor_image(pic)):
--> 103         raise TypeError('pic should be Tensor or ndarray. Got {}.'.format(type(pic)))
    104 
    105     npimg = pic

TypeError: pic should be Tensor or ndarray. Got <class 'PIL.Image.Image'>.

The code (network is the same as before):

def train(epoch):
    
    model.train()
    
    exp_lr_scheduler.step()

    for batch_idx, (data, target) in enumerate(train_loader):
        
        optimizer.zero_grad()
        
        output = model(data)
        
        loss = criterion(output, target)
        
        loss.backward()
        
        optimizer.step()
        
        if (batch_idx + 1)% 100 == 0:
            print('File: {} Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
                i, epoch, (batch_idx + 1) * len(data), len(train_loader.dataset),
                100. * (batch_idx + 1) / len(train_loader), loss.data))

    print('Training data:')
    evaluate(train_loader)
    print('Test data:')
    evaluate(test_loader)
test_dataset = ImageFolder('./data/test', )
test_loader = torch.utils.data.DataLoader(dataset=test_dataset,batch_size=batch_size, shuffle=False)
train_dataset = ImageFolder('./data/train', transform= transforms.Compose(
                    [transforms.ToPILImage(), 
                     transforms.ToTensor(), 
                     transforms.Normalize(mean=(0.5,), std=(0.5,))]))
train_loader = torch.utils.data.DataLoader(dataset=train_dataset,batch_size=batch_size, shuffle=True)
n_epochs = 10
for epoch in range(n_epochs):
    train(epoch)

I should modify train() right? If so can you tell me what please?

I removed the transforms.ToPILImage() from train_dataset and the error is gone but now im getting this error:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-12-b1d4464b941e> in <module>
      1 n_epochs = 10
      2 for epoch in range(n_epochs):
----> 3     train(epoch)

<ipython-input-9-df779ce045bf> in train(epoch)
      9         optimizer.zero_grad()
     10 
---> 11         output = model(data)
     12 
     13         loss = criterion(output, target)

~/.local/lib/python3.6/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    487             result = self._slow_forward(*input, **kwargs)
    488         else:
--> 489             result = self.forward(*input, **kwargs)
    490         for hook in self._forward_hooks.values():
    491             hook_result = hook(self, input, result)

<ipython-input-2-46779b84067d> in forward(self, x)
     32 
     33     def forward(self, x):
---> 34         x = self.features(x)
     35         x = x.view(x.size(0), -1)
     36         x = self.classifier(x)

~/.local/lib/python3.6/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    487             result = self._slow_forward(*input, **kwargs)
    488         else:
--> 489             result = self.forward(*input, **kwargs)
    490         for hook in self._forward_hooks.values():
    491             hook_result = hook(self, input, result)

~/.local/lib/python3.6/site-packages/torch/nn/modules/container.py in forward(self, input)
     90     def forward(self, input):
     91         for module in self._modules.values():
---> 92             input = module(input)
     93         return input
     94 

~/.local/lib/python3.6/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    487             result = self._slow_forward(*input, **kwargs)
    488         else:
--> 489             result = self.forward(*input, **kwargs)
    490         for hook in self._forward_hooks.values():
    491             hook_result = hook(self, input, result)

~/.local/lib/python3.6/site-packages/torch/nn/modules/conv.py in forward(self, input)
    318     def forward(self, input):
    319         return F.conv2d(input, self.weight, self.bias, self.stride,
--> 320                         self.padding, self.dilation, self.groups)
    321 
    322 

RuntimeError: Given groups=1, weight of size [32, 1, 5, 5], expected input[64, 3, 33, 33] to have 1 channels, but got 3 channels instead

[ transforms.ToTensor(),
transforms.Normalize(mean=(0.5,), std=(0.5,))]))

Try this.

Sorry i didnt say but it’s already like that:

train_dataset = ImageFolder('./data/train', transform= transforms.Compose(
                    [transforms.ToTensor(), 
                     transforms.Normalize(mean=(0.5,), std=(0.5,))]))
train_loader = torch.utils.data.DataLoader(dataset=train_dataset,batch_size=batch_size, shuffle=True)

I’ m still getting the error:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-12-b1d4464b941e> in <module>
      1 n_epochs = 10
      2 for epoch in range(n_epochs):
----> 3     train(epoch)

<ipython-input-9-df779ce045bf> in train(epoch)
      9         optimizer.zero_grad()
     10 
---> 11         output = model(data)
     12 
     13         loss = criterion(output, target)

~/.local/lib/python3.6/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    487             result = self._slow_forward(*input, **kwargs)
    488         else:
--> 489             result = self.forward(*input, **kwargs)
    490         for hook in self._forward_hooks.values():
    491             hook_result = hook(self, input, result)

<ipython-input-2-46779b84067d> in forward(self, x)
     32 
     33     def forward(self, x):
---> 34         x = self.features(x)
     35         x = x.view(x.size(0), -1)
     36         x = self.classifier(x)

~/.local/lib/python3.6/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    487             result = self._slow_forward(*input, **kwargs)
    488         else:
--> 489             result = self.forward(*input, **kwargs)
    490         for hook in self._forward_hooks.values():
    491             hook_result = hook(self, input, result)

~/.local/lib/python3.6/site-packages/torch/nn/modules/container.py in forward(self, input)
     90     def forward(self, input):
     91         for module in self._modules.values():
---> 92             input = module(input)
     93         return input
     94 

~/.local/lib/python3.6/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    487             result = self._slow_forward(*input, **kwargs)
    488         else:
--> 489             result = self.forward(*input, **kwargs)
    490         for hook in self._forward_hooks.values():
    491             hook_result = hook(self, input, result)

~/.local/lib/python3.6/site-packages/torch/nn/modules/conv.py in forward(self, input)
    318     def forward(self, input):
    319         return F.conv2d(input, self.weight, self.bias, self.stride,
--> 320                         self.padding, self.dilation, self.groups)
    321 
    322 

RuntimeError: Given groups=1, weight of size [32, 1, 5, 5], expected input[64, 3, 33, 33] to have 1 channels, but got 3 channels instead

Congratulations, you have progressed. Is it an RGB image?

No it’s black and white, 33x33. This is how I converted the CSV to images:

import numpy as np
import pandas as pd
from PIL import Image
file=6
df = pd.read_csv('./data/{}.csv'.format(file))
for c in range(23000):
    label=df.iloc[c][0]
    im=df.iloc[c][1:].as_matrix().reshape(33,33)
    im = Image.fromarray(im)
    im = im.convert("L")
    im.save("./data/train/{}/{}_{}.jpeg".format(label, file, c))

Can you save it this way?

import numpy as np
import pandas as pd
from scipy.misc import imsave
file=6
df = pd.read_csv('./data/{}.csv'.format(file))
for c in range(23000):
    label=df.iloc[c][0]
    im=df.iloc[c][1:].as_matrix().reshape(33,33)
    imsave("./data/train/{}/{}_{}.png".format(label, file, c), im)

For some reason, your image is right now in rgb instead of black and white, if this doesn’t work,
come to github and lets write a custom dataloader for you. :slight_smile:

I did that and the images look much better now. However i still get the same error:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-23-b1d4464b941e> in <module>
      1 n_epochs = 10
      2 for epoch in range(n_epochs):
----> 3     train(epoch)

<ipython-input-19-6f4e84fa60fc> in train(epoch)
      9         optimizer.zero_grad()
     10 
---> 11         output = model(data)
     12 
     13         loss = criterion(output, target)

~/.local/lib/python3.6/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    487             result = self._slow_forward(*input, **kwargs)
    488         else:
--> 489             result = self.forward(*input, **kwargs)
    490         for hook in self._forward_hooks.values():
    491             hook_result = hook(self, input, result)

<ipython-input-2-46779b84067d> in forward(self, x)
     32 
     33     def forward(self, x):
---> 34         x = self.features(x)
     35         x = x.view(x.size(0), -1)
     36         x = self.classifier(x)

~/.local/lib/python3.6/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    487             result = self._slow_forward(*input, **kwargs)
    488         else:
--> 489             result = self.forward(*input, **kwargs)
    490         for hook in self._forward_hooks.values():
    491             hook_result = hook(self, input, result)

~/.local/lib/python3.6/site-packages/torch/nn/modules/container.py in forward(self, input)
     90     def forward(self, input):
     91         for module in self._modules.values():
---> 92             input = module(input)
     93         return input
     94 

~/.local/lib/python3.6/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    487             result = self._slow_forward(*input, **kwargs)
    488         else:
--> 489             result = self.forward(*input, **kwargs)
    490         for hook in self._forward_hooks.values():
    491             hook_result = hook(self, input, result)

~/.local/lib/python3.6/site-packages/torch/nn/modules/conv.py in forward(self, input)
    318     def forward(self, input):
    319         return F.conv2d(input, self.weight, self.bias, self.stride,
--> 320                         self.padding, self.dilation, self.groups)
    321 
    322 

RuntimeError: Given groups=1, weight of size [32, 1, 5, 5], expected input[64, 3, 33, 33] to have 1 channels, but got 3 channels instead

Actually I got the answer I was looking for originally, torch.utils.data.ConcatDataset is exactly i was looking for but i wanna try to do it with images as well since training with CSV files is very slow. If the dataset is composed of images it should be faster right?

I used your code and im.shape is (33,33) so i dont understand why it’s saving it with 3 channels and not 1…

Well, Looks like its time to build custom DataLoader, eh? :wink:

Ok :slight_smile: my github is BarisSayil. How should write the custom dataloader?