How to create a Dataloader for a txt file containing pixel values of dimension 28 by 28

sam101 · September 7, 2020, 12:17pm

I have many txt files, each containing the pixels values of dimensions 28 by 28. How can I create a Dataloader for them that I will feed to a GAN model

pfloat · September 7, 2020, 12:22pm

Hello,
can you show the content or formatting inside one text file ?

sam101 · September 7, 2020, 12:28pm

Here is the screenshot of the content of the file

sam101 · September 7, 2020, 2:03pm

I have attached the screen shot

pfloat · September 7, 2020, 2:25pm

In order to extract the pixels values from the file, we need to know how to extract one pixel value.
which color domain is used? is it a color pixel rgb or bgr ? or grayscale ? or …?

In the mean time, the float values (that are between -1 and 1?) can be extracted using a whitespace separator. The idea would be to do something like this:

with open("Im_1.txt", "r") as fd:
    content = fd.readlines().strip()
    # each row is split into a list of string representing float numbers
    # then each string is converted to a float with map(float, ...)
    pixel_values = [map(float, row.split(" ")) for row in content]

Now, IF the image is grayscale, the list pixel_values and contains 28 rows of 28 elements, we obtain the tensor:

image = torch.tensor(pixel_values)

sam101 · September 7, 2020, 3:30pm

Thank you for the answer and these are rgb values and how can I make a Dataloader for list of txt files like we have for images in a folder

pfloat · September 8, 2020, 2:05am

Hello,
ok then it means, it is possible to group the float values by 3 to get color pixels assuming the floats are gathered in a list and that this is the original intended format for the file.

Now a solution would be to subclass the torch.utils.data.Dataset class:

import itertools
from torch.utils.data import Dataset, DataLoader


def chunks(lst, n):
    """Yield successive n-sized chunks from lst."""
    for i in range(0, len(lst), n):
        yield lst[i:i + n]

class TextDataset(Dataset):
    def __init__(self, text_file_paths, other_arguments):
        self.paths = text_file_paths
        # other processing you may want
        ...

    def __len__(self):
        return len(self.paths)
    
    def __getitem__(self, idx):
        path = self.paths[idx]
        with open(path, "r") as fd: # EDIT 
            content = fd.readlines().strip()
        pixel_values = [map(float, row.split(" ")) for row in content]
        # all float values in one list
        all_float_values = list(itertools.chain.from_iterable(pixel_values)))
        # group the float values by 3
        color_pixels = list(chunks(all_float_values, 3))
        # again, split this color_pixels into a list of 28 elements
        image = list(chunks(color_pixels, 28))
        return torch.tensor(image)

StackOverflow links:

Finally

text_ds = TextDataset(your_text_file_paths_list, ...)
mydataloader = DataLoader(text_ds, batch_size=16, shuffle=True)

Hope it helps

sam101 · September 8, 2020, 8:46am

Thank you so much. I really appreciate it.