Reading a csv file for the prediction of model

I am reading the 1st 15 columns of a csv file for the prediction of a model. The 15th column has the scientific name of the plant that I am using in the task. When I run the code it shows an error “no such file or directory “Alliaria petiolata”” which is the scientific name of the plant. I don’t know how to solve this problem as it is not a file or directory but just an entry in a columns. The code is

 with torch.no_grad():
    for data in testloader:
        images = data
        output = model(images)
        class_probs_batch = [F.softmax(el, dim=0) for el in output]
        _, class_preds_batch = torch.max(output, 1)

        class_probs.append(class_probs_batch)
        class_preds.append(class_preds_batch)

and a custom Dataset class

class CustomDataSet(Dataset):
def __init__(self, csv_file, root_dir, transform):
    self.root_dir = root_dir
    self.transforms = transform
    self.dataframe = pd.read_csv(csv_file)

def __len__(self):
    return len(self.dataframe)

def __getitem__(self, idx):
    if torch.is_tensor(idx):
        idx = idx.tolist()
    img_path = self.dataframe.iloc[idx, 15]
    image = Image.open(img_path).convert("RGB")
    #image = Image.open(img_path)

    tensor_image = self.transforms(image)
    return tensor_image

The error is:

Traceback (most recent call last)
Input In [6], in <cell line: 1>()
      1 with torch.no_grad():
----> 2     for data in testloader:
      3         images = data
      4         output = model(images)

File ~/.conda/envs/flowering/lib/python3.10/site-packages/torch/utils/data/dataloader.py:530, in _BaseDataLoaderIter.__next__(self)
    528 if self._sampler_iter is None:
    529     self._reset()
--> 530 data = self._next_data()
    531 self._num_yielded += 1
    532 if self._dataset_kind == _DatasetKind.Iterable and \
    533         self._IterableDataset_len_called is not None and \
    534         self._num_yielded > self._IterableDataset_len_called:

File ~/.conda/envs/flowering/lib/python3.10/site-packages/torch/utils/data/dataloader.py:570, in _SingleProcessDataLoaderIter._next_data(self)
    568 def _next_data(self):
    569     index = self._next_index()  # may raise StopIteration
--> 570     data = self._dataset_fetcher.fetch(index)  # may raise StopIteration
    571     if self._pin_memory:
    572         data = _utils.pin_memory.pin_memory(data)

File ~/.conda/envs/flowering/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py:49, in _MapDatasetFetcher.fetch(self, possibly_batched_index)
     47 def fetch(self, possibly_batched_index):
     48     if self.auto_collation:
---> 49         data = [self.dataset[idx] for idx in possibly_batched_index]
     50     else:
     51         data = self.dataset[possibly_batched_index]

File ~/.conda/envs/flowering/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py:49, in <listcomp>(.0)
     47 def fetch(self, possibly_batched_index):
     48     if self.auto_collation:
---> 49         data = [self.dataset[idx] for idx in possibly_batched_index]
     50     else:
     51         data = self.dataset[possibly_batched_index]

Input In [2], in CustomDataSet.__getitem__(self, idx)
     18     idx = idx.tolist()
     19 img_path = self.dataframe.iloc[idx, 15]
---> 20 image = Image.open(img_path).convert("RGB")
     21 #image = Image.open(img_path)
     23 tensor_image = self.transforms(image)

File ~/.conda/envs/flowering/lib/python3.10/site-packages/PIL/Image.py:3092, in open(fp, mode, formats)
   3089     filename = fp
   3091 if filename:
-> 3092     fp = builtins.open(filename, "rb")
   3093     exclusive_fp = True
   3095 try:

FileNotFoundError: [Errno 2] No such file or directory: 'Alliaria petiolata'

This description does not fit the usage of this column in:

    img_path = self.dataframe.iloc[idx, 15]
    image = Image.open(img_path).convert("RGB")

since you are indeed using it as an image path.
Based on your description it sounds as if the actual input data is not an image but is stored as features in the columns [0, 14] while column 15 contains the target name (which you would most likely have to map to a class index).

Thank you for your reply. can you tell me what should I do here. I don’t know what to do exactly. How to change it?

When I read less or more columns than 15 like:
img_path = self.dataframe.iloc[idx, 10]
it gives me: numpy.int64’ object has no attribute ‘read’

It seems you are still trying to call Image.open on the indexing result from the DataFrame so you would need to check what exactly is stored in your DataFrame and how you would like to process and use it.
From your description it seems the DataFrame contains numerical values as well as strings containing the class name. In this case, why would you want to use Image.open? Is the DataFrame also storing paths to images, which you want to open?

how can I check that what is stored in the dataframe? It is not working with show() or head() or print(). I am not getting this point. I am running someone else code and I have to make some changes later once it works.

in the datafram I think I am reading this data

I would recommend to check the DataFrame and think how each entry should be used.
The last picture shows that the DataFrame contains mostly strings, which you won’t be able to use directly as inputs to any neural network.
Check how these string values should be remapped to any feature values, what the data, and target should contain, and write a custom Dataset to load these samples.

@ptrblck thank you for your help and support. That part of code is now working but I am facing another error. I am pasting the code here and the error. I don’t know how to give it the path of the images and it gives me keyerror.

#Get model output
preds = test_preds.numpy()
probs = test_probs.numpy()
model_output = np.column_stack((preds, probs))
model_output = pd.DataFrame(model_output, columns = ['Label', 'f_prob', 'nf_prob'])
model_output['imagepaths'] = list(img_list['imagepaths'])
model_output.to_csv('model_predictions_2.csv')```

In this code I don't know how to pass the imagespath. and the error is

Traceback (most recent call last)
File ~/.conda/envs/flowering/lib/python3.10/site-packages/pandas/core/indexes/base.py:3621, in Index.get_loc(self, key, method, tolerance)
3620 try:
→ 3621 return self._engine.get_loc(casted_key)
3622 except KeyError as err:

File ~/.conda/envs/flowering/lib/python3.10/site-packages/pandas/_libs/index.pyx:136, in pandas._libs.index.IndexEngine.get_loc()

File ~/.conda/envs/flowering/lib/python3.10/site-packages/pandas/_libs/index.pyx:163, in pandas._libs.index.IndexEngine.get_loc()

File pandas/_libs/hashtable_class_helper.pxi:5198, in pandas._libs.hashtable.PyObjectHashTable.get_item()

File pandas/_libs/hashtable_class_helper.pxi:5206, in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: ‘imagepaths’

The above exception was the direct cause of the following exception:

KeyError Traceback (most recent call last)
Input In [29], in <cell line: 6>()
4 model_output = np.column_stack((preds, probs))
5 model_output = pd.DataFrame(model_output, columns = [‘Label’, ‘f_prob’, ‘nf_prob’])
----> 6 model_output[‘imagepaths’] = list(img_list[‘imagepaths’])
7 model_output.to_csv(‘model_predictions_2.csv’)

File ~/.conda/envs/flowering/lib/python3.10/site-packages/pandas/core/frame.py:3505, in DataFrame.getitem(self, key)
3503 if self.columns.nlevels > 1:
3504 return self._getitem_multilevel(key)
→ 3505 indexer = self.columns.get_loc(key)
3506 if is_integer(indexer):
3507 indexer = [indexer]

File ~/.conda/envs/flowering/lib/python3.10/site-packages/pandas/core/indexes/base.py:3623, in Index.get_loc(self, key, method, tolerance)
3621 return self._engine.get_loc(casted_key)
3622 except KeyError as err:
→ 3623 raise KeyError(key) from err
3624 except TypeError:
3625 # If we have a listlike key, _check_indexing_error will raise
3626 # InvalidIndexError. Otherwise we fall through and re-raise
3627 # the TypeError.
3628 self._check_indexing_error(key)

KeyError: ‘imagepaths’


Please help me in this.
Thank you

It seems that list(img_list['imagepaths']) is failing to index the img_list. Make sure this key is valid and you will be able to read it before trying to assign it to model_output.

2 Likes