Hi,
I’m working on video classification, where I have extracted the frames from each video in my dataset, preprocessed them (i.e., cropped faces), and saved them as .png images. I was able to run one epoch of training and validation, but then I noticed that running any more epochs always resulted in an error, because some images would never be read properly anymore, neither with cv2 nor with PIL.Image (I tried both). So, if I rerun the training process again, it will throw an error in the first epoch of that run. When I locate a certain “invalid” image and run the following code:
from PIL import Image
import numpy as np
path = '' # path to image
img = Image.open(path)
im = np.array(img)
I get the following error:
OSError: unrecognized data stream contents when reading image file
If I try:
from PIL import ImageFile
ImageFile.LOAD_TRUNCATED_IMAGES = True
then I do not get this error, but the image then contains many zeros at the end (these zeros are not part of the original image).
Strangely, if I run the code (without the LOAD_TRUNCATED_IMAGES) for opening the image in the same conda environment but on a different machine, then the image opens fine, which suggests that the image is not actually corrupted.
I also tried creating a new conda environment on the problematic machine and installing only the newest version of PIL, but the same error occurs.
Further, I deleted all images and then pre-processed all the frames again but the problem repeats: first epoch of training is fine, then certain images (many but not all) cannot be read again.
Here’s how I read the clips in my custom Dataset. I have only included the relevant methods from the class:
def get_clip(self, idx):
video_idx = bisect.bisect_right(self.cumulative_sizes, idx)
if video_idx == 0:
clip_idx = idx
else:
clip_idx = idx - self.cumulative_sizes[video_idx - 1]
path = self.paths[video_idx]
frames = sorted(os.listdir(os.path.join(self.root, path)))
start_idx = clip_idx * (self.frames_per_clip * self.frame_dilation + self.step_between_clips - 1)
end_idx = start_idx + self.frames_per_clip * self.frame_dilation
video = []
for idx in range(start_idx, end_idx, self.frame_dilation):
# img = cv2.imread(os.path.join(self.root, path, frames[idx]))
# img = cv2.cvtColor(img, cv2.COLOR_RGB2BGR)
img = Image.open(os.path.join(self.root, path, frames[idx]))
img = np.array(img)
video.append(torch.from_numpy(img))
video = torch.stack(video)
return video, video_idx
def __getitem__(self, idx):
video, video_idx = self.get_clip(idx)
if video_idx < self.videos_per_type['youtube'] + self.videos_per_type['real']:
label = 0
else:
label = 1
label = torch.tensor(label, dtype=torch.float32).unsqueeze(-1)
if self.transform is not None:
video = self.transform(video)
return video, label, video_idx
I would massively appreciate any help regarding this strange issue, as I am currently out of ideas. Please let me know if you need more information. Thanks!