The dataLoader object is created without anyissue. But when I iterate through
for step, (x, y) in enumerate(dataloader):
The program crashes @ getitem
def __getitem__(self, index):
img, target = None, None
env = self.env
with env.begin(write=False) as txn:
imgbuf = txn.get(self.keys[index])
print('The buffer information :',len(imgbuf),index,self.keys[index])
buf = six.BytesIO()
buf.write(imgbuf)
buf.seek(0)
img = Image.open(buf).convert('RGB')
File “”, line 30, in getitem
img = Image.open(buf).convert(‘RGB’)
File “/anaconda/envs/py35/lib/python3.5/site-packages/PIL/Image.py”, line 2319, in open
% (filename if filename else fp))
OSError: cannot identify image file <_io.BytesIO object at 0x7f677db0b570>
Please Note, However, if I load LSUN LMDB dataset default from Pytorch then it works fine.Same set of code is used but with Caffe Lmdb it crashes…
Thanks for your reply. I have PIL installed and I have used same piece of code from the share link from LSUNClass. Update the question with code snippet.
Strange thing is I can read LSUN dataset which is also in LMDB format but I cannot read my Caffe LMDB dataset
Sorry for confusion. I have pillow in my package list. Imported as
from PIL import Image
I also tried with python 2.7 version to load lmdb as caffe uses python 2.7 but no affect.
I tried to load the data without lmdb just with image name and label as list by overriding getitem function. But it is horribly slow. Thats why I tried to load lmdb. Will raise issue of loading dataset as list in separate chain.
But I am still unable to figure out why
Image.open(buf). fails
I printed the
print('The buffer information :',len(imgbuf),index,self.keys[index])
It prints correctly the Key. Problem is with IO buffering. Tried to use OpenCV and the error goes away but it has problem further down the chain.
with env.begin(write=False) as txn:
imgbuf = txn.get(self.keys[index])
print('The buffer information :',len(imgbuf),index,self.keys[index])
import cv2
import numpy
img = cv2.imdecode(
numpy.fromstring(imgbuf, dtype=numpy.uint8), 1)
hmm, i dont have any more pointers, but it does look like some bug wrt how the buffer is, maybe there’s a newline at the end that PIL doesn’t expect, but maybe OpenCV is okay with it?
Ok thanks… I dont know because using the LMDB generated by Caffe. Not sure internals of it. For now I have to stick to Caffe for the use case I’m addressing.
One request is it would be great if we get generic utilities to prepare the dataset in LMDB, HDF5,LIST that is compatible with Pytorch ( Goes through the iterator and batch perfectly) for seamless data loading and data preparation. Lot of time is wasted after this.
I already have python scripts to do the same.It seems I will have to study the PyTorch Framework in more detail especially DataLoader to make sure integration is seamless. Will post queries for more pointers if I hit a problem while doing that.