Hello Guys,
im new to Pytorch and DL in genereal, so i hope this is the right place to ask questions like this
I wanted to create my first record, but my record becomes infinite. This problem should be easiest to show with the codes and outputs.
class DataframeDataset(Dataset):
"""Load Pytorch Dataset from Dataframe
"""
def __init__(self, data_frame, input_key, target_key, transform=None, features=None):
self.data_frame = data_frame
self.input_key = input_key
self.target_key = target_key
self.inputs = self.data_frame[input_key]
self.targets = self.data_frame[target_key]
self.transform = transform
self.features = [input_key, target_key] if features is None else features
self.len = len(self.inputs)
def __len__(self):
return self.len
def __str__(self):
return str(self.info())
def info(self):
info = {
'features': self.features,
'num_rows': len(self)
}
return info
def __getitem__(self, idx):
if torch.is_tensor(idx):
idx = idx.tolist()
data = {
self.input_key: self.inputs[idx],
self.target_key: self.targets[idx]
}
if self.transform:
return self.transform(data)
return data
lets look at some mook data:
data = [("x1", "y2", "A3"), ("x1", "y2", "b3"), ("x1", "y2", "c3"), ("x1", "y2", "d3")]
df = pd.DataFrame(data, columns=['input', 'target', 'random'])
print(df.head())
input target random
0 x1 y2 A3
1 x1 y2 b3
2 x1 y2 c3
3 x1 y2 d3
ds = DataframeDataset(data_frame=df, input_key="input", target_key="target", transform=None)
print("*" * 40)
print("Len:", len(ds))
print("Ds", ds)
print(ds[0])
Len: 4
Ds {'features': ['input', 'target'], 'num_rows': 4}
{'input': 'x1', 'target': 'y2'}
So the basic functions seem to work. However, if I want to iterate over the data with a foreach loop, then unfortunately the loop does not know the boundaries. So I get a key-error, because the torch accesses indicies outside the boundary.
for idx, data in enumerate(ds):
print(idx,"->",data)
0 -> {'input': 'x1', 'target': 'y2'}
1 -> {'input': 'x1', 'target': 'y2'}
2 -> {'input': 'x1', 'target': 'y2'}
3 -> {'input': 'x1', 'target': 'y2'}
Traceback (most recent call last):
File "/home/warmachine/.local/lib/python3.8/site-packages/pandas/core/indexes/range.py", line 351, in get_loc
return self._range.index(new_key)
ValueError: 4 is not in range
By changing the __len__
function, somehow nothing changes. If I let the function always return 0, for example, I get the same error.
def __len__(self):
return 0
→
ds = DataframeDataset(data_frame=df, input_key="input", target_key="target", transform=None)
print("*" * 40)
print("Len:", len(ds))
print("Ds", ds)
for idx, data in enumerate(ds):
print(idx,"->",data)
Len: 0
Ds {'features': ['input', 'target'], 'num_rows': 0}
{'input': 'x1', 'target': 'y2'}
0 -> {'input': 'x1', 'target': 'y2'}
1 -> {'input': 'x1', 'target': 'y2'}
2 -> {'input': 'x1', 'target': 'y2'}
3 -> {'input': 'x1', 'target': 'y2'}
Traceback (most recent call last):
File "/home/warmachine/.local/lib/python3.8/site-packages/pandas/core/indexes/range.py", line 351, in get_loc
return self._range.index(new_key)
ValueError: 4 is not in range
What am i doing wrong?
if i leave __len__
at self.len
and i try this:
for idx in range(0, len(ds)):
data = ds[idx]
print(idx,"->",data)
Everything works as expected, so what am i doing wrong?
1 -> {'input': 'x1', 'target': 'y2'}
2 -> {'input': 'x1', 'target': 'y2'}
3 -> {'input': 'x1', 'target': 'y2'}
Process finished with exit code 0
I would pref to use it without range.
Ty in advanced