Hi, I am trying to create a class in order to read and create a json dataset for use in a CNN.
The json files are like this (pose keypoints from images):
{
"0": {
"PoseKeypoints": [
[
2529.287109375,
1424.733642578125,
0.9513001441955566
],
[
2574.495849609375,
1384.9786376953125,
0.9392595291137695
],
[ ...
I have to store this keypoints into a pytorch tensor.
My idea is to use iterable objects in the class, but I don’t know how to do it.
Thank you!!
I have written this model
data = []
class json_dataset(Dataset):
def __init__(self, root_dir):
self.root_dir= root_dir
def __getitem__(self,index):
for file in os.listdir(self.root_dir):
if file.endswith('json'):
json_path = os.path.join(self.root_dir, file)
#json_data = pd.read_json(json_path, lines=True)
json_data = json.load(open(json_path))
for keypoints in json_data.items():
valores = keypoints['PoseKeypoints']
keypoints_normalized.append(valores)
data.append(json_data)
data = torch.FloatTensor(data)
return(data)
What do you think?
The general loading looks alright (replace torch.FloatTensor
with e.g. torch.from_numpy
or torch.tensor
), but the for loop looks wrong.
In the __getitem__
method you would use the index
to load a single sample while it seems you are trying to iterate all json files and return the very first one all all indices.
1 Like
Now I have this:
class json_dataset(Dataset):
keypoints = []
def __init__(self, root_dir):
self.root_dir= root_dir
def __str__(self):
return str(data) #???
def __getitem__(self,index):
for file in os.listdir(self.root_dir):
if file.endswith('json'):
json_path = os.path.join(self.root_dir, file)
json_data = json.load(open(json_path))
for k in json_data["0"]["PoseKeypoints"]:
keypoints.append(k)
data.append(keypoints)
data = torch.Tensor(data)
return(data)
How do I do what you tell me about the index?
Also I want to check if the return “data” is correct, but when I print it returns me an empty tensor. I checked out of a class and it should me return well.
The str function is for this.
Thank you!
In the common use case you have a specific number of samples in the Dataset
and this number of samples is returned in the Dataset.__len__
function. In the __getitem__
method you are using the index
(which has values in the range [0, len(dataset)-1]
) to load a single sample for this index
.
E.g. if each sample is stored in a separate json
file in the self.root_dir
you could load the corresponding file using the index
instead of iterating all files.
Add print
statements to the __getitem__
method and check which objects are valid and where the tensor becomes an empty one.
The __str__
function uses data
, which is undefined or globally defined, so check what is being printed there.
1 Like
Ok, I made some changes:
class json_dataset(Dataset):
def __init__(self, csv_file, root_dir):
self.annotations= pd.read_csv(csv_file)
self.root_dir= root_dir
def __len__(self):
return len(self.annotations)
def __getitem__(self,index):
json_path= os.path.join(self.root_dir, self.annotations.iloc[index, 0])
json_data = json.load(open(json_path))
keypoints = []
if "0" in json_data:
for k in json_data["0"]["PoseKeypoints"]:
keypoints.append(k)
data.append(keypoints)
data = torch.Tensor(data)
return(data)
The csv file has the names of the json files. Now it works, thank you!
But I have a question: How the index works so that in training you can access all the images? I need that the dataset loads sorted with the filenames.