Dataset class returns the data twice

hello everyone, I was trying to test my concatenated model on some unlabeled data so it should go over each data in a directory and label it then go to the next, and so on. however, the data seems to be returned twice so it gives me a wrong tensor shape, and also it returns an empty array for the second data. here is the code i am using:

class Hfdata(Dataset):
    def __init__(self, data_dir, file_name):
        self.data_dir = data_dir
        self.file_name = file_name
        with h5py.File(f"{data_dir}/{file_name}", "r") as f:
   = f["data"][:]

    def __len__(self):
        return len(

    def __getitem__(self, idx):
        x =[idx]
        return x
folder_path_sp = 'test_data'
file_list_sp = glob.glob(os.path.join(folder_path_sp, '*.h5'))
folder_path_tm = 'test_data_tm'
file_list_tm = glob.glob(os.path.join(folder_path_tm, '*.h5'))
model = Ensemble(ModelA(), ModelB())

# predict on the new data
predictions = []
for file_path_sp, file_path_tm in zip(file_list_sp, file_list_tm):
    file_name_sp = os.path.basename(file_path_sp)
    file_name_tm = os.path.basename(file_path_tm)
    print(f"Prediction on {file_name_sp} and {file_name_tm}")
    dataset_sp_p = Hfdata(folder_path_sp, file_name_sp)
    dataloader_sp_p = DataLoader(dataset_sp_p, batch_size=batch_size, shuffle=False)
    dataset_tm_p = Hfdata(folder_path_tm, file_name_tm)
    dataloader_tm_p = DataLoader(dataset_tm_p, batch_size=batch_size, shuffle=False)
    for i, ((data_sp), (data_tm)) in enumerate(zip(dataloader_sp_p, dataloader_tm_p)):
        # prepare the data
        data_sp = data_sp.permute(0, 4, 1,2, 3)
        data_sp = data_sp.float()
        data_tm = data_tm.view(-1, 1, 609).float()

        # pass the data through the model
        with torch.no_grad():
            outputs = model(data_sp, data_tm)

        # obtain the predicted labels from the predicted outputs
        _, predicted_labels = torch.max(, 1)

        # append the predicted labels to the list of predictions

# print the predictions

and this is just print the following:
(79, 95, 79, 1)
(79, 95, 79, 1)
torch.Size([2, 79, 95, 79, 1])
torch. Size([2])

so, my you please help me with that issue?
many thanks in advance.

I don’t fully understand the issue since I would assume a RuntimeError would be raised if an empty tensor is processed in the model.
Could you explain your use case a bit more and which shapes are expected where?

yes, so normally I have some 3D data and another 1D data (they actually form the same source (4D data) but they are separated and was using in two different models (3D and 1D) then they were concatenated into one model (Ensamble model) and was trained and evaluated and this all works fine. then for testing i wanted this ensemble model to label those data but taking into account 1 data at a time (so 3D +1D then give me the label). but here in the code I provided, from the print statement in the getitem it seems to return duplicated data. the 3D data ((79, 95, 79, 1)
(79, 95, 79, 1)), and also the 1D data but it returns as empty(()
()) so indeed i have an empty tensor for this 1D data which should be torch. Size([1, 1, 609]) and the 3D data should be torch.Size([1, 1, 79, 95, 79]).
I hope this elaborates it a bit.

I have managed to solve it by updating the class Hfdata as follows:

class Hfdata(
    def __init__(self, data_dir, file_name):
        self.data_dir = data_dir
        self.file_name = file_name = []
        with h5py.File(f"{data_dir}/{file_name}", "r") as f:
        #print(f"Number of data samples in {file_name}: {len(}")

        #print(f'length: {len(}')

    def __len__(self):
        return len(

    def __getitem__(self, idx):
        x =[idx]
        #print(f"shape: {x.shape}")
        return x

so now I am getting the correct shape of the data and the tensor.