morning all,
I have 2 custom data sets which is generated externally and both in the format [data1, data2, data3,data,4]. I use the following function combine the datasets into a single entity. where data1 is a [n,m] array
class DoubleDataset(Dataset):
def __init__(self):
data_a = torch.load('data1.pt')
data_b = torch.load('data2.pt')
self.data_1=torch.utils.data.ConcatDataset((data_a ,data_b ))
data_c = torch.load('data3.pt')
data_d= torch.load('data4.pt')
self.data_2=torch.utils.data.ConcatDataset((data_c ,data_d))
def __getitem__(self, index):
return self.data_1[index], self.data_2[index]
def __len__(self):
return min(len(self.data_1), len(self.data_2))
This works fine, can load the data into my network, normalise by batch and everything is hunkey dorey.
I would like to try and normalise data_1 and data_2 by the dataset and not by batch but when i get the following error:
data_1.max()
AttributeError: 'ConcatDataset' object has no attribute 'max'
If i load the dataset into memory its possible to use the following to normalise the whole dataset, but is it possible to get this into the class? if so where does it go?
batch_samples = data_1[0].size(0)
data = data_1[0].view(batch_samples, data_1[0].size(1), -1)
data2=(data-data.min())/(data.max()-data.min())
data3=(data-0.5/0.5
Failing this, if i do this
def normalise_dataset(dataload):
loader=DataLoader(dataload, batch_size=len(dataload), num_workers=0, shuffle=False)
for (data_1,data_2) in loader:
batch_samples = data_1[0].size(0)
data = data_1[0].view(batch_samples, data_1[0].size(1), -1)
data2=(data-data.min())/(data.max()-data.min())
data_1_O=(data2-0.5)/0.5
data_1[0]=data_1_O
batch_samples = data_2[0].size(0)
dataI = data_2[0].view(batch_samples, data_2[0].size(1), -1)
dataI2=(dataI-dataI.min())/(dataI.max()-dataI.min())
data_2_O=(dataI2-0.5)/0.5
data_2[0]=data_2_O
norm=torch.utils.data.ConcatDataset((data_1,data_2))
return norm
norm=normalise_dataset(dataload)
test=DataLoader(
norm,
batch_size=25,
num_workers=0,
shuffle=False
)
for i, (data_1,data_2) in enumerate(test):
test1=data_1[0]
test2=data_2[0]
This normalises the whole dataset, however when i try and load test into the network i get
RuntimeError: Expected object of scalar type Float but got scalar type Long for sequence element 1 in sequence argument at position #1 'tensors'
but the only elements that have changed are the data_1[0] & data_1[0], and the formatting of these are the same as the previous.
How do i repack the original dataload with the new values of data_1 & data_2?
Chaslie
PS, sorry for the bad coding…