Boazrciasn
(Barış özcan)
November 16, 2018, 8:00am
1
Hello everyone,
The following code returns ByteTensor in torchvision 0.2.1, osx,
self.train_dataset = datasets.MNIST(root=root,
train=True,
transform=transforms.ToTensor(),
download=True)
self.test_dataset = datasets.MNIST(root=root,
train=False,
transform=transforms.ToTensor())
print(self.test_dataset.test_data.type())
print(self.train_dataset.train_data.type())
Is this a bug or am I missing something here? As far as I remember it did return FloatTensor when I previously used it.
Thanks in advance!
ptrblck
November 16, 2018, 8:20am
2
The underlying data might still be stored as bytes. However, your ToTensor()
transformation should return a FloatTensor
for each sample.
Try to check the type of an instance, since the transformation will be applied in __getitem__
: print(self.test_dataset[0][0].type())
.
Boazrciasn
(Barış özcan)
November 16, 2018, 8:36am
3
print(self.test_dataset.test_data[0].type()) is also a ByteTensor. I use a temporary solution by manually converting into FloatTensor however there was no problem with this snippet before as far as I remember(maybe before 0.4.1, I am not sure unfortunately).
ptrblck
November 16, 2018, 9:20am
4
Using your statement you are still getting the underlying test_data
.
Try the following:
print(self.train_dataset[0][0].type())
, without calling .test_data
.
PS: I had a small error in my previous post, as you need to index the data of the returned tuple.
Boazrciasn
(Barış özcan)
November 16, 2018, 9:24am
5
Oh I thought they were same:)
Yes now it prints out FloatTensor. I try to get a random subset from the train_dataset with the following code,
for class_type in labels.unique():
indices = np.where(labels == class_type)
sample_indices = torch.randint(int(indices[0][0]),
int(indices[0][-1]),
(samples_per_class,)).long()
real_sample_pos = torch.cat((real_sample_pos, real_indices[sample_indices]))
if shuffle:
np.random.permutation(real_sample_pos)
self.small_train_set = self.train_dataset.train_data[real_sample_pos].type(torch.FloatTensor)
self.small_train_labels = self.train_dataset.train_labels[real_sample_pos]
Is there a more elegant way where I can avoid .type(torch.FloatTensor) ?
Thank you very much for your help!
ptrblck
November 16, 2018, 10:16am
6
You could use a SubsetRandomSampler
, keep your Dataset
as it is, and just pass the sampler to your DataLoader
.
Assuming real_sample_pos
was somehow created, here is a small example:
real_sample_pos = torch.randperm(len(train_dataset.data))[:100]
sampler = sampler.SubsetRandomSampler(real_sample_pos)
loader = DataLoader(
train_dataset,
sampler=sampler,
batch_size=10
)
This would avoid working with the dataset internals, as e.g. now your transformations might not work on self.small_train_set
.
1 Like
Boazrciasn
(Barış özcan)
November 16, 2018, 10:37am
7
Thank you for the insight!