I cannot access my dataset instance after call torch.utils.data.random_split

Jim_Xu · August 7, 2020, 8:32am

Hi ,as the topic said, this is my code:

import imutils
import cv2
from imutils import paths
import torch
import torch.nn as nn
import torch.optim as optim
from torch.optim import lr_scheduler
import numpy as np
import torchvision
from torchvision import datasets, models, transforms
import matplotlib.pyplot as plt
import time
from PIL import Image
import os
import copy
if name == ‘main’:
# Data augmentation and normalization for training
# Just normalization for validation
data_transforms = {
transforms.Compose([
transforms.Resize((224,224)),
transforms.ToTensor()
]),

}

data_dir = './pill'

image_datasets = datasets.ImageFolder(data_dir,
                                          data_transforms)

dataloaders = torch.utils.data.DataLoader(image_datasets, batch_size=8,
                                             shuffle=True, num_workers=8)

train_size = int(0.8 * len(image_datasets))
test_size = len(image_datasets) - train_size
train_dataset, test_dataset = torch.utils.data.random_split(image_datasets, [train_size, test_size])

I can get the length by len(train_dataset.dataset), while I can not get train_dataset.dataset[0], this will cause an error:

TypeError Traceback (most recent call last)
in
----> 1 train_dataset.dataset[0]

~\anaconda3\Lib\site-packages\torchvision\datasets\folder.py in getitem(self, index)
138 sample = self.loader(path)
139 if self.transform is not None:
–> 140 sample = self.transform(sample)
141 if self.target_transform is not None:
142 target = self.target_transform(target)

TypeError: ‘set’ object is not callable

Could you help me solve this problem? Thanks a lot!!

ptrblck · August 9, 2020, 3:30am

The error doesn’t seem to be raised by the usage of random_split and should also be raised, if you try to index image_datasets directly, as the passed transformation seems to be defined in a wrong way.
While you are using:

data_transforms = {
transforms.Compose([
transforms.Resize((224,224)),
transforms.ToTensor()
]),
}

which will create a set object.
Remove the curly brackets and rerun the code.

PS: you can post code snippets by wrapping them into three backticks, which makes debugging easier.

Jim_Xu · August 16, 2020, 11:07am

Thanks a LOT !!! I will make a try now

Jim_Xu · August 17, 2020, 12:12am

Thank you so much for your last response!! The data could be accessed, while in the training phase, I meet another problem, could you tell me how may I fix it?

Epoch 0/24

RuntimeError Traceback (most recent call last)
in
1 model_ft = train_model(model_ft, criterion, optimizer_ft, exp_lr_scheduler,
----> 2 num_epochs=25)

in train_model(model, criterion, optimizer, scheduler, num_epochs)
36 # backward + optimize only if in training phase
37 if phase == ‘train’:
—> 38 loss.backward()
39 optimizer.step()
40

~\anaconda3\Lib\site-packages\torch\tensor.py in backward(self, gradient, retain_graph, create_graph)
183 products. Defaults to False.
184 “”"
–> 185 torch.autograd.backward(self, gradient, retain_graph, create_graph)
186
187 def register_hook(self, hook):

~\anaconda3\Lib\site-packages\torch\autograd_init_.py in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables)
125 Variable._execution_engine.run_backward(
126 tensors, grad_tensors, retain_graph, create_graph,
–> 127 allow_unreachable=True) # allow_unreachable flag
128
129

RuntimeError: CUDA error: CUBLAS_STATUS_ALLOC_FAILED when calling cublasCreate(handle)
Exception raised from createCublasHandle at …\aten\src\ATen\cuda\CublasHandlePool.cpp:8 (most recent call first):
00007FFB63A875A200007FFB63A87540 c10.dll!c10::Error::Error [ @ ]
00007FFAFF31AEA800007FFAFF319E70 torch_cuda.dll!at::cuda::getCurrentCUDASparseHandle [ @ ]
00007FFAFF31A7D800007FFAFF319E70 torch_cuda.dll!at::cuda::getCurrentCUDASparseHandle [ @ ]
00007FFAFF31B66700007FFAFF31B1A0 torch_cuda.dll!at::cuda::getCurrentCUDABlasHandle [ @ ]
00007FFAFF31B24700007FFAFF31B1A0 torch_cuda.dll!at::cuda::getCurrentCUDABlasHandle [ @ ]
00007FFAFF31320700007FFAFF3124B0 torch_cuda.dll!at::native::sparse_mask_cuda [ @ ]
00007FFAFE81CA9700007FFAFE81B990 torch_cuda.dll!at::native::lerp_cuda_tensor_out [ @ ]
00007FFAFE81E4D200007FFAFE81DF60 torch_cuda.dll!at::native::addmm_out_cuda [ @ ]
00007FFAFE81F64300007FFAFE81F560 torch_cuda.dll!at::native::mm_cuda [ @ ]
00007FFAFF381B0F00007FFAFF31E0A0 torch_cuda.dll!at::native::set_storage_cuda_ [ @ ]
00007FFAFF371B2200007FFAFF31E0A0 torch_cuda.dll!at::native::set_storage_cuda_ [ @ ]
00007FFB2969D94900007FFB29698FA0 torch_cpu.dll!at::bucketize_out [ @ ]
00007FFB296D057700007FFB296D0520 torch_cpu.dll!at::mm [ @ ]
00007FFB2AA2EC7900007FFB2A93E010 torch_cpu.dll!torch::autograd::GraphRoot::apply [ @ ]
00007FFB291E715700007FFB291E6290 torch_cpu.dll!at::indexing::TensorIndex::boolean [ @ ]
00007FFB2969D94900007FFB29698FA0 torch_cpu.dll!at::bucketize_out [ @ ]
00007FFB297B210700007FFB297B20B0 torch_cpu.dll!at::Tensor::mm [ @ ]
00007FFB2A8CB71100007FFB2A8CA760 torch_cpu.dll!torch::autograd::profiler::Event::kind [ @ ]
00007FFB2A8816D800007FFB2A881580 torch_cpu.dll!torch::autograd::generated::AddmmBackward::apply [ @ ]
00007FFB2A877E9100007FFB2A877B50 torch_cpu.dll!torch::autograd::Node::operator() [ @ ]
00007FFB2ADDF9BA00007FFB2ADDF300 torch_cpu.dll!torch::autograd::Engine::add_thread_pool_task [ @ ]
00007FFB2ADE03AD00007FFB2ADDFFD0 torch_cpu.dll!torch::autograd::Engine::evaluate_function [ @ ]
00007FFB2ADE4FE200007FFB2ADE4CA0 torch_cpu.dll!torch::autograd::Engine::thread_main [ @ ]
00007FFB2ADE4C4100007FFB2ADE4BC0 torch_cpu.dll!torch::autograd::Engine::thread_init [ @ ]
00007FFB4A0E0A7700007FFB4A0BA150 torch_python.dll!THPShortStorage_New [ @ ]
00007FFB2ADDBF1400007FFB2ADDB780 torch_cpu.dll!torch::autograd::Engine::get_base_engine [ @ ]
00007FFB9E6B0E8200007FFB9E6B0D40 ucrtbase.dll!beginthreadex [ @ ]
00007FFBA0967BD400007FFBA0967BC0 KERNEL32.DLL!BaseThreadInitThunk [ @ ]
00007FFBA15CCE5100007FFBA15CCE30 ntdll.dll!RtlUserThreadStart [ @ ]

ptrblck · August 18, 2020, 3:44am

Could you rerun the code via:

CUDA_LAUNCH_BLOCKING=1 python script.py args

and post the stack trace here?

Also, check the memory usage on your device, as this error message is sometimes raised instead of a proper message pointing towards an out of memory issue.

Jim_Xu · August 25, 2020, 2:28pm

hello,
Thanks for your reply!
Could you show me an classcifacation tast example which use “torch.utils.data.random_split”, I think the problem is not caused by GPU memory, because when I manually splt the dataset to training and testing dataset, the code works, but I really can access the dataset split by “torch.utils.data.random_split” now, it just show the error when the code start training

ptrblck · August 26, 2020, 1:35am

Here is a simple example how to use random_split and how to access attributes of the underlying dataset:

dataset = datasets.MNIST(
    root=PATH, download=False,
    transform=transforms.ToTensor())

train_dataset, val_dataset = torch.utils.data.random_split(
    dataset, [len(dataset) - 10000, 10000])

train_loader = DataLoader(train_dataset)
val_loader = DataLoader(val_dataset)

# access underlying target directly in `datasets.MNIST`
print(dataset.targets)
# access it via .dataset in the Subset (created by random_split)
print(train_dataset.dataset.targets)
# access it via .dataset.dataset in the DataLoader
print(train_loader.dataset.dataset.targets)

Jim_Xu · August 27, 2020, 4:17pm

Oh, you’re so kind !! I will make a try