Unclear error related to dataloader, iter() and next()

Hello
I’m new to PyTorch and trying to use a custom dataset, specifically the 1994 US census data to predict income. My problem appears when I’m trying to use dataloader in combination with iter and next. The error only gives me a number.

Here is the code I use to load and iter

train_loader = torch.utils.data.DataLoader(dataset=prosessed_train_data,batch_size=batch_size,shuffle = True)

test_loader = torch.utils.data.DataLoader(dataset=prosessed_test_data,batch_size=batch_size,shuffle = False)

print(prosessed_train_data.shape)

print(train_loader)

sample_train_loader = iter(train_loader)

a = sample_train_loader.next()

The error that pops up. It also shows the shape of the data before it goes into the dataloader.

Can you share the code where you implement & initiate prosessed_train_data? Looks like there’s some issue there, perhaps in the __getitem__ method.

#processing data
train_data = train_data.replace(" ?",np.nan).dropna() 
train_data.columns = ['age', 'workclass', 'fnlwgt', 'education', 'education_num', 'marital_status', 'occupation', 'relationship', 'race', 'sex', 'capital_gain', 'capital_loss', 'hours_per_week', 'native_country', 'income']


train_data = train_data.replace(" ?",np.nan)
test_data.columns = ['age', 'workclass', 'fnlwgt', 'education', 'education_num', 'marital_status', 'occupation', 'relationship', 'race', 'sex', 'capital_gain', 'capital_loss', 'hours_per_week', 'native_country', 'income']

def string_int_converter(dataframe):
  test_data.workclass.replace(('Private', 'Self-emp-not-inc', 'Self-emp-inc', 'Federal-gov', 'Local-gov', 'State-gov', 'Without-pay', 'Never-worked'),(1,2,3,4,5,6,7,8), inplace=True,regex = True)
  test_data.education.replace(('Bachelors', 'Some-college', '11th', 'HS-grad', 'Prof-school', 'Assoc-acdm', 'Assoc-voc', '9th', '7th-8th', '12th', 'Masters', '1st-4th', '10th', 'Doctorate', '5th-6th', 'Preschool'),(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16), inplace=True,regex = True)
  dataframe.marital_status.replace(('Married-civ-spouse', 'Divorced', 'Never-married', 'Separated', 'Widowed', 'Married-spouse-absent', 'Married-AF-spouse'),(1,2,3,4,5,6,7), inplace=True,regex = True)
  dataframe.occupation.replace(('Tech-support', 'Craft-repair', 'Other-service', 'Sales', 'Exec-managerial', 'Prof-specialty', 'Handlers-cleaners', 'Machine-op-inspct', 'Adm-clerical', 'Farming-fishing', 'Transport-moving', 'Priv-house-serv', 'Protective-serv', 'Armed-Forces'),(1,2,3,4,5,6,7,8,9,10,11,12,13,14), inplace=True,regex = True)
  dataframe.relationship.replace(('Wife', 'Own-child', 'Husband', 'Not-in-family', 'Other-relative', 'Unmarried'),(1,2,3,4,5,6), inplace=True,regex = True)
  dataframe.race.replace(('White', 'Asian-Pac-Islander', 'Amer-Indian-Eskimo', 'Other', 'Black'),(1,2,3,4,5), inplace=True,regex = True)
  dataframe.sex.replace(('Female', 'Male'),(1,2), inplace=True,regex = True)
  dataframe.native_country.replace(('United-States', 'Cambodia', 'England', 'Puerto-Rico', 'Canada', 'Germany', 'Outlying-US(Guam-USVI-etc)', 'India', 'Japan', 'Greece', 'South', 'China', 'Cuba', 'Iran', 'Honduras', 'Philippines', 'Italy', 'Poland', 'Jamaica', 'Vietnam', 'Mexico', 'Portugal', 'Ireland', 'France', 'Dominican-Republic', 'Laos', 'Ecuador', 'Taiwan', 'Haiti', 'Columbia', 'Hungary', 'Guatemala', 'Nicaragua', 'Scotland', 'Thailand', 'Yugoslavia', 'El-Salvador', 'Trinadad&Tobago', 'Peru', 'Hong', 'Holand-Netherlands'),(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41), inplace=True,regex = True)
  test_data.income.replace((' <=50K', ' >50K'),(0,1), inplace=True,regex=True)
  return test_data


prosessed_train_data = string_int_converter(train_data)

prosessed_test_data = string_int_converter(test_data)

sorry it’s kinda messy

In the future you can wrap your code in 3 backquotes ``` to make it more readable.

Your custom dataset needs to inherit the Dataset class, so it is expected that your code will get an error. Please have a look at this tutorial and adapt it to your use case. It should just mean taking your code and putting inside a proper class, where you implement __getitem__ and __len__ methods.

1 Like