Can't create tensors as shown in first Captum Titanic tutorial?!

rbelew · August 13, 2024, 4:26pm

I’m stumped by the simplest part of the most basic “Titanic Basic” captum tutorial: converting the data into tensors?!

After getting the data and performing the first preprocessing steps,
converting to numpy arrays and separating out train and test sets works fine:

	data = titanic_data.to_numpy()

	train_indices = np.random.choice(len(labels), int(0.7*len(labels)), replace=False)
	test_indices = list(set(range(len(labels))) - set(train_indices))
	train_features = data[train_indices]
	train_labels = labels[train_indices]
	test_features = data[test_indices]
	test_labels = labels[test_indices]

but converting to tensors doesn’t work:

	File ".../Titanic_Basic_Interpret.py", line 139, in <module>
	input_tensor = torch.as_tensor(train_features,dtype=torch.float32)
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	TypeError: can't convert np.ndarray of type numpy.object_. The only supported types are: float64, float32, float16, complex64, complex128, int64, int32, int16, int8, uint8, and bool.

Maybe the problem is “no more magic, convert_objects has been
deprecated in pandas 0.17” ? It seems the tutorial was added back in 2019, but other issues seem to have used it more recently?

I’ve tried some of the suggestions there (building a separate dictionary of data types and then data = data.astype(dtype=dtypeDict), converting each column separately:

	for c in titanic_data.columns:
	    titanic_data[c] = pd.to_numeric(titanic_data[c])

but these don’t go thru either. What could the issue be?!

ptrblck · August 13, 2024, 9:31pm

Your np.array is an object, which cannot be transformed to a tensor directly.
This can be the case if your array contains different incompatible types:

a = np.array([1.0, tuple((1, 1))])
print(a.dtype)
# object

You could try to transform the array to a numerical dtype first via .astype() before transforming it to a tensor.

rbelew · August 13, 2024, 11:24pm

hi @ptrblck thanks so much. that is indeed the problem.

I’ve tried two of the suggestions mentioned in that SO post: (https://stackoverflow.com/a/21197863:

building a separate dictionary of data types and then data = data.astype(dtype=dtypeDict), as suggested here:
python - Assign pandas dataframe column dtypes - Stack Overflow

 dtypeDict = {}
 # numpy _internal _useffields() requires "names" and "formats" ?
 # dtypeDict = {"names": [], "formats":[]}

 for k,v in titanic_data.dtypes.items(): 
 	dtypeDict[k] = v # or v.name?
 	# or ala _useffields
 	# dtypeDict['names'].append(k)
 	# dtypeDict['formats'].append(v)
 data = data.astype(dtype=dtypeDict)

converting each column separately, as the first answer here:
python - Assign pandas dataframe column dtypes - Stack Overflow
```
 for c in titanic_data.columns:
     titanic_data[c] = pd.to_numeric(titanic_data[c])
```

but these don’t go thru either. How am I supposed to do this? (And
why am I the one bumping into this issue on a captum tutorial from
2019?!)

rbelew · August 14, 2024, 3:24pm

way simpler than I’d imagined: data = data.astype(np.float32) is all that was required!