Unexplained pytorch datatype

in this program, I start with a numpy array with default initialization. The array is then used within the Pytorch Dataset class, and then the dataset is used in Dataloader. Without specifying anything, the data turns out to be tensor, dtype=torch.float64. Why?

import numpy as np
import torch
import torch.nn.functional as F
from torch import nn, optim
from torch.utils.data import Dataset
from torch.utils.data import DataLoader
from torchvision import transforms
np.random.seed(seed=115)
X = np.random.rand(10,3)
class Wine(Dataset):
    def __init__(self,X):
        self.x = X[:, 1:]
        self.y = X[:, 0]
        self.num_samples = X.shape[0]

    def __len__(self):
        return(self.num_samples)

    def __getitem__(self, index):
        return (self.x[index] , self.y[index] )
#        
WineDS = Wine(X)
#
print('size of train ds ',len(WineDS))
train_dataloader = DataLoader(WineDS, batch_size=2, shuffle=False)
#
train_features , train_labels = next(iter(train_dataloader))
print(train_features)
print('----')
print(train_labels)

It is because the default collation function is used in Dataloader if you do not provide one (using collate_fn option in the DataLoader). You can provide your own function to collate_fn if you want to handle the batch of data in some other way.

thank you; how would you change the default so that the dataloader returns 32 bit data instead of 64?

A straightforward way is to call .float() on train_features and train_labels.

train_features = train_features.float()
train_labels = train_labels.float()

If you want to do this at collation level, you can write a custom collate function and pass it to DataLoader as below.

def collate_fn(batch):
    X, y = [], []
    for data, label in batch:
        X.append(torch.tensor(data))
        y.append(torch.tensor(label))
    
    return torch.stack(X).float(), torch.stack(y).float()

...
train_dataloader = DataLoader(WineDS, batch_size=2, shuffle=False, collate_fn=collate_fn)

Something like this?

import numpy as np
import torch
import torch.nn.functional as F
from torch import nn, optim
from torch.utils.data import Dataset
from torch.utils.data import DataLoader
from torchvision import transforms
np.random.seed(seed=115)
X = np.random.rand(10,3)
class Wine(Dataset):
    def __init__(self,X):
        self.x = X[:, 1:]
        self.y = X[:, 0]
        self.num_samples = X.shape[0]

    def __len__(self):
        return(self.num_samples)

    def __getitem__(self, index):
        x = torch.tensor(self.x[index] , dtype= torch.float32)
        y = torch.tensor(self.y[index] , dtype= torch.float32) 
        return x, y
#        
WineDS = Wine(X)
#
print('size of train ds ',len(WineDS))
train_dataloader = DataLoader(WineDS, batch_size=2, shuffle=False)
#
train_features , train_labels = next(iter(train_dataloader))
print(train_features)
print(train_features.dtype)
print('----')
print(train_labels)
print(train_labels.dtype)

// size of train ds 10
// tensor([[0.7028, 0.4137],
// [0.7279, 0.1905]])
// torch.float32
// ----
// tensor([0.1961, 0.5768])
// torch.float32