I am training a set of FF networks on tabular data. The input is a sparse matrix. Here’s the relevant code:
#training
self.data_tr = TensorDataset(
torch.tensor(train_csc.toarray(), dtype=torch.float32, device=self.device),
torch.tensor(train_pd['is_case'].values, dtype=torch.float32, device=self.device) #labels
)
#validation
self.data_va = TensorDataset(
torch.tensor(valid_csc.toarray(), dtype=torch.float32, device=self.device),
torch.tensor(valid_pd['is_case'].values, dtype=torch.float32, device=self.device) #labels
)
and used it in the training
train_ldr = DataLoader(dataset=self.data_tr, batch_size=param['bs'], shuffle=True)
for X_mb,y_mb in train_ldr:
yhat_mb = model(X_mb)
loss = criterion(yhat_mb[:,0], y_mb)
...
The dense array is being stored on the GPU and sliced as required. This runs very fast. Unfortunately, a couple of instances are so big that they do not fit on the GPU memory as required in the above approach. For those instances I have the following:
class SparseDataset(Dataset):
def __init__(self, mat_csc, label, device='cpu'):
self.dim = mat_csc.shape
self.device = torch.device(device)
csr = mat_csc.tocsr(copy=True)
self.indptr = torch.tensor(csr.indptr, dtype=torch.int64, device=self.device)
self.indices = torch.tensor(csr.indices, dtype=torch.int64, device=self.device)
self.data = torch.tensor(csr.data, dtype=torch.float32, device=self.device)
self.label = torch.tensor(label, dtype=torch.float32, device=self.device)
def __len__(self):
return self.dim[0]
def __getitem__(self, idx):
obs = torch.zeros((self.dim[1],), dtype=torch.float32, device=self.device)
ind1,ind2 = self.indptr[idx],self.indptr[idx+1]
obs[self.indices[ind1:ind2]] = self.data[ind1:ind2]
return obs,self.label[idx]
instantiated as
self.data_tr = SparseDataset(train_csc, train_pd['is_case'].values, device)
self.data_va = SparseDataset(valid_csc, valid_pd['is_case'].values, device)
and used as
train_ldr = DataLoader(dataset=self.data_tr, batch_size=param['bs'], shuffle=True, collate_fn=my_collate)
for X_mb,y_mb in train_ldr:
yhat_mb = model(X_mb)
loss = criterion(yhat_mb[:,0], y_mb)
....
While this is VERY memory efficient, even on my smallest instance (which fits in the memory), it is 20 times slower than the first approach. I am looking for ideas to make this faster. Thx.