def main():
# train on the GPU or on the CPU, if a GPU is not available
device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
# our dataset has two classes only - background and person
num_classes = 2
# use our dataset and defined transformations
dataset = PennFudanDataset('PennFudanPed', get_transform(train=True))
dataset_test = PennFudanDataset('PennFudanPed', get_transform(train=False))
# split the dataset in train and test set
indices = torch.randperm(len(dataset)).tolist()
dataset = torch.utils.data.Subset(dataset, indices[:-50])
dataset_test = torch.utils.data.Subset(dataset_test, indices[-50:])
# define training and validation data loaders
data_loader = torch.utils.data.DataLoader(
dataset, batch_size=2, shuffle=True, num_workers=4,
collate_fn=utils.collate_fn)
data_loader_test = torch.utils.data.DataLoader(
dataset_test, batch_size=1, shuffle=False, num_workers=4,
collate_fn=utils.collate_fn)
# get the model using our helper function
model = get_model_instance_segmentation(num_classes)
Hi emcap,
Thanks for the reply! In my case, I also want to perform the tokenize function on the data as well. I am not sure will be possible in the way you said.
as I understand it, those are two very different goals.
splitting up a dataset into test/train/validation/etc
as alluded to in the earlier post you can do something with the example from vision
tokenization
this is a step earlier in the process, in which you are creating your dataset of tokens. Then after you have your dataset of tokens, split it up randomly…
OR
if you want to perform the tokenization as a means for splitting… just replace my example in which I split the array… and use your tokenization strategy instead.