I attempt to train SegFormer model which I followed the Roboflow tutorial on my local machine M1 pro: How To Train SegFormer on a Custom Dataset for Computer Vision - YouTube
I searched for the error on Google but I have no clue how to solve this problem which seems like on-going problem/bug.
Error message:
- decode_head.classifier.weight: found shape torch.Size([150, 256, 1, 1]) in the checkpoint and torch.Size([3, 256, 1, 1]) in the model instantiated
- decode_head.classifier.bias: found shape torch.Size([150]) in the checkpoint and torch.Size([3]) in the model instantiated
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
load_metric is deprecated and will be removed in the next major version of datasets. Use 'evaluate.load' instead, from the new library 🤗 Evaluate: https://huggingface.co/docs/evaluate
GPU available: True (mps), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
| Name | Type | Params
-----------------------------------------------------------
0 | model | SegformerForSemanticSegmentation | 3.7 M
-----------------------------------------------------------
3.7 M Trainable params
0 Non-trainable params
3.7 M Total params
14.860 Total estimated model params size (MB)
Sanity Checking: 0it [00:00, ?it/s]The dataloader, val_dataloader 0, does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` (try 10 which is the number of cpus on this machine) in the `DataLoader` init to improve performance.
Sanity Checking DataLoader 0: 100%|████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 5.57it/s]invalid value encountered in divide
Traceback (most recent call last):
File "/Users/tonggihkang/test_2.py", line 296, in <module>
trainer.fit(segformer_finetuner)
File "/Users/tonggihkang/ML/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 696, in fit
self._call_and_handle_interrupt(
File "/Users/tonggihkang/ML/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 650, in _call_and_handle_interrupt
return trainer_fn(*args, **kwargs)
File "/Users/tonggihkang/ML/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 735, in _fit_impl
results = self._run(model, ckpt_path=self.ckpt_path)
File "/Users/tonggihkang/ML/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1166, in _run
results = self._run_stage()
File "/Users/tonggihkang/ML/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1252, in _run_stage
return self._run_train()
File "/Users/tonggihkang/ML/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1274, in _run_train
self._run_sanity_check()
File "/Users/tonggihkang/ML/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1343, in _run_sanity_check
val_loop.run()
File "/Users/tonggihkang/ML/lib/python3.10/site-packages/pytorch_lightning/loops/loop.py", line 207, in run
output = self.on_run_end()
File "/Users/tonggihkang/ML/lib/python3.10/site-packages/pytorch_lightning/loops/dataloader/evaluation_loop.py", line 183, in on_run_end
self._evaluation_epoch_end(self._outputs)
File "/Users/tonggihkang/ML/lib/python3.10/site-packages/pytorch_lightning/loops/dataloader/evaluation_loop.py", line 293, in _evaluation_epoch_end
self.trainer._call_lightning_module_hook(hook_name, output_or_outputs)
File "/Users/tonggihkang/ML/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1550, in _call_lightning_module_hook
output = fn(*args, **kwargs)
File "/Users/tonggihkang/test_2.py", line 191, in validation_epoch_end
self.log(k,v)
File "/Users/tonggihkang/ML/lib/python3.10/site-packages/pytorch_lightning/core/module.py", line 451, in log
value = apply_to_collection(value, (torch.Tensor, numbers.Number), self.__to_tensor, name)
File "/Users/tonggihkang/ML/lib/python3.10/site-packages/pytorch_lightning/utilities/apply_func.py", line 99, in apply_to_collection
return function(data, *args, **kwargs)
File "/Users/tonggihkang/ML/lib/python3.10/site-packages/pytorch_lightning/core/module.py", line 587, in __to_tensor
else torch.tensor(value, device=self.device)
TypeError: Cannot convert a MPS Tensor to float64 dtype as the MPS framework doesn't support float64. Please use float32 instead.