train code with pyTorch

stev · January 18, 2025, 2:30pm

please can you help me how to solve this error?

python  main_train_psnr.py --opt options/swinir/train_swinir_sr_classical.json
Terminal Width: 80
----project SwinIR------
----project SwinIR------
export CUDA_VISIBLE_DEVICES=0
number of GPUs is: 1
wandb: (1) Create a W&B account
wandb: (2) Use an existing W&B account
wandb: (3) Don't visualize my results
wandb: Enter your choice:
wandb: Enter your choice: 3
wandb: You chose "Don't visualize my results"
wandb: Using wandb-core as the SDK backend.  Please refer to https://wandb.me/wandb-core for more information.
wandb: Tracking run with wandb version 0.19.4
wandb: W&B syncing is set to `offline` in this directory.
wandb: Run `wandb online` or set WANDB_MODE=online to enable cloud syncing.
LogHandlers setup!
25-01-18 08:01:49.768 :   task: swinir_sr_classical_patch48_x2
  model: plain
  gpu_ids: [0]
  dist: False
  scale: 2
  n_channels: 3
  path:[
    root: superresolution
    pretrained_netG: None
    pretrained_netE: None
    task: superresolution\swinir_sr_classical_patch48_x2
    log: superresolution\swinir_sr_classical_patch48_x2
    options: superresolution\swinir_sr_classical_patch48_x2\options
    models: superresolution\swinir_sr_classical_patch48_x2\models
    images: superresolution\swinir_sr_classical_patch48_x2\images
    pretrained_optimizerG: None
  ]
  datasets:[
    train:[
      name: train_dataset
      dataset_type: sr
      dataroot_H: trainsets/hr
      dataroot_L: trainsets/lr
      H_size: 96
      dataloader_shuffle: True
      dataloader_num_workers: 16
      dataloader_batch_size: 1
      phase: train
      scale: 2
      n_channels: 3
    ]
    test:[
      name: test_dataset
      dataset_type: sr
      dataroot_H: trainsets/lr
      dataroot_L: trainsets/lr
      phase: test
      scale: 2
      n_channels: 3
    ]
  ]
  netG:[
    net_type: swinir
    upscale: 1
    in_chans: 3
    img_size: 1
    window_size: 8
    img_range: 1.0
    depths: [6, 6, 6, 6, 6, 6]
    embed_dim: 180
    num_heads: [6, 6, 6, 6, 6, 6]
    mlp_ratio: 2
    upsampler: pixelshuffle
    resi_connection: 1conv
    init_type: default
    scale: 2
  ]
  train:[
    G_lossfn_type: l1
    G_lossfn_weight: 1.0
    E_decay: 0.999
    G_optimizer_type: adam
    G_optimizer_lr: 0.0001
    G_optimizer_wd: 0
    G_optimizer_clipgrad: None
    G_optimizer_reuse: True
    G_scheduler_type: MultiStepLR
    G_scheduler_milestones: [250000, 400000, 450000, 475000, 500000]
    G_scheduler_gamma: 0.5
    G_regularizer_orthstep: None
    G_regularizer_clipstep: None
    G_param_strict: True
    E_param_strict: True
    checkpoint_test: 5000
    checkpoint_save: 5000
    checkpoint_print: 200
    F_feature_layer: 34
    F_weights: 1.0
    F_lossfn_type: l1
    F_use_input_norm: True
    F_use_range_norm: False
  ]
  opt_path: options/swinir/train_swinir_sr_classical.json
  is_train: True
  merge_bn: False
  merge_bn_startpoint: -1
  find_unused_parameters: True
  num_gpu: 1
  rank: 0
  world_size: 1

Random seed: 9144
Dataset [DatasetSR - train_dataset] is created.
25-01-18 08:01:49.785 : Number of train images: 800, iters: 800
Dataset [DatasetSR - test_dataset] is created.
C:\Users\pc\AppData\Local\Programs\Python\Python311\Lib\site-packages\timm\models\layers\__init__.py:48: FutureWarning: Importing from timm.models.layers is deprecated, please import via timm.layers
  warnings.warn(f"Importing from {__name__} is deprecated, please import via timm.layers", FutureWarning)
C:\Users\pc\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\functional.py:512: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ..\aten\src\ATen\native\TensorShape.cpp:3588.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
Pass this initialization! Initialization was done during network defination!
Pass this initialization! Initialization was done during network defination!
Training model [ModelPlain] is created.
Copying model for E ...
C:\Users\pc\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\optim\lr_scheduler.py:28: UserWarning: The verbose parameter is deprecated. Please use get_last_lr() to access the learning rate.
  warnings.warn("The verbose parameter is deprecated. Please use get_last_lr() "
Start the main training
Terminal Width: 80
Terminal Width: 80
Terminal Width: 80
Terminal Width: 80
Terminal Width: 80
Terminal Width: 80
Terminal Width: 80
Terminal Width: 80
Terminal Width: 80
Terminal Width: 80
Terminal Width: 80
Terminal Width: 80
Terminal Width: 80
Terminal Width: 80
Terminal Width: 80
Terminal Width: 80
C:\Users\pc\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\optim\lr_scheduler.py:143: UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`.  Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
  warnings.warn("Detected call of `lr_scheduler.step()` before `optimizer.step()`. "
C:\Users\pc\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\optim\lr_scheduler.py:156: UserWarning: The epoch parameter in `scheduler.step()` was not necessary and is being deprecated where possible. Please use `scheduler.step()` to step the scheduler. During the deprecation, if epoch is different from None, the closed form is used instead of the new chainable form, where available. Please open an issue if you are unable to replicate your use case: https://github.com/pytorch/pytorch/issues/new/choose.
  warnings.warn(EPOCH_DEPRECATION_WARNING, UserWarning)
C:\Users\pc\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\nn\modules\loss.py:101: UserWarning: Using a target size (torch.Size([1, 3, 0, 0])) that is different to the input size (torch.Size([1, 3, 48, 48])). This will likely lead to incorrect results due to broadcasting. Please ensure they have the same size.
  return F.l1_loss(input, target, reduction=self.reduction)
Traceback (most recent call last):
  File "C:\Users\pc\Desktop\1993\SwinIR-main\main_train_psnr.py", line 288, in <module>
    main()
  File "C:\Users\pc\Desktop\1993\SwinIR-main\main_train_psnr.py", line 231, in main
    model.optimize_parameters(current_step)
  File "C:\Users\pc\Desktop\1993\SwinIR-main\models\model_plain.py", line 170, in optimize_parameters
    G_loss = self.G_lossfn_weight * self.G_lossfn(self.E, self.H)
                                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\pc\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\pc\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\pc\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\nn\modules\loss.py", line 101, in forward
    return F.l1_loss(input, target, reduction=self.reduction)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\pc\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\nn\functional.py", line 3335, in l1_loss
    expanded_input, expanded_target = torch.broadcast_tensors(input, target)
                                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\pc\AppData\Local\Programs\Python\Python311\Lib\site-packages\torch\functional.py", line 76, in broadcast_tensors
    return _VF.broadcast_tensors(tensors)  # type: ignore[attr-defined]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: The size of tensor a (48) must match the size of tensor b (0) at non-singleton dimension 3
wandb:
wandb: You can sync this run to the cloud by running:
wandb: wandb sync C:\Users\pc\Desktop\1993\SwinIR-main\wandb\offline-run-20250118_080149-5k9qasgq
wandb: Find logs at: wandb\offline-run-20250118_080149-5k9qasgq\logs

this is json file:

{//sync test
  "task": "swinir_sr_classical_patch48_x2"     //  classical image sr for x2/x3/x4/x8. root/task/images-models-options
  , "model": "plain" // "plain" | "plain2" if two inputs
  , "gpu_ids": [0]  //[2,3,5,6]   // this doesn't work!!! please check out the readme on github.
  , "dist": true

  , "scale": 2       // 2 | 3 | 4 | 8
  , "n_channels": 3  // broadcast to "datasets", 1 for grayscale, 3 for color

  , "path": {
    "root": "superresolution"            // "denoising" | "superresolution" | "dejpeg"
    , "pretrained_netG": null      // path of pretrained model. We fine-tune X3/X4/X8 models from X2 model, so that `G_optimizer_lr` and `G_scheduler_milestones` can be halved to save time.
    , "pretrained_netE": null      // path of pretrained model
  }

  , "datasets": {
    "train": {
      "name": "train_dataset"           // just name
      , "dataset_type": "sr"         // "dncnn" | "dnpatch" | "fdncnn" | "ffdnet" | "sr" | "srmd" | "dpsr" | "plain" | "plainpatch" | "jpeg"
      , "dataroot_H": "trainsets/hr"  // path of H training dataset. DIV2K (800 training images)
      , "dataroot_L": "trainsets/lr"              // path of L training dataset

      , "H_size": 96                   // 96/144|192/384 | 128/192/256/512. LR patch size is set to 48 or 64 when compared with RCAN or RRDB.

      , "dataloader_shuffle": true
      , "dataloader_num_workers": 16
      , "dataloader_batch_size": 1  //16     // ----------------batch size------------------
      // batch size 1 | 16 | 32 | 48 | 64 | 128. Total batch size =4x8=32 in SwinIR
    }
    , "test": {                         // 這部分沒有用到~
      "name": "test_dataset"            // just name
      , "dataset_type": "sr"         // "dncnn" | "dnpatch" | "fdncnn" | "ffdnet" | "sr" | "srmd" | "dpsr" | "plain" | "plainpatch" | "jpeg"
      , "dataroot_H": "trainsets/lr"  // path of H testing dataset
      , "dataroot_L": "trainsets/lr"    // path of L testing dataset

    }
  }

  , "netG": {
    "net_type": "swinir" 
    , "upscale": 1  //2                      // 2 | 3  | 4 | 8
    , "in_chans": 3 
    , "img_size": 1  // 48                    // For fair comparison, LR patch size is set to 48 or 64 when compared with RCAN or RRDB.
    , "window_size": 8  
    , "img_range": 1.0 
    , "depths": [6, 6, 6, 6, 6, 6] 
    , "embed_dim": 180 
    , "num_heads": [6, 6, 6, 6, 6, 6]
    , "mlp_ratio": 2 
    , "upsampler": "pixelshuffle"        // "pixelshuffle" | "pixelshuffledirect" | "nearest+conv" | null
    , "resi_connection": "1conv"        // "1conv" | "3conv"

    , "init_type": "default"
  }

  , "train": {
    "G_lossfn_type": "l1"               // "l1" preferred | "l2sum" | "l2" | "ssim" | "charbonnier"
    , "G_lossfn_weight": 1.0            // default

    , "E_decay": 0.999                  // Exponential Moving Average for netG: set 0 to disable; default setting 0.999

    , "G_optimizer_type": "adam"        // fixed, adam is enough
    , "G_optimizer_lr": 1e-4            // -----------------learning rate , default: 2e-4-----------------
    , "G_optimizer_wd": 0               // weight decay, default 0
    , "G_optimizer_clipgrad": null      // unused
    , "G_optimizer_reuse": true         // 

    , "G_scheduler_type": "MultiStepLR" // "MultiStepLR" is enough
    , "G_scheduler_milestones": [250000, 400000, 450000, 475000, 500000]
    , "G_scheduler_gamma": 0.5          // ----------------- lr decrease ratio -----------------

    , "G_regularizer_orthstep": null    // unused
    , "G_regularizer_clipstep": null    // unused

    , "G_param_strict": true
    , "E_param_strict": true

    , "checkpoint_test": 5000           // 5000  for testing
    , "checkpoint_save": 5000           // 5000  for saving model
    , "checkpoint_print": 200           // for print
  }
}

ptrblck · January 18, 2025, 2:34pm

The F.l1_loss call fails and raises the error as the shapes of the input and target are not compatible.

ptrblck · January 18, 2025, 10:47pm

If the error is only raised if you change the batch size your code is most likely reshaping or viewing tensors in a wrong way. Could you check of your model’s forward method or the training loop have some reshape or view operations and post them here?