Hello !
I was trying to use an EC2 to do textual inversion (for stable diffusion).
I executed the following command :
export MODEL_NAME="runwayml/stable-diffusion-v1-5"
export DATA_DIR="cropped_selfies"
accelerate launch textual_inversion.py \
--pretrained_model_name_or_path=$MODEL_NAME \
--train_data_dir=$DATA_DIR \
--learnable_property="object" \
--placeholder_token="<Lorenzo>" --initializer_token="man" \
--resolution=512 \
--train_batch_size=5 \
--gradient_accumulation_steps=1 \
--max_train_steps=2000 \
--learning_rate=0.01 --scale_lr \
--lr_scheduler="linear" \
--lr_warmup_steps=10 \
--output_dir="embeddings"
NB : I stole textual_inversion.py file from here : diffusers/textual_inversion.py at main · huggingface/diffusers · GitHub
I got as output :
The following values were not passed to `accelerate launch` and had defaults used instead:
`--num_processes` was set to a value of `1`
`--num_machines` was set to a value of `1`
`--mixed_precision` was set to a value of `'no'`
`--dynamo_backend` was set to a value of `'no'`
To avoid this warning pass in values for each of the problematic parameters or run `accelerate config`.
/home/ec2-user/.local/lib/python3.7/site-packages/accelerate/accelerator.py:233: FutureWarning: `logging_dir` is deprecated and will be removed in version 0.18.0 of 🤗 Accelerate. Use `project_dir` instead.
FutureWarning,
02/14/2023 18:26:24 - INFO - __main__ - Distributed environment: NO
Num processes: 1
Process index: 0
Local process index: 0
Device: cpu
Mixed precision type: no
{'variance_type', 'prediction_type'} was not found in config. Values will be initialized to default values.
{'scaling_factor'} was not found in config. Values will be initialized to default values.
{'conv_in_kernel', 'time_cond_proj_dim', 'conv_out_kernel', 'time_embedding_type', 'num_class_embeds', 'upcast_attention', 'resnet_time_scale_shift', 'timestep_post_act', 'use_linear_projection', 'only_cross_attention', 'mid_block_type', 'dual_cross_attention', 'class_embed_type'} was not found in config. Values will be initialized to default values.
02/14/2023 18:27:00 - INFO - __main__ - ***** Running training *****
02/14/2023 18:27:00 - INFO - __main__ - Num examples = 3400
02/14/2023 18:27:00 - INFO - __main__ - Num Epochs = 3
02/14/2023 18:27:00 - INFO - __main__ - Instantaneous batch size per device = 5
02/14/2023 18:27:00 - INFO - __main__ - Total train batch size (w. parallel, distributed & accumulation) = 5
02/14/2023 18:27:00 - INFO - __main__ - Gradient Accumulation steps = 1
02/14/2023 18:27:00 - INFO - __main__ - Total optimization steps = 2000
Steps: 0%| | 0/2000 [00:00<?, ?it/s]Traceback (most recent call last):
File "/home/ec2-user/.local/bin/accelerate", line 8, in <module>
sys.exit(main())
File "/home/ec2-user/.local/lib/python3.7/site-packages/accelerate/commands/accelerate_cli.py", line 45, in main
args.func(args)
File "/home/ec2-user/.local/lib/python3.7/site-packages/accelerate/commands/launch.py", line 1097, in launch_command
simple_launcher(args)
File "/home/ec2-user/.local/lib/python3.7/site-packages/accelerate/commands/launch.py", line 552, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/usr/bin/python3', 'textual_inversion.py', '--pretrained_model_name_or_path=runwayml/stable-diffusion-v1-5', '--train_data_dir=cropped_selfies', '--learnable_property=object', '--placeholder_token=<Lorenzo>', '--initializer_token=man', '--resolution=512', '--train_batch_size=5', '--gradient_accumulation_steps=1', '--max_train_steps=2000', '--learning_rate=0.01', '--scale_lr', '--lr_scheduler=linear', '--lr_warmup_steps=10', '--output_dir=embeddings']' died with <Signals.SIGKILL: 9>.
I got no idea where the problem comes from. Any ideas ?