I’m attempting to fine-tune a Llama-3-8b model using Torchtune with a custom dataset in Alpaca-style JSON format. However, I’m encountering a TypeError when running the fine-tuning process with the command according to the Torchtune documentation.
Error message: TypeError: JsonConfig.__init__() got an unexpected keyword argument 'tokenizer'
Example Dataset Structure (df_train.json)
{
'instruction': 'Read the input and generate the summary.',
'input': 'Here is the details of my passage.',
'output': 'summarized passage.',
'text': 'Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.\n\n### Instruction:\nRead the input and generate the summary.\n\n### Input:\nHere is the details of my passage.\n\n### Response:\nsummarized passage.'
}
Modified Config File (8b_lora.yaml)
dataset:
_component_: datasets.load_dataset
path: json
data_files: /dbfs/FileStore/amtbds/df_train.json
Run command for fine-tune
!tune run --nproc_per_node 2 lora_finetune_distributed --config ./8b_lora.yaml
I’ve checked the TorchTune and datasets documentation but couldn’t find specifics about handling custom JSON with a tokenizer parameter in this context.
Question: Does anyone have insights on how to correctly configure the YAML file or modify the command to avoid this error?
Any help or pointers towards relevant documentation would be greatly appreciated!
Here is the complete error log for better understanding
/databricks/python/lib/python3.11/site-packages/transformers/utils/generic.py:441: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead. _torch_pytree._register_pytree_node( Running with torchrun... [2024-05-07 15:18:32,618] torch.distributed.run: [WARNING] [2024-05-07 15:18:32,618] torch.distributed.run: [WARNING] ***************************************** [2024-05-07 15:18:32,618] torch.distributed.run: [WARNING] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. [2024-05-07 15:18:32,618] torch.distributed.run: [WARNING] ***************************************** /databricks/python/lib/python3.11/site-packages/transformers/utils/generic.py:441: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead. _torch_pytree._register_pytree_node( /databricks/python/lib/python3.11/site-packages/transformers/utils/generic.py:441: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead. _torch_pytree._register_pytree_node( INFO:torchtune.utils.logging:Running LoRAFinetuneRecipeDistributed with resolved config: batch_size: 2 checkpointer: _component_: torchtune.utils.FullModelMetaCheckpointer checkpoint_dir: /tmp/Meta-Llama-3-8B-Instruct/original/ checkpoint_files: - consolidated.00.pth model_type: LLAMA3 output_dir: /tmp/Meta-Llama-3-8B-Instruct/ recipe_checkpoint: null dataset: _component_: datasets.load_dataset data_files: /dbfs/FileStore/amtbds/df_train.json path: json device: cuda dtype: bf16 enable_activation_checkpointing: false epochs: 2 gradient_accumulation_steps: 32 log_every_n_steps: 1 log_peak_memory_stats: false loss: _component_: torch.nn.CrossEntropyLoss lr_scheduler: _component_: torchtune.modules.get_cosine_schedule_with_warmup num_warmup_steps: 100 max_steps_per_epoch: null metric_logger: _component_: torchtune.utils.metric_logging.DiskLogger log_dir: /tmp/lora_finetune_output model: _component_: torchtune.models.llama3.lora_llama3_8b apply_lora_to_mlp: false apply_lora_to_output: false lora_alpha: 16 lora_attn_modules: - q_proj - v_proj lora_rank: 8 optimizer: _component_: torch.optim.AdamW lr: 0.0003 weight_decay: 0.01 output_dir: /tmp/lora_finetune_output resume_from_checkpoint: false seed: null shuffle: true tokenizer: _component_: torchtune.models.llama3.llama3_tokenizer path: /tmp/Meta-Llama-3-8B-Instruct/original/tokenizer.model DEBUG:torchtune.utils.logging:Setting manual seed to local seed 3594840783. Local seed is seed + rank = 3594840783 + 0 DEBUG:torchtune.utils.logging:Setting manual seed to local seed 3594840784. Local seed is seed + rank = 3594840783 + 1 Writing logs to /tmp/lora_finetune_output/log_1715095116.txt INFO:torchtune.utils.logging:FSDP is enabled. Instantiating Model on CPU for Rank 0 ... INFO:torchtune.utils.logging:Model instantiation took 21.42 secs INFO:torchtune.utils.logging:Memory Stats after model init: {'peak_memory_active': 13.327446016, 'peak_memory_alloc': 12.276764672, 'peak_memory_reserved': 14.600372224} INFO:torchtune.utils.logging:Optimizer and loss are initialized. Traceback (most recent call last): File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-f74a28a0-cb29-4f7a-8257-73f238be3d57/lib/python3.11/site-packages/recipes/lora_finetune_distributed.py", line 615, in <module> sys.exit(recipe_main()) ^^^^^^^^^^^^^ File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-f74a28a0-cb29-4f7a-8257-73f238be3d57/lib/python3.11/site-packages/torchtune/config/_parse.py", line 50, in wrapper sys.exit(recipe_main(conf)) ^^^^^^^^^^^^^^^^^ File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-f74a28a0-cb29-4f7a-8257-73f238be3d57/lib/python3.11/site-packages/recipes/lora_finetune_distributed.py", line 609, in recipe_main recipe.setup(cfg=cfg) File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-f74a28a0-cb29-4f7a-8257-73f238be3d57/lib/python3.11/site-packages/recipes/lora_finetune_distributed.py", line 228, in setup self._sampler, self._dataloader = self._setup_data( ^^^^^^^^^^^^^^^^^ File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-f74a28a0-cb29-4f7a-8257-73f238be3d57/lib/python3.11/site-packages/recipes/lora_finetune_distributed.py", line 411, in _setup_data ds = config.instantiate(cfg_dataset, tokenizer=self._tokenizer) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-f74a28a0-cb29-4f7a-8257-73f238be3d57/lib/python3.11/site-packages/torchtune/config/_instantiate.py", line 106, in instantiate return _instantiate_node(config, *args) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-f74a28a0-cb29-4f7a-8257-73f238be3d57/lib/python3.11/site-packages/torchtune/config/_instantiate.py", line 31, in _instantiate_node return _create_component(_component_, args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-f74a28a0-cb29-4f7a-8257-73f238be3d57/lib/python3.11/site-packages/torchtune/config/_instantiate.py", line 20, in _create_component return _component_(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/databricks/python/lib/python3.11/site-packages/datasets/load.py", line 2523, in load_dataset builder_instance = load_dataset_builder( ^^^^^^^^^^^^^^^^^^^^^ Traceback (most recent call last): File "/databricks/python/lib/python3.11/site-packages/datasets/load.py", line 2232, in load_dataset_builder File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-f74a28a0-cb29-4f7a-8257-73f238be3d57/lib/python3.11/site-packages/recipes/lora_finetune_distributed.py", line 615, in <module> sys.exit(recipe_main()) ^^^^^^^^^^^^^ File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-f74a28a0-cb29-4f7a-8257-73f238be3d57/lib/python3.11/site-packages/torchtune/config/_parse.py", line 50, in wrapper sys.exit(recipe_main(conf)) builder_instance: DatasetBuilder = builder_cls( ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-f74a28a0-cb29-4f7a-8257-73f238be3d57/lib/python3.11/site-packages/recipes/lora_finetune_distributed.py", line 609, in recipe_main ^^^^^^^^^^^^ File "/databricks/python/lib/python3.11/site-packages/datasets/builder.py", line 371, in __init__ recipe.setup(cfg=cfg) File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-f74a28a0-cb29-4f7a-8257-73f238be3d57/lib/python3.11/site-packages/recipes/lora_finetune_distributed.py", line 228, in setup self.config, self.config_id = self._create_builder_config( self._sampler, self._dataloader = self._setup_data( ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ File "/databricks/python/lib/python3.11/site-packages/datasets/builder.py", line 605, in _create_builder_config ^^^^^^^^^^^^^^^^^ File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-f74a28a0-cb29-4f7a-8257-73f238be3d57/lib/python3.11/site-packages/recipes/lora_finetune_distributed.py", line 411, in _setup_data builder_config = self.BUILDER_CONFIG_CLASS(**config_kwargs) ds = config.instantiate(cfg_dataset, tokenizer=self._tokenizer) ^ ^ ^ ^ ^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^TypeError^^: ^JsonConfig.__init__() got an unexpected keyword argument 'tokenizer'^ ^^^^^^^^^^^^^^^ File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-f74a28a0-cb29-4f7a-8257-73f238be3d57/lib/python3.11/site-packages/torchtune/config/_instantiate.py", line 106, in instantiate return _instantiate_node(config, *args) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-f74a28a0-cb29-4f7a-8257-73f238be3d57/lib/python3.11/site-packages/torchtune/config/_instantiate.py", line 31, in _instantiate_node return _create_component(_component_, args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-f74a28a0-cb29-4f7a-8257-73f238be3d57/lib/python3.11/site-packages/torchtune/config/_instantiate.py", line 20, in _create_component return _component_(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/databricks/python/lib/python3.11/site-packages/datasets/load.py", line 2523, in load_dataset builder_instance = load_dataset_builder( ^^^^^^^^^^^^^^^^^^^^^ File "/databricks/python/lib/python3.11/site-packages/datasets/load.py", line 2232, in load_dataset_builder builder_instance: DatasetBuilder = builder_cls( ^^^^^^^^^^^^ File "/databricks/python/lib/python3.11/site-packages/datasets/builder.py", line 371, in __init__ self.config, self.config_id = self._create_builder_config( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/databricks/python/lib/python3.11/site-packages/datasets/builder.py", line 605, in _create_builder_config builder_config = self.BUILDER_CONFIG_CLASS(**config_kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ TypeError: JsonConfig.__init__() got an unexpected keyword argument 'tokenizer' [2024-05-07 15:19:07,624] torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: 1) local_rank: 0 (pid: 22930) of binary: /local_disk0/.ephemeral_nfs/envs/pythonEnv-f74a28a0-cb29-4f7a-8257-73f238be3d57/bin/python Traceback (most recent call last): File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-f74a28a0-cb29-4f7a-8257-73f238be3d57/bin/tune", line 8, in <module> sys.exit(main()) ^^^^^^ File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-f74a28a0-cb29-4f7a-8257-73f238be3d57/lib/python3.11/site-packages/torchtune/_cli/tune.py", line 49, in main parser.run(args) File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-f74a28a0-cb29-4f7a-8257-73f238be3d57/lib/python3.11/site-packages/torchtune/_cli/tune.py", line 43, in run args.func(args) File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-f74a28a0-cb29-4f7a-8257-73f238be3d57/lib/python3.11/site-packages/torchtune/_cli/run.py", line 177, in _run_cmd self._run_distributed(args) File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-f74a28a0-cb29-4f7a-8257-73f238be3d57/lib/python3.11/site-packages/torchtune/_cli/run.py", line 88, in _run_distributed run(args) File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-f74a28a0-cb29-4f7a-8257-73f238be3d57/lib/python3.11/site-packages/torch/distributed/run.py", line 803, in run elastic_launch( File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-f74a28a0-cb29-4f7a-8257-73f238be3d57/lib/python3.11/site-packages/torch/distributed/launcher/api.py", line 135, in __call__ return launch_agent(self._config, self._entrypoint, list(args)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-f74a28a0-cb29-4f7a-8257-73f238be3d57/lib/python3.11/site-packages/torch/distributed/launcher/api.py", line 268, in launch_agent raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /local_disk0/.ephemeral_nfs/envs/pythonEnv-f74a28a0-cb29-4f7a-8257-73f238be3d57/lib/python3.11/site-packages/recipes/lora_finetune_distributed.py FAILED ------------------------------------------------------------ Failures: [1]: time : 2024-05-07_15:19:07 host : 0403-193017-nbhuvffe-10-213-69-14 rank : 1 (local_rank: 1) exitcode : 1 (pid: 22931) error_file: <N/A> traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2024-05-07_15:19:07 host : 0403-193017-nbhuvffe-10-213-69-14 rank : 0 (local_rank: 0) exitcode : 1 (pid: 22930) error_file: <N/A> traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html ============================================================