Train

Here’s an example for training a LMM using Phi-2.

  • Replace data paths with yours in scripts/train/train_phi.sh

  • Replace output_dir with yours in scripts/train/pretrain.sh

  • Replace pretrained_model_path and output_dir with yours in scripts/train/finetune.sh

  • Adjust your GPU ids (localhost) and per_device_train_batch_size in scripts/train/pretrain.sh and scripts/train/finetune.sh

bash scripts/train/train_phi.sh

Important hyperparameters used in pretraining and finetuning are provided below.

Training Stage

Global Batch Size

Learning rate

conv_version

Pretraining

256

1e-3

pretrain

Finetuning

128

2e-5

phi

Tips:

  • Global Batch Size = num of GPUs * per_device_train_batch_size * gradient_accumulation_steps, we recommand you always keep global batch size and learning rate as above except for lora tuning your model.

  • conv_version is a hyperparameter used for choosing different chat templates for different LLMs. In the pretraining stage, conv_version is the same for all LLMs, using pretrain. In the finetuning stage, we use

    • phi for Phi-2, StableLM, Qwen-1.5

    • llama for TinyLlama, OpenELM

    • gemma for Gemma