Train
Here’s an example for training a LMM using Phi-2.
Replace data paths with yours in
scripts/train/train_phi.shReplace
output_dirwith yours inscripts/train/pretrain.shReplace
pretrained_model_pathandoutput_dirwith yours inscripts/train/finetune.shAdjust your GPU ids (localhost) and
per_device_train_batch_sizeinscripts/train/pretrain.shandscripts/train/finetune.sh
bash scripts/train/train_phi.sh
Important hyperparameters used in pretraining and finetuning are provided below.
Training Stage |
Global Batch Size |
Learning rate |
conv_version |
|---|---|---|---|
Pretraining |
256 |
1e-3 |
pretrain |
Finetuning |
128 |
2e-5 |
phi |
Tips:
Global Batch Size = num of GPUs *
per_device_train_batch_size*gradient_accumulation_steps, we recommand you always keep global batch size and learning rate as above except for lora tuning your model.conv_versionis a hyperparameter used for choosing different chat templates for different LLMs. In the pretraining stage,conv_versionis the same for all LLMs, usingpretrain. In the finetuning stage, we usephifor Phi-2, StableLM, Qwen-1.5llamafor TinyLlama, OpenELMgemmafor Gemma