LLaMA Factory concepts
WebUI, training parameters, tuning algorithms, distributed training, quantization, and inference — the LLaMA Factory cheat-sheet
Before diving into the single- and multi-node experiments, it helps to be familiar with these core concepts.
WebUI
LLaMA Factory exposes a zero-code WebUI for fine-tuning. Run llamafactory-cli webui to launch it. The UI has four panels: Train / Eval & Predict / Chat / Export.
- Train — configure model path, training stage, tuning method, dataset, learning rate, epochs, output dir
- Eval & Predict — set data path, truncation length, top-p, temperature, output dir
- Chat — pick inference engine and dtype, then chat with the model interactively
- Export — set max shard size, quantization level, export device and target dir, then click Export
Data handling
LLaMA Factory supports Alpaca and ShareGPT dataset formats. To use a custom dataset, you must register it in dataset_info.json, which holds all preprocessed local datasets and online dataset definitions.
Training methods
Pre-training and post-training are both supported. Post-training techniques include:
- Supervised Fine-Tuning (SFT)
- RLHF (Reinforcement Learning from Human Feedback)
- DPO (Direct Preference Optimization)
- KTO (Kahneman–Tversky Optimization)
Training parameters
| Name | Description |
|---|---|
model_name_or_path | Model name or path |
stage | Training stage: rm / pt / sft / PPO / DPO / KTO / ORPO |
do_train | true to train, false to evaluate |
finetuning_type | Tuning method: freeze / lora / full |
lora_target | LoRA target modules — default all |
dataset | Dataset(s) — comma-separated for multiple |
template | Dataset template — must match the model |
output_dir | Output path |
logging_steps | Log emission interval |
save_steps | Checkpoint save interval |
overwrite_output_dir | Allow overwriting the output directory |
per_device_train_batch_size | Per-device training batch size |
gradient_accumulation_steps | Gradient accumulation steps |
max_grad_norm | Gradient clipping threshold |
learning_rate | Learning rate |
lr_scheduler_type | LR schedule: linear / cosine / polynomial / constant |
num_train_epochs | Number of epochs |
bf16 | Whether to use bf16 |
warmup_ratio | LR warm-up ratio |
warmup_steps | LR warm-up steps |
push_to_hub | Push the model to Hugging Face Hub |
Training acceleration
Supported acceleration techniques: FlashAttention, Unsloth, Liger Kernel. Toggle them via the training config.
Tuning algorithms
- Full Parameter Fine-tuning
- Freeze
- LoRA
- GaLore
- BAdam
Distributed training
Single-node multi-GPU and multi-node multi-GPU distributed training are supported with three engines:
- DDP (PyTorch DistributedDataParallel)
- DeepSpeed (ZeRO stages)
- FSDP (Fully Sharded Data Parallel)
Merge
After training a LoRA adapter on top of a pre-trained model, you can merge and export the base model + LoRA adapter into a single standalone model. The merge can optionally include quantization, so each inference call doesn't have to load both pieces separately.
Quantization
Quantization compresses precision to reduce VRAM usage and accelerate inference. Supported methods:
- AQLM
- AWQ
- GPTQ
- QLoRA
Inference
Supported inference modes:
- Original-model inference
- Fine-tuned-model inference config
- Multimodal-model inference
- Batch inference
General-capability evaluation
After training, evaluate model effectiveness with:
llamafactory-cli eval examples/train_lora/llama3_lora_eval.yamlNLG evaluation
Get BLEU / ROUGE scores for generation quality:
llamafactory-cli train examples/extras/nlg_eval/llama3_lora_predict.yamlExperiment tracking
LLaMA Factory integrates with several training-visualization tools:
- TensorBoard
- Wandb
- MLflow
- SwanLab (used in the experiments later in this section)
Toggle these in the WebUI under Other parameter settings → Enable external logging panel.
License: please respect LLaMA Factory's licensing terms — see LLaMA-Factory Apache-2.0 license.
Last updated on
