LLaMA Factory concepts

WebUI, training parameters, tuning algorithms, distributed training, quantization, and inference — the LLaMA Factory cheat-sheet

Before diving into the single- and multi-node experiments, it helps to be familiar with these core concepts.

WebUI

LLaMA Factory exposes a zero-code WebUI for fine-tuning. Run llamafactory-cli webui to launch it. The UI has four panels: Train / Eval & Predict / Chat / Export.

Train — configure model path, training stage, tuning method, dataset, learning rate, epochs, output dir
Eval & Predict — set data path, truncation length, top-p, temperature, output dir
Chat — pick inference engine and dtype, then chat with the model interactively
Export — set max shard size, quantization level, export device and target dir, then click Export

LLaMA Factory supports Alpaca and ShareGPT dataset formats. To use a custom dataset, you must register it in dataset_info.json, which holds all preprocessed local datasets and online dataset definitions.

Training methods

Pre-training and post-training are both supported. Post-training techniques include:

Supervised Fine-Tuning (SFT)
RLHF (Reinforcement Learning from Human Feedback)
DPO (Direct Preference Optimization)
KTO (Kahneman–Tversky Optimization)

Training parameters

Name	Description
`model_name_or_path`	Model name or path
`stage`	Training stage: `rm` / `pt` / `sft` / `PPO` / `DPO` / `KTO` / `ORPO`
`do_train`	`true` to train, `false` to evaluate
`finetuning_type`	Tuning method: `freeze` / `lora` / `full`
`lora_target`	LoRA target modules — default `all`
`dataset`	Dataset(s) — comma-separated for multiple
`template`	Dataset template — must match the model
`output_dir`	Output path
`logging_steps`	Log emission interval
`save_steps`	Checkpoint save interval
`overwrite_output_dir`	Allow overwriting the output directory
`per_device_train_batch_size`	Per-device training batch size
`gradient_accumulation_steps`	Gradient accumulation steps
`max_grad_norm`	Gradient clipping threshold
`learning_rate`	Learning rate
`lr_scheduler_type`	LR schedule: `linear` / `cosine` / `polynomial` / `constant`
`num_train_epochs`	Number of epochs
`bf16`	Whether to use bf16
`warmup_ratio`	LR warm-up ratio
`warmup_steps`	LR warm-up steps
`push_to_hub`	Push the model to Hugging Face Hub

Training acceleration

Supported acceleration techniques: FlashAttention, Unsloth, Liger Kernel. Toggle them via the training config.

Tuning algorithms

Full Parameter Fine-tuning
Freeze
LoRA
GaLore
BAdam

Distributed training

Single-node multi-GPU and multi-node multi-GPU distributed training are supported with three engines:

DDP (PyTorch DistributedDataParallel)
DeepSpeed (ZeRO stages)
FSDP (Fully Sharded Data Parallel)

Merge

After training a LoRA adapter on top of a pre-trained model, you can merge and export the base model + LoRA adapter into a single standalone model. The merge can optionally include quantization, so each inference call doesn't have to load both pieces separately.

Quantization

Quantization compresses precision to reduce VRAM usage and accelerate inference. Supported methods:

AQLM
AWQ
GPTQ
QLoRA

Inference

Supported inference modes:

Original-model inference
Fine-tuned-model inference config
Multimodal-model inference
Batch inference

General-capability evaluation

After training, evaluate model effectiveness with:

llamafactory-cli eval examples/train_lora/llama3_lora_eval.yaml

NLG evaluation

Get BLEU / ROUGE scores for generation quality:

llamafactory-cli train examples/extras/nlg_eval/llama3_lora_predict.yaml

Experiment tracking

LLaMA Factory integrates with several training-visualization tools:

TensorBoard
Wandb
MLflow
SwanLab (used in the experiments later in this section)

Toggle these in the WebUI under Other parameter settings → Enable external logging panel.

License: please respect LLaMA Factory's licensing terms — see LLaMA-Factory Apache-2.0 license.