Alaya NeW Cloud

LLaMA Factory concepts

WebUI, training parameters, tuning algorithms, distributed training, quantization, and inference — the LLaMA Factory cheat-sheet

Before diving into the single- and multi-node experiments, it helps to be familiar with these core concepts.

WebUI

LLaMA Factory exposes a zero-code WebUI for fine-tuning. Run llamafactory-cli webui to launch it. The UI has four panels: Train / Eval & Predict / Chat / Export.

  • Train — configure model path, training stage, tuning method, dataset, learning rate, epochs, output dir
  • Eval & Predict — set data path, truncation length, top-p, temperature, output dir
  • Chat — pick inference engine and dtype, then chat with the model interactively
  • Export — set max shard size, quantization level, export device and target dir, then click Export

Data handling

LLaMA Factory supports Alpaca and ShareGPT dataset formats. To use a custom dataset, you must register it in dataset_info.json, which holds all preprocessed local datasets and online dataset definitions.

Training methods

Pre-training and post-training are both supported. Post-training techniques include:

  • Supervised Fine-Tuning (SFT)
  • RLHF (Reinforcement Learning from Human Feedback)
  • DPO (Direct Preference Optimization)
  • KTO (Kahneman–Tversky Optimization)

Training parameters

NameDescription
model_name_or_pathModel name or path
stageTraining stage: rm / pt / sft / PPO / DPO / KTO / ORPO
do_traintrue to train, false to evaluate
finetuning_typeTuning method: freeze / lora / full
lora_targetLoRA target modules — default all
datasetDataset(s) — comma-separated for multiple
templateDataset template — must match the model
output_dirOutput path
logging_stepsLog emission interval
save_stepsCheckpoint save interval
overwrite_output_dirAllow overwriting the output directory
per_device_train_batch_sizePer-device training batch size
gradient_accumulation_stepsGradient accumulation steps
max_grad_normGradient clipping threshold
learning_rateLearning rate
lr_scheduler_typeLR schedule: linear / cosine / polynomial / constant
num_train_epochsNumber of epochs
bf16Whether to use bf16
warmup_ratioLR warm-up ratio
warmup_stepsLR warm-up steps
push_to_hubPush the model to Hugging Face Hub

Training acceleration

Supported acceleration techniques: FlashAttention, Unsloth, Liger Kernel. Toggle them via the training config.

Tuning algorithms

  • Full Parameter Fine-tuning
  • Freeze
  • LoRA
  • GaLore
  • BAdam

Distributed training

Single-node multi-GPU and multi-node multi-GPU distributed training are supported with three engines:

  • DDP (PyTorch DistributedDataParallel)
  • DeepSpeed (ZeRO stages)
  • FSDP (Fully Sharded Data Parallel)

Merge

After training a LoRA adapter on top of a pre-trained model, you can merge and export the base model + LoRA adapter into a single standalone model. The merge can optionally include quantization, so each inference call doesn't have to load both pieces separately.

Quantization

Quantization compresses precision to reduce VRAM usage and accelerate inference. Supported methods:

  • AQLM
  • AWQ
  • GPTQ
  • QLoRA

Inference

Supported inference modes:

  • Original-model inference
  • Fine-tuned-model inference config
  • Multimodal-model inference
  • Batch inference

General-capability evaluation

After training, evaluate model effectiveness with:

llamafactory-cli eval examples/train_lora/llama3_lora_eval.yaml

NLG evaluation

Get BLEU / ROUGE scores for generation quality:

llamafactory-cli train examples/extras/nlg_eval/llama3_lora_predict.yaml

Experiment tracking

LLaMA Factory integrates with several training-visualization tools:

  • TensorBoard
  • Wandb
  • MLflow
  • SwanLab (used in the experiments later in this section)

Toggle these in the WebUI under Other parameter settings → Enable external logging panel.

License: please respect LLaMA Factory's licensing terms — see LLaMA-Factory Apache-2.0 license.

Last updated on

Was this page helpful?

On this page