← Back to Blog

Fine-Tune LLaMA 3 with QLoRA on a Single GPU

Large Language Models like LLaMA 3 have revolutionized what's possible with AI. But fine-tuning a 70B parameter model seems impossible without a cluster of A100 GPUs, right? Wrong.

With QLoRA (Quantized Low-Rank Adaptation), you can fine-tune LLaMA 3 on a single GPU with as little as 24GB VRAM. This guide walks you through the entire process.

What You'll Learn

  • How QLoRA reduces memory requirements by 75%+
  • Dataset preparation for instruction fine-tuning
  • Training configuration and hyperparameter tuning
  • Evaluation and benchmarking your fine-tuned model
  • Deploying with vLLM for production inference

Prerequisites

  • Python 3.10+ with PyTorch 2.x
  • A GPU with 24GB+ VRAM (RTX 4090, A5000, or Colab A100)
  • Hugging Face account with LLaMA 3 access

Step 1: Install Dependencies

pip install torch transformers peft bitsandbytes
pip install trl datasets accelerate
pip install flash-attn --no-build-isolation

Step 2: Load Model in 4-bit Quantization

from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_use_double_quant=True,
)

model = AutoModelForCausalLM.from_pretrained(
    "meta-llama/Meta-Llama-3-8B",
    quantization_config=bnb_config,
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Meta-Llama-3-8B")

This loads the 8B model using only ~6GB VRAM instead of the usual 16GB+.

Step 3: Configure LoRA

from peft import LoraConfig, get_peft_model

lora_config = LoraConfig(
    r=16,
    lora_alpha=32,
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj"],
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM",
)

model = get_peft_model(model, lora_config)
model.print_trainable_parameters()
# trainable params: 13,631,488 (0.16% of 8B)
Want hands-on help with this? Book a 1-on-1 session and we'll fine-tune a model together on your dataset.

This is just the beginning — the full article covers training, evaluation, merging adapters, and deploying with vLLM.

Ready for the Full Deep Dive?

This article is part of the LLM Fine-Tuning Workshop course, which includes 18 lessons, Colab notebooks, and 14 hours of content.

Need Help Fine-Tuning Your Model?

Book a session and I'll help you fine-tune on your specific dataset.

Book a Session →