Python / Python Modern Generative AI and Agents Interview Questions
What is LoRA and how does the Hugging Face PEFT library simplify fine-tuning large models?
Fine-tuning all parameters of a 7B model requires enormous compute and memory. LoRA (Low-Rank Adaptation) sidesteps this by keeping the original pretrained weights frozen and injecting small trainable rank decomposition matrices into each layer. For a weight matrix W ∈ ℝ^{d×k}, LoRA adds ΔW = BA where B ∈ ℝ^{d×r} and A ∈ ℝ^{r×k} with rank r ≪ min(d,k). Only A and B are trained, reducing trainable parameters by 100–10,000×.
The Hugging Face PEFT (Parameter-Efficient Fine-Tuning) library wraps any transformers model with LoRA (or other methods like Prefix Tuning, IA3) and integrates with the Trainer API for a complete fine-tuning workflow. QLoRA combines 4-bit quantisation with LoRA, enabling fine-tuning a 7B model on a single 24 GB GPU.
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from peft import LoraConfig, get_peft_model, TaskType
import torch
model_id = 'mistralai/Mistral-7B-v0.1'
tokenizer = AutoTokenizer.from_pretrained(model_id)
# Load in 4-bit for QLoRA
bnb_config = BitsAndBytesConfig(
load_in_4bit=True, bnb_4bit_compute_dtype=torch.bfloat16,
bnb_4bit_quant_type='nf4',
)
model = AutoModelForCausalLM.from_pretrained(
model_id, quantization_config=bnb_config, device_map='auto'
)
# Prepare for k-bit training
from peft import prepare_model_for_kbit_training
model = prepare_model_for_kbit_training(model)
# LoRA configuration
lora_config = LoraConfig(
task_type=TaskType.CAUSAL_LM,
r=16, # rank: lower = fewer params = faster, less expressive
lora_alpha=32, # scaling factor (typically 2*r)
lora_dropout=0.05,
target_modules=[ # which weight matrices to add LoRA to
'q_proj', 'k_proj', 'v_proj', 'o_proj',
'gate_proj', 'up_proj', 'down_proj',
],
)
model = get_peft_model(model, lora_config)
model.print_trainable_parameters()
# trainable params: 83,886,080 || all params: 7,325,491,200 || trainable%: 1.1%
# Save LoRA adapter only (not the full model)
model.save_pretrained('./lora-adapter')
# Load and merge for inference
from peft import PeftModel
base = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.bfloat16)
merged = PeftModel.from_pretrained(base, './lora-adapter').merge_and_unload()
Invest now in Acorns!!! 🚀
Join Acorns and get your $5 bonus!
Acorns is a micro-investing app that automatically invests your "spare change" from daily purchases into diversified, expert-built portfolios of ETFs. It is designed for beginners, allowing you to start investing with as little as $5. The service automates saving and investing. Disclosure: I may receive a referral bonus.
Invest now!!! Get Free equity stock (US, UK only)!
Use Robinhood app to invest in stocks. It is safe and secure. Use the Referral link to claim your free stock when you sign up!.
The Robinhood app makes it easy to trade stocks, crypto and more.
Webull! Receive free stock by signing up using the link: Webull signup.
More Related questions...
