Tag: llm
3 articles
· llm / cost-optimization
LLM cost routing: when Haiku beats Opus and when it does not
Routing 1M classification tokens from Opus 4.7 to Haiku 4.5 saves $6.00 — 80% reduction. Here is the task taxonomy, the latency case, and the tools to implement it.
· llm / fine-tuning
How to fine-tune a small LLM in 2026 (LoRA on a laptop)
Fine-tune Llama 3.1 8B with QLoRA on a consumer GPU — pinned Unsloth install, exact training config, GGUF export to Ollama, and eight failure modes.
· ollama / lm-studio
Ollama vs LM Studio on Mac — which survives daily use?
LM Studio wins on throughput and memory. Ollama wins on time-to-first-token and CLI setup. Here is when each choice makes sense on Apple Silicon.