Tag: llm

3 articles

· llm / cost-optimization

LLM cost routing: when Haiku beats Opus and when it does not

Routing 1M classification tokens from Opus 4.7 to Haiku 4.5 saves $6.00 — 80% reduction. Here is the task taxonomy, the latency case, and the tools to implement it.

· llm / fine-tuning

How to fine-tune a small LLM in 2026 (LoRA on a laptop)

Fine-tune Llama 3.1 8B with QLoRA on a consumer GPU — pinned Unsloth install, exact training config, GGUF export to Ollama, and eight failure modes.

· ollama / lm-studio

Ollama vs LM Studio on Mac — which survives daily use?

LM Studio wins on throughput and memory. Ollama wins on time-to-first-token and CLI setup. Here is when each choice makes sense on Apple Silicon.