Tag: llm

14 articles

Jun 11, 2026 · ai / claude

Claude Sonnet 4 for developers — what changed from Claude 3

Sonnet 4 is a reliability upgrade for agentic work, not a raw benchmark jump. What changed in the API, where reward hacking dropped 69%, and whether to upgrade now.

Jun 11, 2026 · ai-tools / llm

Context engineering in 2026 — six patterns that work

Context engineering decides what your model sees at inference. Six patterns with code: ordering, caching, compaction, sub-agent isolation, and more.

Jun 8, 2026 · cloudflare / cloudflare-workers

How to Set Up Cloudflare Workers AI: Step-by-Step Guide

Run inference at the edge with Workers AI: scaffold a Worker, bind the AI, call models, stream SSE, generate images. Includes pricing and rate limits.

Jun 8, 2026 · nextjs / llm

How to stream LLM responses in Next.js with the Vercel AI SDK

Stream LLM responses token-by-token in Next.js using AI SDK v6. Covers route handler, client hook, and the two Vercel timeout traps most tutorials skip.

Jun 5, 2026 · llm / api

GitHub Models 2026 — free LLM API for developers reviewed

We tested GitHub Models' free-tier LLM API: rate limits, OpenAI compatibility, and whether 150 calls a day is enough for a real side project.

Jun 4, 2026 · ai-tools / llm

Prompt caching in 2026 — Anthropic, OpenAI, and Gemini compared

Prompt caching cuts costs 90%. Anthropic requires explicit markers, OpenAI caches automatically, Gemini bills hourly. Here is which one fits your workload.

Jun 4, 2026 · llm / openai

LLM structured outputs: JSON mode, function calling, and Zod

Grammar-constrained sampling is the only reliable LLM primitive. How OpenAI, Anthropic, Zod, and Vercel AI SDK v6 compare — and where each still fails you.

Jun 4, 2026 · claude / anthropic

Claude API 2026: Prompt Caching, Tool Use & Batches

A practical guide to the three Claude API features that separate toy prototypes from production integrations: prompt caching, tool use, and Message Batches API.

Jun 4, 2026 · openrouter / llm

OpenRouter vs direct API — when the gateway pays off

OpenRouter wins for multi-model projects and automatic failover. Direct API wins at high volume or for compliance-critical workloads. Here is how to decide.

Jun 4, 2026 · editors / zed

Zed AI in 2026 — how the built-in LLM features stack up

Zed AI is fast and private but lacks codebase indexing — behind Cursor on unfamiliar repos. Worth it if editor speed and BYOK matter more than semantic search.

May 30, 2026 · prompt-engineering / llm

Best prompt engineering tools for LLM apps in 2026

PromptLayer for PM-owned prompts, LangSmith for LangChain stacks, Braintrust for eval-first teams. Persona-grouped breakdown of 8 LLM tools, 2026.

May 17, 2026 · llm / cost-optimization

LLM cost routing: when Haiku beats Opus and when it does not

Routing 1M classification tokens from Opus 4.7 to Haiku 4.5 saves $6.00 — 80% reduction. Here is the task taxonomy, the latency case, and the tools to implement it.

May 17, 2026 · llm / fine-tuning

How to fine-tune a small LLM in 2026 (LoRA on a laptop)

Fine-tune Llama 3.1 8B with QLoRA on a consumer GPU — pinned Unsloth install, exact training config, GGUF export to Ollama, and eight failure modes.

May 16, 2026 · ollama / lm-studio

Ollama vs LM Studio on Mac — which survives daily use?

LM Studio wins on throughput and memory. Ollama wins on time-to-first-token and CLI setup. Here is when each choice makes sense on Apple Silicon.