// Topics / LLM

LLM

Definition

LLM coverage in this archive spans 35 posts from Jan 2023 to Apr 2026 and treats llm as a production discipline: evaluation loops, tool boundaries, escalation paths, and cost control. The strongest adjacent threads are ai, go, and architecture. Recurring title motifs include ai, production, llm, and stop.

Key claims

  • The archive repeatedly argues that llm only creates leverage when it is wired into an existing workflow.
  • Early posts lean on llm and patterns, while newer posts lean on models and production as constraints shifted.
  • This topic repeatedly intersects with ai, go, and architecture, so design choices here rarely stand alone.

Practical checklist

  • Define quality gates up front: eval sets, guardrails, and explicit rollback criteria.
  • Start with the newest post to calibrate current constraints, then backtrack to older entries for first principles.
  • When boundary questions appear, cross-read ai and go before committing implementation details.

Failure modes

  • Shipping agent behavior without hard boundaries for tools, data access, and approvals.
  • Optimizing for model novelty while ignoring reliability, latency, or cost drift.
  • Applying guidance from 2023 to 2026 without revisiting assumptions as context changed.

Suggested reading path

References

    Running AI Locally: A Practical Guide for Teams Who Care About Control Local AI is no longer a hobby project. Here's how to set it up properly: provider abstraction, versioned models, evaluation harnesses, and cloud fallback for when local isn't enough. local-ai development ollama Stop Fine-Tuning Models You Haven't Bothered to Prompt Properly Fine-tuning is the goto move for teams who skipped the basics. Most of the time, better prompts and proper retrieval solve the actual problem. fine-tuning llm ai Reasoning Models in Production: A Practical Guide Reasoning models are powerful but expensive and slow. Here's how I integrate them in Go services with routing, async patterns, and cost controls that actually work. reasoning o1 llm Picking an AI Model for Production (Late 2024) There's no best model. There's the model that fits your workload, latency budget, cost constraint, and ops tolerance. Here's how to compare them. ai models comparison AI Cost Benchmarking: What Your Bill Actually Tells You Price-per-token is the least useful number on your AI bill. Real cost benchmarking starts with your workload, not a provider's pricing page. ai cost benchmarking RAG Retrieval That Actually Works Most RAG failures are retrieval failures. Fixing them requires hybrid search, smarter chunking, query expansion, and reranking -- measured independently from generation. rag retrieval vector-search How I Actually Test LLM Features LLM outputs are non-deterministic. That doesn't mean you can't test them rigorously. Here's the layered testing approach I use in production. llm testing ai The Best Model Is the Smallest One That Works Everyone reaches for GPT-4 by default. Most production tasks don't need it. Small models are faster, cheaper, and often better when the task is well-defined. small-models llm ai Stop Stuffing Your Context Window Bigger context windows aren't an excuse to stop thinking about what goes into them. Most teams are paying for irrelevant tokens and wondering why quality degrades. context-window llm ai Function Calling Patterns That Survive Production Function calling is how LLMs touch real systems. Treat tools like APIs, arguments like untrusted input, and permissions like the model is an intern with root access. function-calling llm ai Claude 3.5 Sonnet Analysis: Cost, Coding, and Model Routing Claude 3.5 Sonnet changes model routing math for coding, cost, latency, and production AI workloads. claude anthropic ai LLM Structured Output in Go: JSON Schema, Validation, Retries How to get reliable JSON from LLMs in Go with schemas, validation, repair loops, and typed contracts. llm structured-output json LLM Prompt Caching in Go: Cut Costs Without Breaking Things Caching LLM responses is the highest-leverage optimization most teams are not doing. Here is how I implement it in Go, with real patterns for keys, invalidation, and safety. llm caching go Why I Run Multiple Models in Production Betting on a single model provider is like having a single database with no failover. Here is why multi-model is the only sane production strategy. ai architecture llm Claude 3 First Impressions: Three Models, One Decision Framework Anthropic shipped three models instead of one. That is actually the most interesting part of the release. claude anthropic llm LLM Evaluation: Stop Shipping on Vibes Your LLM feature looks great in demos and breaks in production. Here is how to build an evaluation loop that catches regressions before your users do. evaluation llm testing Architecting AI-Native Applications (Without the Delusion) The architecture of an AI-native app is fundamentally different from bolting a model onto a CRUD app. Here is how I structure them -- with code, layers, and hard-won opinions. architecture ai design Stop Paying OpenAI to Test Your Prompts Local LLMs are finally good enough for development. Use them for iteration, keep the API bills for production. llm local-development ollama AI Engineering Is Its Own Discipline Now AI engineering is not ML research with a product hat. It is the discipline of making models behave in production -- and it demands its own skill set. ai-engineering career skills Two Weeks With the Assistants API: What I Like, What I Hate I built three things with the Assistants API. One shipped, one got scrapped, and one taught me where the API's limits really are. openai assistants-api ai OpenAI DevDay Happened and I Have Opinions OpenAI DevDay was not just a product launch. It was a platform play that changes the build-vs-buy calculus for every team shipping AI features. openai ai devday LLM Security: A Field Guide for People Who Ship Things LLMs introduce security failure modes that most teams are not defending against. Prompt injection, data leakage, tool abuse, and cost attacks are real and exploitable today. security llm ai AI Technical Debt Is Eating Your Codebase (You Just Cannot See It Yet) AI features create a new species of technical debt that hides in prompts, data pipelines, and model versions. By the time you notice it, the cleanup bill is brutal. ai technical-debt engineering Agent Architecture Patterns That Actually Work in Production Most agent demos are impressive. Most agent production systems are not. Here is what separates the two. ai agents llm LLM Observability: Your Existing Monitoring Is Not Enough Traditional monitoring tells you the service is up. It doesn't tell you the model started confidently returning garbage last Tuesday. Here's how to actually observe LLM systems. observability llm ai What I Learned Building AI Features Into a Fintech Product Building AI features at a fintech infrastructure company taught me that the hard part isn't the model. It's defining quality, handling failures gracefully, and resisting the urge to ship a demo as a product. ai product-engineering fintech Your LLM Bill Is Your Own Fault Everyone's complaining about LLM costs. Almost nobody has done the basics: caching, model routing, or even measuring what they're spending per feature. ai cost-optimization llm Fine-Tuning vs. Prompting: A Decision Framework Most teams should exhaust prompting before they even think about fine-tuning. Here's how to decide which lever to pull. ai fine-tuning prompting LangChain Is the New ORM: Convenient Until It Is Not LangChain promises to simplify LLM development. Instead it adds abstraction layers you will fight against the moment your use case gets real. langchain ai llm RAG Patterns That Actually Work in Production RAG is the default architecture for grounding LLMs in private data. Here are the patterns that survive real traffic, with Go examples from production systems. rag ai llm Claude vs GPT: A User's Honest Take Anthropic's Claude takes a different approach to AI safety. Here is how it compares to GPT in practice, from someone using both daily. ai claude anthropic My First Week Building with GPT-4 GPT-4 landed and everything changed. What I learned in the first week of building with it, and the architecture decisions that followed. ai gpt-4 openai Prompt Engineering Is Not Engineering The term 'prompt engineering' oversells what is essentially clear writing. It is a useful skill, not a discipline. ai prompt-engineering llm LLM Integration Patterns That Actually Survive Production Practical patterns for integrating LLMs into real applications -- prompt management, structured outputs, caching, fallbacks, and tool use -- with Go examples. ai llm go