// Topics / Production

Production

Definition

Production coverage in this archive spans 27 posts from Feb 2016 to Jul 2026 and treats production as a production discipline: evaluation loops, tool boundaries, escalation paths, and cost control. The strongest adjacent threads are ai, llm, and infrastructure. Recurring title motifs include ai, production, engineering, and kubernetes.

Key claims

The archive repeatedly argues that production only creates leverage when it is wired into an existing workflow.
Early posts lean on production and kubernetes, while newer posts lean on ai and production as constraints shifted.
This topic repeatedly intersects with ai, llm, and infrastructure, so design choices here rarely stand alone.

Practical checklist

Define quality gates up front: eval sets, guardrails, and explicit rollback criteria.
Start with the newest post to calibrate current constraints, then backtrack to older entries for first principles.
When boundary questions appear, cross-read ai and llm before committing implementation details.

Failure modes

Shipping agent behavior without hard boundaries for tools, data access, and approvals.
Optimizing for model novelty while ignoring reliability, latency, or cost drift.
Applying guidance from 2016 to 2026 without revisiting assumptions as context changed.

Suggested reading path

Start here (current state): AI Engineering Is Its Own Discipline Now
Then read (operating middle): Function Calling Patterns That Survive Production
Finish with (foundational context): Docker in Production: What We Learned Running Containers at Dropbyke

References

27 entries tagged “Production”

AI Production Governance: A Maturity Model April 23, 2026 · 4 min By mid-April 2026, the gap between teams shipping stable AI features and teams shipping chaos isn't tools—it's production governance. Here is how mature teams evaluate, deploy, and rollback. governance ai reliability

AI Security: Evolving Threats and Defenses February 23, 2026 · 7 min As of late February 2026, AI security is defined by adaptive attacks and layered, operational defenses. security ai threats

AI-Native Architecture Patterns 2026: Production Guide January 26, 2026 · 7 min Production AI architecture patterns for gateways, retrieval, evaluation, fallbacks, cost control, and ownership. architecture ai patterns

AI Video Applications in Practice January 12, 2026 · 4 min Video AI is practical for scoped workflows. This post covers what works, how to design for reliability, and where human review still matters. video ai applications

AI Incidents Don't Look Like Outages. That's the Problem. November 10, 2025 · 4 min Your AI system can return 200 OK and still be wrong, unsafe, or confidently hallucinating. Here's how to detect, contain, and learn from AI incidents -- drawing from the same IR principles that work for traditional systems. incident-management ai reliability

AI Workflow Automation: Decisions Are Cheap, Actions Are Expensive August 4, 2025 · 4 min The trick to AI workflow automation is simple: let the model decide, let deterministic code act, and never confuse the two. automation ai workflow

AI Customer Support That Doesn't Make People Hate You June 9, 2025 · 4 min Most AI support systems are built to deflect tickets. The ones that actually work are built around escalation, grounding, and the simple idea that customers aren't idiots. customer-support ai chatbot

AI Security: Same Principles, New Attack Surface April 28, 2025 · 5 min AI systems are exposed APIs with real blast radius. The threats are injection, leakage, and tool misuse. The defenses are the same ones we've always needed -- just applied to a new surface. security ai threats

Testing AI Where It Actually Runs April 14, 2025 · 6 min Offline evals are necessary but not sufficient. Here's how I test AI features in production with shadow mode, canaries, and rollback automation -- with Go code. testing ai production

Your AI System Looks Healthy. It Is Not. March 31, 2025 · 4 min Traditional monitoring will tell you your AI service is up. It won't tell you it's returning confident garbage. Here's what observability actually looks like for AI. observability ai monitoring

Reasoning Models in Production: A Practical Guide January 20, 2025 · 7 min Reasoning models are powerful but expensive and slow. Here's how I integrate them in Go services with routing, async patterns, and cost controls that actually work. reasoning o1 llm

Your AI Infrastructure Is Not Special December 9, 2024 · 4 min AI infrastructure at scale is just infrastructure. The same boring patterns -- gateways, caching, circuit breakers, budget enforcement -- solve the same boring problems. ai infrastructure scale

AI Safety Is Just Production Engineering November 11, 2024 · 5 min AI safety in production isn't a research problem. It's defense in depth, the same way cyber defense works -- layered controls, assumed breach, observable boundaries. ai safety production

Function Calling Patterns That Survive Production July 8, 2024 · 7 min Function calling is how LLMs touch real systems. Treat tools like APIs, arguments like untrusted input, and permissions like the model is an intern with root access. function-calling llm ai

Agentic Workflows: From Demo Magic to Production Reality April 1, 2024 · 6 min AI agents that can take actions are fundamentally different from chatbots. The engineering bar must match the blast radius. agents ai production

Why I Run Multiple Models in Production March 18, 2024 · 4 min Betting on a single model provider is like having a single database with no failover. Here is why multi-model is the only sane production strategy. ai architecture llm

AI Engineering Is Its Own Discipline Now January 8, 2024 · 4 min AI engineering is not ML research with a product hat. It is the discipline of making models behave in production -- and it demands its own skill set. ai-engineering career skills

LLM Observability: Your Existing Monitoring Is Not Enough August 21, 2023 · 5 min Traditional monitoring tells you the service is up. It doesn't tell you the model started confidently returning garbage last Tuesday. Here's how to actually observe LLM systems. observability llm ai

AI in Production Is Just Engineering. Treat It That Way. January 9, 2023 · 4 min ChatGPT changed expectations overnight, but shipping AI features that actually work is an engineering problem, not a model problem. ai production engineering

Your Staging Environment Is Lying to You June 3, 2019 · 5 min Staging never catches the real bugs. Here's how I learned to test in production without burning everything down. testing production feature-flags

The Boring Kubernetes Checklist That Actually Keeps Production Alive January 14, 2019 · 5 min Most Kubernetes outages come from skipping the basics. Here's the checklist I use after running clusters at the fintech startup and now at Decloud. kubernetes devops infrastructure

GraphQL in Production Is Harder Than They Tell You June 11, 2018 · 4 min After a year running GraphQL at the fintech startup, here's what the conference talks leave out. graphql api backend

Two Years of Kubernetes in Production — The Boring Parts Are the Hard Parts January 22, 2018 · 7 min Year two of running Kubernetes at the fintech startup. The panic is gone. Now it's networking, resource tuning, and all the operational grunt work nobody blogs about. kubernetes containers devops

A Year Running Kubernetes in Production — What Actually Happened January 16, 2017 · 6 min After a year of running Kubernetes in production, the wins are real but the sharp edges drew blood first. Here's what paid off, what bit us, and what I'd do differently. kubernetes containers devops

Why We Deleted 42 Grafana Panels December 12, 2016 · 3 min Most teams monitor too much and alert on the wrong things. Five metrics are enough to run a startup backend. monitoring observability devops

Building Resilient Systems: Lessons from Production Failures July 18, 2016 · 7 min Production incidents show where architecture bends and where it breaks. These lessons focus on designing for failure, limiting blast radius, and making recovery routine. reliability resilience architecture

Docker in Production: What We Learned Running Containers at Dropbyke February 8, 2016 · 8 min Running Docker in production at Dropbyke forced us to get serious about image builds, container networking, log aggregation, and security. Here is what actually worked. docker containers devops