GraphQL in Production: One Year Later

A year ago, we migrated our primary API from REST to GraphQL. The promise was compelling: clients request exactly what they need, strong typing, and better developer experience. We’ve now processed billions of GraphQL queries in production.

Here’s what we’ve learned.

What Worked

Client Flexibility

The primary promise delivered. Clients get exactly the data they need:

# Mobile app - minimal data for list view
query ProductList {
  products(first: 20) {
    id
    name
    thumbnailUrl
    price
  }
}

# Web app - rich data for detail view
query ProductDetail($id: ID!) {
  product(id: $id) {
    id
    name
    description
    fullImageUrl
    price
    inventory
    reviews {
      rating
      comment
      author { name }
    }
    relatedProducts {
      id
      name
      thumbnailUrl
    }
  }
}

Mobile apps use less bandwidth. Web apps get rich data. Same API.

Type Safety

GraphQL’s type system caught errors early:

type Product {
  id: ID!
  name: String!
  price: Float!
  description: String
  inventory: Int!
}

TypeScript clients generated from schema ensure type safety end-to-end. Breaking changes are detected at build time, not runtime.

Documentation

Schema is documentation:

"""
A product in the catalog.
Products can be queried by ID or searched.
"""
type Product {
  """The unique product identifier."""
  id: ID!

  """Display name for the product."""
  name: String!

  """Current price in USD."""
  price: Float!
}

GraphQL Playground provides interactive documentation. Developers explore the API without external docs.

Frontend Developer Experience

Frontend teams report significant productivity improvements:

No more waiting for backend to add endpoints
Self-service data requirements
Local development with schema mocking
Clear contract between frontend and backend

Schema Evolution

Adding fields is non-breaking:

# Before
type Product {
  id: ID!
  name: String!
  price: Float!
}

# After - add fields without breaking clients
type Product {
  id: ID!
  name: String!
  price: Float!
  salePrice: Float      # New optional field
  onSale: Boolean!      # New required field with default resolver
}

Old clients continue working. New clients use new fields.

What Didn’t Work

N+1 Query Problem

Naive implementation creates database query explosions:

query {
  products(first: 100) {
    id
    name
    category {      # N queries for categories
      name
    }
    reviews {       # N queries for reviews
      rating
    }
  }
}

Without optimization, this executes 201 database queries.

Solution: DataLoader

const categoryLoader = new DataLoader(async (categoryIds) => {
  const categories = await db.categories.findByIds(categoryIds);
  return categoryIds.map(id => categories.find(c => c.id === id));
});

const resolvers = {
  Product: {
    category: (product) => categoryLoader.load(product.categoryId),
  },
};

DataLoader batches and caches within a request. Essential for performance.

Query Complexity Attacks

Without limits, malicious queries can overwhelm servers:

query Evil {
  products(first: 1000) {
    reviews {
      author {
        products {
          reviews {
            author {
              products {
                # Exponential explosion
              }
            }
          }
        }
      }
    }
  }
}

Solutions:

// Query complexity analysis
const complexityLimiter = graphqlComplexity({
  maximumComplexity: 1000,
  variables: req.body.variables,
  onComplete: (complexity) => {
    console.log('Query Complexity:', complexity);
  },
});

// Query depth limiting
const depthLimiter = depthLimit(5);

// Rate limiting by query cost
const rateLimiter = costBasedRateLimiter({
  maxCost: 10000,
  window: '1 minute',
});

Implement query analysis before production.

Caching Complexity

REST’s HTTP caching doesn’t work with GraphQL POST requests:

# REST - cacheable
GET /products/123
Cache-Control: max-age=3600

# GraphQL - not HTTP cacheable
POST /graphql
{query: "{ product(id: 123) { name } }"}

Solutions:

// Persisted queries - cache by query hash
const persistedQueries = {
  'abc123': 'query ProductDetail($id: ID!) { product(id: $id) { name } }',
};

// CDN caching with @cacheControl directive
type Product @cacheControl(maxAge: 3600) {
  id: ID!
  name: String! @cacheControl(maxAge: 86400)
  inventory: Int! @cacheControl(maxAge: 0)  # Never cache
}

// Response caching
const cache = new RedisCache();
const cachePlugin = responseCachePlugin({ cache });

Caching requires more thought than REST.

Monitoring Difficulty

Traditional endpoint-based monitoring doesn’t work:

# REST - clear what's happening
GET /products - 50ms
GET /users/123 - 30ms

# GraphQL - all one endpoint
POST /graphql - 150ms  (which query? what was slow?)

Solution: Per-field tracing

const server = new ApolloServer({
  plugins: [
    {
      requestDidStart() {
        return {
          executionDidStart() {
            return {
              willResolveField({ info }) {
                const start = Date.now();
                return () => {
                  const duration = Date.now() - start;
                  if (duration > 100) {
                    logger.warn(`Slow field: ${info.fieldName} (${duration}ms)`);
                  }
                };
              },
            };
          },
        };
      },
    },
  ],
});

Apollo Studio or similar tools provide query-level visibility.

Error Handling Nuances

GraphQL returns 200 OK with errors in the response:

{
  "data": {
    "product": null
  },
  "errors": [
    {
      "message": "Product not found",
      "path": ["product"]
    }
  ]
}

Clients must check for partial errors. Monitoring tools expecting HTTP error codes miss GraphQL errors.

Schema Design Mistakes

Early schema decisions are hard to change:

# Our mistake: too-specific naming
type ProductSearchResult {
  products: [Product!]!
  totalCount: Int!
}

# Better: generic connection pattern
type ProductConnection {
  edges: [ProductEdge!]!
  pageInfo: PageInfo!
}

We’re still living with early design decisions.

What We’d Do Differently

Start with Schema Design

We built resolvers first, then designed schema around them. Should have:

Design schema from client perspective
Review with frontend teams
Iterate on design
Then implement resolvers

Schema-first development produces better APIs.

Implement Complexity Limiting Earlier

We added query complexity limits after incidents. Should be there from day one.

Use Persisted Queries From Start

Persisted queries provide:

Better security (clients can’t send arbitrary queries)
Better caching
Smaller request payloads
Query analysis at build time

We migrated to persisted queries later; it was harder than starting with them.

Better Tooling Investment

We underinvested in:

Schema governance tools
Breaking change detection
Performance monitoring
Development tooling

Good tooling multiplies productivity.

Federation Earlier

Our monolithic schema became unwieldy. Apollo Federation lets teams own their schema portions:

# Products service
type Product @key(fields: "id") {
  id: ID!
  name: String!
  price: Float!
}

# Reviews service - extends Product
extend type Product @key(fields: "id") {
  id: ID! @external
  reviews: [Review!]!
}

We’re migrating to federation now; earlier adoption would have scaled better.

Recommendations

For New Projects

Use GraphQL if: Multiple clients with different needs, complex data requirements, frontend teams that benefit from self-service
Avoid GraphQL if: Simple CRUD, single client, team unfamiliar with GraphQL

Essential Practices

DataLoader for N+1 prevention
Query complexity limits
Depth limits
Field-level monitoring
Schema testing
Breaking change detection

Recommended Tools

Apollo Server: Full-featured, great ecosystem
GraphQL Code Generator: Type generation
Apollo Studio: Monitoring and schema management
GraphQL Inspector: Schema change detection

Key Takeaways

Client flexibility is GraphQL’s biggest win—different clients get exactly what they need
Type safety and self-documenting schemas improve developer experience
N+1 queries require DataLoader or similar batching
Implement query complexity limits before production
Caching requires explicit strategies (persisted queries, response caching)
Monitor at the field level, not just endpoint level
Design schema from client perspective before implementing resolvers
Consider federation early for team scalability
GraphQL isn’t always the right choice—evaluate honestly

GraphQL has been net positive for us, but it requires investment in tooling and practices. The benefits are real when implemented well.