GraphQL in Production: One Year Later

June 11, 2018

A year ago, we migrated our primary API from REST to GraphQL. The promise was compelling: clients request exactly what they need, strong typing, and better developer experience. We’ve now processed billions of GraphQL queries in production.

Here’s what we’ve learned.

What Worked

Client Flexibility

The primary promise delivered. Clients get exactly the data they need:

# Mobile app - minimal data for list view
query ProductList {
  products(first: 20) {
    id
    name
    thumbnailUrl
    price
  }
}

# Web app - rich data for detail view
query ProductDetail($id: ID!) {
  product(id: $id) {
    id
    name
    description
    fullImageUrl
    price
    inventory
    reviews {
      rating
      comment
      author { name }
    }
    relatedProducts {
      id
      name
      thumbnailUrl
    }
  }
}

Mobile apps use less bandwidth. Web apps get rich data. Same API.

Type Safety

GraphQL’s type system caught errors early:

type Product {
  id: ID!
  name: String!
  price: Float!
  description: String
  inventory: Int!
}

TypeScript clients generated from schema ensure type safety end-to-end. Breaking changes are detected at build time, not runtime.

Documentation

Schema is documentation:

"""
A product in the catalog.
Products can be queried by ID or searched.
"""
type Product {
  """The unique product identifier."""
  id: ID!

  """Display name for the product."""
  name: String!

  """Current price in USD."""
  price: Float!
}

GraphQL Playground provides interactive documentation. Developers explore the API without external docs.

Frontend Developer Experience

Frontend teams report significant productivity improvements:

Schema Evolution

Adding fields is non-breaking:

# Before
type Product {
  id: ID!
  name: String!
  price: Float!
}

# After - add fields without breaking clients
type Product {
  id: ID!
  name: String!
  price: Float!
  salePrice: Float      # New optional field
  onSale: Boolean!      # New required field with default resolver
}

Old clients continue working. New clients use new fields.

What Didn’t Work

N+1 Query Problem

Naive implementation creates database query explosions:

query {
  products(first: 100) {
    id
    name
    category {      # N queries for categories
      name
    }
    reviews {       # N queries for reviews
      rating
    }
  }
}

Without optimization, this executes 201 database queries.

Solution: DataLoader

const categoryLoader = new DataLoader(async (categoryIds) => {
  const categories = await db.categories.findByIds(categoryIds);
  return categoryIds.map(id => categories.find(c => c.id === id));
});

const resolvers = {
  Product: {
    category: (product) => categoryLoader.load(product.categoryId),
  },
};

DataLoader batches and caches within a request. Essential for performance.

Query Complexity Attacks

Without limits, malicious queries can overwhelm servers:

query Evil {
  products(first: 1000) {
    reviews {
      author {
        products {
          reviews {
            author {
              products {
                # Exponential explosion
              }
            }
          }
        }
      }
    }
  }
}

Solutions:

// Query complexity analysis
const complexityLimiter = graphqlComplexity({
  maximumComplexity: 1000,
  variables: req.body.variables,
  onComplete: (complexity) => {
    console.log('Query Complexity:', complexity);
  },
});

// Query depth limiting
const depthLimiter = depthLimit(5);

// Rate limiting by query cost
const rateLimiter = costBasedRateLimiter({
  maxCost: 10000,
  window: '1 minute',
});

Implement query analysis before production.

Caching Complexity

REST’s HTTP caching doesn’t work with GraphQL POST requests:

# REST - cacheable
GET /products/123
Cache-Control: max-age=3600

# GraphQL - not HTTP cacheable
POST /graphql
{query: "{ product(id: 123) { name } }"}

Solutions:

// Persisted queries - cache by query hash
const persistedQueries = {
  'abc123': 'query ProductDetail($id: ID!) { product(id: $id) { name } }',
};

// CDN caching with @cacheControl directive
type Product @cacheControl(maxAge: 3600) {
  id: ID!
  name: String! @cacheControl(maxAge: 86400)
  inventory: Int! @cacheControl(maxAge: 0)  # Never cache
}

// Response caching
const cache = new RedisCache();
const cachePlugin = responseCachePlugin({ cache });

Caching requires more thought than REST.

Monitoring Difficulty

Traditional endpoint-based monitoring doesn’t work:

# REST - clear what's happening
GET /products - 50ms
GET /users/123 - 30ms

# GraphQL - all one endpoint
POST /graphql - 150ms  (which query? what was slow?)

Solution: Per-field tracing

const server = new ApolloServer({
  plugins: [
    {
      requestDidStart() {
        return {
          executionDidStart() {
            return {
              willResolveField({ info }) {
                const start = Date.now();
                return () => {
                  const duration = Date.now() - start;
                  if (duration > 100) {
                    logger.warn(`Slow field: ${info.fieldName} (${duration}ms)`);
                  }
                };
              },
            };
          },
        };
      },
    },
  ],
});

Apollo Studio or similar tools provide query-level visibility.

Error Handling Nuances

GraphQL returns 200 OK with errors in the response:

{
  "data": {
    "product": null
  },
  "errors": [
    {
      "message": "Product not found",
      "path": ["product"]
    }
  ]
}

Clients must check for partial errors. Monitoring tools expecting HTTP error codes miss GraphQL errors.

Schema Design Mistakes

Early schema decisions are hard to change:

# Our mistake: too-specific naming
type ProductSearchResult {
  products: [Product!]!
  totalCount: Int!
}

# Better: generic connection pattern
type ProductConnection {
  edges: [ProductEdge!]!
  pageInfo: PageInfo!
}

We’re still living with early design decisions.

What We’d Do Differently

Start with Schema Design

We built resolvers first, then designed schema around them. Should have:

  1. Design schema from client perspective
  2. Review with frontend teams
  3. Iterate on design
  4. Then implement resolvers

Schema-first development produces better APIs.

Implement Complexity Limiting Earlier

We added query complexity limits after incidents. Should be there from day one.

Use Persisted Queries From Start

Persisted queries provide:

We migrated to persisted queries later; it was harder than starting with them.

Better Tooling Investment

We underinvested in:

Good tooling multiplies productivity.

Federation Earlier

Our monolithic schema became unwieldy. Apollo Federation lets teams own their schema portions:

# Products service
type Product @key(fields: "id") {
  id: ID!
  name: String!
  price: Float!
}

# Reviews service - extends Product
extend type Product @key(fields: "id") {
  id: ID! @external
  reviews: [Review!]!
}

We’re migrating to federation now; earlier adoption would have scaled better.

Recommendations

For New Projects

  1. Use GraphQL if: Multiple clients with different needs, complex data requirements, frontend teams that benefit from self-service
  2. Avoid GraphQL if: Simple CRUD, single client, team unfamiliar with GraphQL

Essential Practices

Key Takeaways

GraphQL has been net positive for us, but it requires investment in tooling and practices. The benefits are real when implemented well.