Go was designed for building network services. Its concurrency primitives, garbage collector, and runtime make it excellent for high-throughput, low-latency applications. But writing performant Go requires understanding how the runtime works and following specific patterns.
Here’s how to build Go services that perform at scale.
Understanding Go’s Runtime
Goroutines and Scheduling
Goroutines are lightweight threads managed by Go’s runtime:
- ~2KB initial stack (grows as needed)
- Scheduled onto OS threads by Go’s scheduler
- M:N scheduling (many goroutines on fewer threads)
The scheduler is cooperative—goroutines yield at specific points:
- Channel operations
time.Sleep- I/O operations
- Function calls (stack checks)
runtime.Gosched()
Long-running computations without yields can block the scheduler.
Memory Allocation
The Go allocator is optimized for small allocations:
- Per-P (processor) caches for fast allocation
- Size-class based allocation
- Low contention design
But allocation isn’t free. Reducing allocations improves performance.
Garbage Collection
Go’s GC is concurrent and low-latency:
- Sub-millisecond pauses in most cases
- Runs concurrently with application
- Triggered by heap growth
GC time correlates with live heap size and allocation rate. Reduce allocations, reduce GC pressure.
Reducing Allocations
Profiling First
Before optimizing, profile:
import _ "net/http/pprof"
func main() {
go func() {
http.ListenAndServe("localhost:6060", nil)
}()
// ...
}
go tool pprof http://localhost:6060/debug/pprof/heap
go tool pprof http://localhost:6060/debug/pprof/allocs
Focus on hot paths—most allocations come from few locations.
Stack vs Heap
Variables escape to heap when:
- Returned from functions (usually)
- Captured by closures
- Stored in interfaces
- Larger than stack limits
Check escape analysis:
go build -gcflags="-m" ./...
// Escapes - returned pointer
func newUser() *User {
return &User{} // Allocated on heap
}
// Doesn't escape
func processUser() {
user := User{} // Stack allocated
doSomething(&user)
}
Sync.Pool
Reuse allocations with sync.Pool:
var bufferPool = sync.Pool{
New: func() interface{} {
return make([]byte, 4096)
},
}
func handleRequest(r *Request) {
buf := bufferPool.Get().([]byte)
defer bufferPool.Put(buf)
// Use buffer
n := copy(buf, r.Body)
process(buf[:n])
}
sync.Pool reduces allocations for frequently-used objects.
Avoid Interface Allocation
Storing values in interfaces can allocate:
// Allocates - int must be boxed
var x interface{} = 42
// Better - use concrete types when possible
var x int = 42
Preallocate Slices
// Grows multiple times
result := []Item{}
for _, v := range input {
result = append(result, transform(v))
}
// Preallocated - no growth
result := make([]Item, 0, len(input))
for _, v := range input {
result = append(result, transform(v))
}
Strings and Bytes
String operations often allocate:
// Many allocations
s := ""
for _, part := range parts {
s += part
}
// Single allocation
var b strings.Builder
for _, part := range parts {
b.WriteString(part)
}
s := b.String()
Concurrency Patterns
Worker Pools
Process work with bounded concurrency:
func workerPool(jobs <-chan Job, results chan<- Result, workers int) {
var wg sync.WaitGroup
for i := 0; i < workers; i++ {
wg.Add(1)
go func() {
defer wg.Done()
for job := range jobs {
results <- process(job)
}
}()
}
wg.Wait()
close(results)
}
Workers limit concurrency and reduce goroutine creation overhead.
Bounded Channels
Unbounded work queues cause memory issues:
// Dangerous - unbounded queue
jobs := make(chan Job)
// Better - backpressure
jobs := make(chan Job, 1000) // Blocks when full
Backpressure prevents memory exhaustion under load.
Context for Cancellation
Propagate cancellation through call chain:
func handleRequest(ctx context.Context) error {
ctx, cancel := context.WithTimeout(ctx, 5*time.Second)
defer cancel()
resultCh := make(chan Result)
go func() {
resultCh <- slowOperation()
}()
select {
case result := <-resultCh:
return processResult(result)
case <-ctx.Done():
return ctx.Err() // Timeout or cancellation
}
}
Always respect context cancellation.
Avoid Goroutine Leaks
Goroutines that never exit leak memory:
// Leak - goroutine never exits if done is closed
go func() {
for {
select {
case v := <-input:
process(v)
}
}
}()
// Fixed - exits on done
go func() {
for {
select {
case v := <-input:
process(v)
case <-done:
return
}
}
}()
HTTP Server Optimization
Connection Handling
Default settings may not be optimal:
server := &http.Server{
Addr: ":8080",
Handler: handler,
ReadTimeout: 5 * time.Second,
WriteTimeout: 10 * time.Second,
IdleTimeout: 120 * time.Second,
MaxHeaderBytes: 1 << 20,
}
Tune timeouts for your workload.
Connection Pooling for Clients
Reuse connections:
var client = &http.Client{
Transport: &http.Transport{
MaxIdleConns: 100,
MaxIdleConnsPerHost: 100,
IdleConnTimeout: 90 * time.Second,
},
Timeout: 10 * time.Second,
}
Create one client, reuse it.
JSON Performance
Standard library is good but not fastest:
// Standard library
json.Marshal(v)
json.Unmarshal(data, &v)
// Faster alternatives
// jsoniter - drop-in replacement
var json = jsoniter.ConfigCompatibleWithStandardLibrary
json.Marshal(v)
// easyjson - code generation
// Requires generating marshaling code
For hot paths, consider alternatives.
Response Compression
Compress responses:
func compressHandler(next http.Handler) http.Handler {
return gziphandler.GzipHandler(next)
}
Reduces bandwidth, improves perceived latency.
Database Access
Connection Pooling
Configure pool appropriately:
db.SetMaxOpenConns(25)
db.SetMaxIdleConns(25)
db.SetConnMaxLifetime(5 * time.Minute)
Tune based on database capacity and workload.
Prepared Statements
Reuse prepared statements:
stmt, err := db.Prepare("SELECT * FROM users WHERE id = ?")
if err != nil {
return err
}
defer stmt.Close()
// Reuse for many queries
for _, id := range userIds {
rows, err := stmt.Query(id)
// ...
}
Batch Operations
Reduce round trips:
// Slow - N queries
for _, item := range items {
db.Exec("INSERT INTO items VALUES (?)", item)
}
// Fast - one query
tx, _ := db.Begin()
stmt, _ := tx.Prepare("INSERT INTO items VALUES (?)")
for _, item := range items {
stmt.Exec(item)
}
tx.Commit()
Benchmarking
Write Benchmarks
func BenchmarkProcess(b *testing.B) {
input := generateInput()
b.ResetTimer()
for i := 0; i < b.N; i++ {
process(input)
}
}
func BenchmarkProcessParallel(b *testing.B) {
input := generateInput()
b.ResetTimer()
b.RunParallel(func(pb *testing.PB) {
for pb.Next() {
process(input)
}
})
}
Run Benchmarks
go test -bench=. -benchmem -count=5 ./...
Compare before and after:
go test -bench=. -count=10 > old.txt
# Make changes
go test -bench=. -count=10 > new.txt
benchstat old.txt new.txt
Production Considerations
GOMAXPROCS
Default is number of CPUs. Sometimes tuning helps:
import _ "go.uber.org/automaxprocs" // Respects container limits
In containers, detect actual CPU limits.
Memory Limits
Set soft memory limit (Go 1.19+):
import "runtime/debug"
func init() {
debug.SetMemoryLimit(500 * 1024 * 1024) // 500MB
}
Or via environment:
GOMEMLIMIT=500MiB ./myservice
Observability
Expose runtime metrics:
import (
"github.com/prometheus/client_golang/prometheus/promhttp"
)
http.Handle("/metrics", promhttp.Handler())
Monitor goroutine count, heap size, GC pause times.
Key Takeaways
- Profile before optimizing—find actual hot paths
- Reduce allocations: use sync.Pool, preallocate slices, avoid unnecessary interface boxing
- Understand escape analysis to keep allocations on stack
- Use worker pools for bounded concurrency
- Configure HTTP clients and servers with appropriate timeouts and pool sizes
- Batch database operations to reduce round trips
- Write benchmarks and use benchstat for comparison
- In containers, respect CPU and memory limits
- Monitor runtime metrics in production
Go performs well by default, but understanding the runtime enables significant improvements for demanding workloads.