OpenTelemetry Adoption: Unified Observability

Observability instrumentation has been fragmented: different libraries for tracing (Jaeger, Zipkin), metrics (Prometheus, StatsD), and logging. OpenTelemetry unifies this with a single standard for collecting telemetry data. Adoption is accelerating.

Here’s how to adopt OpenTelemetry effectively.

Why OpenTelemetry?

The Fragmentation Problem

before_opentelemetry:
  tracing:
    - OpenTracing (deprecated)
    - OpenCensus (merged into OTel)
    - Jaeger client
    - Zipkin client
    - Vendor-specific (Datadog, New Relic)

  metrics:
    - Prometheus client
    - StatsD
    - Micrometer
    - Vendor-specific

  logging:
    - Language-specific (log4j, logrus, zap)
    - No correlation with traces

  problems:
    - Vendor lock-in
    - Different instrumentation per backend
    - Difficult to switch vendors
    - No unified context

OpenTelemetry Solution

opentelemetry_approach:
  unified_api:
    - Single API for traces, metrics, logs
    - Language-specific SDKs
    - Consistent across languages

  vendor_neutral:
    - Export to any backend
    - Switch backends without code changes
    - Multi-backend support

  semantic_conventions:
    - Standardized attribute names
    - Consistent across services
    - Better correlation

  context_propagation:
    - Automatic context passing
    - Trace ID in logs
    - Correlated telemetry

Core Components

Architecture

┌─────────────────────────────────────────────────────────────────┐
│                        Application                               │
├─────────────────────────────────────────────────────────────────┤
│                    OpenTelemetry SDK                             │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐              │
│  │   Tracer    │  │   Meter     │  │   Logger    │              │
│  └─────────────┘  └─────────────┘  └─────────────┘              │
├─────────────────────────────────────────────────────────────────┤
│                    Exporters                                     │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐              │
│  │    OTLP     │  │  Prometheus │  │   Jaeger    │              │
│  └─────────────┘  └─────────────┘  └─────────────┘              │
└───────────────────────────┬─────────────────────────────────────┘
                            │
                            ▼
┌─────────────────────────────────────────────────────────────────┐
│                  OpenTelemetry Collector                         │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐              │
│  │  Receivers  │──│  Processors │──│  Exporters  │              │
│  └─────────────┘  └─────────────┘  └─────────────┘              │
└───────────────────────────┬─────────────────────────────────────┘
                            │
        ┌───────────────────┼───────────────────┐
        ▼                   ▼                   ▼
   ┌─────────┐        ┌─────────┐        ┌─────────┐
   │ Jaeger  │        │Prometheus│       │ Vendor  │
   └─────────┘        └─────────┘        └─────────┘

Instrumentation

// Go instrumentation example
import (
    "go.opentelemetry.io/otel"
    "go.opentelemetry.io/otel/attribute"
    "go.opentelemetry.io/otel/trace"
    "go.opentelemetry.io/otel/metric"
)

var tracer = otel.Tracer("order-service")
var meter = otel.Meter("order-service")

// Metrics
var (
    ordersCounter, _ = meter.Int64Counter(
        "orders.created",
        metric.WithDescription("Number of orders created"),
    )

    orderDuration, _ = meter.Float64Histogram(
        "orders.duration",
        metric.WithDescription("Order processing duration"),
        metric.WithUnit("ms"),
    )
)

func CreateOrder(ctx context.Context, order Order) error {
    // Start span
    ctx, span := tracer.Start(ctx, "CreateOrder",
        trace.WithAttributes(
            attribute.String("customer.id", order.CustomerID),
            attribute.Float64("order.total", order.Total),
        ),
    )
    defer span.End()

    start := time.Now()

    // Business logic
    if err := validateOrder(ctx, order); err != nil {
        span.RecordError(err)
        span.SetStatus(codes.Error, "validation failed")
        return err
    }

    // ... more processing

    // Record metrics
    ordersCounter.Add(ctx, 1,
        metric.WithAttributes(
            attribute.String("status", "success"),
        ),
    )
    orderDuration.Record(ctx, float64(time.Since(start).Milliseconds()))

    span.SetStatus(codes.Ok, "")
    return nil
}

SDK Setup

// Initialize OpenTelemetry
func initTelemetry() (func(), error) {
    ctx := context.Background()

    // Resource describes the service
    res, err := resource.New(ctx,
        resource.WithAttributes(
            semconv.ServiceName("order-service"),
            semconv.ServiceVersion("1.0.0"),
            semconv.DeploymentEnvironment("production"),
        ),
    )
    if err != nil {
        return nil, err
    }

    // Trace exporter
    traceExporter, err := otlptracegrpc.New(ctx,
        otlptracegrpc.WithEndpoint("otel-collector:4317"),
        otlptracegrpc.WithInsecure(),
    )
    if err != nil {
        return nil, err
    }

    // Trace provider
    tp := trace.NewTracerProvider(
        trace.WithBatcher(traceExporter),
        trace.WithResource(res),
        trace.WithSampler(trace.ParentBased(trace.TraceIDRatioBased(0.1))),
    )
    otel.SetTracerProvider(tp)

    // Metric exporter
    metricExporter, err := otlpmetricgrpc.New(ctx,
        otlpmetricgrpc.WithEndpoint("otel-collector:4317"),
        otlpmetricgrpc.WithInsecure(),
    )
    if err != nil {
        return nil, err
    }

    // Meter provider
    mp := metric.NewMeterProvider(
        metric.WithReader(metric.NewPeriodicReader(metricExporter)),
        metric.WithResource(res),
    )
    otel.SetMeterProvider(mp)

    // Propagator for distributed tracing
    otel.SetTextMapPropagator(propagation.NewCompositeTextMapPropagator(
        propagation.TraceContext{},
        propagation.Baggage{},
    ))

    // Cleanup function
    return func() {
        tp.Shutdown(ctx)
        mp.Shutdown(ctx)
    }, nil
}

Collector

Configuration

# otel-collector-config.yaml
receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318

  prometheus:
    config:
      scrape_configs:
        - job_name: 'kubernetes-pods'
          kubernetes_sd_configs:
            - role: pod

processors:
  batch:
    timeout: 1s
    send_batch_size: 1024

  memory_limiter:
    check_interval: 1s
    limit_mib: 1000
    spike_limit_mib: 200

  resource:
    attributes:
      - key: environment
        value: production
        action: upsert

  filter:
    spans:
      exclude:
        match_type: strict
        attributes:
          - key: http.target
            value: /health

exporters:
  jaeger:
    endpoint: jaeger:14250
    tls:
      insecure: true

  prometheus:
    endpoint: 0.0.0.0:8889

  otlp:
    endpoint: vendor-endpoint:443

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [memory_limiter, batch, resource, filter]
      exporters: [jaeger, otlp]

    metrics:
      receivers: [otlp, prometheus]
      processors: [memory_limiter, batch, resource]
      exporters: [prometheus, otlp]

Deployment

# Kubernetes deployment
apiVersion: apps/v1
kind: Deployment
metadata:
  name: otel-collector
spec:
  replicas: 2
  template:
    spec:
      containers:
        - name: collector
          image: otel/opentelemetry-collector-contrib:latest
          args:
            - --config=/etc/otel/config.yaml
          ports:
            - containerPort: 4317  # OTLP gRPC
            - containerPort: 4318  # OTLP HTTP
            - containerPort: 8889  # Prometheus metrics
          volumeMounts:
            - name: config
              mountPath: /etc/otel
          resources:
            requests:
              cpu: 200m
              memory: 256Mi
            limits:
              cpu: 1000m
              memory: 1Gi
      volumes:
        - name: config
          configMap:
            name: otel-collector-config

Auto-Instrumentation

Java Agent

# Download agent
curl -L -o opentelemetry-javaagent.jar \
  https://github.com/open-telemetry/opentelemetry-java-instrumentation/releases/download/v1.17.0/opentelemetry-javaagent.jar

# Run with agent
java -javaagent:opentelemetry-javaagent.jar \
  -Dotel.service.name=my-service \
  -Dotel.exporter.otlp.endpoint=http://collector:4317 \
  -jar my-app.jar

Python

# pip install opentelemetry-distro opentelemetry-exporter-otlp
# opentelemetry-bootstrap -a install

from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
from opentelemetry.instrumentation.flask import FlaskInstrumentor
from opentelemetry.instrumentation.requests import RequestsInstrumentor

# Setup
provider = TracerProvider()
processor = BatchSpanProcessor(OTLPSpanExporter(endpoint="http://collector:4317"))
provider.add_span_processor(processor)
trace.set_tracer_provider(provider)

# Auto-instrument
FlaskInstrumentor().instrument()
RequestsInstrumentor().instrument()

# Manual instrumentation where needed
tracer = trace.get_tracer(__name__)

@app.route('/process')
def process():
    with tracer.start_as_current_span("process_request") as span:
        span.set_attribute("custom.key", "value")
        # ... business logic

Migration Strategy

Phased Approach

migration_phases:
  phase_1_collector:
    duration: 2-4 weeks
    actions:
      - Deploy OTel Collector
      - Receive existing formats (Jaeger, Prometheus)
      - Export to existing backends
    benefit: Centralized telemetry pipeline

  phase_2_new_services:
    duration: Ongoing
    actions:
      - New services use OTel SDK
      - Auto-instrumentation where possible
      - Standard semantic conventions
    benefit: Future-proof instrumentation

  phase_3_migrate_existing:
    duration: 3-6 months
    actions:
      - Prioritize critical services
      - Replace vendor SDKs with OTel
      - Remove old instrumentation
    benefit: Unified instrumentation

  phase_4_optimize:
    duration: Ongoing
    actions:
      - Tune sampling
      - Optimize collector pipeline
      - Add custom instrumentation
    benefit: Production-ready observability

Key Takeaways

OpenTelemetry unifies traces, metrics, and logs instrumentation
Single API works across languages with consistent semantics
Collector centralizes telemetry pipeline and enables multi-backend export
Auto-instrumentation reduces effort for common frameworks
Vendor-neutral: switch backends without code changes
Semantic conventions ensure consistent attribute naming
Deploy Collector first, then migrate instrumentation
New services should use OTel from the start
Migration can be incremental—both old and new can coexist
OTel is the future; investment now pays off

OpenTelemetry is becoming the standard. The time to adopt is now.