Prev Next

Golang / GoLang System Architecture and Testing Interview Questions

1. Compare REST/JSON with gRPC/Protocol Buffers. When would you choose gRPC for a Go microservice? 2. How do you implement a gRPC server in Go, including error handling and interceptors? 3. How do you build a production-ready gRPC client in Go with connection reuse and resilience? 4. What microservice design patterns are most important to understand for Go interviews? 5. How do you implement distributed tracing and observability in a Go microservice system? 6. How do you implement event-driven communication between Go microservices using message queues? 7. How do you manage database connections and sharding in a high-scale Go service? 8. What caching strategies do you use in Go microservices and how do you prevent cache stampede? 9. What are table-driven tests in Go and why are they the standard testing pattern? 10. How do you write Go benchmarks and what does -benchmem tell you? 11. How do you find and fix memory allocation hotspots in a Go service using profiling? 12. How do you structure integration tests in Go that require real databases or external services? 13. Explain the difference between mocks, stubs, and fakes in Go testing. When do you use each? 14. How does Go's built-in fuzzing work and when should you use property-based testing? 15. How do you test concurrent Go code correctly — including data races and timing issues? 16. How do you decide where to draw service boundaries when decomposing a Go monolith into microservices? 17. How do you version gRPC APIs in Go without breaking existing clients? 18. How do you write unit and integration tests for gRPC services in Go? 19. How do you load test a Go microservice and interpret the results? 20. How does service discovery and client-side load balancing work in a Go microservice system? 21. How do you design a consistent error model across multiple Go microservices? 22. How do you implement the Saga pattern for distributed transactions in Go? 23. What testing.T methods do experienced Go engineers use to write cleaner tests? 24. How do you benchmark concurrent code with testing.B and what insights does it provide? 25. How do you manage dependency injection at scale in a large Go service — wire, dig, or manual? 26. How do you achieve zero-downtime deployments for a Go microservice in Kubernetes? 27. How do generics in Go 1.18+ enable better system design and what are the trade-offs? 28. How do you use test coverage meaningfully in Go — beyond just a percentage? 29. What are the best practices for designing Protocol Buffer schemas in Go microservices? 30. How do you implement safe retries in Go microservices? 31. What are golden file tests in Go and when should you use them? 32. How do you ensure data consistency across Go microservices without distributed transactions? 33. What is the API Gateway pattern and how does it complement Go microservices? 34. What memory leak patterns in Go are not goroutine leaks and how do you detect them? 35. How do CQRS and event sourcing apply to Go microservice architecture? 36. What is chaos engineering and how do Go teams apply it to test microservice resilience? 37. What is contract testing and how does it apply to Go microservices? 38. What makes a Go microservice horizontally scalable and what patterns break scaling? 39. How do you implement configuration hot-reloading in a Go service without restart? 40. How do you architect Go services for maximum testability at the package level? 41. How do you implement feature flags and canary deployments in a Go microservice? 42. How do you design a multi-tenant Go microservice? 43. What is mutation testing and how does it evaluate test suite quality beyond coverage? 44. How do you manage the full lifecycle of a Go microservice from startup to shutdown? 45. How do you test Go code that processes streaming data or works with channels? 46. Summarise the key principles for designing scalable Go microservices that senior engineers demonstrate.
Could not find what you were looking for? send us the question and we would be happy to answer your question.

1. Compare REST/JSON with gRPC/Protocol Buffers. When would you choose gRPC for a Go microservice?

REST and gRPC solve the same problem — remote procedure calls — but make very different trade-offs. Understanding these trade-offs is central to microservice architecture decisions.

REST/JSON vs gRPC/Protobuf
AspectREST / JSONgRPC / Protobuf
ProtocolHTTP/1.1 or HTTP/2HTTP/2 (mandatory)
SerialisationJSON: text, ~30 bytes/field, human-readableProtobuf: binary, ~3-5 bytes/field, 5-10× smaller
SchemaOptional (OpenAPI), not enforced at compile timeRequired (.proto file), compile-time checked
Code generationOptionalRequired (protoc + language plugins)
Browser supportNativeNeeds grpc-web proxy layer
StreamingWorkarounds: SSE, WebSocketsBuilt-in: unary, server, client, bidirectional
LatencyHigher (text parsing overhead)Lower (binary + HTTP/2 multiplexing)
Best forPublic APIs, browser clients, simple integrationsInternal service-to-service, high-throughput, typed contracts
// user.proto
syntax = "proto3";
package user;

service UserService {
    rpc GetUser(GetUserRequest) returns (User);           // unary
    rpc StreamUsers(Empty) returns (stream User);         // server streaming
    rpc BatchCreate(stream CreateUserRequest)             // client streaming
        returns (BatchCreateResponse);
    rpc Chat(stream Message) returns (stream Message);    // bidirectional
}

message GetUserRequest { int64 id = 1; }
message User {
    int64  id    = 1;
    string name  = 2;
    string email = 3;
}

// Generate Go code:
// protoc --go_out=. --go-grpc_out=. user.proto

Why Go excels at gRPC: Go's goroutine model maps naturally to gRPC's concurrent streaming model. Each RPC handler runs in its own goroutine; the Go runtime multiplexes thousands of concurrent streams efficiently. Protobuf's generated Go code is idiomatic and integrates with Go's type system. Go's standard net/http and context packages align with gRPC's design.

Decision guide: choose gRPC for internal service-to-service calls where typed contracts, performance, and streaming matter. Use REST/JSON for public APIs consumed by browsers or third parties where human readability and broad tooling matter.

What serialisation format does gRPC use and what is its primary advantage over JSON?
Which gRPC streaming mode allows both the client and server to send multiple messages on a single connection?
2. How do you implement a gRPC server in Go, including error handling and interceptors?

Implementing a gRPC server follows a code-generation-first workflow: define the proto, generate Go stubs, implement the interface, and start the server. Interceptors (gRPC's equivalent of HTTP middleware) add cross-cutting concerns like logging and auth.

// Step 1: implement the generated server interface
type userServiceServer struct {
    pb.UnimplementedUserServiceServer // embed for forward compatibility
    repo UserRepository
    log  *slog.Logger
}

func (s *userServiceServer) GetUser(
    ctx context.Context,
    req *pb.GetUserRequest,
) (*pb.User, error) {
    if req.Id <= 0 {
        return nil, status.Errorf(codes.InvalidArgument,
            "id must be positive, got %d", req.Id)
    }
    user, err := s.repo.FindByID(ctx, int(req.Id))
    if err != nil {
        if errors.Is(err, ErrNotFound) {
            return nil, status.Errorf(codes.NotFound,
                "user %d not found", req.Id)
        }
        s.log.Error("repo error", "err", err)
        return nil, status.Errorf(codes.Internal, "internal error")
    }
    return &pb.User{Id: int64(user.ID), Name: user.Name, Email: user.Email}, nil
}

// Interceptor (middleware equivalent)
func loggingInterceptor(log *slog.Logger) grpc.UnaryServerInterceptor {
    return func(
        ctx context.Context,
        req any,
        info *grpc.UnaryServerInfo,
        handler grpc.UnaryHandler,
    ) (any, error) {
        start := time.Now()
        resp, err := handler(ctx, req)
        log.Info("RPC",
            "method", info.FullMethod,
            "duration", time.Since(start),
            "error", err,
        )
        return resp, err
    }
}

// Step 2: start the server
func main() {
    lis, _ := net.Listen("tcp", ":9090")
    srv := grpc.NewServer(
        grpc.ChainUnaryInterceptor(
            loggingInterceptor(log),
            recoveryInterceptor(),
            authInterceptor(secret),
        ),
    )
    pb.RegisterUserServiceServer(srv, &userServiceServer{repo: repo})
    reflection.Register(srv) // enables grpcurl inspection
    srv.Serve(lis)
}

gRPC status codes: always return structured status errors using status.Errorf(codes.X, ...). Clients receive the code and message and can handle them programmatically. The mapping to HTTP status codes is standardised (NotFound→404, InvalidArgument→400, Internal→500).

What does embedding 'pb.UnimplementedUserServiceServer' in your gRPC server struct provide?
Which gRPC status code should you return for a missing resource?
3. How do you build a production-ready gRPC client in Go with connection reuse and resilience?

A gRPC client wraps a ClientConn which manages a pool of HTTP/2 connections. Unlike HTTP/1.1, a single gRPC connection multiplexes many concurrent RPCs — connection reuse is critical.

// Production gRPC client setup
func newUserClient(addr string) (pb.UserServiceClient, func(), error) {
    conn, err := grpc.NewClient(addr,
        // TLS in production
        grpc.WithTransportCredentials(credentials.NewTLS(&tls.Config{})),

        // Keep-alive: detect dead connections
        grpc.WithKeepaliveParams(keepalive.ClientParameters{
            Time:                10 * time.Second,
            Timeout:             5 * time.Second,
            PermitWithoutStream: true,
        }),

        // Retry policy (built-in retry)
        grpc.WithDefaultServiceConfig(`{
            "methodConfig": [{
                "name": [{"service": "user.UserService"}],
                "retryPolicy": {
                    "maxAttempts": 3,
                    "initialBackoff": "0.1s",
                    "maxBackoff": "1s",
                    "backoffMultiplier": 2,
                    "retryableStatusCodes": ["UNAVAILABLE"]
                }
            }]
        }`),
    )
    if err != nil { return nil, nil, err }

    cleanup := func() { conn.Close() }
    return pb.NewUserServiceClient(conn), cleanup, nil
}

// Using the client — always pass context
func fetchUser(ctx context.Context, client pb.UserServiceClient, id int64) (*pb.User, error) {
    ctx, cancel := context.WithTimeout(ctx, 2*time.Second)
    defer cancel()

    return client.GetUser(ctx, &pb.GetUserRequest{Id: id})
}

// Client interceptors (middleware for outgoing calls)
conn, _ := grpc.NewClient(addr,
    grpc.WithTransportCredentials(insecure.NewCredentials()),
    grpc.WithChainUnaryInterceptor(
        metadataInterceptor, // attach trace ID to outgoing metadata
        retryInterceptor,
    ),
)

Connection sharing: share one ClientConn per target service across the entire application. HTTP/2 multiplexing means thousands of concurrent RPCs share a single TCP connection — creating a new conn per RPC defeats the purpose and wastes resources.

Why should a gRPC ClientConn be shared across the application rather than creating one per RPC call?
What does the gRPC keepalive parameter 'PermitWithoutStream: true' enable?
4. What microservice design patterns are most important to understand for Go interviews?

Senior Go interviews probe whether you can design systems that are resilient, observable, and maintainable at scale. These patterns recur across every production Go microservice.

Core Microservice Patterns
PatternProblem SolvedGo Implementation
Circuit BreakerPrevent cascade failures when a downstream is slow/deadsony/gobreaker or custom state machine
BulkheadIsolate failures — one slow dependency shouldn't affect othersSeparate goroutine pools / semaphores per dependency
Retry + BackoffTransient failures in distributed systemsgrpc RetryPolicy or manual exponential backoff
Saga / OutboxDistributed transactions without 2PCEvent sourcing + idempotent consumers
SidecarCross-cutting concerns without modifying serviceEnvoy/Linkerd proxies, Dapr
Health Check AggregatorK8s readiness = all dependencies healthyCustom /readyz checking DB, cache, downstream
Strangler FigGradual migration from monolithRoute by feature flag; run old+new in parallel
// Circuit breaker with sony/gobreaker
import "github.com/sony/gobreaker"

type UserClient struct {
    grpc pb.UserServiceClient
    cb   *gobreaker.CircuitBreaker
}

func NewUserClient(grpc pb.UserServiceClient) *UserClient {
    cb := gobreaker.NewCircuitBreaker(gobreaker.Settings{
        Name:        "user-service",
        MaxRequests: 3,                 // requests in half-open state
        Interval:    10 * time.Second,  // reset window for counting
        Timeout:     30 * time.Second,  // how long to stay open
        ReadyToTrip: func(c gobreaker.Counts) bool {
            return c.ConsecutiveFailures >= 5
        },
    })
    return &UserClient{grpc: grpc, cb: cb}
}

func (c *UserClient) GetUser(ctx context.Context, id int64) (*pb.User, error) {
    result, err := c.cb.Execute(func() (any, error) {
        return c.grpc.GetUser(ctx, &pb.GetUserRequest{Id: id})
    })
    if err != nil {
        if errors.Is(err, gobreaker.ErrOpenState) {
            return nil, fmt.Errorf("user service unavailable (circuit open): %w", err)
        }
        return nil, err
    }
    return result.(*pb.User), nil
}
What is the purpose of a circuit breaker in a microservice architecture?
What problem does the Bulkhead pattern solve?
5. How do you implement distributed tracing and observability in a Go microservice system?

Observability in distributed systems requires three pillars: metrics (what is happening?), logs (what happened?), and traces (why is a specific request slow?). OpenTelemetry is the standard SDK for Go.

// OpenTelemetry setup
import (
    "go.opentelemetry.io/otel"
    "go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc"
    "go.opentelemetry.io/otel/sdk/trace"
)

func initTracer(ctx context.Context) (func(), error) {
    exporter, err := otlptracegrpc.New(ctx,
        otlptracegrpc.WithEndpoint("otel-collector:4317"),
        otlptracegrpc.WithInsecure(),
    )
    if err != nil { return nil, err }

    tp := trace.NewTracerProvider(
        trace.WithBatcher(exporter),
        trace.WithResource(resource.NewWithAttributes(
            semconv.SchemaURL,
            semconv.ServiceNameKey.String("user-service"),
            semconv.ServiceVersionKey.String("1.0.0"),
        )),
        trace.WithSampler(trace.TraceIDRatioBased(0.1)), // sample 10%
    )
    otel.SetTracerProvider(tp)
    return func() { tp.Shutdown(context.Background()) }, nil
}

// Instrument a function
var tracer = otel.Tracer("user-service")

func (s *userServiceServer) GetUser(
    ctx context.Context, req *pb.GetUserRequest,
) (*pb.User, error) {
    ctx, span := tracer.Start(ctx, "UserService.GetUser")
    defer span.End()

    span.SetAttributes(
        attribute.Int64("user.id", req.Id),
    )

    user, err := s.repo.FindByID(ctx, int(req.Id))
    if err != nil {
        span.RecordError(err)
        span.SetStatus(codes.Error, err.Error())
        return nil, status.Errorf(codes.Internal, "internal error")
    }
    return toProto(user), nil
}

// Propagate trace context in gRPC metadata
// Use otelgrpc interceptors to do this automatically:
grpc.NewServer(
    grpc.StatsHandler(otelgrpc.NewServerHandler()),
)

Correlation IDs: every request entering the system gets a trace ID propagated through gRPC metadata, HTTP headers, and message queue headers. This enables you to see the full call tree of a single request across 10 services in a tool like Jaeger or Tempo.

What are the three pillars of observability in a distributed system?
What does span.RecordError(err) do in OpenTelemetry?
6. How do you implement event-driven communication between Go microservices using message queues?

Synchronous RPC (REST/gRPC) creates tight coupling — if service B is down, service A fails. Message queues (Kafka, NATS, RabbitMQ) decouple producers from consumers: A publishes an event and continues; B processes it when ready. This improves resilience and enables fan-out.

// NATS JetStream producer
import "github.com/nats-io/nats.go"

type EventPublisher struct {
    js nats.JetStreamContext
}

type UserCreatedEvent struct {
    UserID    int       `json:"user_id"`
    Email     string    `json:"email"`
    CreatedAt time.Time `json:"created_at"`
}

func (p *EventPublisher) PublishUserCreated(ctx context.Context, e UserCreatedEvent) error {
    data, err := json.Marshal(e)
    if err != nil {
        return fmt.Errorf("marshal event: %w", err)
    }
    msg := &nats.Msg{
        Subject: "users.created",
        Data:    data,
        Header:  make(nats.Header),
    }
    // Propagate trace context in headers
    otel.GetTextMapPropagator().Inject(ctx, propagation.HeaderCarrier(msg.Header))

    _, err = p.js.PublishMsg(msg)
    return err
}

// Consumer with idempotency
type EmailConsumer struct {
    email  EmailSender
    dedup  *DeduplicationCache // prevent double-sending
}

func (c *EmailConsumer) HandleUserCreated(msg *nats.Msg) {
    var event UserCreatedEvent
    if err := json.Unmarshal(msg.Data, &event); err != nil {
        msg.Nak() // negative ack: requeue
        return
    }

    // Idempotency check — process each event exactly once
    key := fmt.Sprintf("email:welcome:%d", event.UserID)
    if c.dedup.Has(key) {
        msg.Ack() // already processed — ack without re-sending
        return
    }

    if err := c.email.SendWelcome(event.Email); err != nil {
        msg.Nak()
        return
    }
    c.dedup.Set(key)
    msg.Ack()
}

Exactly-once delivery: message queues provide at-least-once delivery (messages may be re-delivered on failure). Consumers must be idempotent — processing the same message twice produces the same result. Use a deduplication cache (Redis SET NX with TTL) keyed on the event ID.

What is the key advantage of event-driven (message queue) communication over synchronous gRPC for microservices?
Why must event consumers be idempotent in an at-least-once message queue?
7. How do you manage database connections and sharding in a high-scale Go service?

At scale, a single database becomes a bottleneck. Go services address this through connection pool tuning, read replicas, and horizontal sharding. The database/sql pool must be sized carefully — too few connections cause queuing, too many overwhelm the DB.

// Connection pool configuration
func openDB(dsn string) (*sql.DB, error) {
    db, err := sql.Open("postgres", dsn)
    if err != nil { return nil, err }

    // Tune the pool
    db.SetMaxOpenConns(25)                  // max concurrent connections
    db.SetMaxIdleConns(25)                  // keep idle connections warm
    db.SetConnMaxLifetime(5 * time.Minute)  // close and reopen periodically
    db.SetConnMaxIdleTime(1 * time.Minute)  // close long-idle connections

    return db, db.PingContext(context.Background())
}

// Read replica routing
type DBPool struct {
    primary  *sql.DB
    replicas []*sql.DB
    rr       uint64 // round-robin counter
}

func (p *DBPool) ReadDB() *sql.DB {
    if len(p.replicas) == 0 { return p.primary }
    idx := atomic.AddUint64(&p.rr, 1) % uint64(len(p.replicas))
    return p.replicas[idx]
}

// Hash-based sharding (user ID → shard)
type ShardedDB struct {
    shards []*sql.DB
}

func (s *ShardedDB) shardFor(userID int64) *sql.DB {
    // Consistent hashing: hash(userID) mod N shards
    h := fnv32(userID) % uint32(len(s.shards))
    return s.shards[h]
}

func (s *ShardedDB) GetUser(ctx context.Context, id int64) (*User, error) {
    db := s.shardFor(id)
    row := db.QueryRowContext(ctx,
        "SELECT id, name FROM users WHERE id = $1", id)
    var u User
    return &u, row.Scan(&u.ID, &u.Name)
}

Pool sizing rule of thumb: set MaxOpenConns to the number of CPU cores on the DB server (for CPU-bound queries) or the connection limit minus connections used by other services. Postgres default connection limit is 100 — a service with 4 replicas should use at most 20 connections each.

What problem does SetMaxIdleConns solve in database/sql connection pooling?
In hash-based database sharding, what property must the shard assignment function have to avoid data migration when adding shards?

8. What caching strategies do you use in Go microservices and how do you prevent cache stampede?

Caching reduces database load and improves latency. Common strategies in Go: in-memory (sync.Map, ristretto), distributed (Redis), and multi-level (L1 in-memory + L2 Redis). Cache stampede (thundering herd) is a classic distributed systems problem where many requests simultaneously miss a cold cache.

// Singleflight: collapse concurrent identical requests into one
import "golang.org/x/sync/singleflight"

type UserCache struct {
    mu     sync.RWMutex
    local  map[int64]*cachedUser
    redis  *redis.Client
    repo   UserRepository
    sf     singleflight.Group
}

func (c *UserCache) Get(ctx context.Context, id int64) (*User, error) {
    // L1: in-memory cache (no network)
    c.mu.RLock()
    if cu, ok := c.local[id]; ok && time.Now().Before(cu.expires) {
        c.mu.RUnlock()
        return cu.user, nil
    }
    c.mu.RUnlock()

    // Singleflight: if 100 goroutines miss at the same time,
    // only ONE goes to Redis/DB — the other 99 wait for the result
    key := fmt.Sprintf("user:%d", id)
    result, err, _ := c.sf.Do(key, func() (any, error) {
        // L2: Redis cache
        data, err := c.redis.Get(ctx, key).Bytes()
        if err == nil {
            var u User
            json.Unmarshal(data, &u)
            c.storeLocal(id, &u)
            return &u, nil
        }

        // L3: database
        user, err := c.repo.FindByID(ctx, int(id))
        if err != nil { return nil, err }

        // Populate caches (jitter TTL to avoid simultaneous expiry)
        jitter := time.Duration(rand.Intn(30)) * time.Second
        ttl := 5*time.Minute + jitter
        data, _ = json.Marshal(user)
        c.redis.Set(ctx, key, data, ttl)
        c.storeLocal(id, user)
        return user, nil
    })
    if err != nil { return nil, err }
    return result.(*User), nil
}

Cache stampede prevention: TTL jitter prevents all keys from expiring simultaneously (avoiding a mass DB hit). Singleflight collapses concurrent requests for the same key. Probabilistic early rehydration (XFetch algorithm) proactively refreshes cache before expiry based on computation time vs remaining TTL.

How does singleflight.Group prevent cache stampede?
Why should cache TTLs include a random jitter value?
9. What are table-driven tests in Go and why are they the standard testing pattern?

Table-driven tests define all test cases as a slice of structs, then iterate over them with a single test loop. This is Go's idiomatic testing pattern — adopted throughout the standard library. It eliminates duplication, makes adding new cases trivial, and produces clear failure output identifying exactly which case failed.

// Function under test
func validateEmail(email string) error {
    if email == "" {
        return errors.New("email is required")
    }
    if !strings.Contains(email, "@") {
        return fmt.Errorf("email %q has no @ symbol", email)
    }
    return nil
}

// Table-driven test
func TestValidateEmail(t *testing.T) {
    tests := []struct {
        name    string
        email   string
        wantErr bool
        errMsg  string // optional: check error message substring
    }{
        {
            name:    "valid email",
            email:   "alice@example.com",
            wantErr: false,
        },
        {
            name:    "empty email",
            email:   "",
            wantErr: true,
            errMsg:  "required",
        },
        {
            name:    "missing @ symbol",
            email:   "notanemail",
            wantErr: true,
            errMsg:  "no @ symbol",
        },
        {
            name:    "@ symbol only",
            email:   "@",
            wantErr: false, // technically valid by our rule
        },
    }

    for _, tt := range tests {
        t.Run(tt.name, func(t *testing.T) {
            err := validateEmail(tt.email)

            if (err != nil) != tt.wantErr {
                t.Errorf("validateEmail(%q): got err = %v, wantErr = %v",
                    tt.email, err, tt.wantErr)
                return
            }
            if tt.wantErr && tt.errMsg != "" {
                if !strings.Contains(err.Error(), tt.errMsg) {
                    t.Errorf("error %q does not contain %q",
                        err.Error(), tt.errMsg)
                }
            }
        })
    }
}

// Run only a specific subtest:
// go test -run TestValidateEmail/empty_email ./...

t.Run benefits: each subtest gets its own scope in test output. Failed subtests show the test name, making debugging immediate. You can run a specific subtest with -run TestFoo/case_name. Parallel subtests are supported with t.Parallel() inside the subtest function.

What is the main benefit of using t.Run() for each table-driven test case?
How do you run a specific table-driven subtest named 'empty email' in TestValidateEmail?
10. How do you write Go benchmarks and what does -benchmem tell you?

Go's testing package has built-in benchmark support. Benchmarks identify performance regressions and allocation hotspots before they reach production. The -benchmem flag reveals hidden allocations that cause GC pressure.

// Benchmark function: func BenchmarkXxx(b *testing.B)
func BenchmarkJSONMarshal(b *testing.B) {
    user := User{ID: 1, Name: "Alice", Email: "alice@example.com", Age: 30}

    b.ResetTimer() // start timing AFTER setup (exclude allocation of user)
    for i := 0; i < b.N; i++ { // b.N is calibrated by the framework
        _, err := json.Marshal(user)
        if err != nil { b.Fatal(err) }
    }
}

// Memory allocation benchmark
func BenchmarkStringConcat(b *testing.B) {
    words := []string{"hello", "world", "foo", "bar", "baz"}

    b.ReportAllocs() // same as -benchmem for this specific benchmark
    b.ResetTimer()

    b.Run("plus operator", func(b *testing.B) {
        for i := 0; i < b.N; i++ {
            s := ""
            for _, w := range words { s += w } // alloc per iteration
            _ = s
        }
    })

    b.Run("strings.Builder", func(b *testing.B) {
        for i := 0; i < b.N; i++ {
            var sb strings.Builder
            sb.Grow(50) // pre-allocate — zero allocations inside loop
            for _, w := range words { sb.WriteString(w) }
            _ = sb.String()
        }
    })
}

// Run commands:
// go test -bench=. -benchmem ./...
// go test -bench=BenchmarkStringConcat -benchtime=5s -count=3

// Sample output:
// BenchmarkStringConcat/plus_operator-8   2345678   512 ns/op   256 B/op  5 allocs/op
// BenchmarkStringConcat/strings.Builder-8 9876543   123 ns/op    64 B/op  1 allocs/op
// Columns: name, iterations, ns per op, bytes per op, allocs per op

What -benchmem reports: 'B/op' is average bytes allocated per operation (heap). 'allocs/op' is the number of separate heap allocations per operation. Each allocation has overhead (~100ns) and increases GC pressure. Zero allocations in a hot path is the ideal target.

Comparing benchmarks: use benchstat (golang.org/x/perf/cmd/benchstat) to compare before/after with statistical significance — it reports percent change and p-values.

What does the 'allocs/op' column in benchmark output represent?
What does b.ResetTimer() do in a benchmark and why is it important?
11. How do you find and fix memory allocation hotspots in a Go service using profiling?

Memory allocation hotspots cause GC pressure, latency spikes, and higher CPU usage. The workflow: benchmark to detect allocations, profile to find the source, fix (pre-allocate, use sync.Pool, reduce interface boxing), benchmark again to verify improvement.

// Step 1: identify hotspots with -benchmem
// go test -bench=BenchmarkProcessRequests -benchmem -memprofile=mem.out
// go tool pprof mem.out
// > top10 -cum
// > list processRequest

// Step 2: fix common allocation patterns

// PATTERN 1: pre-allocate slices to known capacity
// Allocates N times as the slice grows:
func collectIDs(users []User) []int {
    var ids []int
    for _, u := range users { ids = append(ids, u.ID) }
    return ids
}
// Zero allocations:
func collectIDsFast(users []User) []int {
    ids := make([]int, 0, len(users)) // pre-allocate exact capacity
    for _, u := range users { ids = append(ids, u.ID) }
    return ids
}

// PATTERN 2: sync.Pool for frequently allocated/freed objects
var bufPool = sync.Pool{
    New: func() any { return &bytes.Buffer{} },
}

func encodeResponse(v any) ([]byte, error) {
    buf := bufPool.Get().(*bytes.Buffer)
    buf.Reset()
    defer bufPool.Put(buf)
    if err := json.NewEncoder(buf).Encode(v); err != nil {
        return nil, err
    }
    return buf.Bytes(), nil
}

// PATTERN 3: avoid interface boxing of small values
// This allocates (int escapes to heap when stored as interface):
func logValue(v interface{}) { fmt.Println(v) }
logValue(42) // 42 allocated on heap

// Use type-specific overloads or generics instead
func logInt(v int) { fmt.Println(v) } // no allocation

// PATTERN 4: strings.Builder instead of string concatenation
// Step 3: verify with benchmark comparison
// go test -bench=BenchmarkCollectIDs -benchmem -count=5 > after.txt
// benchstat before.txt after.txt
What is the most effective way to reduce allocations when building a slice of known final length?
When using sync.Pool, why must you call buf.Reset() after Get()?
12. How do you structure integration tests in Go that require real databases or external services?

Integration tests verify that your code works with real infrastructure. Go's testing tools make this clean: build tags separate unit from integration tests, TestMain handles setup/teardown, and testcontainers-go spins up real dependencies in Docker.

// integration_test.go
//go:build integration

package repository_test

import (
    "context"
    "testing"
    "github.com/testcontainers/testcontainers-go"
    "github.com/testcontainers/testcontainers-go/modules/postgres"
)

var testDB *sql.DB

// TestMain: shared setup/teardown for the whole package
func TestMain(m *testing.M) {
    ctx := context.Background()

    // Start a real Postgres container
    pgContainer, err := postgres.RunContainer(ctx,
        testcontainers.WithImage("postgres:15"),
        postgres.WithDatabase("testdb"),
        postgres.WithUsername("test"),
        postgres.WithPassword("test"),
    )
    if err != nil { log.Fatalf("container start: %v", err) }
    defer pgContainer.Terminate(ctx)

    dsn, _ := pgContainer.ConnectionString(ctx, "sslmode=disable")
    testDB, err = sql.Open("postgres", dsn)
    if err != nil { log.Fatalf("open db: %v", err) }

    // Run migrations
    if err := runMigrations(testDB); err != nil {
        log.Fatalf("migrate: %v", err)
    }

    os.Exit(m.Run()) // run all tests in the package
}

// Individual integration test using shared testDB
func TestUserRepository_Save(t *testing.T) {
    repo := postgres.NewUserRepository(testDB)
    ctx := context.Background()

    t.Cleanup(func() {
        testDB.ExecContext(ctx, "DELETE FROM users WHERE email = $1",
            "test@example.com")
    })

    user := &User{Name: "Test", Email: "test@example.com"}
    if err := repo.Save(ctx, user); err != nil {
        t.Fatalf("Save: %v", err)
    }
    if user.ID == 0 { t.Error("expected ID to be set after save") }

    got, err := repo.FindByID(ctx, user.ID)
    if err != nil { t.Fatalf("FindByID: %v", err) }
    if got.Email != user.Email {
        t.Errorf("got email %q, want %q", got.Email, user.Email)
    }
}

// Run integration tests:
// go test -tags integration ./...
What is the purpose of TestMain(m *testing.M) in Go integration tests?
What does 't.Cleanup(fn)' do in a Go test?
13. Explain the difference between mocks, stubs, and fakes in Go testing. When do you use each?

These three terms are often used interchangeably, but they describe different test double patterns with different purposes. Go's implicit interfaces make all three easy to implement without a framework.

Test Double Types
TypePurposeReturnsVerifies calls?
StubReturns pre-programmed responses to specific callsFixed valuesNo
FakeWorking implementation with simplified logic (e.g., in-memory DB)Realistic valuesNo
MockRecords calls and verifies expected interactionsPre-programmed OR realYes — asserts call count, order, args
// Interface to test against
type UserRepository interface {
    FindByID(ctx context.Context, id int) (*User, error)
    Save(ctx context.Context, user *User) error
}

// STUB: returns fixed values, no logic
type stubUserRepo struct {
    user *User
    err  error
}
func (s *stubUserRepo) FindByID(_ context.Context, _ int) (*User, error) {
    return s.user, s.err
}
func (s *stubUserRepo) Save(_ context.Context, _ *User) error { return nil }

// FAKE: in-memory map — works like a real repo but without a DB
type fakeUserRepo struct {
    mu    sync.Mutex
    users map[int]*User
    nextID int
}
func (f *fakeUserRepo) FindByID(_ context.Context, id int) (*User, error) {
    f.mu.Lock(); defer f.mu.Unlock()
    u, ok := f.users[id]
    if !ok { return nil, ErrNotFound }
    return u, nil
}
func (f *fakeUserRepo) Save(_ context.Context, u *User) error {
    f.mu.Lock(); defer f.mu.Unlock()
    f.nextID++; u.ID = f.nextID
    f.users[u.ID] = u
    return nil
}

// MOCK: records calls for assertion
type mockUserRepo struct {
    FindByIDCalls []int
    SaveCalls     []*User
    stub          stubUserRepo
}
func (m *mockUserRepo) FindByID(ctx context.Context, id int) (*User, error) {
    m.FindByIDCalls = append(m.FindByIDCalls, id)
    return m.stub.FindByID(ctx, id)
}
func (m *mockUserRepo) Save(ctx context.Context, u *User) error {
    m.SaveCalls = append(m.SaveCalls, u)
    return m.stub.Save(ctx, u)
}

// Test using mock
func TestService_Register(t *testing.T) {
    mock := &mockUserRepo{stub: stubUserRepo{}}
    svc := NewUserService(mock)
    svc.Register(context.Background(), "alice@example.com")
    if len(mock.SaveCalls) != 1 {
        t.Errorf("expected 1 Save call, got %d", len(mock.SaveCalls))
    }
}
When would you prefer a fake over a mock in Go tests?
What is the key difference between a stub and a mock in testing terminology?
14. How does Go's built-in fuzzing work and when should you use property-based testing?

Go 1.18 added native fuzz testing via go test -fuzz. Fuzzing automatically generates inputs that exercise edge cases your hand-written tests miss — particularly effective for parsers, serialisers, and cryptographic code.

// Fuzz test — finds inputs that cause a panic or incorrect result
func FuzzParseURL(f *testing.F) {
    // Seed corpus: known interesting inputs
    f.Add("https://example.com/path?q=1")
    f.Add("http://user:pass@host:8080/")
    f.Add("")

    f.Fuzz(func(t *testing.T, s string) {
        // Property: parsing should never panic
        u, err := url.Parse(s)
        if err != nil { return } // error is ok, panic is not

        // Property: round-trip should be stable
        // parsing the string representation should give the same URL
        u2, err := url.Parse(u.String())
        if err != nil {
            t.Errorf("round-trip parse failed: %v", err)
        }
        if u.String() != u2.String() {
            t.Errorf("round-trip changed URL: %q → %q",
                u.String(), u2.String())
        }
    })
}

// Another fuzz example: JSON encode/decode round-trip
func FuzzJSONRoundTrip(f *testing.F) {
    f.Add(`{"name":"Alice","age":30}`)

    f.Fuzz(func(t *testing.T, data []byte) {
        var v map[string]any
        if err := json.Unmarshal(data, &v); err != nil { return }

        encoded, err := json.Marshal(v)
        if err != nil {
            t.Errorf("marshal failed after successful unmarshal: %v", err)
        }
        var v2 map[string]any
        if err := json.Unmarshal(encoded, &v2); err != nil {
            t.Errorf("second unmarshal failed: %v", err)
        }
    })
}

// Run fuzzing:
// go test -fuzz=FuzzParseURL                    # fuzz until stopped
// go test -fuzz=FuzzParseURL -fuzztime=60s      # fuzz for 60 seconds
// go test                                        # replays corpus only (CI)

When to use fuzzing: parsers (JSON, YAML, protobuf), network protocol handlers, cryptographic code, regular expression engines, any function that accepts arbitrary byte/string input. Fuzzing found critical bugs in Go's own standard library.

What happens when 'go test -fuzz' discovers a failing input?
What is a 'property' in property-based testing and fuzzing?
15. How do you test concurrent Go code correctly — including data races and timing issues?

Concurrent code is notoriously difficult to test because bugs may only appear under specific goroutine interleavings. Go provides three essential tools: the race detector (-race), goroutine leak detection, and deterministic design.

import (
    "testing"
    "sync"
    "go.uber.org/goleak"
)

// Always run concurrent tests with -race
// go test -race ./...

// Test goroutine leak detection
func TestNoGoroutineLeak(t *testing.T) {
    defer goleak.VerifyNone(t) // fails if goroutines remain after test

    ctx, cancel := context.WithCancel(context.Background())
    worker := NewBackgroundWorker(ctx)
    worker.Start()

    // do some work...

    cancel() // signal worker to stop
    worker.Wait()
    // goleak checks that the worker goroutine actually exited
}

// Test shared state with concurrent access
func TestCounter_ConcurrentIncrement(t *testing.T) {
    c := NewAtomicCounter()
    const goroutines = 100
    const increments = 1000

    var wg sync.WaitGroup
    wg.Add(goroutines)
    for i := 0; i < goroutines; i++ {
        go func() {
            defer wg.Done()
            for j := 0; j < increments; j++ {
                c.Increment()
            }
        }()
    }
    wg.Wait()

    expected := goroutines * increments
    if got := c.Value(); got != expected {
        t.Errorf("got %d, want %d", got, expected)
    }
}

// Test channel-based pipelines with timeout
func TestPipeline_ProcessesAllItems(t *testing.T) {
    ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
    defer cancel()

    in := generateJobs([]string{"a", "b", "c"})
    out := processJobs(ctx, in, 3)

    results := collectResults(ctx, out)
    if len(results) != 3 {
        t.Errorf("expected 3 results, got %d", len(results))
    }
}

Deterministic test design: avoid time.Sleep in tests to wait for goroutines — use WaitGroup, channels, or context. Sleep-based synchronisation makes tests flaky on slow CI machines. Design concurrent components so their completion is signalled through channels or sync primitives.

What does running 'go test -race' detect?
Why should you avoid time.Sleep() in tests to wait for goroutines to complete?
16. How do you decide where to draw service boundaries when decomposing a Go monolith into microservices?

Service decomposition is one of the hardest architectural decisions. Decomposing too aggressively creates a 'distributed monolith' — all the complexity of microservices with none of the benefits. Decomposing too conservatively keeps the monolith's disadvantages.

Decomposition Principles
PrincipleDescriptionGo Implication
Domain-Driven DesignSplit by bounded context — User, Order, Inventory are separate domainsEach service owns its schema; no cross-schema JOINs
Single ResponsibilityEach service does one thing well; change one thing without touching othersOne binary per domain; small team ownership
Data OwnershipService owns its data; others access via API — no shared DB tablesSeparate databases or schemas; eventual consistency via events
DeployabilityCan each service be deployed independently?Separate CI/CD pipelines; semantic versioning of gRPC APIs
Failure IsolationDoes one service's failure propagate?Circuit breakers; timeout on every cross-service call
Strangler FigMigrate gradually — route traffic by feature flagProxy in Go; run old+new in parallel
// Anti-pattern: too-fine decomposition (nanoservices)
// UserNameService, UserEmailService, UserAgeService
// → Every user operation requires 3 network calls
// → Synchronous coupling worse than a monolith

// Better: domain-aligned decomposition
// UserService:   manage user profiles and authentication
// OrderService:  manage order lifecycle and payments
// NotifyService: send emails/SMS/push — consumes events from others

// Strangler Fig pattern in Go
func routeToNewService(cfg *Config) http.Handler {
    legacyHandler := newLegacyHandler()
    newHandler    := newModernHandler()

    return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
        // Gradually shift traffic using feature flags
        if cfg.Features.IsEnabled("new-user-service") &&
            strings.HasPrefix(r.URL.Path, "/users/") {
            newHandler.ServeHTTP(w, r)
            return
        }
        legacyHandler.ServeHTTP(w, r)
    })
}
What is the 'distributed monolith' anti-pattern in microservices?
According to the Data Ownership principle, how should one service access another service's data?
17. How do you version gRPC APIs in Go without breaking existing clients?

Breaking changes in gRPC are harder to recover from than REST — generated client code must be recompiled. The Protobuf wire format and Go's embedded Unimplemented* pattern provide the tools to evolve APIs safely.

// Protobuf field number rules (never change):
// - Field numbers 1-15: used for frequently-sent fields (1-byte encoded)
// - Never reuse a field number — remove fields by reserving them
// - Never rename fields if using JSON encoding

// Safe changes (backward compatible):
// 1. Add new optional fields with new field numbers
// 2. Add new RPC methods to the service
// 3. Add values to enums (with care)

// Breaking changes (require new major version):
// 1. Remove or rename fields
// 2. Change field types
// 3. Remove RPC methods

// In user.proto — safe evolution:
message User {
    int64  id    = 1;
    string name  = 2;
    string email = 3;
    // Added in v1.1 — old clients ignore this field
    string avatar_url = 4;
    // Removed field: reserved to prevent accidental reuse
    reserved 5;
    reserved "phone_number";
}

// Major version bump when breaking changes required
// package user.v2;
// → separate proto package, separate Go package
// → clients opt-in by importing v2

// Server supporting both v1 and v2 simultaneously
func main() {
    grpcServer := grpc.NewServer()
    v1pb.RegisterUserServiceServer(grpcServer, &v1Handler{})
    v2pb.RegisterUserServiceServer(grpcServer, &v2Handler{})
    // gRPC uses fully qualified service name to route:
    // user.v1.UserService vs user.v2.UserService
}

Field reservation: when you remove a field, add its number and name to reserved. This prevents future schema changes from accidentally reusing the old field number and silently corrupting old clients that still send the deprecated field.

What must you do when removing a field from a Protobuf message to prevent future corruption?
Which Protobuf change is backward compatible and will not break existing clients?
18. How do you write unit and integration tests for gRPC services in Go?

gRPC services are tested at multiple levels: unit tests using the generated client/server with an in-process buffer connection, and integration tests using a real server. The bufconn package provides a lightweight in-memory network for fast unit tests.

import (
    "google.golang.org/grpc"
    "google.golang.org/grpc/test/bufconn"
)

const bufSize = 1024 * 1024

// Setup: start server in-process with bufconn
func setupTestServer(t *testing.T, repo UserRepository) pb.UserServiceClient {
    t.Helper()

    lis := bufconn.Listen(bufSize)
    grpcServer := grpc.NewServer()
    pb.RegisterUserServiceServer(grpcServer,
        &userServiceServer{repo: repo})

    go func() {
        if err := grpcServer.Serve(lis); err != nil {
            t.Logf("server error: %v", err)
        }
    }()
    t.Cleanup(func() { grpcServer.GracefulStop() })

    // Dial using the in-memory buffer
    conn, err := grpc.NewClient(
        "passthrough://bufnet",
        grpc.WithContextDialer(func(ctx context.Context, _ string) (net.Conn, error) {
            return lis.DialContext(ctx)
        }),
        grpc.WithTransportCredentials(insecure.NewCredentials()),
    )
    if err != nil { t.Fatalf("dial: %v", err) }
    t.Cleanup(func() { conn.Close() })

    return pb.NewUserServiceClient(conn)
}

// Table-driven gRPC test
func TestGetUser(t *testing.T) {
    fakeRepo := &fakeUserRepo{
        users: map[int]*User{1: {ID: 1, Name: "Alice"}},
    }
    client := setupTestServer(t, fakeRepo)

    tests := []struct {
        name     string
        id       int64
        wantName string
        wantCode codes.Code
    }{
        {"existing user", 1, "Alice", codes.OK},
        {"missing user", 99, "", codes.NotFound},
        {"invalid id",   0, "", codes.InvalidArgument},
    }

    for _, tt := range tests {
        t.Run(tt.name, func(t *testing.T) {
            resp, err := client.GetUser(context.Background(),
                &pb.GetUserRequest{Id: tt.id})

            if code := status.Code(err); code != tt.wantCode {
                t.Errorf("got code %v, want %v", code, tt.wantCode)
            }
            if tt.wantName != "" && resp.Name != tt.wantName {
                t.Errorf("got name %q, want %q", resp.Name, tt.wantName)
            }
        })
    }
}
What does bufconn provide for gRPC testing?
How do you extract the gRPC status code from an error returned by a gRPC client call?
19. How do you load test a Go microservice and interpret the results?

Load testing validates that a service meets performance requirements under expected and peak traffic. Go services are typically tested with k6, vegeta, or the Go-native go-wrk. The key metrics: throughput (RPS), latency percentiles (p50, p95, p99), and error rate.

// Vegeta: Go-native load testing library
import vegeta "github.com/tsenart/vegeta/v12/lib"

func LoadTestGetUser(t *testing.T) {
    if testing.Short() { t.Skip("skipping load test in short mode") }

    rate    := vegeta.Rate{Freq: 100, Per: time.Second} // 100 RPS
    duration := 30 * time.Second
    targeter := vegeta.NewStaticTargeter(vegeta.Target{
        Method: "GET",
        URL:    "http://localhost:8080/users/1",
    })

    attacker := vegeta.NewAttacker()
    var metrics vegeta.Metrics
    for res := range attacker.Attack(targeter, rate, duration, "load test") {
        metrics.Add(res)
    }
    metrics.Close()

    t.Logf("Requests:  %d", metrics.Requests)
    t.Logf("Success:   %.2f%%", metrics.Success*100)
    t.Logf("Throughput: %.2f rps", metrics.Throughput)
    t.Logf("Latency p50: %v", metrics.Latencies.P50)
    t.Logf("Latency p95: %v", metrics.Latencies.P95)
    t.Logf("Latency p99: %v", metrics.Latencies.P99)

    // Assertions
    if metrics.Success < 0.999 {
        t.Errorf("success rate %.2f%% below 99.9%%", metrics.Success*100)
    }
    if metrics.Latencies.P99 > 50*time.Millisecond {
        t.Errorf("p99 latency %v exceeds 50ms SLO", metrics.Latencies.P99)
    }
}

// Benchmark as a proxy for load test (lower overhead)
func BenchmarkHandlerThroughput(b *testing.B) {
    srv := httptest.NewServer(buildRouter())
    defer srv.Close()
    client := srv.Client()

    b.SetParallelism(10) // 10 goroutines × GOMAXPROCS concurrent
    b.ResetTimer()
    b.RunParallel(func(pb *testing.PB) {
        for pb.Next() {
            resp, _ := client.Get(srv.URL + "/users/1")
            io.Discard.Write(resp.Body)
            resp.Body.Close()
        }
    })
    b.ReportMetric(float64(b.N)/b.Elapsed().Seconds(), "rps")
}
Why is the p99 latency more important than the average latency for a production SLO?
What does b.RunParallel() do in a Go benchmark?
20. How does service discovery and client-side load balancing work in a Go microservice system?

When service B needs to call service A, it must discover A's current addresses (since pods restart and scale). Go gRPC has built-in pluggable load balancing and name resolution for integrating with Consul, etcd, or Kubernetes DNS.

// Option 1: Kubernetes DNS + round-robin (simplest)
// k8s headless service: userservice.default.svc.cluster.local
// → resolves to ALL pod IPs, not just one VIP
conn, err := grpc.NewClient(
    "dns:///userservice.default.svc.cluster.local:9090",
    grpc.WithDefaultServiceConfig(`{"loadBalancingPolicy":"round_robin"}`),
    grpc.WithTransportCredentials(insecure.NewCredentials()),
)

// Option 2: consul service discovery
import resolverv2 "github.com/mbobrovskyi/grpc-consul-resolver"

resolverv2.RegisterDefault(
    resolverv2.NewResolver(
        "consul://localhost:8500/user-service",
        resolverv2.WithHealthCheck(true),
    ),
)

// Option 3: manual custom resolver for testing / dev
type staticResolver struct{}

func (staticResolver) Build(target resolver.Target,
    cc resolver.ClientConn, opts resolver.BuildOptions) (resolver.Resolver, error) {
    addrs := []resolver.Address{
        {Addr: "localhost:9090"},
        {Addr: "localhost:9091"},
    }
    cc.UpdateState(resolver.State{Addresses: addrs})
    return &staticRes{cc: cc, addrs: addrs}, nil
}

// Load balancing policies in gRPC:
// round_robin:  cycle through all addresses
// pick_first:   always use first healthy address (default)
// grpclb:       server-side load balancing (deprecated)
// rls:          routing lookup service

Service mesh alternative: tools like Istio, Linkerd, and Cilium implement load balancing, circuit breaking, retries, and mTLS in a sidecar proxy — removing these concerns from application code entirely. The Go service makes plain gRPC calls; the mesh handles distribution transparently.

What is the advantage of a Kubernetes headless service for gRPC load balancing?
What problem does using a regular Kubernetes ClusterIP service create for gRPC load balancing?
21. How do you design a consistent error model across multiple Go microservices?

In a system with 10+ services, inconsistent error formats force every client to implement different error parsing. A shared error contract — carried in gRPC status details or HTTP Problem Details — enables uniform client-side handling.

// Shared proto for rich error details (google.rpc.Status)
// Add error_details.proto to your project

import (
    spb "google.golang.org/genproto/googleapis/rpc/status"
    "google.golang.org/grpc/status"
    errdetails "google.golang.org/genproto/googleapis/rpc/errdetails"
)

// Return rich error from gRPC handler
func (s *userServiceServer) CreateUser(
    ctx context.Context, req *pb.CreateUserRequest,
) (*pb.User, error) {
    if req.Email == "" {
        // Rich validation error with field-level detail
        st := status.New(codes.InvalidArgument, "validation failed")
        detail := &errdetails.BadRequest{
            FieldViolations: []*errdetails.BadRequest_FieldViolation{
                {Field: "email", Description: "email is required"},
            },
        }
        st, _ = st.WithDetails(detail)
        return nil, st.Err()
    }

    user, err := s.repo.Save(ctx, &User{Email: req.Email})
    if err != nil {
        if errors.Is(err, ErrDuplicate) {
            st := status.New(codes.AlreadyExists, "email already registered")
            info := &errdetails.ErrorInfo{
                Reason: "EMAIL_ALREADY_EXISTS",
                Domain: "user.service",
            }
            st, _ = st.WithDetails(info)
            return nil, st.Err()
        }
        return nil, status.Errorf(codes.Internal, "internal error")
    }
    return toProto(user), nil
}

// Client: extract rich error details
_, err := client.CreateUser(ctx, req)
if err != nil {
    st := status.Convert(err)
    for _, detail := range st.Details() {
        switch d := detail.(type) {
        case *errdetails.BadRequest:
            for _, v := range d.FieldViolations {
                log.Printf("field %s: %s", v.Field, v.Description)
            }
        }
    }
}
What does status.WithDetails() enable in gRPC error responses?
Why should you never expose internal error details (stack traces, DB queries) in gRPC error messages?
22. How do you implement the Saga pattern for distributed transactions in Go?

Distributed transactions that span multiple services cannot use traditional 2-phase commit without creating tight coupling and availability issues. The Saga pattern decomposes a transaction into a sequence of local transactions, each publishing an event. Failures trigger compensating transactions.

// Choreography-based saga: services react to events

// OrderService publishes OrderCreated
type OrderCreatedEvent struct {
    OrderID   string  `json:"order_id"`
    UserID    string  `json:"user_id"`
    Amount    float64 `json:"amount"`
    ProductID string  `json:"product_id"`
}

// PaymentService consumes OrderCreated, publishes PaymentProcessed or PaymentFailed
func (s *PaymentService) HandleOrderCreated(ctx context.Context,
    event OrderCreatedEvent) error {

    charged, err := s.stripe.Charge(ctx, event.UserID, event.Amount)
    if err != nil {
        // Publish compensating event for OrderService to cancel the order
        return s.publisher.Publish(ctx, PaymentFailedEvent{
            OrderID: event.OrderID,
            Reason:  err.Error(),
        })
    }
    return s.publisher.Publish(ctx, PaymentProcessedEvent{
        OrderID:   event.OrderID,
        ChargeID:  charged.ID,
    })
}

// InventoryService consumes PaymentProcessed, publishes StockReserved or StockUnavailable
// OrderService listens to StockReserved → fulfillment
// OrderService listens to StockUnavailable → refund (another compensating event)

// Outbox pattern: ensure event is published atomically with DB write
func (s *OrderService) CreateOrder(ctx context.Context, req OrderRequest) error {
    tx, err := s.db.BeginTx(ctx, nil)
    if err != nil { return err }
    defer tx.Rollback()

    // Write order to DB
    orderID := uuid.New().String()
    tx.ExecContext(ctx, "INSERT INTO orders ...", orderID, req.UserID)

    // Write event to outbox table in SAME transaction
    eventData, _ := json.Marshal(OrderCreatedEvent{OrderID: orderID})
    tx.ExecContext(ctx,
        "INSERT INTO outbox (event_type, payload) VALUES ($1, $2)",
        "order.created", eventData)

    return tx.Commit()
    // Separate process reads outbox and publishes to message queue
}

Outbox pattern solves the dual-write problem: writing to the DB and publishing to a message queue are two separate operations — either can fail. Writing both to the same DB transaction (with the event in an outbox table) makes them atomic; a relay process then publishes to the queue and deletes from outbox.

What is a compensating transaction in the Saga pattern?
What problem does the Outbox pattern solve in event-driven architecture?
23. What testing.T methods do experienced Go engineers use to write cleaner tests?

Beyond t.Error and t.Fatal, Go's testing package offers several methods that eliminate boilerplate and make test intent clearer. Knowing these marks a candidate as familiar with Go testing idioms.

// t.Helper() — marks current function as a test helper
// Error messages show the caller's line, not the helper's line
func requireNoError(t *testing.T, err error, msg string) {
    t.Helper() // CRITICAL: without this, failure shows helper's line
    if err != nil {
        t.Fatalf("%s: %v", msg, err)
    }
}

// t.Cleanup() — deferred cleanup, runs even if test panics
func TestDBOperation(t *testing.T) {
    db := openTestDB(t)
    t.Cleanup(func() { db.Close() }) // cleaner than defer for table tests

    user := createTestUser(t, db)
    t.Cleanup(func() {
        db.Exec("DELETE FROM users WHERE id = $1", user.ID)
    })
    // test code...
}

// t.Setenv() — sets env var for the test, auto-restores on cleanup
func TestLoadConfig(t *testing.T) {
    t.Setenv("DATABASE_URL", "postgres://test:test@localhost/testdb")
    t.Setenv("JWT_SECRET", "test-secret-at-least-32-chars-long")
    cfg, err := loadConfig()
    requireNoError(t, err, "loadConfig")
    if cfg.Database.URL == "" { t.Error("DATABASE_URL not loaded") }
}

// t.TempDir() — creates a temp directory that's auto-removed
func TestWriteFile(t *testing.T) {
    dir := t.TempDir() // automatically cleaned up after test
    path := filepath.Join(dir, "output.json")
    err := writeJSON(path, map[string]string{"ok": "true"})
    requireNoError(t, err, "writeJSON")
}

// t.Parallel() — run subtests concurrently for faster test suites
func TestConcurrentOperations(t *testing.T) {
    for _, tc := range testCases {
        tc := tc // capture pre-Go 1.22
        t.Run(tc.name, func(t *testing.T) {
            t.Parallel() // this subtest runs concurrently with others
            // ... test body
        })
    }
}
Why is t.Helper() important when writing test helper functions?
What does t.Setenv() do that manually calling os.Setenv does not?
24. How do you benchmark concurrent code with testing.B and what insights does it provide?

Serial benchmarks (for i := 0; i < b.N; i++) measure single-goroutine throughput. Parallel benchmarks reveal lock contention, cache coherence issues, and true concurrent throughput — critical for shared data structures and handlers.

// Serial benchmark: single goroutine throughput
func BenchmarkMapGet(b *testing.B) {
    m := map[string]int{"key": 1}
    b.ResetTimer()
    for i := 0; i < b.N; i++ {
        _ = m["key"]
    }
}

// Parallel benchmark: concurrent throughput + contention
func BenchmarkSyncMapGet(b *testing.B) {
    var m sync.Map
    m.Store("key", 1)

    b.ResetTimer()
    b.RunParallel(func(pb *testing.PB) {
        for pb.Next() { // pb.Next() is goroutine-safe, replaces i < b.N
            m.Load("key")
        }
    })
}

// Compare mutex-protected map vs sync.Map under contention
type MutexMap struct {
    mu sync.RWMutex
    m  map[string]int
}

func BenchmarkMutexMapVsSyncMap(b *testing.B) {
    b.Run("mutex-map", func(b *testing.B) {
        mm := &MutexMap{m: map[string]int{"k": 1}}
        b.RunParallel(func(pb *testing.PB) {
            for pb.Next() {
                mm.mu.RLock()
                _ = mm.m["k"]
                mm.mu.RUnlock()
            }
        })
    })

    b.Run("sync-map", func(b *testing.B) {
        var sm sync.Map
        sm.Store("k", 1)
        b.RunParallel(func(pb *testing.PB) {
            for pb.Next() { sm.Load("k") }
        })
    })
}

// Run with multiple parallelism levels:
// go test -bench=BenchmarkMutexMapVsSyncMap -cpu=1,4,8,16
// -cpu controls GOMAXPROCS; shows how performance scales with CPUs

// Custom metric reporting
b.SetParallelism(10) // GOMAXPROCS * 10 goroutines
b.ReportMetric(float64(b.N)/b.Elapsed().Seconds(), "rps")
b.ReportMetric(float64(contention)/float64(b.N), "contentions/op")
What does the '-cpu=1,4,8,16' flag do when running Go benchmarks?
In b.RunParallel, what does pb.Next() do that 'i < b.N' does not?
25. How do you manage dependency injection at scale in a large Go service — wire, dig, or manual?

As a Go service grows beyond a few dependencies, main() becomes a complex wiring function. Three approaches: manual wiring (always readable), Google Wire (code generation), or Uber Dig (reflection-based runtime injection).

// APPROACH 1: Manual wiring in main() — clear, no magic, preferred for < 20 deps
func main() {
    cfg, err := config.Load()
    if err != nil { log.Fatal(err) }

    db, err := database.Open(cfg.Database)
    if err != nil { log.Fatal(err) }

    userRepo  := postgres.NewUserRepository(db)
    emailSvc  := smtp.NewEmailService(cfg.SMTP)
    cacheSvc  := redis.NewCache(cfg.Redis)
    userSvc   := service.NewUserService(userRepo, emailSvc, cacheSvc)
    grpcSrv   := grpc.NewUserServer(userSvc)
    httpSrv   := http.NewServer(cfg.HTTP, userSvc)

    // Start servers...
}

// APPROACH 2: Wire (google/wire) — compile-time code generation
// wire.go (build tag: //go:build wireinject)
//go:build wireinject
func InitializeApp(cfgPath string) (*App, func(), error) {
    wire.Build(
        config.Provider,      // func LoadConfig() (*Config, error)
        database.Provider,    // func Open(*Config) (*sql.DB, error)
        postgres.NewUserRepository,
        smtp.NewEmailService,
        service.NewUserService,
        NewApp,
    )
    return nil, nil, nil // wire generates the real implementation
}
// Run: wire gen ./...  → generates wire_gen.go

// APPROACH 3: Dig (uber-go/dig) — runtime reflection-based DI
container := dig.New()
container.Provide(config.Load)
container.Provide(database.Open)
container.Provide(postgres.NewUserRepository)
container.Provide(service.NewUserService)
container.Invoke(func(svc *service.UserService) {
    // svc is fully constructed with all deps injected
    startServer(svc)
})

Trade-offs: Manual wiring is the most debuggable (stack traces show actual constructor calls). Wire generates readable code at compile time (fails fast, zero runtime overhead). Dig is most concise but uses reflection (runtime errors, harder to trace). The Go community generally prefers manual or Wire over Dig for production services.

What is the main advantage of Wire (google/wire) over runtime DI frameworks like Dig?
When does manual dependency injection in main() become problematic?
26. How do you achieve zero-downtime deployments for a Go microservice in Kubernetes?

Zero-downtime deployment means in-flight requests complete before old pods terminate, and new pods are ready before traffic is routed to them. This requires coordination between the Go service and Kubernetes lifecycle hooks.

// Key components:

// 1. Graceful shutdown in the Go service
func main() {
    srv := &http.Server{Addr: ":8080", Handler: router}

    go func() { srv.ListenAndServe() }()

    quit := make(chan os.Signal, 1)
    signal.Notify(quit, syscall.SIGTERM) // k8s sends SIGTERM
    <-quit

    // Give in-flight requests time to complete
    ctx, cancel := context.WithTimeout(context.Background(), 25*time.Second)
    defer cancel()
    srv.Shutdown(ctx)   // stop accepting; drain active connections
    db.Close()          // close DB connections cleanly
    log.Println("shutdown complete")
}

// 2. Kubernetes deployment configuration
// deployment.yaml (relevant sections):
// spec.template.spec.containers:
//   lifecycle:
//     preStop:
//       exec:
//         command: ["sleep", "5"]
//   # SIGTERM is sent AFTER preStop
//   terminationGracePeriodSeconds: 30  # must be > service shutdown timeout

// 3. Rolling update strategy
// strategy:
//   type: RollingUpdate
//   rollingUpdate:
//     maxSurge: 1        # start 1 new pod before killing old
//     maxUnavailable: 0  # never kill old until new is Ready

// 4. Readiness probe — k8s only sends traffic when this returns 200
// readinessProbe:
//   httpGet:
//     path: /readyz
//     port: 8080
//   initialDelaySeconds: 5   # wait for startup
//   periodSeconds: 5
//   failureThreshold: 3

preStop hook timing: when Kubernetes terminates a pod, it sends SIGTERM and simultaneously removes the pod from the Service endpoints. There is a propagation delay (~5s) before kube-proxy and ingress controllers stop routing. The preStop sleep 5 delays SIGTERM so the graceful shutdown starts after traffic has stopped arriving.

Why is a preStop sleep hook needed in Kubernetes for zero-downtime deployments?
What does 'maxUnavailable: 0' in a Kubernetes RollingUpdate strategy ensure?
27. How do generics in Go 1.18+ enable better system design and what are the trade-offs?

Go generics allow writing type-safe, reusable data structures and algorithms without code duplication or losing type information through interfaces. The key use cases: generic data structures, result/option types, and typed collections.

// Generic Result type — eliminates panic-or-nil patterns
type Result[T any] struct {
    value T
    err   error
}

func Ok[T any](v T) Result[T]    { return Result[T]{value: v} }
func Err[T any](e error) Result[T] { return Result[T]{err: e} }

func (r Result[T]) Unwrap() (T, error) { return r.value, r.err }
func (r Result[T]) IsOk() bool         { return r.err == nil }

// Generic ordered set
type Set[T comparable] struct {
    items map[T]struct{}
}

func NewSet[T comparable]() *Set[T] {
    return &Set[T]{items: make(map[T]struct{})}
}
func (s *Set[T]) Add(v T) { s.items[v] = struct{}{} }
func (s *Set[T]) Has(v T) bool { _, ok := s.items[v]; return ok }
func (s *Set[T]) Len() int { return len(s.items) }

// Generic Map/Filter/Reduce — functional pipeline without reflect
func Map[T, U any](slice []T, fn func(T) U) []U {
    result := make([]U, len(slice))
    for i, v := range slice { result[i] = fn(v) }
    return result
}

func Filter[T any](slice []T, pred func(T) bool) []T {
    var result []T
    for _, v := range slice {
        if pred(v) { result = append(result, v) }
    }
    return result
}

// Usage
users := []User{{ID: 1, Active: true}, {ID: 2, Active: false}}
activeIDs := Map(
    Filter(users, func(u User) bool { return u.Active }),
    func(u User) int { return u.ID },
)
// [1]  — type-safe, zero reflect, zero allocation overhead

Trade-offs: generics increase code readability but slightly increase compile time. The Go implementation uses GCShape monomorphisation — types with the same memory layout share one instantiation, reducing binary size compared to full C++ monomorphisation. Avoid generics for simple cases — if a plain interface satisfies the requirement, prefer it.

What is the primary benefit of the generic Map/Filter pattern over using reflect or interface{}?
What Go type constraint must T satisfy to be used as a map key in a generic data structure?
28. How do you use test coverage meaningfully in Go — beyond just a percentage?

Test coverage in Go (go test -cover) reports which source lines were executed during tests. But 80% coverage can still miss critical paths. Experienced engineers use coverage to find untested branches, not to chase a number.

// Run coverage
// go test -cover ./...
// go test -coverprofile=cover.out ./...
// go tool cover -html=cover.out   # visual HTML report
// go tool cover -func=cover.out   # per-function percentages

// Useful: find UNCOVERED branches visually
// Red lines in cover -html are untested paths

// Coverage pragmas: mark code as intentionally not testable
func panicOnInvariantViolation(cond bool, msg string) {
    if !cond {
        // This code path requires a programming error to hit
        // It's fine to leave untested
        panic(msg) //nolint:gocritic
    }
}

// Testing edge cases found in coverage report
func parsePort(s string) (int, error) {
    n, err := strconv.Atoi(s)
    if err != nil {
        return 0, fmt.Errorf("port %q is not a number: %w", s, err) // branch 1
    }
    if n < 1 || n > 65535 {
        return 0, fmt.Errorf("port %d out of range [1, 65535]", n)   // branch 2
    }
    return n, nil // branch 3
}

// Table-driven test to hit all branches
func TestParsePort(t *testing.T) {
    tests := []struct {
        input   string
        want    int
        wantErr bool
    }{
        {"8080", 8080, false},    // happy path (branch 3)
        {"abc",  0,    true},     // not a number (branch 1)
        {"0",    0,    true},     // too low (branch 2)
        {"65536", 0,   true},     // too high (branch 2)
        {"1",    1,    false},    // boundary minimum
        {"65535", 65535, false},  // boundary maximum
    }
    // ... test loop
}

// CI enforcement:
// go test -coverprofile=cover.out ./...
// go tool cover -func=cover.out | tail -1 | awk '{print $3}' | grep -v '^[0-6][0-9]'
What does 'go tool cover -html=cover.out' show that the percentage alone does not?
Why is achieving 100% test coverage not necessarily a good goal?
29. What are the best practices for designing Protocol Buffer schemas in Go microservices?

Protobuf schema design has long-term consequences — once published, breaking changes require coordinated version bumps across all consumers. Good schema design minimises future pain.

// Best practices:

// 1. Use well-known types for common data
import "google/protobuf/timestamp.proto";
import "google/protobuf/duration.proto";
import "google/protobuf/wrappers.proto"; // for nullable primitives

message Order {
    string order_id = 1;  // UUIDs as strings, not int64
    google.protobuf.Timestamp created_at = 2; // not int64 unix
    google.protobuf.Duration  processing_time = 3;
    google.protobuf.StringValue discount_code = 4; // nullable string
    repeated OrderItem items = 5;
    OrderStatus status = 6;
}

// 2. Use enums with an UNSPECIFIED zero value
enum OrderStatus {
    ORDER_STATUS_UNSPECIFIED = 0; // default/unknown — must be 0
    ORDER_STATUS_PENDING     = 1;
    ORDER_STATUS_PAID        = 2;
    ORDER_STATUS_SHIPPED     = 3;
    ORDER_STATUS_CANCELLED   = 4;
}

// 3. OneOf for discriminated unions
message PaymentMethod {
    oneof method {
        CreditCard credit_card = 1;
        BankTransfer bank_transfer = 2;
        CryptoCurrency crypto = 3;
    }
}

// 4. Avoid nested types — prefer separate top-level messages
// Bad: message User { message Address { ... } address = 5; }
// Good: message Address { ... }  message User { Address address = 5; }

// 5. Name convention: snake_case fields, PascalCase messages
// 6. Namespace your protos: package company.service.v1;
// 7. One service per .proto file; one message per concern
Why should Protobuf enums always have a zero-value entry named *_UNSPECIFIED?
What is the purpose of 'google.protobuf.StringValue' instead of a plain string field?
30. How do you implement safe retries in Go microservices?

Retries improve resilience but must be done correctly. Retrying non-idempotent operations (POST create) without idempotency keys causes duplicate records. Retrying without backoff causes thundering herd. Retrying infinite times causes cascading failure.

// Safe retry with exponential backoff and jitter
type RetryConfig struct {
    MaxAttempts int
    InitialWait time.Duration
    MaxWait     time.Duration
    Multiplier  float64
}

func Retry(ctx context.Context, cfg RetryConfig, fn func() error) error {
    wait := cfg.InitialWait
    for attempt := 1; ; attempt++ {
        err := fn()
        if err == nil { return nil }

        // Don't retry permanent errors
        if !isRetryable(err) { return err }

        if attempt >= cfg.MaxAttempts {
            return fmt.Errorf("after %d attempts: %w", attempt, err)
        }

        // Exponential backoff with jitter
        jitter := time.Duration(rand.Int63n(int64(wait / 2)))
        sleep := wait + jitter
        if sleep > cfg.MaxWait { sleep = cfg.MaxWait }

        select {
        case <-time.After(sleep):
        case <-ctx.Done():
            return fmt.Errorf("retry cancelled: %w", ctx.Err())
        }
        wait = time.Duration(float64(wait) * cfg.Multiplier)
    }
}

func isRetryable(err error) bool {
    // Only retry transient errors
    code := status.Code(err)
    return code == codes.Unavailable ||
        code == codes.DeadlineExceeded ||
        code == codes.ResourceExhausted
}

// Idempotency key for non-idempotent operations
func (s *userServiceServer) CreateUser(
    ctx context.Context, req *pb.CreateUserRequest,
) (*pb.User, error) {
    // Extract idempotency key from metadata
    md, _ := metadata.FromIncomingContext(ctx)
    idempKey := md.Get("idempotency-key")

    // Check if we've seen this key before
    if len(idempKey) > 0 {
        if cached, err := s.cache.Get(ctx, "idemp:"+idempKey[0]); err == nil {
            var existing pb.User
            proto.Unmarshal(cached, &existing)
            return &existing, nil // return cached response
        }
    }
    // ... create user, cache result with idempotency key
}
Why should retries use exponential backoff with jitter instead of fixed intervals?
What is an idempotency key and why is it necessary for retries on mutating operations?
31. What are golden file tests in Go and when should you use them?

Golden file tests compare output against a saved reference file. They are ideal for testing complex output (JSON API responses, generated SQL, HTML) where manually writing the expected value in code is tedious and error-prone.

import "github.com/sebdah/goldie/v2"

// Golden file test — output compared against testdata/*.golden
func TestUserToJSON(t *testing.T) {
    g := goldie.New(t,
        goldie.WithFixtureDir("testdata"),
        goldie.WithUpdateFlag("update"), // flag: -update
    )

    user := User{
        ID:        1,
        Name:      "Alice",
        CreatedAt: time.Date(2024, 1, 15, 0, 0, 0, 0, time.UTC),
    }
    data, err := json.MarshalIndent(user, "", "  ")
    if err != nil { t.Fatal(err) }

    // Compares with testdata/TestUserToJSON.golden
    g.Assert(t, "TestUserToJSON", data)
}

// Manual golden file implementation
func assertGolden(t *testing.T, name string, got []byte) {
    t.Helper()
    path := filepath.Join("testdata", name+".golden")

    if *update {
        // go test -run TestUserToJSON -args -update
        os.MkdirAll("testdata", 0755)
        os.WriteFile(path, got, 0644)
        return
    }

    want, err := os.ReadFile(path)
    if err != nil {
        t.Fatalf("golden file missing: %s (run with -args -update to create)", path)
    }
    if !bytes.Equal(got, want) {
        diff := diffStrings(string(want), string(got))
        t.Errorf("golden file mismatch:\n%s", diff)
    }
}

var update = flag.Bool("update", false, "update golden files")

// Commit testdata/*.golden files to version control
// Update when intentional output changes:
// go test -run TestUserToJSON -args -update
When are golden file tests most useful?
How do you update golden files when the output intentionally changes?
32. How do you ensure data consistency across Go microservices without distributed transactions?

Distributed transactions (2PC) are generally avoided in microservices — they couple services and create availability issues. The alternative is eventual consistency through careful design: idempotent consumers, compensating transactions, and the Outbox pattern.

// Pattern: last-write-wins with optimistic locking (version field)
type User struct {
    ID      int
    Name    string
    Email   string
    Version int // incremented on each update
}

func (r *UserRepo) UpdateOptimistic(
    ctx context.Context, user *User,
) error {
    result, err := r.db.ExecContext(ctx,
        `UPDATE users
         SET name=$1, email=$2, version=version+1
         WHERE id=$3 AND version=$4`,
        user.Name, user.Email, user.ID, user.Version,
    )
    if err != nil { return err }
    n, _ := result.RowsAffected()
    if n == 0 {
        return ErrConflict // another update happened concurrently
    }
    user.Version++ // reflect new version
    return nil
}

// Pattern: idempotent event handler with deduplication
type EventHandler struct {
    db    *sql.DB
    dedup *DedupeCache
}

func (h *EventHandler) Handle(ctx context.Context, event Event) error {
    // Use event ID as idempotency key
    if already, _ := h.dedup.Check(ctx, event.ID); already {
        return nil // already processed — safe to ack
    }

    tx, err := h.db.BeginTx(ctx, nil)
    if err != nil { return err }
    defer tx.Rollback()

    // Process the event
    if err := h.applyEvent(ctx, tx, event); err != nil {
        return fmt.Errorf("apply event: %w", err)
    }

    // Mark event as processed within the same transaction
    tx.ExecContext(ctx,
        "INSERT INTO processed_events (id) VALUES ($1)", event.ID)

    if err := tx.Commit(); err != nil {
        return fmt.Errorf("commit: %w", err)
    }
    h.dedup.Set(ctx, event.ID) // populate cache
    return nil
}
What is optimistic locking and when does it fail?
How does storing processed event IDs in the same transaction as the event handler work ensure idempotency?
33. What is the API Gateway pattern and how does it complement Go microservices?

An API Gateway is a single entry point for all client requests. It handles cross-cutting concerns — auth, rate limiting, routing, SSL termination, response aggregation — so individual services don't need to implement them.

// Lightweight Go API gateway (simplified)
type Gateway struct {
    routes map[string]RouteConfig
    auth   *JWTValidator
    limit  *RateLimiter
    log    *slog.Logger
}

type RouteConfig struct {
    Target      string   // upstream URL
    RequireAuth bool
    RateLimit   int      // requests per second
    Methods     []string // allowed HTTP methods
    StripPrefix string
}

func (g *Gateway) ServeHTTP(w http.ResponseWriter, r *http.Request) {
    route, ok := g.matchRoute(r.URL.Path)
    if !ok {
        http.Error(w, "not found", 404); return
    }

    // Auth (before rate limiting to save rate limit quota for auth users)
    if route.RequireAuth {
        if _, err := g.auth.Validate(r); err != nil {
            http.Error(w, "unauthorized", 401); return
        }
    }

    // Rate limiting
    if !g.limit.Allow(clientIP(r)) {
        http.Error(w, "rate limited", 429); return
    }

    // Proxy to upstream
    proxy := httputil.NewSingleHostReverseProxy(mustParseURL(route.Target))
    if route.StripPrefix != "" {
        r.URL.Path = strings.TrimPrefix(r.URL.Path, route.StripPrefix)
    }
    proxy.ServeHTTP(w, r)
}

// In practice: use Kong, Envoy, or AWS API Gateway
// Go excels at writing custom gateway logic as plugins:
// - Kong plugins in Go (go-pdk)
// - Envoy WASM filters in Go (tetratelabs/proxy-wasm-go-sdk)
// - Custom gateway in Go using httputil.ReverseProxy

BFF (Backend for Frontend) pattern: a specialised API gateway for each type of client — mobile BFF, web BFF. Each BFF aggregates calls to multiple services and returns exactly the data its client needs (graph-like queries without GraphQL's complexity). Often written in Go for performance.

What is the primary benefit of an API Gateway in a microservices architecture?
What is the Backend for Frontend (BFF) pattern?
34. What memory leak patterns in Go are not goroutine leaks and how do you detect them?

Go has several memory leak patterns beyond leaked goroutines: long-lived caches without eviction, global maps that grow without bounds, slice backing arrays held by small sub-slices, and finalizers that delay GC.

// LEAK 1: growing global map without eviction
var requestMetrics = map[string]int64{} // grows forever with new paths
// Fix: use a bounded structure (LRU cache, TTL cache)
var metricsCache = cache.NewLRU[string, int64](1000) // bounded

// LEAK 2: slice sub-slice retaining large backing array
func getHeader(data []byte) []byte {
    return data[:8]  // BAD: keeps ALL of 'data' alive
}
func getHeaderSafe(data []byte) []byte {
    header := make([]byte, 8)
    copy(header, data[:8]) // GOOD: independent allocation
    return header
}

// LEAK 3: time.Ticker without Stop()
// Already discussed but worth repeating — channel and goroutine alive forever

// LEAK 4: cgo-allocated memory (outside GC's purview)
// Must manually call C.free() for C.CString(), C.malloc(), etc.

// DETECTION:
// A. runtime.ReadMemStats — watch HeapAlloc over time
func monitorHeap(ctx context.Context) {
    ticker := time.NewTicker(30 * time.Second)
    defer ticker.Stop()
    for {
        select {
        case <-ctx.Done(): return
        case <-ticker.C:
            var stats runtime.MemStats
            runtime.ReadMemStats(&stats)
            slog.Info("heap",
                "alloc_mb",    stats.HeapAlloc / (1 << 20),
                "sys_mb",      stats.HeapSys / (1 << 20),
                "num_gc",      stats.NumGC,
                "goroutines",  runtime.NumGoroutine(),
            )
        }
    }
}

// B. pprof heap profile comparison
// curl http://localhost:6060/debug/pprof/heap > heap1.out
// # wait a while
// curl http://localhost:6060/debug/pprof/heap > heap2.out
// go tool pprof -base heap1.out heap2.out
// > top10  # shows objects that grew between snapshots
How do you detect that a sub-slice is holding an unexpectedly large backing array in memory?
What is the correct way to prevent a small sub-slice from holding a large backing array alive?
35. How do CQRS and event sourcing apply to Go microservice architecture?

CQRS (Command Query Responsibility Segregation) separates reads and writes into separate models. Event Sourcing stores every state change as an event — the current state is derived by replaying events. Both patterns appear in high-scale Go systems.

// CQRS: separate command and query handlers
// Command: mutates state, returns error
type CreateOrderCommand struct {
    UserID    string
    ProductID string
    Quantity  int
}

type OrderCommandHandler struct{ eventStore EventStore }

func (h *OrderCommandHandler) Handle(
    ctx context.Context, cmd CreateOrderCommand,
) error {
    event := OrderCreatedEvent{
        OrderID:   uuid.New().String(),
        UserID:    cmd.UserID,
        ProductID: cmd.ProductID,
        Quantity:  cmd.Quantity,
        CreatedAt: time.Now(),
    }
    return h.eventStore.Append(ctx, event.OrderID, event)
}

// Query: reads from a projection (read model)
type OrderQueryHandler struct{ readDB *sql.DB }

func (h *OrderQueryHandler) GetOrder(
    ctx context.Context, orderID string,
) (*OrderView, error) {
    // Read from denormalised view optimised for queries
    row := h.readDB.QueryRowContext(ctx,
        "SELECT id, status, total, user_name FROM order_views WHERE id=$1",
        orderID)
    var view OrderView
    return &view, row.Scan(&view.ID, &view.Status, &view.Total, &view.UserName)
}

// Event Sourcing: replay events to reconstruct state
type Order struct {
    ID      string
    Status  string
    Items   []OrderItem
    version int
}

func ReplayOrder(ctx context.Context, store EventStore, id string) (*Order, error) {
    events, err := store.Load(ctx, id)
    if err != nil { return nil, err }

    order := &Order{ID: id}
    for _, event := range events {
        order.apply(event)   // deterministically mutate state
        order.version++
    }
    return order, nil
}
What is the main benefit of event sourcing over traditional state storage?
In CQRS, why do the read and write models use different data stores or schemas?
36. What is chaos engineering and how do Go teams apply it to test microservice resilience?

Chaos engineering deliberately injects failures into a running system to discover weaknesses before they cause production outages. Go services are validated against: network failures, slow dependencies, pod restarts, and resource exhaustion.

// Chaos testing in Go: inject failures in tests

// Fault injection via interface
type FaultInjector struct {
    next     UserRepository
    failRate float64 // 0.0 to 1.0
    latency  time.Duration
}

func (f *FaultInjector) FindByID(ctx context.Context, id int) (*User, error) {
    // Inject artificial latency
    if f.latency > 0 {
        select {
        case <-time.After(f.latency):
        case <-ctx.Done(): return nil, ctx.Err()
        }
    }
    // Inject random failures
    if rand.Float64() < f.failRate {
        return nil, errors.New("injected fault: database unavailable")
    }
    return f.next.FindByID(ctx, id)
}

// Test service behaviour under 50% failure rate
func TestServiceUnderFaults(t *testing.T) {
    repo := &fakeUserRepo{users: testUsers}
    faulty := &FaultInjector{
        next:     repo,
        failRate: 0.5,
        latency:  100 * time.Millisecond,
    }
    svc := NewUserService(faulty)

    // Test that service handles partial failures gracefully
    successCount := 0
    for i := 0; i < 100; i++ {
        user, err := svc.GetUserWithFallback(context.Background(), 1)
        if err == nil && user != nil { successCount++ }
    }
    // With 50% fault rate and fallback, expect at least 90% success
    if float64(successCount) < 90 {
        t.Errorf("success rate %d%% too low with fallback", successCount)
    }
}

// Tools for chaos in production:
// - Chaos Monkey (Netflix) — terminates random pods
// - Litmus (CNCF) — k8s-native chaos experiments
// - Gremlin — cloud chaos-as-a-service
// - k6 + fault injection scenarios
What is the primary goal of chaos engineering?
Why is testing with a FaultInjector in unit tests valuable before doing production chaos experiments?
37. What is contract testing and how does it apply to Go microservices?

Contract testing verifies that services honour their agreed API contracts without requiring a full integration test environment. In a microservices system, consumer-driven contract testing (Pact) lets service consumers define the API shape they expect, and providers verify they match.

// Pact consumer test (user-service-client)
// Defines what the consumer expects from the user service
func TestUserServiceContract_GetUser(t *testing.T) {
    // Define the expected interaction
    pact := dsl.Pact{
        Consumer: "order-service",
        Provider: "user-service",
    }
    defer pact.Teardown()

    pact.
        AddInteraction().
        Given("user 1 exists").
        UponReceiving("a request for user 1").
        WithRequest(dsl.Request{
            Method:  "GET",
            Path:    "/users/1",
            Headers: dsl.MapMatcher{"Accept": "application/json"},
        }).
        WillRespondWith(dsl.Response{
            Status: 200,
            Body: dsl.Match(User{}), // matches structure, not values
        })

    if err := pact.Verify(func() error {
        client := NewUserClient(pact.Server.URL)
        user, err := client.GetUser(context.Background(), 1)
        if err != nil { return err }
        if user.ID != 1 { return fmt.Errorf("expected ID 1, got %d", user.ID) }
        return nil
    }); err != nil {
        t.Fatal(err)
    }
}

// Protobuf contracts are already self-documenting
// For gRPC: test that the proto file matches what clients use
// Use protoc-gen-validate for field-level validation in the schema
message CreateUserRequest {
    string email = 1 [(validate.rules).string.email = true];
    string name  = 2 [(validate.rules).string = {min_len: 1, max_len: 100}];
}

gRPC contract testing: Protobuf provides the contract itself. The pattern is different — ensure the generated Go code from the consumer's proto copy matches the provider's. Tools like buf breaking detect breaking changes in .proto files, acting as automated contract enforcement.

What is the key difference between contract testing and integration testing?
How does Protobuf help with contract testing in gRPC services?
38. What makes a Go microservice horizontally scalable and what patterns break scaling?

A horizontally scalable service can handle more load by adding replicas — each replica is identical and stateless. Go services are well-suited to horizontal scaling due to small memory footprint, but certain patterns break scalability.

Scalable vs Non-Scalable Patterns
PatternScalable?Problem / Fix
Global in-memory cache (map/slice)NOEach replica has its own cache — inconsistent reads. Fix: use Redis or Memcached
In-memory session storeNOSessions are lost on pod restart. Fix: Redis-backed sessions
Background goroutine per replicaCautionN replicas run N workers — duplicated work. Fix: use a job queue with exactly-one semantics
Stateless request handlerYESEach request fully handled without shared state
Externally stored state (DB, cache)YESState survives restarts; any replica can handle any request
Distributed locks (Redis SETNX)YESEnables exactly-one execution across replicas for cron jobs
Idempotent APIYESClients safely retry on any replica
// ANTI-PATTERN: in-memory state that breaks horizontal scaling
var sessionStore = map[string]Session{} // lost on pod restart!

// CORRECT: external session storage
type SessionStore struct{ redis *redis.Client }

func (s *SessionStore) Get(ctx context.Context, id string) (*Session, error) {
    data, err := s.redis.Get(ctx, "sess:"+id).Bytes()
    if err != nil { return nil, err }
    var sess Session
    return &sess, json.Unmarshal(data, &sess)
}

// Distributed cron: only one replica should run a job
func runIfLeader(ctx context.Context, rdb *redis.Client, jobFn func()) {
    key := "job:lock:daily-report"
    acquired, err := rdb.SetNX(ctx, key, "1", 5*time.Minute).Result()
    if err != nil || !acquired {
        return // another replica has the lock
    }
    defer rdb.Del(ctx, key) // release after job
    jobFn()
}
Why does storing session data in a Go service's in-memory map break horizontal scaling?
What pattern enables exactly-one execution of a cron job across multiple replicas?
39. How do you implement configuration hot-reloading in a Go service without restart?

Some configuration changes — feature flags, rate limits, log levels — should not require a pod restart. Hot-reloading reads new config from a file or config service and atomically swaps the configuration pointer.

// Atomic configuration pointer — reads and writes are goroutine-safe
type DynamicConfig struct {
    current atomic.Pointer[Config]
}

func NewDynamicConfig(initial *Config) *DynamicConfig {
    dc := &DynamicConfig{}
    dc.current.Store(initial)
    return dc
}

// Get returns the current config — zero allocation, lock-free
func (dc *DynamicConfig) Get() *Config {
    return dc.current.Load()
}

// Reload atomically swaps to a new config
func (dc *DynamicConfig) Reload(path string) error {
    data, err := os.ReadFile(path)
    if err != nil { return err }
    var cfg Config
    if err := yaml.Unmarshal(data, &cfg); err != nil { return err }
    if err := cfg.Validate(); err != nil { return err }
    dc.current.Store(&cfg) // atomic swap
    slog.Info("config reloaded", "path", path)
    return nil
}

// SIGHUP-triggered reload
func watchConfigSignal(dc *DynamicConfig, path string) {
    sighup := make(chan os.Signal, 1)
    signal.Notify(sighup, syscall.SIGHUP)
    for range sighup {
        if err := dc.Reload(path); err != nil {
            slog.Error("config reload failed", "err", err)
            // Keep running with old config
        }
    }
}

// File watcher approach
func watchConfigFile(dc *DynamicConfig, path string, ctx context.Context) {
    watcher, _ := fsnotify.NewWatcher()
    watcher.Add(path)
    defer watcher.Close()
    for {
        select {
        case <-ctx.Done(): return
        case event := <-watcher.Events:
            if event.Has(fsnotify.Write) {
                time.Sleep(100 * time.Millisecond) // wait for write to complete
                dc.Reload(path)
            }
        }
    }
}
Why is atomic.Pointer used for hot-reloading configuration instead of a sync.RWMutex?
Why should new configuration be validated before calling dc.current.Store()?
40. How do you architect Go services for maximum testability at the package level?

Testable architecture is not an afterthought — it follows from correct dependency direction. The key principle: business logic in inner packages depends on abstractions (interfaces), not on concrete infrastructure. This enables unit testing without a database or network.

// Layered architecture with interfaces at boundaries

// domain/user.go — inner layer, no infrastructure imports
package domain

type UserRepository interface {
    FindByID(ctx context.Context, id int) (*User, error)
    Save(ctx context.Context, u *User) error
}

type EmailSender interface {
    Send(ctx context.Context, to, subject, body string) error
}

type UserService struct {
    repo  UserRepository
    email EmailSender
}

// All business logic here — easily unit testable
func (s *UserService) Register(ctx context.Context, req RegisterRequest) (*User, error) {
    // validation, business rules — no DB/network calls directly
    if err := validateEmail(req.Email); err != nil {
        return nil, fmt.Errorf("invalid email: %w", err)
    }
    u := &User{Email: req.Email, Name: req.Name}
    if err := s.repo.Save(ctx, u); err != nil {
        return nil, fmt.Errorf("saving user: %w", err)
    }
    s.email.Send(ctx, u.Email, "Welcome!", "Thanks for joining.")
    return u, nil
}

// domain/user_test.go — fast unit test, no DB or network
func TestUserService_Register(t *testing.T) {
    tests := []struct {
        name    string
        email   string
        repoErr error
        wantErr bool
    }{
        {"valid", "alice@example.com", nil, false},
        {"invalid email", "notanemail", nil, true},
        {"db error", "bob@example.com", errors.New("db down"), true},
    }
    for _, tt := range tests {
        t.Run(tt.name, func(t *testing.T) {
            svc := &UserService{
                repo:  &stubRepo{saveErr: tt.repoErr},
                email: &noopEmailer{},
            }
            _, err := svc.Register(context.Background(),
                RegisterRequest{Email: tt.email})
            if (err != nil) != tt.wantErr {
                t.Errorf("got err=%v, wantErr=%v", err, tt.wantErr)
            }
        })
    }
}
What is the dependency rule in clean/layered architecture?
Why does placing interfaces in the domain package (consumer) rather than the infrastructure package (producer) improve testability?
41. How do you implement feature flags and canary deployments in a Go microservice?

Feature flags decouple deployment from release — code is deployed to all servers but activated for a subset of users or traffic. Canary deployments route a small percentage of traffic to a new version, monitoring for errors before full rollout.

// Feature flag implementation
type FeatureFlags struct {
    mu    sync.RWMutex
    flags map[string]FlagConfig
}

type FlagConfig struct {
    Enabled    bool
    Percentage int    // 0-100: % of users who see the feature
    Allowlist  []string // specific user IDs
}

func (f *FeatureFlags) IsEnabled(flagName, userID string) bool {
    f.mu.RLock()
    cfg, ok := f.flags[flagName]
    f.mu.RUnlock()
    if !ok || !cfg.Enabled { return false }

    // Always-on for allowlisted users (internal testing)
    for _, id := range cfg.Allowlist {
        if id == userID { return true }
    }

    // Percentage rollout: deterministic based on user ID
    // Same user always gets same experience
    h := fnv32(userID) % 100
    return int(h) < cfg.Percentage
}

// In handler
func userHandler(w http.ResponseWriter, r *http.Request) {
    claims, _ := claimsFromCtx(r.Context())
    if flags.IsEnabled("new-profile-ui", claims.UserID) {
        renderNewProfile(w, r)
        return
    }
    renderLegacyProfile(w, r)
}

// Kubernetes canary via Argo Rollouts
// spec.strategy.canary:
//   steps:
//   - setWeight: 10    # 10% traffic to new version
//   - pause: {duration: 5m}
//   - analysis:        # check error rate
//       templates: [{templateName: error-rate-analysis}]
//   - setWeight: 50    # 50% if analysis passed
//   - pause: {duration: 10m}
//   - setWeight: 100   # full rollout
Why is percentage rollout based on a hash of the user ID instead of a random number?
What is the difference between a feature flag and a canary deployment?
42. How do you design a multi-tenant Go microservice?

Multi-tenancy means one service instance serves multiple customers (tenants) with data isolation. Three common isolation models: shared database, schema per tenant, database per tenant. The choice depends on isolation requirements, scaling needs, and cost.

// Tenant context extracted from JWT or header
type TenantID string
type tenantKey struct{}

func withTenant(ctx context.Context, id TenantID) context.Context {
    return context.WithValue(ctx, tenantKey{}, id)
}

func tenantFromCtx(ctx context.Context) (TenantID, bool) {
    id, ok := ctx.Value(tenantKey{}).(TenantID)
    return id, ok
}

// Middleware: extract and validate tenant from JWT
func tenantMiddleware(next http.Handler) http.Handler {
    return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
        claims, err := validateJWT(r.Header.Get("Authorization"))
        if err != nil { http.Error(w, "unauthorized", 401); return }

        ctx := withTenant(r.Context(), TenantID(claims.TenantID))
        next.ServeHTTP(w, r.WithContext(ctx))
    })
}

// Shared database with tenant_id column (row-level isolation)
type TenantAwareRepo struct{ db *sql.DB }

func (r *TenantAwareRepo) FindUsers(ctx context.Context) ([]User, error) {
    tenantID, ok := tenantFromCtx(ctx)
    if !ok { return nil, errors.New("no tenant in context") }

    rows, err := r.db.QueryContext(ctx,
        // tenant_id filter on every query — NEVER omit this
        "SELECT id, name FROM users WHERE tenant_id = $1",
        tenantID)
    if err != nil { return nil, err }
    defer rows.Close()
    // ... scan rows
}

// Postgres Row Level Security (RLS) — DB enforces tenant isolation
// ALTER TABLE users ENABLE ROW LEVEL SECURITY;
// CREATE POLICY tenant_isolation ON users
//     USING (tenant_id = current_setting('app.tenant_id')::uuid);
// SET app.tenant_id = '123e4567-...' -- per connection/transaction
What is the main risk of the shared-database multi-tenant model if not implemented carefully?
What does PostgreSQL Row Level Security (RLS) provide in a multi-tenant system?
43. What is mutation testing and how does it evaluate test suite quality beyond coverage?

Mutation testing evaluates whether your tests actually catch bugs. It automatically introduces small changes (mutations) to the source code — like changing > to >= — and checks if at least one test fails. Tests that don't catch any mutation are weak.

// The function under test
func isEligible(age int, hasLicense bool) bool {
    return age >= 18 && hasLicense
}

// Weak test — 100% coverage but misses mutations
func TestIsEligible_Weak(t *testing.T) {
    if !isEligible(20, true) {
        t.Error("expected eligible")
    }
    if isEligible(15, true) {
        t.Error("expected ineligible")
    }
}
// Mutation: change age >= 18 to age > 18
// isEligible(18, true) should return true, but weak test doesn't check 18
// → mutation SURVIVES → test is weak at the boundary

// Strong test — catches boundary mutations
func TestIsEligible_Strong(t *testing.T) {
    tests := []struct {
        age     int
        license bool
        want    bool
    }{
        {20, true,  true},   // clearly eligible
        {17, true,  false},  // underage
        {18, true,  true},   // boundary: exactly eligible
        {18, false, false},  // boundary: no license
        {19, false, false},  // old enough but no license
        {0,  false, false},  // both missing
    }
    for _, tt := range tests {
        t.Run(fmt.Sprintf("age=%d,license=%v", tt.age, tt.license), func(t *testing.T) {
            if got := isEligible(tt.age, tt.license); got != tt.want {
                t.Errorf("isEligible(%d, %v) = %v, want %v",
                    tt.age, tt.license, got, tt.want)
            }
        })
    }
}

// Go mutation testing tools:
// - go-mutesting (zimmski/go-mutesting)
// - gremlins (singularity-code/gremlins)
What does it mean when a mutation 'survives' in mutation testing?
Why should boundary values always be included in table-driven tests?
44. How do you manage the full lifecycle of a Go microservice from startup to shutdown?

A production Go service follows a structured lifecycle: configuration validation, dependency initialisation, readiness signalling, traffic serving, graceful shutdown on signal, and cleanup. Each phase must handle failures correctly.

func main() {
    // Phase 1: load and validate config — fail fast
    cfg, err := config.Load()
    if err != nil { log.Fatalf("invalid config: %v", err) }

    // Phase 2: initialise dependencies
    logger := slog.New(slog.NewJSONHandler(os.Stdout, &slog.HandlerOptions{
        Level: cfg.LogLevel,
    }))
    slog.SetDefault(logger)

    db, err := database.Open(cfg.Database)
    if err != nil { log.Fatalf("database: %v", err) }
    defer db.Close()

    // Phase 3: build services
    userRepo := postgres.NewUserRepository(db)
    userSvc  := service.NewUserService(userRepo)
    router   := api.NewRouter(userSvc)

    // Phase 4: start server
    srv := &http.Server{
        Addr:         fmt.Sprintf(":%d", cfg.Port),
        Handler:      router,
        ReadTimeout:  5 * time.Second,
        WriteTimeout: 10 * time.Second,
    }

    errCh := make(chan error, 1)
    go func() {
        slog.Info("server starting", "port", cfg.Port)
        if err := srv.ListenAndServe(); err != nil && !errors.Is(err, http.ErrServerClosed) {
            errCh <- err
        }
    }()

    // Phase 5: wait for shutdown or error
    sigCh := make(chan os.Signal, 1)
    signal.Notify(sigCh, syscall.SIGINT, syscall.SIGTERM)

    select {
    case err := <-errCh:
        slog.Error("server error", "err", err)
    case sig := <-sigCh:
        slog.Info("shutdown signal", "signal", sig)
    }

    // Phase 6: graceful shutdown
    shutdownCtx, cancel := context.WithTimeout(context.Background(), 25*time.Second)
    defer cancel()
    if err := srv.Shutdown(shutdownCtx); err != nil {
        slog.Error("shutdown error", "err", err)
    }
    slog.Info("service stopped cleanly")
}
Why must configuration loading and validation happen before any dependencies are initialised?
What should the service do if the HTTP server returns an error that is not http.ErrServerClosed?
45. How do you test Go code that processes streaming data or works with channels?

Testing channel-based pipelines requires careful synchronisation. Common patterns: bounded channels with timeout assertions, channel-based test doubles that feed input and capture output, and the fan-out test harness.

// Pipeline under test
func processEvents(ctx context.Context, in <-chan Event) <-chan ProcessedEvent {
    out := make(chan ProcessedEvent)
    go func() {
        defer close(out)
        for event := range in {
            result := transform(event)
            select {
            case out <- result:
            case <-ctx.Done(): return
            }
        }
    }()
    return out
}

// Test helper: send N items and collect results with timeout
func drainChannel[T any](t *testing.T, ch <-chan T, timeout time.Duration) []T {
    t.Helper()
    var results []T
    timer := time.NewTimer(timeout)
    defer timer.Stop()
    for {
        select {
        case item, ok := <-ch:
            if !ok { return results } // channel closed
            results = append(results, item)
        case <-timer.C:
            t.Fatalf("timeout: channel did not close after %v", timeout)
            return results
        }
    }
}

// Test the pipeline
func TestProcessEvents(t *testing.T) {
    ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
    defer cancel()

    input := make(chan Event, 3)
    input <- Event{ID: 1, Type: "click"}
    input <- Event{ID: 2, Type: "view"}
    input <- Event{ID: 3, Type: "click"}
    close(input) // signal end of stream

    output := processEvents(ctx, input)
    results := drainChannel(t, output, 3*time.Second)

    if len(results) != 3 {
        t.Errorf("expected 3 results, got %d", len(results))
    }
    // Assert specific transformations
    for _, r := range results {
        if r.ProcessedAt.IsZero() {
            t.Errorf("ProcessedAt not set for event %d", r.EventID)
        }
    }
}

// Test cancellation: pipeline exits cleanly when ctx is cancelled
func TestProcessEvents_Cancellation(t *testing.T) {
    ctx, cancel := context.WithCancel(context.Background())
    input := make(chan Event) // never closes
    output := processEvents(ctx, input)

    cancel() // cancel immediately

    // Output should close promptly after cancellation
    select {
    case _, ok := <-output:
        if ok { t.Error("expected channel to be closed after cancellation") }
    case <-time.After(time.Second):
        t.Error("channel did not close after cancellation")
    }
}
Why is it important to test that a pipeline stage closes its output channel after the input channel closes?
Why should pipeline tests always use a context timeout rather than blocking indefinitely?
46. Summarise the key principles for designing scalable Go microservices that senior engineers demonstrate.

This summary condenses the architectural and testing knowledge expected at senior/staff Go engineer level into a reference for interviews.

Architecture Principles Cheat Sheet
AreaKey Principle
Service boundariesSplit by bounded context (DDD); each service owns its data
CommunicationgRPC for internal (typed, efficient); REST for external (browser, partners)
Error modelUse gRPC status codes + errdetails; never expose internals to clients
Data consistencyEventual consistency via events + idempotent consumers + Outbox pattern
ResilienceCircuit breaker, retry with backoff, timeout on every remote call
ObservabilityMetrics (Prometheus) + traces (OpenTelemetry) + structured logs (slog)
ScalabilityStateless services; external state (Redis, DB); distributed locks for singletons
DeploymentGraceful shutdown; readiness probe; rolling update; preStop sleep
API evolutionNever break field numbers; use reserved; major version for breaking changes
Testing Principles Cheat Sheet
LevelTool/PatternKey Insight
UnitTable-driven + t.RunTest all branches including boundaries
UnitInterface mocks (stubs/fakes)No frameworks — plain structs satisfying interfaces
Benchmarktesting.B + -benchmemMeasure allocs/op — zero is the goal for hot paths
IntegrationTestMain + testcontainersReal DB in Docker; t.Cleanup for teardown
gRPCbufconn + table-drivenIn-memory server; assert status codes
Concurrency-race flag alwaysNever use time.Sleep to sync; use WaitGroup/channels
ContractProtobuf + buf breakingPrevent breaking changes; client-side expectations
Loadvegeta / b.RunParallelAssert p99 SLO; -cpu flag for scaling behaviour
// Interview answer template for 'design a scalable Go service':

// 1. API layer: REST (public) or gRPC (internal)
// 2. Auth: JWT middleware, validate alg, store claims in context
// 3. Business logic: pure domain package, no infrastructure imports
// 4. Data: database/sql pool, context on every query, optimistic locking
// 5. Caching: L1 (in-memory, singleflight) + L2 (Redis, TTL jitter)
// 6. Events: message queue for async, Outbox pattern for atomicity
// 7. Resilience: circuit breaker, retry with backoff, timeout on all I/O
// 8. Observability: Prometheus metrics, OTEL traces, slog JSON logs
// 9. Deployment: graceful shutdown, /healthz + /readyz, rolling update
// 10. Testing: table-driven, -race, -benchmem, goleak, testcontainers
At a senior Go engineer level, what is the correct answer to 'how do you handle distributed transactions across services'?
What is the most important flag to always include when running Go tests in CI for concurrent code?
«
»

Comments & Discussions