Prev Next

Maven / GitOps Interview Questions

1. What is GitOps and what core principles does it define? 2. How does GitOps differ from traditional CI/CD pipelines? 3. What is the 'single source of truth' principle in GitOps? 4. What are the two GitOps deployment models: push-based vs pull-based? 5. What is a GitOps operator and what role does it play? 6. What is declarative infrastructure and why does GitOps require it? 7. How does GitOps improve security and auditability compared to script-based deployments? 8. What Git branching strategies are commonly used with GitOps? 9. What is drift detection and how does a GitOps operator handle drift? 10. What is the difference between GitOps and Infrastructure as Code (IaC)? 11. What is Argo CD and how does it implement GitOps? 12. How does Argo CD's sync process work — desired state vs live state? 13. What are Argo CD Applications and ApplicationSets? 14. How do you structure a GitOps repository — app-of-apps, environment folders, overlays? 15. What is Flux CD and how does it differ from Argo CD? 16. How does Flux's source-controller and kustomize-controller work together? 17. How do you manage secrets in a GitOps workflow — Sealed Secrets, SOPS, External Secrets Operator? 18. How do you handle multiple environments (dev/staging/prod) in a GitOps repo? 19. How does image automation work in Flux for continuous delivery? 20. What are Argo CD sync policies — automated vs manual — and sync waves? 21. How do you roll back a deployment using GitOps? 22. How do you integrate GitOps with a CI pipeline — separation of concerns? 23. What is progressive delivery and how does it relate to GitOps — Argo Rollouts, Flagger? 24. How do you handle Helm charts in a GitOps workflow? 25. How do you use Kustomize overlays in a GitOps repository? 26. How do you implement multi-cluster GitOps at scale? 27. How does Argo CD handle RBAC and multi-tenancy? 28. What are the Argo CD app-of-apps and ApplicationSet patterns and when do you use each? 29. How do you implement GitOps for infrastructure provisioning with Crossplane and Cluster API? 30. How do you observe and alert on GitOps sync failures in production? 31. How do you manage database schema migrations in a GitOps workflow? 32. How do you implement policy enforcement in a GitOps pipeline — OPA/Gatekeeper, Kyverno? 33. What are the limitations and anti-patterns of GitOps? 34. How do you migrate an existing deployment pipeline to GitOps? 35. How does GitOps fit into a platform engineering strategy?
Could not find what you were looking for? send us the question and we would be happy to answer your question.

1. What is GitOps and what core principles does it define?

GitOps is an operational framework that applies DevOps practices — version control, collaboration, compliance, and CI/CD automation — to infrastructure and application delivery. The term was coined by Alexis Richardson of Weaveworks in 2017. The central idea is that Git acts as both the mechanism for change (pull requests) and the immutable audit log (commit history) for every system state transition.

The OpenGitOps working group (CNCF) formalised four core principles:

  • Declarative: The entire desired system state is expressed declaratively — you describe what should exist, not the sequence of steps to create it. Kubernetes manifests, Helm values files, and Kustomize overlays all qualify.
  • Versioned and Immutable: Desired state is stored in a VCS (Git) that enforces immutability and retains full history. Every change is a commit — reviewable, reversible, and attributable to a specific author.
  • Pulled Automatically: Software agents — not humans or CI pipelines — pull desired state from Git and apply it to the target environment. This inverts the traditional push model and keeps cluster credentials inside the cluster, not in external CI systems.
  • Continuously Reconciled: Agents continuously compare actual cluster state against the Git-declared desired state. When drift is detected they either alert operators or automatically self-heal, converging the system back to what Git specifies.

These four principles create a closed-loop automation system where every deployment, rollback, or configuration change flows through a Git commit and review cycle.

Which organisation formalised the four core GitOps principles as an open standard?
Which GitOps principle is directly responsible for keeping cluster credentials inside the cluster rather than in a CI system?
2. How does GitOps differ from traditional CI/CD pipelines?

Traditional CI/CD pipelines are push-based: the CI system builds an artifact, and the CD stage runs kubectl apply or helm upgrade directly against the cluster. The pipeline holds a kubeconfig or service-account token with cluster-write access. There is no persistent desired-state record and no automatic drift correction — if someone manually deletes a deployment, the pipeline only redeploys on the next trigger.

GitOps is pull-based: the CI system builds the image and updates a config repository, but it never touches the cluster directly. A GitOps operator running inside the cluster watches the config repo and reconciles the live state to match what Git says, continuously. Rollback is a git revert rather than re-running a pipeline step.

GitOps vs Traditional CI/CD
DimensionTraditional CI/CDGitOps
Deployment triggerPipeline push on build successOperator pull on Git commit
Cluster credentialsStored in CI system secretsKept inside the cluster only
Drift detectionNone — only corrects on next pipeline runContinuous — operator reconciles on every poll cycle
Rollback mechanismRe-run old pipeline or manual kubectlgit revert creates a new auditable commit
Audit trailPipeline logs (often ephemeral)Git commit history (permanent, cryptographically ordered)

The separation also improves security posture: even if the CI system is compromised, an attacker cannot push arbitrary changes to the cluster without also compromising Git and passing branch-protection reviews.

In a traditional push-based CD setup, where are cluster credentials typically stored?
What term describes a GitOps operator automatically fixing a cluster state that diverged from Git?
3. What is the 'single source of truth' principle in GitOps?

In GitOps, the Git repository is the system state. Every resource that should exist in the cluster — deployments, services, config maps, RBAC rules, network policies — is represented as a committed file. The repository is not a backup or documentation artifact; it is the authoritative record, and the cluster is just the materialised form of what Git contains.

Practical implications of this principle:

  • No out-of-band changes: Running kubectl apply -f directly, editing a ConfigMap in the Kubernetes dashboard, or scaling a deployment manually all create "drift" — a gap between what Git says and what the cluster is doing. A GitOps operator will detect and revert those changes on its next reconciliation cycle.
  • All changes via pull request: A developer who wants to change a replica count opens a PR, gets it reviewed and approved, and merges it. The operator then applies the change automatically. There is no separate "deployment approval" step because the PR is the approval.
  • History as an audit log: git log shows exactly who changed what, when, and why. This satisfies SOC 2, PCI-DSS, and similar compliance requirements without building a separate audit system.
  • Reproducibility: Because the entire desired state is in Git, spinning up a new environment is a matter of pointing a GitOps operator at the same repository and branch. There are no snowflake servers.

The single-source-of-truth principle breaks down if teams maintain parallel state (e.g., Helm releases applied manually alongside Argo CD-managed ones). Discipline about eliminating all write paths to the cluster other than the operator is essential.

What does 'single source of truth' mean in the context of GitOps?
What happens when someone runs kubectl apply directly on a GitOps-managed cluster?
4. What are the two GitOps deployment models: push-based vs pull-based?

Push-based deployment: a CI/CD pipeline (GitHub Actions, Jenkins, etc.) runs kubectl apply or helm upgrade directly against the target cluster after a build succeeds. The pipeline authenticates to the cluster using a kubeconfig or service-account token stored in the CI system's secret vault. Changes only reach the cluster when a pipeline is triggered; drift is not detected between runs.

Pull-based deployment: a GitOps operator running inside the cluster watches one or more Git repositories on a polling interval or via webhook. When it detects a difference between what Git declares and what the cluster is running, it applies the delta automatically. Cluster credentials never leave the cluster boundary.

Push-based vs Pull-based GitOps
PropertyPush-basedPull-based (true GitOps)
Who applies changesCI/CD pipelineIn-cluster operator
Credential exposurekubeconfig in CI secretsCredentials stay in cluster
Drift correctionNo automatic correctionContinuous reconciliation
ExamplesGitHub Actions + kubectl, Jenkins pipelineArgo CD, Flux CD

A pull-based Argo CD Application looks like this — the operator handles all cluster writes:

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: my-app
  namespace: argocd
spec:
  project: default
  source:
    repoURL: https://github.com/org/gitops-config.git
    targetRevision: main
    path: apps/my-app/overlays/prod
  destination:
    server: https://kubernetes.default.svc
    namespace: production
  syncPolicy:
    automated:
      prune: true
      selfHeal: true
In a pull-based GitOps model, which component is responsible for applying changes to the cluster?
What security advantage does pull-based deployment offer over push-based pipelines?
5. What is a GitOps operator and what role does it play?

A GitOps operator is a software agent that runs inside the target cluster and implements the pull-based reconciliation loop. It is the engine that turns a static Git repository into a live, self-correcting deployment system. Without an operator, GitOps is just a documentation convention — the operator is what closes the loop between intent (Git) and reality (cluster).

An operator performs four continuous activities:

  1. Source watching: Polls a Git repository (or OCI artifact registry) on a configurable interval — typically every 1–5 minutes — or listens for a webhook notification. It detects when the desired state has changed.
  2. Manifest rendering: Fetches the source and renders it into plain Kubernetes manifests. Depending on configuration, this may involve running kustomize build, rendering a Helm chart, or processing Jsonnet templates.
  3. Diff computation: Compares the rendered desired state against the live resources in the cluster (fetched from the Kubernetes API). Identifies resources that need to be created, updated, or deleted.
  4. Reconciliation: Applies the diff using kubectl apply (or server-side apply), bringing the cluster to the declared state. Updates the status of its own CRs to reflect sync health.

The two dominant GitOps operators in production today are Argo CD (CNCF graduated, rich UI, Application/AppProject model) and Flux CD (CNCF graduated, composable controllers, CLI-first). Both support Kubernetes, Helm, and Kustomize as manifest sources.

What does a GitOps operator compare during each reconciliation cycle?
Which of the following is a CNCF-graduated GitOps operator?
6. What is declarative infrastructure and why does GitOps require it?

Declarative infrastructure means specifying the desired end state of a system rather than the sequence of commands needed to reach it. A Kubernetes Deployment manifest is declarative — it says "I want 3 replicas of this container image running with these environment variables." The Kubernetes control plane figures out the steps (schedule pods, pull images, attach volumes) to achieve that state. By contrast, a shell script that runs kubectl scale, kubectl set image, and kubectl rollout in sequence is imperative — the outcome depends on the starting state.

GitOps requires declarative infrastructure for a specific technical reason: the reconciliation loop works by comparing two representations of system state — the declared state in Git and the live state reported by the Kubernetes API. Both sides of the comparison must be in the same format for the diff to be meaningful.

  • Imperative commands like kubectl run describe a one-time action, not a persistent state. You cannot store "kubectl scale to 5" in Git and compare it to the current replica count; you can store spec.replicas: 5 and compare it.
  • Idempotency follows naturally from declarative configs — applying the same manifest twice produces the same result, which means the operator can safely reconcile as often as it likes.
  • Drift detection only works if the desired state can be parsed, normalised, and diffed against the live API server response. Declarative YAML makes this straightforward.

Common declarative formats used with GitOps: Kubernetes YAML manifests, Helm values.yaml files (rendered to manifests by the operator), Kustomize overlay patches, and Crossplane Composite Resource Claims.

What is the defining characteristic that makes a Kubernetes manifest 'declarative'?
Why can imperative kubectl commands not serve as the desired state in a GitOps reconciliation loop?
7. How does GitOps improve security and auditability compared to script-based deployments?

Script-based deployments scatter cluster-write credentials across CI systems, developer laptops, and shared servers. Any engineer with access to the deployment script or its secrets can push arbitrary changes to production with no mandatory review. Audit trails, when they exist at all, are CI job logs that expire after a few weeks.

GitOps addresses these weaknesses through several concrete mechanisms:

  • Immutable audit trail: Every change to cluster state must be a Git commit, signed with the author's identity and timestamped by the VCS. git log --follow shows the complete history of every resource, who changed it, when, and why (via commit message). This satisfies SOC 2 Type II, PCI-DSS, and ISO 27001 change-management requirements without a separate audit tool.
  • Mandatory code review: Branch protection rules require pull-request review before any change merges to the deploy branch. This adds a human approval gate that script-based pipelines typically lack.
  • No human direct cluster access: In a mature GitOps setup, engineers do not need kubeconfig with write permissions. The operator holds the only write credential. Even if an engineer's laptop is compromised, the attacker cannot directly alter the cluster — they would need to also compromise Git and pass branch-protection checks.
  • Credentials stay in the cluster: Pull-based deployment means CI systems never hold cluster tokens. Reducing the number of places a kubeconfig exists reduces the attack surface.
  • Signed commits: GPG or SSH-signed commits cryptographically link every change to a verified identity, making it impossible to forge the commit author after the fact.
  • Automated policy checks in CI: Tools like Conftest (OPA) or Kyverno CLI can validate manifests in the PR pipeline before they ever reach the cluster — shifting security left.
Which GitOps practice directly eliminates the need to store a kubeconfig in a CI system?
Which compliance requirement is most directly satisfied by GitOps's Git-commit-based audit trail?

8. What Git branching strategies are commonly used with GitOps?

GitOps repositories have different branching needs than application source repos. The goal is to map Git branches or folders to deployment environments cleanly, while keeping promotion paths easy to reason about.

Three strategies are common in practice:

  1. Environment branches: Separate branches represent separate environments — main maps to production, staging to staging, develop to development. Promoting from dev to staging is a PR merge from develop to staging. Simple to understand but prone to long-lived divergence and merge conflicts as environments drift apart.
  2. Directory-per-environment on a single branch (trunk-based): One main branch holds folders like envs/dev/, envs/staging/, envs/prod/. Kustomize overlays sit in each folder. Promotion is a PR that copies or patches the relevant image tag into the next environment's overlay. This avoids branch divergence and is the approach most commonly recommended for Flux and Argo CD setups.
  3. Separate repos per environment: Dedicated repositories for dev, staging, and prod config. Provides the strictest separation (prod repo can have tighter branch-protection and access controls) but makes cross-environment changes (e.g., updating a shared base) more cumbersome.

For most teams, trunk-based development with Kustomize overlays is the pragmatic choice. A single PR flow touching overlays is easy to review and the Git history stays linear. The separation of the application source repo (where code lives) from the config repo (where deployment manifests live) is an orthogonal best practice that applies regardless of the branching strategy chosen.

In a trunk-based GitOps repository, how are multiple deployment environments typically represented?
What is a key disadvantage of using environment branches (e.g., main=prod, staging branch) in a GitOps config repo?
9. What is drift detection and how does a GitOps operator handle drift?

Drift is the condition where the actual state of the cluster diverges from the desired state declared in Git. Common causes: an engineer runs kubectl scale manually during an incident, a node failure causes a deployment controller to change replica counts temporarily, or a helm upgrade is run outside the GitOps workflow. Without drift detection, these deviations are invisible until something breaks.

A GitOps operator detects drift by continuously repeating the same comparison it uses for normal syncs: it fetches the current live resources from the Kubernetes API and diffs them against the rendered desired state from Git. There is no separate "drift check" mode — drift detection is simply what happens between the commit that caused the last sync and the next one.

Argo CD marks an Application as OutOfSync when it detects drift. If selfHeal: true is set in the sync policy, Argo CD re-applies the Git state immediately, reverting the manual change. If not set, the Application stays OutOfSync and shows the diff in the UI, requiring a human to click Sync.

Flux marks its Kustomization resource as not Ready when drift occurs. With the default configuration (prune: true and a polling interval), Flux will re-apply on the next interval cycle. There is no explicit "selfHeal" toggle — Flux always reconciles on its interval.

Drift tolerance is configurable: you can tell Argo CD to ignore specific fields (like status fields managed by other controllers) using ignoreDifferences in the Application spec, preventing false-positive drift alerts from controllers that write back to managed resources.

What Argo CD Application sync status indicates the live cluster state differs from the Git-declared desired state?
Which Argo CD sync policy option causes Argo CD to automatically revert manual kubectl changes?
10. What is the difference between GitOps and Infrastructure as Code (IaC)?

IaC and GitOps address different layers of the stack and use different execution models, but they complement each other in modern cloud-native environments.

Infrastructure as Code (Terraform, Pulumi, CloudFormation): Provisions cloud resources — VMs, VPCs, managed databases, load balancers, Kubernetes clusters themselves. IaC is typically applied imperatively by a human or pipeline (terraform apply). State is tracked in a tfstate file (local or remote in S3/Terraform Cloud). It answers the question: "What cloud infrastructure should exist?"

GitOps: Deploys and manages application workloads and configuration on top of already-provisioned infrastructure. The GitOps operator runs continuously, reconciling the live cluster to match the Git-declared desired state. It answers the question: "What should be running on this Kubernetes cluster right now?"

IaC vs GitOps
DimensionIaC (Terraform)GitOps (Argo CD / Flux)
TargetCloud resources (EKS, RDS, VPC)Kubernetes workloads and config
Execution modelImperative plan/apply triggered by human or pipelineContinuous pull-based reconciliation loop
State storagetfstate file (S3, Terraform Cloud)Git repository
Drift correctionManual re-apply neededAutomatic on each reconciliation cycle

They are complementary, not competing: Terraform provisions the EKS cluster, GitOps manages what runs on it. The boundary blurs with tools like Crossplane and Cluster API, which represent cloud resources and Kubernetes clusters as Kubernetes CRDs — making it possible to manage infrastructure provisioning through a GitOps operator.

Which tool manages the provisioning of cloud resources like EKS clusters and RDS instances using IaC?
Which tool bridges IaC and GitOps by representing cloud resources as Kubernetes CRDs manageable by a GitOps operator?
11. What is Argo CD and how does it implement GitOps?

Argo CD is a declarative, GitOps continuous delivery tool for Kubernetes, graduated in the CNCF. It implements the pull-based GitOps model: you define Application custom resources that map a Git source to a Kubernetes destination, and Argo CD's Application Controller continuously reconciles the two.

Argo CD's main components:

  • API Server: Exposes a gRPC and REST API consumed by the web UI, the argocd CLI, and CI webhooks. Handles authentication (OIDC via Dex or external providers) and authorisation (RBAC).
  • Repository Server: Clones Git repositories and renders manifests using whichever tool the Application source specifies — plain YAML, Kustomize, Helm, Jsonnet, or a custom config management plugin.
  • Application Controller: The reconciliation engine. Runs a control loop that fetches live Kubernetes resources, compares them to the rendered desired state from the Repository Server, and applies the diff. Emits sync status and health status.
  • Redis: Caches rendered manifests and application state to reduce Git and Kubernetes API load.
  • Dex (optional): An embedded OIDC identity provider for SSO against GitHub, LDAP, SAML, or other identity providers.
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: guestbook
  namespace: argocd
spec:
  project: default
  source:
    repoURL: https://github.com/org/gitops-config.git
    targetRevision: HEAD
    path: apps/guestbook
  destination:
    server: https://kubernetes.default.svc
    namespace: guestbook
  syncPolicy:
    automated:
      prune: true
      selfHeal: true
    syncOptions:
      - CreateNamespace=true
Which Argo CD component is responsible for cloning Git repos and rendering Helm and Kustomize manifests?
What Kubernetes resource type does Argo CD use to represent a single deployed application?
12. How does Argo CD's sync process work — desired state vs live state?

Argo CD's sync process is a six-step reconciliation cycle executed by the Application Controller on every polling interval (default: 3 minutes) or on webhook notification.

  1. Clone and render: The Repository Server clones (or updates from cache) the Git source at the specified targetRevision. It runs the appropriate tool — kustomize build, helm template, or reads plain YAML — to produce a set of Kubernetes resource manifests.
  2. Fetch live state: The Application Controller calls the Kubernetes API to list the current state of all resources in the Application's destination namespace that are labeled as managed by this Application.
  3. Compute diff: Argo CD performs a three-way diff between (a) the last-applied configuration, (b) the current live state, and (c) the desired state from Git. Resources present in Git but not in the cluster are marked as missing. Resources in the cluster but not in Git are marked as extra (candidates for pruning).
  4. Sync decision: If any resource differs, the Application status is set to OutOfSync. If syncPolicy.automated is configured, the sync proceeds immediately. Otherwise, a human must trigger it via UI, CLI, or webhook.
  5. Apply: Argo CD applies the desired manifests using server-side apply. Pre-sync and sync hooks execute in their designated phases.
  6. Update status: Application sync status becomes Synced or SyncFailed. Health status (Healthy / Degraded / Progressing / Missing) is evaluated by checking the readiness conditions of the deployed resources.
# Example: Application status after a partial sync failure
status:
  sync:
    status: OutOfSync
    revision: "abc1234def5678"
  health:
    status: Degraded
  conditions:
    - type: SyncError
      message: "Failed to apply deployment.apps/frontend: field is immutable"
      lastTransitionTime: "2026-04-22T10:00:00Z" 
What Argo CD sync status means every live resource exactly matches the Git-declared desired state?
Which Argo CD component executes the continuous reconciliation loop comparing desired and live Kubernetes state?
13. What are Argo CD Applications and ApplicationSets?

An Application is the fundamental Argo CD custom resource. It maps a single Git source (repository URL, path, revision) to a single Kubernetes destination (cluster API URL, namespace) and declares how to sync between them. One Application typically represents one microservice or one component of a larger system.

An ApplicationSet is a higher-level CR managed by the ApplicationSet controller. Instead of managing one application, it uses generators to programmatically produce multiple Application CRs from a template. As generators produce or remove parameters, the controller creates, updates, or deletes the corresponding Applications automatically.

Built-in ApplicationSet generators:

  • List generator: Explicit list of parameter sets — each item creates one Application.
  • Git directory generator: Scans a Git repo for directories matching a glob pattern; each directory becomes one Application.
  • Cluster generator: Creates one Application per registered Argo CD cluster, optionally filtered by cluster labels.
  • Matrix generator: Cross-product of two other generators — e.g., all apps × all clusters.
  • Pull-request generator: Creates a temporary Application for each open PR, enabling ephemeral preview environments.
# ApplicationSet using the git directory generator
apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
  name: all-apps
  namespace: argocd
spec:
  generators:
    - git:
        repoURL: https://github.com/org/gitops-config.git
        revision: main
        directories:
          - path: apps/*
  template:
    metadata:
      name: "{{path.basename}}"
    spec:
      project: default
      source:
        repoURL: https://github.com/org/gitops-config.git
        targetRevision: main
        path: "{{path}}"
      destination:
        server: https://kubernetes.default.svc
        namespace: "{{path.basename}}" 
Which ApplicationSet generator creates one Application for each directory it discovers in a Git repository?
What does the ApplicationSet controller do when a generator stops producing a particular set of parameters?
14. How do you structure a GitOps repository — app-of-apps, environment folders, overlays?

The way you structure a GitOps config repo determines how easily you can promote changes between environments, onboard new applications, and keep per-environment differences small and reviewable. There is no single correct layout, but three patterns dominate.

1. Kustomize base + environment overlays (most common): Each application has a base/ directory with shared manifests and an overlays/ directory with one folder per environment. Each overlay folder contains a kustomization.yaml that references the base and adds environment-specific patches (image tag, replica count, config values).

gitops-config/
├── apps/
│   └── my-app/
│       ├── base/
│       │   ├── deployment.yaml
│       │   ├── service.yaml
│       │   └── kustomization.yaml
│       └── overlays/
│           ├── dev/
│           │   └── kustomization.yaml  # patches: image tag, replicas=1
│           ├── staging/
│           │   └── kustomization.yaml  # patches: image tag, replicas=2
│           └── prod/
│               └── kustomization.yaml  # patches: image tag, replicas=5
└── argocd/
    └── apps/
        ├── my-app-dev.yaml   # Application CR for dev overlay
        ├── my-app-staging.yaml
        └── my-app-prod.yaml

2. App-of-apps: A single root Argo CD Application points to the argocd/apps/ folder above. Argo CD syncs that folder, creating all the child Application CRs. This allows managing all Application registrations via Git without applying them manually.

3. Separate config repo per environment: Stricter separation — production config lives in a separate repo with tighter branch protection and access control. Suitable for regulated environments but adds overhead when updating shared base configs.

The app repo / config repo split (keeping source code and deployment manifests in separate repositories) is an orthogonal best practice that improves the separation of concerns between CI (build) and CD (deploy).

In a Kustomize-based GitOps repo, where do environment-specific patches such as replica counts and image tags live?
What does the app-of-apps pattern achieve in an Argo CD GitOps setup?
15. What is Flux CD and how does it differ from Argo CD?

Flux CD is a CNCF-graduated GitOps toolkit for Kubernetes. Unlike Argo CD — which ships as a relatively integrated platform with a built-in UI, Application model, and unified controller — Flux is composed of separate, independently deployable controllers, each managing a specific concern. You assemble the controllers you need and extend them without touching the others.

Flux CD vs Argo CD
DimensionFlux CDArgo CD
ArchitectureComposable controllers (source, kustomize, helm, notification, image-automation)Integrated platform with API server, repo server, application controller
UICLI-first; Weave GitOps is a separate optional UIBuilt-in rich web UI with app graph and diff view
Application modelGitRepository + Kustomization CRs (composable)Application CR (opinionated single resource)
Image automationBuilt-in ImageRepository / ImagePolicy / ImageUpdateAutomationRequires Argo CD Image Updater (separate tool)
Multi-tenancyNamespace-scoped CRs; each team manages their own Flux CRsAppProject model with centralised RBAC policy
Helm supportHelmRelease CR with native drift detection and remediationApplication source.helm with sync phases
CNCF statusGraduated (2022)Graduated (2022)

In practice: teams that prioritise a strong visual UI and centralised management tend to choose Argo CD. Teams that prefer composable building blocks, want to avoid centralised control-plane complexity, or need built-in image automation often choose Flux. Both are production-ready and widely deployed at scale.

What is the primary architectural difference between Flux CD and Argo CD?
Which capability is built into Flux CD but requires a separate add-on project with Argo CD?
16. How does Flux's source-controller and kustomize-controller work together?

Flux splits the responsibilities of fetching config and applying config into two separate controllers, connected through an intermediate Artifact object.

source-controller is responsible for watching external sources — Git repositories, Helm repositories, OCI registries, and S3-compatible buckets. When the source changes (new commit, new chart version, new image tag), the source-controller downloads the content, archives it as a compressed tarball, and stores a reference to it in a GitRepository (or HelmRepository, OCIRepository) status field as a versioned Artifact. Subsequent controllers consume this Artifact rather than fetching from the network themselves.

kustomize-controller watches Kustomization CRs. Each Kustomization references a source (typically a GitRepository) and specifies a path within it. When the source's Artifact revision changes, the kustomize-controller:

  1. Downloads the Artifact tarball from source-controller.
  2. Runs kustomize build on the specified path to render final manifests.
  3. Validates the manifests (optional: dry-run or server-side validation).
  4. Applies them to the cluster using server-side apply.
  5. Monitors the health of applied resources and updates the Kustomization Ready condition.
# source-controller watches this repo
apiVersion: source.toolkit.fluxcd.io/v1
kind: GitRepository
metadata:
  name: gitops-config
  namespace: flux-system
spec:
  interval: 1m
  url: https://github.com/org/gitops-config.git
  ref:
    branch: main
---
# kustomize-controller deploys from this repo/path
apiVersion: kustomize.toolkit.fluxcd.io/v1
kind: Kustomization
metadata:
  name: apps-prod
  namespace: flux-system
spec:
  interval: 10m
  sourceRef:
    kind: GitRepository
    name: gitops-config
  path: ./apps/overlays/prod
  prune: true
  wait: true
  timeout: 5m
What does Flux's source-controller produce after detecting a change in a GitRepository?
Which Flux CR instructs the kustomize-controller which Git path to render and apply to the cluster?
17. How do you manage secrets in a GitOps workflow — Sealed Secrets, SOPS, External Secrets Operator?

Plain Kubernetes Secrets encoded as base64 cannot be committed to a Git repository — the value is trivially decodable by anyone with repo access. Three patterns solve this, with different trust models and operational tradeoffs.

1. Sealed Secrets (Bitnami): The sealed-secrets-controller running in the cluster holds a private key. You use the kubeseal CLI to encrypt a regular Secret into a SealedSecret CR — encrypted with the cluster's public key. Only that cluster's controller can decrypt it. The SealedSecret YAML is safe to commit to Git. On sync, the controller decrypts it back into a regular Kubernetes Secret.

# SealedSecret — safe to commit to Git
apiVersion: bitnami.com/v1alpha1
kind: SealedSecret
metadata:
  name: db-credentials
  namespace: production
spec:
  encryptedData:
    password: AgBy8I5V2EqtcPmVTiIuEolW...(encrypted blob)

2. SOPS (Mozilla): Encrypts entire secret files (YAML, JSON, .env) using AWS KMS, GCP KMS, HashiCorp Vault, or age keys. The encrypted file is committed to Git. Flux natively supports SOPS decryption — configure a spec.decryption block in the Kustomization CR pointing to a Kubernetes Secret that holds the decryption key. Argo CD requires a custom config management plugin for SOPS.

3. External Secrets Operator (ESO): An ExternalSecret CR in Git declares which key to fetch from an external store (AWS SSM Parameter Store, HashiCorp Vault, GCP Secret Manager, Azure Key Vault). The ESO controller fetches the secret value and creates a regular Kubernetes Secret. The actual secret value never lives in Git at all — only the reference does.

# ExternalSecret — references AWS SSM, no secret value in Git
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
  name: db-credentials
  namespace: production
spec:
  refreshInterval: 1h
  secretStoreRef:
    name: aws-ssm-store
    kind: ClusterSecretStore
  target:
    name: db-credentials
  data:
    - secretKey: password
      remoteRef:
        key: /prod/db/password
Which secret management approach encrypts the secret file itself and commits the ciphertext directly to the Git repository?
Which approach ensures the actual secret value is never stored in Git — only a reference to it?
18. How do you handle multiple environments (dev/staging/prod) in a GitOps repo?

The most common and maintainable approach is Kustomize base + overlays: shared manifests live in a base/ directory, and each environment has an overlay directory that patches only what differs — typically the container image tag, replica count, resource limits, and environment-specific ConfigMap values.

# apps/my-app/overlays/prod/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
  - ../../base
namePrefix: prod-
commonLabels:
  env: production
images:
  - name: my-app
    newName: registry.example.com/my-app
    newTag: v1.5.2
patches:
  - patch: |-
      - op: replace
        path: /spec/replicas
        value: 5
    target:
      kind: Deployment
      name: my-app

Each environment has its own Argo CD Application or Flux Kustomization CR pointing to its overlay path. Promotion between environments is a PR that updates the image tag in the next environment's kustomization.yaml.

Promotion workflow example:

  1. CI builds image, tags it v1.5.2, pushes to registry.
  2. CI opens a PR updating overlays/dev/kustomization.yaml with tag v1.5.2.
  3. GitOps operator deploys to dev. Team verifies.
  4. Engineer opens a PR to update overlays/staging/kustomization.yaml to v1.5.2.
  5. After staging validation, a final PR promotes overlays/prod/kustomization.yaml to v1.5.2.

Alternative: Helm with per-environment values-dev.yaml, values-prod.yaml files. Flux HelmRelease CRs reference the appropriate values file per environment. Less DRY for structural differences but cleaner for apps already distributed as Helm charts.

In a Kustomize GitOps repo, what does an overlay kustomization.yaml typically change compared to the base?
How is a GitOps environment promotion from staging to prod typically implemented?
19. How does image automation work in Flux for continuous delivery?

Flux's image automation system closes the loop between a new container image being pushed to a registry and the updated tag being committed back to the GitOps config repository — without any CI pipeline involvement. It relies on three CRs working in sequence.

  1. ImageRepository: Tells Flux to scan a container registry on an interval and cache the available tags for a given image name.
  2. ImagePolicy: Selects the "best" tag from those discovered by the ImageRepository, using a policy rule — semver range, alphabetical ordering, or a regex filter.
  3. ImageUpdateAutomation: When the ImagePolicy resolves to a new tag, this CR commits the new tag value back to the GitOps Git repository. The Kustomization's image setter marker in the YAML file (a comment annotation) tells Flux exactly which field to update.
# Step 1: scan registry for available tags
apiVersion: image.toolkit.fluxcd.io/v1beta2
kind: ImageRepository
metadata:
  name: my-app
  namespace: flux-system
spec:
  image: registry.example.com/my-app
  interval: 5m
---
# Step 2: pick the latest semver tag in the 1.x range
apiVersion: image.toolkit.fluxcd.io/v1beta2
kind: ImagePolicy
metadata:
  name: my-app
  namespace: flux-system
spec:
  imageRepositoryRef:
    name: my-app
  policy:
    semver:
      range: ">=1.0.0 <2.0.0"
---
# Step 3: commit the new tag back to Git
apiVersion: image.toolkit.fluxcd.io/v1beta1
kind: ImageUpdateAutomation
metadata:
  name: flux-system
  namespace: flux-system
spec:
  interval: 30m
  sourceRef:
    kind: GitRepository
    name: gitops-config
  git:
    commit:
      author:
        email: fluxcdbot@example.com
        name: Flux
      messageTemplate: "chore: update image to {{range .Updated.Images}}{{.}}{{end}}"
    push:
      branch: main
  update:
    path: ./apps
    strategy: Setters
Which Flux CR commits the newly resolved image tag back to the GitOps config Git repository?
What does a Flux ImagePolicy resource define?
20. What are Argo CD sync policies — automated vs manual — and sync waves?

Manual sync (default): Argo CD detects drift and marks the Application OutOfSync, but applies nothing until a human explicitly triggers synchronisation via the UI, argocd app sync CLI, or an API call. Suitable for production environments where deployments need explicit human approval.

Automated sync: Configured via syncPolicy.automated. Argo CD syncs automatically as soon as it detects a Git change or live-state drift. Two sub-options refine the behaviour:

  • prune: true — Argo CD deletes Kubernetes resources that exist in the cluster but have been removed from Git. Without this, removed manifests leave orphaned resources.
  • selfHeal: true — Argo CD re-applies Git state when live state drifts (e.g., someone ran kubectl manually). Without this, automated sync only fires on Git changes, not on live drift.

Sync waves control the order in which resources are created within a single sync operation. Resources are annotated with a wave number; lower-numbered waves sync first. Argo CD waits for all resources in wave N to become healthy before starting wave N+1. This is essential for ordering CRD creation before CR creation, database migrations before application pods, or secrets before deployments.

# Automated sync policy with pruning and self-healing
spec:
  syncPolicy:
    automated:
      prune: true
      selfHeal: true
    syncOptions:
      - CreateNamespace=true
---
# Sync wave: migration Job (wave 1) runs and completes before app (wave 2)
apiVersion: batch/v1
kind: Job
metadata:
  name: db-migration
  annotations:
    argocd.argoproj.io/sync-wave: "1"
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app
  annotations:
    argocd.argoproj.io/sync-wave: "2" 
Which Argo CD automated sync option deletes Kubernetes resources that have been removed from the Git repository?
What Kubernetes annotation on a resource controls its creation order within an Argo CD sync operation?
21. How do you roll back a deployment using GitOps?

In GitOps, a rollback is not a special out-of-band operation — it is just another change to the Git repository. The same reconciliation loop that applies new versions applies the rollback. This preserves the audit trail and keeps the single source of truth intact.

Method 1 — git revert (preferred): Create a new commit that reverses the problematic change. The commit appears in history, clearly documenting when the rollback happened and why. This is always safe on protected branches.

git revert abc1234   # creates a new "revert" commit
git push origin main  # triggers the GitOps operator to sync back to the previous state

Method 2 — git reset (emergency, use with caution): Hard-resets the branch to a known-good commit and force-pushes. Rewrites history. Use only when the problematic commit must be expunged from the record.

git reset --hard v1.4.9-tag
git push --force-with-lease origin main

Method 3 — Argo CD history rollback (without touching Git): Argo CD stores the last N sync revisions. You can roll back to any of them via UI (Application → History and Rollback) or CLI. This does not revert the Git branch — the Application will show OutOfSync against HEAD until you either reconcile or also revert Git.

# CLI rollback to history entry ID 3
argocd app rollback my-app 3

# Or pin the Application to a specific commit in its spec
spec:
  source:
    targetRevision: "abc1234def"  # pinned to known-good SHA
Which git operation is preferred for GitOps rollbacks because it preserves the commit history?
What is the side effect of using 'argocd app rollback' without also reverting the Git repository?
22. How do you integrate GitOps with a CI pipeline — separation of concerns?

The cleanest mental model is: CI builds artifacts, GitOps deploys them. The CI pipeline (GitHub Actions, GitLab CI, Jenkins) is responsible for everything up to and including pushing a container image to a registry. It is explicitly not responsible for applying changes to the cluster. That role belongs entirely to the GitOps operator.

The handoff between CI and GitOps is a commit to the config repository. CI updates the image tag reference in the relevant overlay or HelmRelease, commits it, and pushes. The GitOps operator detects the commit on its next poll and applies the new image tag to the cluster.

Config repo update step in a GitHub Actions CI workflow:

- name: Update image tag in GitOps config repo
  env:
    IMAGE_TAG: ${{ github.sha }}
  run: |
    git clone https://${{ secrets.GITOPS_TOKEN }}@github.com/org/gitops-config.git config
    cd config
    # Use kustomize to update the image tag in the prod overlay
    cd apps/my-app/overlays/prod
    kustomize edit set image my-app=registry.example.com/my-app:${IMAGE_TAG}
    git config user.email "ci-bot@example.com"
    git config user.name "CI Bot"
    git add kustomization.yaml
    git commit -m "chore(cd): promote my-app to ${IMAGE_TAG}"
    git push origin main

This separation has several benefits: the CI system never needs cluster credentials; rollbacks only require a Git revert (no re-running CI); and the config repo provides a clear record of every version deployed to every environment, independent of the CI system that triggered it.

Alternatives to manual kustomize edit set image: yq for YAML patching, sed -i (fragile but simple), or Flux's built-in image automation which removes the config-repo-update step from CI entirely.

In a GitOps CI/CD separation model, what is the CI pipeline's only responsibility regarding deployment?
Which command-line tool can update an image tag inside a Kustomize overlay during a CI pipeline step?
23. What is progressive delivery and how does it relate to GitOps — Argo Rollouts, Flagger?

Progressive delivery is the practice of releasing changes incrementally to a subset of users or traffic rather than all-at-once. Common strategies include canary releases (route 5% of traffic to the new version, ramp up if metrics look healthy), blue-green (maintain two identical environments and switch traffic between them), and A/B testing. The goal is to catch problems on a small blast radius before full rollout.

GitOps stores the desired final state. Progressive delivery tools manage the transition path to get there. A Git commit declaring a new image tag triggers the start; a separate tool decides how fast and safely to promote it.

Argo Rollouts: A Kubernetes controller that replaces standard Deployment resources with a Rollout CRD. The Rollout spec includes a strategy block defining canary steps or blue-green configuration. Argo CD syncs the Rollout manifest from Git; Argo Rollouts then manages the actual pod progression, traffic shifting (via Istio, NGINX, ALB), and Analysis template queries (Prometheus metrics, Datadog, Kayenta). If metrics fail, Rollouts automatically aborts and rolls back.

Flagger: Works alongside Flux CD. You keep deploying regular Kubernetes Deployments via Flux; Flagger watches them and creates primary/canary Deployment pairs automatically. It uses Prometheus or Datadog metrics to gate each traffic increment and integrates with Istio, App Mesh, Linkerd, and Nginx for traffic management. Canary objects are fully managed by Flagger — you only maintain the primary Deployment in Git.

Neither tool conflicts with GitOps — the rollout strategy CR itself is stored in Git and managed by the GitOps operator. Git records the intent; Argo Rollouts or Flagger executes the progression safely.

Which tool works natively alongside Flux CD to automate metrics-gated canary promotions?
What Kubernetes resource type does Argo Rollouts introduce to replace standard Deployments for canary/blue-green control?
24. How do you handle Helm charts in a GitOps workflow?

Both Argo CD and Flux CD support Helm natively, but they integrate with it differently. The key GitOps principle is that Helm chart versions and values are declared in Git, not passed as CLI arguments — and the GitOps operator, not a human running helm upgrade, applies them.

Flux CD — HelmRelease CR: A HelmRepository CR registers a Helm chart repository or OCI registry. A HelmRelease CR declares which chart, which version, and which values to use. Flux's helm-controller renders the chart and applies the resulting manifests. Crucially, if someone runs helm upgrade directly, Flux detects the drift and reverts it on the next reconciliation cycle.

# Flux HelmRelease for nginx-ingress
apiVersion: helm.toolkit.fluxcd.io/v2beta2
kind: HelmRelease
metadata:
  name: nginx-ingress
  namespace: ingress-nginx
spec:
  interval: 5m
  chart:
    spec:
      chart: ingress-nginx
      version: "4.9.x"          # minor-version floating; patch updates auto-apply
      sourceRef:
        kind: HelmRepository
        name: ingress-nginx
        namespace: flux-system
  values:
    controller:
      replicaCount: 2
      service:
        type: LoadBalancer
  valuesFrom:
    - kind: ConfigMap
      name: ingress-extra-values   # additional values from a ConfigMap in Git

Argo CD — Application with Helm source: Set source.helm in the Application CR. You can specify valueFiles (paths to values files within the repo) or inline values overrides. Argo CD renders the chart and syncs the output manifests.

Best practices: Always pin chart versions (avoid * or floating ranges in prod). Store values files alongside the HelmRelease in Git, not as inline YAML when values are large. Use valuesFrom in Flux to reference ConfigMaps or Secrets for values that change per environment, keeping the HelmRelease itself environment-agnostic.

In Flux CD, which CR specifies the Helm chart version, values, and upgrade strategy for a GitOps-managed release?
What does Flux CD do if someone runs helm upgrade directly on a Flux-managed release?
25. How do you use Kustomize overlays in a GitOps repository?

Kustomize is a template-free Kubernetes manifest customisation tool built into kubectl and natively supported by both Argo CD and Flux. The overlay pattern keeps shared resource definitions in a base/ directory and accumulates environment-specific differences in per-environment overlays/ directories, avoiding duplication while keeping diffs small and reviewable.

A base kustomization.yaml simply lists the resources it manages:

# apps/my-app/base/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
  - deployment.yaml
  - service.yaml
  - hpa.yaml

A production overlay then references the base and applies patches:

# apps/my-app/overlays/prod/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
  - ../../base
namePrefix: prod-
commonLabels:
  env: production
images:
  - name: my-app
    newName: registry.example.com/my-app
    newTag: v2.1.0
patches:
  # Strategic merge patch: increase replicas for prod
  - path: replica-patch.yaml
  # JSON 6902 patch: set a specific env var
  - patch: |-
      - op: replace
        path: /spec/template/spec/containers/0/env/0/value
        value: "production"
    target:
      kind: Deployment
      name: my-app

Useful Kustomize fields in overlays: namePrefix / nameSuffix to namespace resource names per environment; commonLabels / commonAnnotations to tag all resources; configMapGenerator and secretGenerator to create environment-specific ConfigMaps or Secrets. Argo CD and Flux both run kustomize build on these directories before applying, so the raw YAML you store in Git is never the final manifest — patches are applied at sync time.

What does a Kustomize strategic merge patch do compared to a full resource replacement?
Which kustomization.yaml field prepends a string to the names of all resources in a Kustomize overlay?
26. How do you implement multi-cluster GitOps at scale?

Multi-cluster GitOps adds the dimension of which cluster to deploy to on top of the standard single-cluster model. The two dominant approaches are hub-spoke (a central GitOps control plane manages all clusters) and decentralised (each cluster runs its own GitOps operator, all watching the same or related Git repos).

Argo CD hub-spoke: A single Argo CD instance (the "hub") in a management cluster registers all target clusters using argocd cluster add. Applications or ApplicationSets declare their destination.server as the registered cluster API URL. The hub's Application Controller communicates with each target cluster's API server directly. An ApplicationSet with the cluster generator creates one Application per registered cluster automatically.

apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
  name: platform-addons
  namespace: argocd
spec:
  generators:
    - clusters:
        selector:
          matchLabels:
            tier: production
  template:
    metadata:
      name: "{{name}}-platform-addons"
    spec:
      project: platform
      source:
        repoURL: https://github.com/org/gitops-config.git
        targetRevision: main
        path: "clusters/{{metadata.labels.region}}/addons"
      destination:
        server: "{{server}}"
        namespace: platform-system

Flux decentralised: Each cluster runs its own Flux controllers, bootstrapped from a management cluster or a bootstrap script. A "management" cluster's Flux can push Flux configs to leaf clusters by managing their Flux Kustomization CRs via the Kubernetes API or GitOps itself.

Scaling considerations: Shard the Argo CD Application Controller by cluster to handle hundreds of clusters. Use AppProjects or Flux tenancy to enforce namespace-level isolation per team. Keep a separate Git directory structure per cluster (or cluster tier) to make blast-radius clear — a change to clusters/prod-us-east-1/ only affects that cluster.

Which Argo CD ApplicationSet generator automatically creates one Application per registered Argo CD cluster?
In a Flux multi-cluster setup, what typically runs the Flux controllers on each target workload cluster?
27. How does Argo CD handle RBAC and multi-tenancy?

Argo CD implements multi-tenancy through two complementary mechanisms: AppProjects (resource-level isolation) and RBAC policies (action-level access control).

AppProject is a CR that scopes what a group of Applications can do:

  • sourceRepos: limits which Git repositories are allowed as Application sources for this project.
  • destinations: limits which cluster/namespace combinations Applications in this project can deploy to.
  • clusterResourceWhitelist / namespaceResourceBlacklist: controls which Kubernetes resource kinds are permitted.
apiVersion: argoproj.io/v1alpha1
kind: AppProject
metadata:
  name: team-frontend
  namespace: argocd
spec:
  sourceRepos:
    - https://github.com/org/frontend-config.git
  destinations:
    - namespace: frontend-*
      server: https://kubernetes.default.svc
  clusterResourceWhitelist:
    - group: ""
      kind: Namespace
  namespaceResourceBlacklist:
    - group: ""
      kind: ResourceQuota

RBAC is configured in the argocd-rbac-cm ConfigMap using Casbin policy syntax. Subjects (users, SSO groups, service accounts) are assigned roles that grant permissions on Argo CD resources.

# argocd-rbac-cm data.policy.csv
# p, <subject>, <resource>, <action>, <appproject>/<object>
p, role:frontend-dev, applications, get, team-frontend/*
p, role:frontend-dev, applications, sync, team-frontend/*
p, role:frontend-dev, applications, action/*, team-frontend/*

# g, <user or group>, <role>
g, engineering-frontend@example.com, role:frontend-dev

Built-in roles: role:readonly (view-only across all projects) and role:admin (full access). Custom roles can be scoped to a single AppProject. Combined, AppProjects and RBAC let you give a team full control over their own Applications without them being able to see or affect other teams' workloads.

Which Argo CD resource restricts which Git repositories and destination namespaces a team's Applications can use?
What policy engine does Argo CD use internally to evaluate RBAC rules?
28. What are the Argo CD app-of-apps and ApplicationSet patterns and when do you use each?

Both patterns solve the same problem — managing many Argo CD Applications without manually applying each one — but they use different mechanisms and suit different scales.

App-of-apps: A parent Argo CD Application has its source.path pointing to a Git directory that contains child Application YAML manifests. When Argo CD syncs the parent, it applies those Application CRs into the argocd namespace. Each child Application then manages its own workload independently. The child Application YAMLs are static files you write by hand and commit to Git.

# Parent Application (the "root app")
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: root-app
  namespace: argocd
spec:
  source:
    repoURL: https://github.com/org/gitops-config.git
    targetRevision: main
    path: argocd/apps          # this folder holds child Application CRs
  destination:
    server: https://kubernetes.default.svc
    namespace: argocd
  syncPolicy:
    automated:
      prune: true

ApplicationSet: A CR managed by the ApplicationSet controller. Instead of static Application files, you define a template and one or more generators. The controller creates, updates, and deletes Application CRs automatically as generators produce or remove parameter sets. No need to manually add/remove Application YAML files when a new service or cluster is onboarded.

When to use each:

  • App-of-apps: small, relatively static set of applications; you want explicit per-Application YAML for full control and diff visibility; or you need per-Application customisations that generators can't express concisely.
  • ApplicationSet: large number of applications; dynamic inputs (new directories in Git, new clusters registered); need to scale without manual YAML maintenance. The matrix and merge generators allow powerful combinatorial deployments.
How does an ApplicationSet differ from the app-of-apps pattern in terms of how Application CRs are created?
Which ApplicationSet generator creates Applications by combining the output of two other generators in a cross-product?
29. How do you implement GitOps for infrastructure provisioning with Crossplane and Cluster API?

Both Crossplane and Cluster API extend the Kubernetes API with CRDs that represent infrastructure resources. Because the desired state is expressed as Kubernetes objects, a standard GitOps operator (Argo CD or Flux) can manage those objects from Git — giving you continuous reconciliation and drift detection for cloud infrastructure, not just application workloads.

Crossplane manages cloud resources (RDS, S3, GKE clusters, IAM roles) through provider plugins. You define a CompositeResourceDefinition (XRD) that creates a new API type, and a Composition that maps it to provider-specific resources. Developers then create a Composite Resource Claim (XRC) stored in the GitOps repo — Crossplane provisions the cloud resource and injects connection secrets into the cluster.

# Developer commits this Claim to the GitOps config repo
apiVersion: database.example.org/v1alpha1
kind: PostgreSQLInstance
metadata:
  name: app-db
  namespace: production
spec:
  parameters:
    storageGB: 20
    version: "14"
    region: eu-west-1
  compositionRef:
    name: postgresql-aws-rds
  writeConnectionSecretToRef:
    name: app-db-credentials

Cluster API (CAPI) manages the lifecycle of Kubernetes clusters themselves as Kubernetes objects on a management cluster. Cluster, MachineDeployment, and infrastructure-provider-specific CRs (AWSManagedControlPlane, AzureManagedMachinePool, etc.) are stored in Git. A GitOps operator syncs them to the management cluster; CAPI provisions, upgrades, and scales workload clusters accordingly.

The combined pattern: Cluster API provisions workload clusters, Crossplane provisions cloud dependencies (databases, queues, buckets), and a second GitOps instance (Argo CD or Flux) on the new workload cluster deploys the application — all driven from a single Git repository hierarchy, each layer's CRs committed and reconciled continuously.

What does a developer commit to the GitOps config repo to request a cloud database via Crossplane?
Which Cluster API resource defines a pool of worker nodes for a Kubernetes workload cluster?
30. How do you observe and alert on GitOps sync failures in production?

GitOps operators expose rich telemetry specifically for monitoring sync and reconciliation health. The two main surfaces are Prometheus metrics and the operators' own notification controllers.

Argo CD observability:

  • Argo CD exposes Prometheus metrics on port 8082. Key metrics: argocd_app_info (labels include sync_status and health_status), argocd_app_sync_total{phase="Failed"}, and argocd_app_reconcile_duration_seconds.
  • The Argo CD Notifications Controller sends alerts based on trigger conditions (SyncFailed, AppOutOfSync, AppDegraded) to Slack, PagerDuty, email, or any webhook — configured via argocd-notifications-cm ConfigMap.
  • argocd app list and argocd app get <name> for CLI-based status checks.

Flux observability:

  • Each Flux controller exposes metrics. Key: gotk_reconcile_error_total (per controller, per object name), gotk_reconcile_duration_seconds, gotk_resource_info (with ready and suspended labels).
  • flux get all and flux events --watch for real-time status.
  • The Flux Notification Controller uses Provider and Alert CRs (stored in Git) to route events from any Flux resource type to any webhook or messaging service.

Example Prometheus alert rule for Argo CD:

- alert: ArgoCDAppSyncFailed
  expr: increase(argocd_app_sync_total{phase="Failed"}[5m]) > 0
  for: 1m
  labels:
    severity: critical
  annotations:
    summary: "Argo CD sync failed for app {{ $labels.name }}"
Which Prometheus metric from Argo CD indicates that a sync operation ended in failure?
Which Flux component is responsible for routing reconciliation event notifications to Slack or PagerDuty?
31. How do you manage database schema migrations in a GitOps workflow?

Database schema migrations are inherently imperative and ordered — they run once, in sequence, and must complete successfully before the application starts. This sits awkwardly with GitOps's declarative, continuously-reconciled model. Three patterns handle this well.

1. Argo CD PreSync hook with a Kubernetes Job: Annotate a migration Job with argocd.argoproj.io/hook: PreSync. Argo CD runs the Job before applying any other resources in the sync. If the Job fails, the sync aborts and the application is not updated. Set hook-delete-policy: BeforeHookCreation to clean up the previous Job before creating a new one.

apiVersion: batch/v1
kind: Job
metadata:
  name: db-migration
  annotations:
    argocd.argoproj.io/hook: PreSync
    argocd.argoproj.io/hook-delete-policy: BeforeHookCreation
spec:
  template:
    spec:
      containers:
        - name: migrate
          image: registry.example.com/my-app:v1.5.2
          command: ["./migrate", "--run"]
      restartPolicy: Never

2. Sync waves: If you don't want a PreSync hook, use argocd.argoproj.io/sync-wave: "1" on the migration Job and sync-wave: "2" on the Deployment. Argo CD waits for wave 1 to succeed before starting wave 2. The Job must complete (not just start) for the wave transition to occur.

3. Flyway / Liquibase as an init container: Embed migration logic directly in the application Pod as an init container. The init container runs flyway migrate or liquibase update against the database before the main container starts. Migration SQL files are packaged inside the application image. No separate Job needed — migrations run atomically with Pod startup. Downside: the application image must always include all historical migrations.

Which Argo CD hook type executes a Kubernetes Job before the main sync phase begins?
Which annotation-based mechanism in Argo CD ensures a migration Job completes successfully before an application Deployment is applied?
32. How do you implement policy enforcement in a GitOps pipeline — OPA/Gatekeeper, Kyverno?

Policy enforcement in GitOps works on two levels: shift-left checks in the CI pipeline (before changes reach the cluster) and runtime admission controls in the cluster (enforced on every resource creation or update, including GitOps operator applies).

OPA/Gatekeeper: The gatekeeper-controller registers as a Kubernetes admission webhook. Policies are written in Rego and packaged as ConstraintTemplate CRDs, with Constraint CRs instantiating them with parameters. You store both in the GitOps config repo and sync them via Argo CD or Flux — policies are themselves managed as GitOps-first resources. In CI, use conftest to run the same Rego policies against manifests before they are committed, catching violations before merge.

Kyverno: Policy-as-code using Kubernetes-native YAML syntax — no Rego required. A ClusterPolicy CR can validate, mutate, or generate resources. Store ClusterPolicy CRs in Git; your GitOps operator applies them. Kyverno also supports a CLI (kyverno apply) for CI-phase pre-validation.

# Kyverno ClusterPolicy: every Pod must declare CPU and memory limits
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: require-resource-limits
spec:
  validationFailureAction: Enforce
  rules:
    - name: check-container-limits
      match:
        any:
          - resources:
              kinds:
                - Pod
      validate:
        message: "Resource limits (cpu and memory) are required for all containers."
        pattern:
          spec:
            containers:
              - name: "*"
                resources:
                  limits:
                    memory: "?*"
                    cpu: "?*" 

The GitOps policy-as-code flow: write ClusterPolicy/ConstraintTemplate → commit to config repo → CI runs kyverno apply or conftest test to validate other manifests in the PR → GitOps operator syncs policies to cluster → admission webhook enforces on all future resource applies, including those from the operator itself. This means every deployment the operator makes is validated against the current policy set.

Which policy engine uses Rego language to define admission control constraints in Kubernetes?
Which CLI tool is used in CI pipelines to validate Kubernetes manifests against OPA/Rego policies before they are committed?
33. What are the limitations and anti-patterns of GitOps?

GitOps is powerful but not a universal solution. Being aware of its genuine limitations prevents teams from forcing it into contexts where it creates more friction than value.

Genuine limitations:

  • Secret management adds complexity: Every team member must understand at least one secret management tool (SOPS, Sealed Secrets, ESO). There is no "just commit the secret" escape hatch. Onboarding engineers to this mental model takes real effort.
  • Stateful workloads need careful design: Databases, message brokers, and anything with persistent state require ordered operations (migrations, backup/restore, leader election) that don't map naturally to declarative reconciliation. GitOps doesn't replace operational runbooks for stateful services.
  • Slow emergency response path: When a production incident requires an immediate replica-count change, the fastest GitOps response is still: edit file → commit → push → wait for operator poll cycle. Teams must define "break-glass" procedures (e.g., temporarily pausing the operator's selfHeal to allow direct kubectl) without abandoning the GitOps principle entirely.
  • Git history pollution: Automated image-tag updates generate many tiny commits. Over time the commit log becomes noisy and hard to scan. Mitigate with squashing, dedicated automation commits, or Flux's imageTemplate commit message customisation.
  • Learning curve: Teams unfamiliar with Kubernetes CRDs find GitOps reconciliation errors hard to debug — "why is my Kustomization not Ready?" is a different mental model from "why is my pipeline failing?"

Anti-patterns to avoid:

  • Storing unencrypted Kubernetes Secrets in Git (even base64 is cleartext).
  • Running kubectl apply directly on a GitOps-managed cluster — the operator will revert it, causing confusion and potential incidents.
  • Using mutable image tags (:latest) — breaks the traceability between Git commit and deployed artefact.
  • Keeping source code and deployment manifests in the same repository — CI rebuilds trigger on code changes and GitOps changes become entangled.
  • Not pinning Helm chart versions — an upstream chart update can unexpectedly change a production deployment.
Which anti-pattern causes a GitOps operator to aggressively overwrite an intentional out-of-band cluster change?
Why is using mutable image tags like ':latest' an anti-pattern in GitOps?
34. How do you migrate an existing deployment pipeline to GitOps?

Migrating to GitOps is not a big-bang cutover — it works best as a phased process where the old pipeline and the GitOps operator run in parallel until confidence is established, then the old pipeline is decommissioned.

Step-by-step migration:

  1. Capture current state as declarative YAML: Export live Kubernetes resources with kubectl get all,configmap,secret,ingress -o yaml. Strip ephemeral fields (status, resourceVersion, uid, managed fields). This becomes your starting commit.
  2. Create the config repository: Organise manifests into an app folder structure with base + overlays (or Helm chart + values files). Commit the cleaned-up YAML. Add secrets using SOPS or Sealed Secrets before committing.
  3. Install the GitOps operator: Install Argo CD or Flux in the cluster. Keep the operator in dry-run or manual-sync mode initially.
  4. Connect operator to config repo: Create an Application or Kustomization CR pointing to the new config repo.
  5. Validate with dry-run: Confirm the operator produces manifests identical to what is currently running. Fix discrepancies in the config repo.
  6. Enable sync — manual first, then automated: Switch to manual sync. Run a few syncs from the UI or CLI. Observe. Then enable automated sync with prune: false initially (no deletions yet).
  7. Update CI to write to the config repo instead of kubectl:
# Replace: kubectl set image deployment/my-app my-app=${IMAGE}
# With: update the image tag in the config repo and push
- name: Promote image to config repo
  run: |
    cd gitops-config
    kustomize edit set image my-app=registry.example.com/my-app:${IMAGE_TAG}
    git add apps/my-app/overlays/prod/kustomization.yaml
    git commit -m "chore(cd): promote my-app ${IMAGE_TAG} to prod"
    git push origin main
  1. Revoke direct cluster write access: Remove kubeconfig from the CI system. Update RBAC to deny write access to any service account not owned by the GitOps operator.
  2. Enable prune: Once the team is comfortable, enable prune: true so deleted manifests are removed from the cluster.
What is the recommended first step when migrating an existing Kubernetes workload to GitOps?
After the GitOps operator is managing deployments, what must be removed from the CI/CD system to fully commit to the GitOps model?
35. How does GitOps fit into a platform engineering strategy?

Platform engineering is the discipline of building Internal Developer Platforms (IDPs) that give application teams self-service access to infrastructure and deployment capabilities through well-defined abstractions. GitOps is the delivery mechanism underneath the platform — it ensures that everything the IDP provisions or configures is continuously reconciled to its declared state, without platform engineers manually applying changes.

Key integration points:

  • Golden path templates: Platform teams define standardised GitOps repository templates (via Backstage Software Templates, Cookiecutter, or custom scaffolding) that new services bootstrap from. The template pre-wires the CI pipeline, the Kustomize overlay structure, the Argo CD Application CR, and the required RBAC/AppProject — so a developer starts with a working GitOps setup from day one.
  • Self-service via PR: Rather than filing a Jira ticket to get a new namespace or a database, a developer fills in a form in Backstage, which opens a PR to the config repo. An Argo CD ApplicationSet or Crossplane Composition is triggered by that PR merge, provisioning the requested resources automatically. The platform team approves the template once; individual team requests need no manual platform-team intervention.
  • Crossplane + GitOps for infrastructure: Developers request cloud resources (databases, queues, storage) by committing Crossplane Claim CRs to the config repo. The GitOps operator applies them; Crossplane provisions the cloud resources. Connection secrets are injected via External Secrets Operator. The developer only interacts with Git and a Kubernetes-style API — never with cloud consoles.
  • Policy-as-code as a platform service: Platform teams manage Kyverno ClusterPolicies and OPA/Gatekeeper ConstraintTemplates in Git. Every team's workload automatically inherits these guardrails because the GitOps operator applies the policies to all clusters. Compliance is built in, not bolted on.
  • Observability configuration as GitOps: Prometheus, Grafana, Loki, and alerting rules are themselves deployed and managed via GitOps, ensuring every cluster has a consistent observability stack that can be audited and version-controlled like any other workload.

The net result: developers interact with Git and a friendly UI, not Kubernetes directly. Platform teams own the templates and policies, not the individual deployments. GitOps closes the loop between expressed intent and running reality for every layer of the platform.

In a GitOps-backed Internal Developer Platform, what does a developer submit to request a new service deployment or cloud resource?
Which tool is commonly used to scaffold Golden Path GitOps repository templates for new services in a platform engineering setup?
«
»
Testing

Comments & Discussions