Maven / GitOps Interview Questions
GitOps is an operational framework that applies DevOps practices — version control, collaboration, compliance, and CI/CD automation — to infrastructure and application delivery. The term was coined by Alexis Richardson of Weaveworks in 2017. The central idea is that Git acts as both the mechanism for change (pull requests) and the immutable audit log (commit history) for every system state transition.
The OpenGitOps working group (CNCF) formalised four core principles:
- Declarative: The entire desired system state is expressed declaratively — you describe what should exist, not the sequence of steps to create it. Kubernetes manifests, Helm values files, and Kustomize overlays all qualify.
- Versioned and Immutable: Desired state is stored in a VCS (Git) that enforces immutability and retains full history. Every change is a commit — reviewable, reversible, and attributable to a specific author.
- Pulled Automatically: Software agents — not humans or CI pipelines — pull desired state from Git and apply it to the target environment. This inverts the traditional push model and keeps cluster credentials inside the cluster, not in external CI systems.
- Continuously Reconciled: Agents continuously compare actual cluster state against the Git-declared desired state. When drift is detected they either alert operators or automatically self-heal, converging the system back to what Git specifies.
These four principles create a closed-loop automation system where every deployment, rollback, or configuration change flows through a Git commit and review cycle.
Traditional CI/CD pipelines are push-based: the CI system builds an artifact, and the CD stage runs kubectl apply or helm upgrade directly against the cluster. The pipeline holds a kubeconfig or service-account token with cluster-write access. There is no persistent desired-state record and no automatic drift correction — if someone manually deletes a deployment, the pipeline only redeploys on the next trigger.
GitOps is pull-based: the CI system builds the image and updates a config repository, but it never touches the cluster directly. A GitOps operator running inside the cluster watches the config repo and reconciles the live state to match what Git says, continuously. Rollback is a git revert rather than re-running a pipeline step.
| Dimension | Traditional CI/CD | GitOps |
|---|---|---|
| Deployment trigger | Pipeline push on build success | Operator pull on Git commit |
| Cluster credentials | Stored in CI system secrets | Kept inside the cluster only |
| Drift detection | None — only corrects on next pipeline run | Continuous — operator reconciles on every poll cycle |
| Rollback mechanism | Re-run old pipeline or manual kubectl | git revert creates a new auditable commit |
| Audit trail | Pipeline logs (often ephemeral) | Git commit history (permanent, cryptographically ordered) |
The separation also improves security posture: even if the CI system is compromised, an attacker cannot push arbitrary changes to the cluster without also compromising Git and passing branch-protection reviews.
In GitOps, the Git repository is the system state. Every resource that should exist in the cluster — deployments, services, config maps, RBAC rules, network policies — is represented as a committed file. The repository is not a backup or documentation artifact; it is the authoritative record, and the cluster is just the materialised form of what Git contains.
Practical implications of this principle:
- No out-of-band changes: Running
kubectl apply -fdirectly, editing a ConfigMap in the Kubernetes dashboard, or scaling a deployment manually all create "drift" — a gap between what Git says and what the cluster is doing. A GitOps operator will detect and revert those changes on its next reconciliation cycle. - All changes via pull request: A developer who wants to change a replica count opens a PR, gets it reviewed and approved, and merges it. The operator then applies the change automatically. There is no separate "deployment approval" step because the PR is the approval.
- History as an audit log:
git logshows exactly who changed what, when, and why. This satisfies SOC 2, PCI-DSS, and similar compliance requirements without building a separate audit system. - Reproducibility: Because the entire desired state is in Git, spinning up a new environment is a matter of pointing a GitOps operator at the same repository and branch. There are no snowflake servers.
The single-source-of-truth principle breaks down if teams maintain parallel state (e.g., Helm releases applied manually alongside Argo CD-managed ones). Discipline about eliminating all write paths to the cluster other than the operator is essential.
Push-based deployment: a CI/CD pipeline (GitHub Actions, Jenkins, etc.) runs kubectl apply or helm upgrade directly against the target cluster after a build succeeds. The pipeline authenticates to the cluster using a kubeconfig or service-account token stored in the CI system's secret vault. Changes only reach the cluster when a pipeline is triggered; drift is not detected between runs.
Pull-based deployment: a GitOps operator running inside the cluster watches one or more Git repositories on a polling interval or via webhook. When it detects a difference between what Git declares and what the cluster is running, it applies the delta automatically. Cluster credentials never leave the cluster boundary.
| Property | Push-based | Pull-based (true GitOps) |
|---|---|---|
| Who applies changes | CI/CD pipeline | In-cluster operator |
| Credential exposure | kubeconfig in CI secrets | Credentials stay in cluster |
| Drift correction | No automatic correction | Continuous reconciliation |
| Examples | GitHub Actions + kubectl, Jenkins pipeline | Argo CD, Flux CD |
A pull-based Argo CD Application looks like this — the operator handles all cluster writes:
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: my-app
namespace: argocd
spec:
project: default
source:
repoURL: https://github.com/org/gitops-config.git
targetRevision: main
path: apps/my-app/overlays/prod
destination:
server: https://kubernetes.default.svc
namespace: production
syncPolicy:
automated:
prune: true
selfHeal: true
A GitOps operator is a software agent that runs inside the target cluster and implements the pull-based reconciliation loop. It is the engine that turns a static Git repository into a live, self-correcting deployment system. Without an operator, GitOps is just a documentation convention — the operator is what closes the loop between intent (Git) and reality (cluster).
An operator performs four continuous activities:
- Source watching: Polls a Git repository (or OCI artifact registry) on a configurable interval — typically every 1–5 minutes — or listens for a webhook notification. It detects when the desired state has changed.
- Manifest rendering: Fetches the source and renders it into plain Kubernetes manifests. Depending on configuration, this may involve running
kustomize build, rendering a Helm chart, or processing Jsonnet templates. - Diff computation: Compares the rendered desired state against the live resources in the cluster (fetched from the Kubernetes API). Identifies resources that need to be created, updated, or deleted.
- Reconciliation: Applies the diff using
kubectl apply(or server-side apply), bringing the cluster to the declared state. Updates the status of its own CRs to reflect sync health.
The two dominant GitOps operators in production today are Argo CD (CNCF graduated, rich UI, Application/AppProject model) and Flux CD (CNCF graduated, composable controllers, CLI-first). Both support Kubernetes, Helm, and Kustomize as manifest sources.
Declarative infrastructure means specifying the desired end state of a system rather than the sequence of commands needed to reach it. A Kubernetes Deployment manifest is declarative — it says "I want 3 replicas of this container image running with these environment variables." The Kubernetes control plane figures out the steps (schedule pods, pull images, attach volumes) to achieve that state. By contrast, a shell script that runs kubectl scale, kubectl set image, and kubectl rollout in sequence is imperative — the outcome depends on the starting state.
GitOps requires declarative infrastructure for a specific technical reason: the reconciliation loop works by comparing two representations of system state — the declared state in Git and the live state reported by the Kubernetes API. Both sides of the comparison must be in the same format for the diff to be meaningful.
- Imperative commands like
kubectl rundescribe a one-time action, not a persistent state. You cannot store "kubectl scale to 5" in Git and compare it to the current replica count; you can storespec.replicas: 5and compare it. - Idempotency follows naturally from declarative configs — applying the same manifest twice produces the same result, which means the operator can safely reconcile as often as it likes.
- Drift detection only works if the desired state can be parsed, normalised, and diffed against the live API server response. Declarative YAML makes this straightforward.
Common declarative formats used with GitOps: Kubernetes YAML manifests, Helm values.yaml files (rendered to manifests by the operator), Kustomize overlay patches, and Crossplane Composite Resource Claims.
Script-based deployments scatter cluster-write credentials across CI systems, developer laptops, and shared servers. Any engineer with access to the deployment script or its secrets can push arbitrary changes to production with no mandatory review. Audit trails, when they exist at all, are CI job logs that expire after a few weeks.
GitOps addresses these weaknesses through several concrete mechanisms:
- Immutable audit trail: Every change to cluster state must be a Git commit, signed with the author's identity and timestamped by the VCS.
git log --followshows the complete history of every resource, who changed it, when, and why (via commit message). This satisfies SOC 2 Type II, PCI-DSS, and ISO 27001 change-management requirements without a separate audit tool. - Mandatory code review: Branch protection rules require pull-request review before any change merges to the deploy branch. This adds a human approval gate that script-based pipelines typically lack.
- No human direct cluster access: In a mature GitOps setup, engineers do not need kubeconfig with write permissions. The operator holds the only write credential. Even if an engineer's laptop is compromised, the attacker cannot directly alter the cluster — they would need to also compromise Git and pass branch-protection checks.
- Credentials stay in the cluster: Pull-based deployment means CI systems never hold cluster tokens. Reducing the number of places a kubeconfig exists reduces the attack surface.
- Signed commits: GPG or SSH-signed commits cryptographically link every change to a verified identity, making it impossible to forge the commit author after the fact.
- Automated policy checks in CI: Tools like Conftest (OPA) or Kyverno CLI can validate manifests in the PR pipeline before they ever reach the cluster — shifting security left.
GitOps repositories have different branching needs than application source repos. The goal is to map Git branches or folders to deployment environments cleanly, while keeping promotion paths easy to reason about.
Three strategies are common in practice:
- Environment branches: Separate branches represent separate environments —
mainmaps to production,stagingto staging,developto development. Promoting from dev to staging is a PR merge fromdeveloptostaging. Simple to understand but prone to long-lived divergence and merge conflicts as environments drift apart. - Directory-per-environment on a single branch (trunk-based): One
mainbranch holds folders likeenvs/dev/,envs/staging/,envs/prod/. Kustomize overlays sit in each folder. Promotion is a PR that copies or patches the relevant image tag into the next environment's overlay. This avoids branch divergence and is the approach most commonly recommended for Flux and Argo CD setups. - Separate repos per environment: Dedicated repositories for dev, staging, and prod config. Provides the strictest separation (prod repo can have tighter branch-protection and access controls) but makes cross-environment changes (e.g., updating a shared base) more cumbersome.
For most teams, trunk-based development with Kustomize overlays is the pragmatic choice. A single PR flow touching overlays is easy to review and the Git history stays linear. The separation of the application source repo (where code lives) from the config repo (where deployment manifests live) is an orthogonal best practice that applies regardless of the branching strategy chosen.
Drift is the condition where the actual state of the cluster diverges from the desired state declared in Git. Common causes: an engineer runs kubectl scale manually during an incident, a node failure causes a deployment controller to change replica counts temporarily, or a helm upgrade is run outside the GitOps workflow. Without drift detection, these deviations are invisible until something breaks.
A GitOps operator detects drift by continuously repeating the same comparison it uses for normal syncs: it fetches the current live resources from the Kubernetes API and diffs them against the rendered desired state from Git. There is no separate "drift check" mode — drift detection is simply what happens between the commit that caused the last sync and the next one.
Argo CD marks an Application as OutOfSync when it detects drift. If selfHeal: true is set in the sync policy, Argo CD re-applies the Git state immediately, reverting the manual change. If not set, the Application stays OutOfSync and shows the diff in the UI, requiring a human to click Sync.
Flux marks its Kustomization resource as not Ready when drift occurs. With the default configuration (prune: true and a polling interval), Flux will re-apply on the next interval cycle. There is no explicit "selfHeal" toggle — Flux always reconciles on its interval.
Drift tolerance is configurable: you can tell Argo CD to ignore specific fields (like status fields managed by other controllers) using ignoreDifferences in the Application spec, preventing false-positive drift alerts from controllers that write back to managed resources.
IaC and GitOps address different layers of the stack and use different execution models, but they complement each other in modern cloud-native environments.
Infrastructure as Code (Terraform, Pulumi, CloudFormation): Provisions cloud resources — VMs, VPCs, managed databases, load balancers, Kubernetes clusters themselves. IaC is typically applied imperatively by a human or pipeline (terraform apply). State is tracked in a tfstate file (local or remote in S3/Terraform Cloud). It answers the question: "What cloud infrastructure should exist?"
GitOps: Deploys and manages application workloads and configuration on top of already-provisioned infrastructure. The GitOps operator runs continuously, reconciling the live cluster to match the Git-declared desired state. It answers the question: "What should be running on this Kubernetes cluster right now?"
| Dimension | IaC (Terraform) | GitOps (Argo CD / Flux) |
|---|---|---|
| Target | Cloud resources (EKS, RDS, VPC) | Kubernetes workloads and config |
| Execution model | Imperative plan/apply triggered by human or pipeline | Continuous pull-based reconciliation loop |
| State storage | tfstate file (S3, Terraform Cloud) | Git repository |
| Drift correction | Manual re-apply needed | Automatic on each reconciliation cycle |
They are complementary, not competing: Terraform provisions the EKS cluster, GitOps manages what runs on it. The boundary blurs with tools like Crossplane and Cluster API, which represent cloud resources and Kubernetes clusters as Kubernetes CRDs — making it possible to manage infrastructure provisioning through a GitOps operator.
Argo CD is a declarative, GitOps continuous delivery tool for Kubernetes, graduated in the CNCF. It implements the pull-based GitOps model: you define Application custom resources that map a Git source to a Kubernetes destination, and Argo CD's Application Controller continuously reconciles the two.
Argo CD's main components:
- API Server: Exposes a gRPC and REST API consumed by the web UI, the
argocdCLI, and CI webhooks. Handles authentication (OIDC via Dex or external providers) and authorisation (RBAC). - Repository Server: Clones Git repositories and renders manifests using whichever tool the Application source specifies — plain YAML, Kustomize, Helm, Jsonnet, or a custom config management plugin.
- Application Controller: The reconciliation engine. Runs a control loop that fetches live Kubernetes resources, compares them to the rendered desired state from the Repository Server, and applies the diff. Emits sync status and health status.
- Redis: Caches rendered manifests and application state to reduce Git and Kubernetes API load.
- Dex (optional): An embedded OIDC identity provider for SSO against GitHub, LDAP, SAML, or other identity providers.
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: guestbook
namespace: argocd
spec:
project: default
source:
repoURL: https://github.com/org/gitops-config.git
targetRevision: HEAD
path: apps/guestbook
destination:
server: https://kubernetes.default.svc
namespace: guestbook
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=true
Argo CD's sync process is a six-step reconciliation cycle executed by the Application Controller on every polling interval (default: 3 minutes) or on webhook notification.
- Clone and render: The Repository Server clones (or updates from cache) the Git source at the specified
targetRevision. It runs the appropriate tool —kustomize build,helm template, or reads plain YAML — to produce a set of Kubernetes resource manifests. - Fetch live state: The Application Controller calls the Kubernetes API to list the current state of all resources in the Application's destination namespace that are labeled as managed by this Application.
- Compute diff: Argo CD performs a three-way diff between (a) the last-applied configuration, (b) the current live state, and (c) the desired state from Git. Resources present in Git but not in the cluster are marked as missing. Resources in the cluster but not in Git are marked as extra (candidates for pruning).
- Sync decision: If any resource differs, the Application status is set to
OutOfSync. IfsyncPolicy.automatedis configured, the sync proceeds immediately. Otherwise, a human must trigger it via UI, CLI, or webhook. - Apply: Argo CD applies the desired manifests using server-side apply. Pre-sync and sync hooks execute in their designated phases.
- Update status: Application sync status becomes
SyncedorSyncFailed. Health status (Healthy / Degraded / Progressing / Missing) is evaluated by checking the readiness conditions of the deployed resources.
# Example: Application status after a partial sync failure
status:
sync:
status: OutOfSync
revision: "abc1234def5678"
health:
status: Degraded
conditions:
- type: SyncError
message: "Failed to apply deployment.apps/frontend: field is immutable"
lastTransitionTime: "2026-04-22T10:00:00Z"
An Application is the fundamental Argo CD custom resource. It maps a single Git source (repository URL, path, revision) to a single Kubernetes destination (cluster API URL, namespace) and declares how to sync between them. One Application typically represents one microservice or one component of a larger system.
An ApplicationSet is a higher-level CR managed by the ApplicationSet controller. Instead of managing one application, it uses generators to programmatically produce multiple Application CRs from a template. As generators produce or remove parameters, the controller creates, updates, or deletes the corresponding Applications automatically.
Built-in ApplicationSet generators:
- List generator: Explicit list of parameter sets — each item creates one Application.
- Git directory generator: Scans a Git repo for directories matching a glob pattern; each directory becomes one Application.
- Cluster generator: Creates one Application per registered Argo CD cluster, optionally filtered by cluster labels.
- Matrix generator: Cross-product of two other generators — e.g., all apps × all clusters.
- Pull-request generator: Creates a temporary Application for each open PR, enabling ephemeral preview environments.
# ApplicationSet using the git directory generator
apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
name: all-apps
namespace: argocd
spec:
generators:
- git:
repoURL: https://github.com/org/gitops-config.git
revision: main
directories:
- path: apps/*
template:
metadata:
name: "{{path.basename}}"
spec:
project: default
source:
repoURL: https://github.com/org/gitops-config.git
targetRevision: main
path: "{{path}}"
destination:
server: https://kubernetes.default.svc
namespace: "{{path.basename}}"
The way you structure a GitOps config repo determines how easily you can promote changes between environments, onboard new applications, and keep per-environment differences small and reviewable. There is no single correct layout, but three patterns dominate.
1. Kustomize base + environment overlays (most common): Each application has a base/ directory with shared manifests and an overlays/ directory with one folder per environment. Each overlay folder contains a kustomization.yaml that references the base and adds environment-specific patches (image tag, replica count, config values).
gitops-config/
├── apps/
│ └── my-app/
│ ├── base/
│ │ ├── deployment.yaml
│ │ ├── service.yaml
│ │ └── kustomization.yaml
│ └── overlays/
│ ├── dev/
│ │ └── kustomization.yaml # patches: image tag, replicas=1
│ ├── staging/
│ │ └── kustomization.yaml # patches: image tag, replicas=2
│ └── prod/
│ └── kustomization.yaml # patches: image tag, replicas=5
└── argocd/
└── apps/
├── my-app-dev.yaml # Application CR for dev overlay
├── my-app-staging.yaml
└── my-app-prod.yaml2. App-of-apps: A single root Argo CD Application points to the argocd/apps/ folder above. Argo CD syncs that folder, creating all the child Application CRs. This allows managing all Application registrations via Git without applying them manually.
3. Separate config repo per environment: Stricter separation — production config lives in a separate repo with tighter branch protection and access control. Suitable for regulated environments but adds overhead when updating shared base configs.
The app repo / config repo split (keeping source code and deployment manifests in separate repositories) is an orthogonal best practice that improves the separation of concerns between CI (build) and CD (deploy).
Flux CD is a CNCF-graduated GitOps toolkit for Kubernetes. Unlike Argo CD — which ships as a relatively integrated platform with a built-in UI, Application model, and unified controller — Flux is composed of separate, independently deployable controllers, each managing a specific concern. You assemble the controllers you need and extend them without touching the others.
| Dimension | Flux CD | Argo CD |
|---|---|---|
| Architecture | Composable controllers (source, kustomize, helm, notification, image-automation) | Integrated platform with API server, repo server, application controller |
| UI | CLI-first; Weave GitOps is a separate optional UI | Built-in rich web UI with app graph and diff view |
| Application model | GitRepository + Kustomization CRs (composable) | Application CR (opinionated single resource) |
| Image automation | Built-in ImageRepository / ImagePolicy / ImageUpdateAutomation | Requires Argo CD Image Updater (separate tool) |
| Multi-tenancy | Namespace-scoped CRs; each team manages their own Flux CRs | AppProject model with centralised RBAC policy |
| Helm support | HelmRelease CR with native drift detection and remediation | Application source.helm with sync phases |
| CNCF status | Graduated (2022) | Graduated (2022) |
In practice: teams that prioritise a strong visual UI and centralised management tend to choose Argo CD. Teams that prefer composable building blocks, want to avoid centralised control-plane complexity, or need built-in image automation often choose Flux. Both are production-ready and widely deployed at scale.
Flux splits the responsibilities of fetching config and applying config into two separate controllers, connected through an intermediate Artifact object.
source-controller is responsible for watching external sources — Git repositories, Helm repositories, OCI registries, and S3-compatible buckets. When the source changes (new commit, new chart version, new image tag), the source-controller downloads the content, archives it as a compressed tarball, and stores a reference to it in a GitRepository (or HelmRepository, OCIRepository) status field as a versioned Artifact. Subsequent controllers consume this Artifact rather than fetching from the network themselves.
kustomize-controller watches Kustomization CRs. Each Kustomization references a source (typically a GitRepository) and specifies a path within it. When the source's Artifact revision changes, the kustomize-controller:
- Downloads the Artifact tarball from source-controller.
- Runs
kustomize buildon the specified path to render final manifests. - Validates the manifests (optional: dry-run or server-side validation).
- Applies them to the cluster using server-side apply.
- Monitors the health of applied resources and updates the Kustomization
Readycondition.
# source-controller watches this repo
apiVersion: source.toolkit.fluxcd.io/v1
kind: GitRepository
metadata:
name: gitops-config
namespace: flux-system
spec:
interval: 1m
url: https://github.com/org/gitops-config.git
ref:
branch: main
---
# kustomize-controller deploys from this repo/path
apiVersion: kustomize.toolkit.fluxcd.io/v1
kind: Kustomization
metadata:
name: apps-prod
namespace: flux-system
spec:
interval: 10m
sourceRef:
kind: GitRepository
name: gitops-config
path: ./apps/overlays/prod
prune: true
wait: true
timeout: 5m
Plain Kubernetes Secrets encoded as base64 cannot be committed to a Git repository — the value is trivially decodable by anyone with repo access. Three patterns solve this, with different trust models and operational tradeoffs.
1. Sealed Secrets (Bitnami): The sealed-secrets-controller running in the cluster holds a private key. You use the kubeseal CLI to encrypt a regular Secret into a SealedSecret CR — encrypted with the cluster's public key. Only that cluster's controller can decrypt it. The SealedSecret YAML is safe to commit to Git. On sync, the controller decrypts it back into a regular Kubernetes Secret.
# SealedSecret — safe to commit to Git
apiVersion: bitnami.com/v1alpha1
kind: SealedSecret
metadata:
name: db-credentials
namespace: production
spec:
encryptedData:
password: AgBy8I5V2EqtcPmVTiIuEolW...(encrypted blob)2. SOPS (Mozilla): Encrypts entire secret files (YAML, JSON, .env) using AWS KMS, GCP KMS, HashiCorp Vault, or age keys. The encrypted file is committed to Git. Flux natively supports SOPS decryption — configure a spec.decryption block in the Kustomization CR pointing to a Kubernetes Secret that holds the decryption key. Argo CD requires a custom config management plugin for SOPS.
3. External Secrets Operator (ESO): An ExternalSecret CR in Git declares which key to fetch from an external store (AWS SSM Parameter Store, HashiCorp Vault, GCP Secret Manager, Azure Key Vault). The ESO controller fetches the secret value and creates a regular Kubernetes Secret. The actual secret value never lives in Git at all — only the reference does.
# ExternalSecret — references AWS SSM, no secret value in Git
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
name: db-credentials
namespace: production
spec:
refreshInterval: 1h
secretStoreRef:
name: aws-ssm-store
kind: ClusterSecretStore
target:
name: db-credentials
data:
- secretKey: password
remoteRef:
key: /prod/db/password
The most common and maintainable approach is Kustomize base + overlays: shared manifests live in a base/ directory, and each environment has an overlay directory that patches only what differs — typically the container image tag, replica count, resource limits, and environment-specific ConfigMap values.
# apps/my-app/overlays/prod/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- ../../base
namePrefix: prod-
commonLabels:
env: production
images:
- name: my-app
newName: registry.example.com/my-app
newTag: v1.5.2
patches:
- patch: |-
- op: replace
path: /spec/replicas
value: 5
target:
kind: Deployment
name: my-appEach environment has its own Argo CD Application or Flux Kustomization CR pointing to its overlay path. Promotion between environments is a PR that updates the image tag in the next environment's kustomization.yaml.
Promotion workflow example:
- CI builds image, tags it
v1.5.2, pushes to registry. - CI opens a PR updating
overlays/dev/kustomization.yamlwith tagv1.5.2. - GitOps operator deploys to dev. Team verifies.
- Engineer opens a PR to update
overlays/staging/kustomization.yamltov1.5.2. - After staging validation, a final PR promotes
overlays/prod/kustomization.yamltov1.5.2.
Alternative: Helm with per-environment values-dev.yaml, values-prod.yaml files. Flux HelmRelease CRs reference the appropriate values file per environment. Less DRY for structural differences but cleaner for apps already distributed as Helm charts.
Flux's image automation system closes the loop between a new container image being pushed to a registry and the updated tag being committed back to the GitOps config repository — without any CI pipeline involvement. It relies on three CRs working in sequence.
- ImageRepository: Tells Flux to scan a container registry on an interval and cache the available tags for a given image name.
- ImagePolicy: Selects the "best" tag from those discovered by the ImageRepository, using a policy rule —
semverrange, alphabetical ordering, or a regex filter. - ImageUpdateAutomation: When the ImagePolicy resolves to a new tag, this CR commits the new tag value back to the GitOps Git repository. The Kustomization's image setter marker in the YAML file (a comment annotation) tells Flux exactly which field to update.
# Step 1: scan registry for available tags
apiVersion: image.toolkit.fluxcd.io/v1beta2
kind: ImageRepository
metadata:
name: my-app
namespace: flux-system
spec:
image: registry.example.com/my-app
interval: 5m
---
# Step 2: pick the latest semver tag in the 1.x range
apiVersion: image.toolkit.fluxcd.io/v1beta2
kind: ImagePolicy
metadata:
name: my-app
namespace: flux-system
spec:
imageRepositoryRef:
name: my-app
policy:
semver:
range: ">=1.0.0 <2.0.0"
---
# Step 3: commit the new tag back to Git
apiVersion: image.toolkit.fluxcd.io/v1beta1
kind: ImageUpdateAutomation
metadata:
name: flux-system
namespace: flux-system
spec:
interval: 30m
sourceRef:
kind: GitRepository
name: gitops-config
git:
commit:
author:
email: fluxcdbot@example.com
name: Flux
messageTemplate: "chore: update image to {{range .Updated.Images}}{{.}}{{end}}"
push:
branch: main
update:
path: ./apps
strategy: Setters
Manual sync (default): Argo CD detects drift and marks the Application OutOfSync, but applies nothing until a human explicitly triggers synchronisation via the UI, argocd app sync CLI, or an API call. Suitable for production environments where deployments need explicit human approval.
Automated sync: Configured via syncPolicy.automated. Argo CD syncs automatically as soon as it detects a Git change or live-state drift. Two sub-options refine the behaviour:
prune: true— Argo CD deletes Kubernetes resources that exist in the cluster but have been removed from Git. Without this, removed manifests leave orphaned resources.selfHeal: true— Argo CD re-applies Git state when live state drifts (e.g., someone ran kubectl manually). Without this, automated sync only fires on Git changes, not on live drift.
Sync waves control the order in which resources are created within a single sync operation. Resources are annotated with a wave number; lower-numbered waves sync first. Argo CD waits for all resources in wave N to become healthy before starting wave N+1. This is essential for ordering CRD creation before CR creation, database migrations before application pods, or secrets before deployments.
# Automated sync policy with pruning and self-healing
spec:
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=true
---
# Sync wave: migration Job (wave 1) runs and completes before app (wave 2)
apiVersion: batch/v1
kind: Job
metadata:
name: db-migration
annotations:
argocd.argoproj.io/sync-wave: "1"
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-app
annotations:
argocd.argoproj.io/sync-wave: "2"
In GitOps, a rollback is not a special out-of-band operation — it is just another change to the Git repository. The same reconciliation loop that applies new versions applies the rollback. This preserves the audit trail and keeps the single source of truth intact.
Method 1 — git revert (preferred): Create a new commit that reverses the problematic change. The commit appears in history, clearly documenting when the rollback happened and why. This is always safe on protected branches.
git revert abc1234 # creates a new "revert" commit
git push origin main # triggers the GitOps operator to sync back to the previous state
Method 2 — git reset (emergency, use with caution): Hard-resets the branch to a known-good commit and force-pushes. Rewrites history. Use only when the problematic commit must be expunged from the record.
git reset --hard v1.4.9-tag
git push --force-with-lease origin main
Method 3 — Argo CD history rollback (without touching Git): Argo CD stores the last N sync revisions. You can roll back to any of them via UI (Application → History and Rollback) or CLI. This does not revert the Git branch — the Application will show OutOfSync against HEAD until you either reconcile or also revert Git.
# CLI rollback to history entry ID 3
argocd app rollback my-app 3
# Or pin the Application to a specific commit in its spec
spec:
source:
targetRevision: "abc1234def" # pinned to known-good SHA
The cleanest mental model is: CI builds artifacts, GitOps deploys them. The CI pipeline (GitHub Actions, GitLab CI, Jenkins) is responsible for everything up to and including pushing a container image to a registry. It is explicitly not responsible for applying changes to the cluster. That role belongs entirely to the GitOps operator.
The handoff between CI and GitOps is a commit to the config repository. CI updates the image tag reference in the relevant overlay or HelmRelease, commits it, and pushes. The GitOps operator detects the commit on its next poll and applies the new image tag to the cluster.
Config repo update step in a GitHub Actions CI workflow:
- name: Update image tag in GitOps config repo
env:
IMAGE_TAG: ${{ github.sha }}
run: |
git clone https://${{ secrets.GITOPS_TOKEN }}@github.com/org/gitops-config.git config
cd config
# Use kustomize to update the image tag in the prod overlay
cd apps/my-app/overlays/prod
kustomize edit set image my-app=registry.example.com/my-app:${IMAGE_TAG}
git config user.email "ci-bot@example.com"
git config user.name "CI Bot"
git add kustomization.yaml
git commit -m "chore(cd): promote my-app to ${IMAGE_TAG}"
git push origin mainThis separation has several benefits: the CI system never needs cluster credentials; rollbacks only require a Git revert (no re-running CI); and the config repo provides a clear record of every version deployed to every environment, independent of the CI system that triggered it.
Alternatives to manual kustomize edit set image: yq for YAML patching, sed -i (fragile but simple), or Flux's built-in image automation which removes the config-repo-update step from CI entirely.
Progressive delivery is the practice of releasing changes incrementally to a subset of users or traffic rather than all-at-once. Common strategies include canary releases (route 5% of traffic to the new version, ramp up if metrics look healthy), blue-green (maintain two identical environments and switch traffic between them), and A/B testing. The goal is to catch problems on a small blast radius before full rollout.
GitOps stores the desired final state. Progressive delivery tools manage the transition path to get there. A Git commit declaring a new image tag triggers the start; a separate tool decides how fast and safely to promote it.
Argo Rollouts: A Kubernetes controller that replaces standard Deployment resources with a Rollout CRD. The Rollout spec includes a strategy block defining canary steps or blue-green configuration. Argo CD syncs the Rollout manifest from Git; Argo Rollouts then manages the actual pod progression, traffic shifting (via Istio, NGINX, ALB), and Analysis template queries (Prometheus metrics, Datadog, Kayenta). If metrics fail, Rollouts automatically aborts and rolls back.
Flagger: Works alongside Flux CD. You keep deploying regular Kubernetes Deployments via Flux; Flagger watches them and creates primary/canary Deployment pairs automatically. It uses Prometheus or Datadog metrics to gate each traffic increment and integrates with Istio, App Mesh, Linkerd, and Nginx for traffic management. Canary objects are fully managed by Flagger — you only maintain the primary Deployment in Git.
Neither tool conflicts with GitOps — the rollout strategy CR itself is stored in Git and managed by the GitOps operator. Git records the intent; Argo Rollouts or Flagger executes the progression safely.
Both Argo CD and Flux CD support Helm natively, but they integrate with it differently. The key GitOps principle is that Helm chart versions and values are declared in Git, not passed as CLI arguments — and the GitOps operator, not a human running helm upgrade, applies them.
Flux CD — HelmRelease CR: A HelmRepository CR registers a Helm chart repository or OCI registry. A HelmRelease CR declares which chart, which version, and which values to use. Flux's helm-controller renders the chart and applies the resulting manifests. Crucially, if someone runs helm upgrade directly, Flux detects the drift and reverts it on the next reconciliation cycle.
# Flux HelmRelease for nginx-ingress
apiVersion: helm.toolkit.fluxcd.io/v2beta2
kind: HelmRelease
metadata:
name: nginx-ingress
namespace: ingress-nginx
spec:
interval: 5m
chart:
spec:
chart: ingress-nginx
version: "4.9.x" # minor-version floating; patch updates auto-apply
sourceRef:
kind: HelmRepository
name: ingress-nginx
namespace: flux-system
values:
controller:
replicaCount: 2
service:
type: LoadBalancer
valuesFrom:
- kind: ConfigMap
name: ingress-extra-values # additional values from a ConfigMap in GitArgo CD — Application with Helm source: Set source.helm in the Application CR. You can specify valueFiles (paths to values files within the repo) or inline values overrides. Argo CD renders the chart and syncs the output manifests.
Best practices: Always pin chart versions (avoid * or floating ranges in prod). Store values files alongside the HelmRelease in Git, not as inline YAML when values are large. Use valuesFrom in Flux to reference ConfigMaps or Secrets for values that change per environment, keeping the HelmRelease itself environment-agnostic.
Kustomize is a template-free Kubernetes manifest customisation tool built into kubectl and natively supported by both Argo CD and Flux. The overlay pattern keeps shared resource definitions in a base/ directory and accumulates environment-specific differences in per-environment overlays/ directories, avoiding duplication while keeping diffs small and reviewable.
A base kustomization.yaml simply lists the resources it manages:
# apps/my-app/base/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- deployment.yaml
- service.yaml
- hpa.yamlA production overlay then references the base and applies patches:
# apps/my-app/overlays/prod/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- ../../base
namePrefix: prod-
commonLabels:
env: production
images:
- name: my-app
newName: registry.example.com/my-app
newTag: v2.1.0
patches:
# Strategic merge patch: increase replicas for prod
- path: replica-patch.yaml
# JSON 6902 patch: set a specific env var
- patch: |-
- op: replace
path: /spec/template/spec/containers/0/env/0/value
value: "production"
target:
kind: Deployment
name: my-appUseful Kustomize fields in overlays: namePrefix / nameSuffix to namespace resource names per environment; commonLabels / commonAnnotations to tag all resources; configMapGenerator and secretGenerator to create environment-specific ConfigMaps or Secrets. Argo CD and Flux both run kustomize build on these directories before applying, so the raw YAML you store in Git is never the final manifest — patches are applied at sync time.
Multi-cluster GitOps adds the dimension of which cluster to deploy to on top of the standard single-cluster model. The two dominant approaches are hub-spoke (a central GitOps control plane manages all clusters) and decentralised (each cluster runs its own GitOps operator, all watching the same or related Git repos).
Argo CD hub-spoke: A single Argo CD instance (the "hub") in a management cluster registers all target clusters using argocd cluster add. Applications or ApplicationSets declare their destination.server as the registered cluster API URL. The hub's Application Controller communicates with each target cluster's API server directly. An ApplicationSet with the cluster generator creates one Application per registered cluster automatically.
apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
name: platform-addons
namespace: argocd
spec:
generators:
- clusters:
selector:
matchLabels:
tier: production
template:
metadata:
name: "{{name}}-platform-addons"
spec:
project: platform
source:
repoURL: https://github.com/org/gitops-config.git
targetRevision: main
path: "clusters/{{metadata.labels.region}}/addons"
destination:
server: "{{server}}"
namespace: platform-systemFlux decentralised: Each cluster runs its own Flux controllers, bootstrapped from a management cluster or a bootstrap script. A "management" cluster's Flux can push Flux configs to leaf clusters by managing their Flux Kustomization CRs via the Kubernetes API or GitOps itself.
Scaling considerations: Shard the Argo CD Application Controller by cluster to handle hundreds of clusters. Use AppProjects or Flux tenancy to enforce namespace-level isolation per team. Keep a separate Git directory structure per cluster (or cluster tier) to make blast-radius clear — a change to clusters/prod-us-east-1/ only affects that cluster.
Argo CD implements multi-tenancy through two complementary mechanisms: AppProjects (resource-level isolation) and RBAC policies (action-level access control).
AppProject is a CR that scopes what a group of Applications can do:
sourceRepos: limits which Git repositories are allowed as Application sources for this project.destinations: limits which cluster/namespace combinations Applications in this project can deploy to.clusterResourceWhitelist/namespaceResourceBlacklist: controls which Kubernetes resource kinds are permitted.
apiVersion: argoproj.io/v1alpha1
kind: AppProject
metadata:
name: team-frontend
namespace: argocd
spec:
sourceRepos:
- https://github.com/org/frontend-config.git
destinations:
- namespace: frontend-*
server: https://kubernetes.default.svc
clusterResourceWhitelist:
- group: ""
kind: Namespace
namespaceResourceBlacklist:
- group: ""
kind: ResourceQuotaRBAC is configured in the argocd-rbac-cm ConfigMap using Casbin policy syntax. Subjects (users, SSO groups, service accounts) are assigned roles that grant permissions on Argo CD resources.
# argocd-rbac-cm data.policy.csv
# p, <subject>, <resource>, <action>, <appproject>/<object>
p, role:frontend-dev, applications, get, team-frontend/*
p, role:frontend-dev, applications, sync, team-frontend/*
p, role:frontend-dev, applications, action/*, team-frontend/*
# g, <user or group>, <role>
g, engineering-frontend@example.com, role:frontend-devBuilt-in roles: role:readonly (view-only across all projects) and role:admin (full access). Custom roles can be scoped to a single AppProject. Combined, AppProjects and RBAC let you give a team full control over their own Applications without them being able to see or affect other teams' workloads.
Both patterns solve the same problem — managing many Argo CD Applications without manually applying each one — but they use different mechanisms and suit different scales.
App-of-apps: A parent Argo CD Application has its source.path pointing to a Git directory that contains child Application YAML manifests. When Argo CD syncs the parent, it applies those Application CRs into the argocd namespace. Each child Application then manages its own workload independently. The child Application YAMLs are static files you write by hand and commit to Git.
# Parent Application (the "root app")
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: root-app
namespace: argocd
spec:
source:
repoURL: https://github.com/org/gitops-config.git
targetRevision: main
path: argocd/apps # this folder holds child Application CRs
destination:
server: https://kubernetes.default.svc
namespace: argocd
syncPolicy:
automated:
prune: trueApplicationSet: A CR managed by the ApplicationSet controller. Instead of static Application files, you define a template and one or more generators. The controller creates, updates, and deletes Application CRs automatically as generators produce or remove parameter sets. No need to manually add/remove Application YAML files when a new service or cluster is onboarded.
When to use each:
- App-of-apps: small, relatively static set of applications; you want explicit per-Application YAML for full control and diff visibility; or you need per-Application customisations that generators can't express concisely.
- ApplicationSet: large number of applications; dynamic inputs (new directories in Git, new clusters registered); need to scale without manual YAML maintenance. The matrix and merge generators allow powerful combinatorial deployments.
Both Crossplane and Cluster API extend the Kubernetes API with CRDs that represent infrastructure resources. Because the desired state is expressed as Kubernetes objects, a standard GitOps operator (Argo CD or Flux) can manage those objects from Git — giving you continuous reconciliation and drift detection for cloud infrastructure, not just application workloads.
Crossplane manages cloud resources (RDS, S3, GKE clusters, IAM roles) through provider plugins. You define a CompositeResourceDefinition (XRD) that creates a new API type, and a Composition that maps it to provider-specific resources. Developers then create a Composite Resource Claim (XRC) stored in the GitOps repo — Crossplane provisions the cloud resource and injects connection secrets into the cluster.
# Developer commits this Claim to the GitOps config repo
apiVersion: database.example.org/v1alpha1
kind: PostgreSQLInstance
metadata:
name: app-db
namespace: production
spec:
parameters:
storageGB: 20
version: "14"
region: eu-west-1
compositionRef:
name: postgresql-aws-rds
writeConnectionSecretToRef:
name: app-db-credentialsCluster API (CAPI) manages the lifecycle of Kubernetes clusters themselves as Kubernetes objects on a management cluster. Cluster, MachineDeployment, and infrastructure-provider-specific CRs (AWSManagedControlPlane, AzureManagedMachinePool, etc.) are stored in Git. A GitOps operator syncs them to the management cluster; CAPI provisions, upgrades, and scales workload clusters accordingly.
The combined pattern: Cluster API provisions workload clusters, Crossplane provisions cloud dependencies (databases, queues, buckets), and a second GitOps instance (Argo CD or Flux) on the new workload cluster deploys the application — all driven from a single Git repository hierarchy, each layer's CRs committed and reconciled continuously.
GitOps operators expose rich telemetry specifically for monitoring sync and reconciliation health. The two main surfaces are Prometheus metrics and the operators' own notification controllers.
Argo CD observability:
- Argo CD exposes Prometheus metrics on port 8082. Key metrics:
argocd_app_info(labels includesync_statusandhealth_status),argocd_app_sync_total{phase="Failed"}, andargocd_app_reconcile_duration_seconds. - The Argo CD Notifications Controller sends alerts based on trigger conditions (SyncFailed, AppOutOfSync, AppDegraded) to Slack, PagerDuty, email, or any webhook — configured via
argocd-notifications-cmConfigMap. argocd app listandargocd app get <name>for CLI-based status checks.
Flux observability:
- Each Flux controller exposes metrics. Key:
gotk_reconcile_error_total(per controller, per object name),gotk_reconcile_duration_seconds,gotk_resource_info(withreadyandsuspendedlabels). flux get allandflux events --watchfor real-time status.- The Flux Notification Controller uses
ProviderandAlertCRs (stored in Git) to route events from any Flux resource type to any webhook or messaging service.
Example Prometheus alert rule for Argo CD:
- alert: ArgoCDAppSyncFailed
expr: increase(argocd_app_sync_total{phase="Failed"}[5m]) > 0
for: 1m
labels:
severity: critical
annotations:
summary: "Argo CD sync failed for app {{ $labels.name }}"
Database schema migrations are inherently imperative and ordered — they run once, in sequence, and must complete successfully before the application starts. This sits awkwardly with GitOps's declarative, continuously-reconciled model. Three patterns handle this well.
1. Argo CD PreSync hook with a Kubernetes Job: Annotate a migration Job with argocd.argoproj.io/hook: PreSync. Argo CD runs the Job before applying any other resources in the sync. If the Job fails, the sync aborts and the application is not updated. Set hook-delete-policy: BeforeHookCreation to clean up the previous Job before creating a new one.
apiVersion: batch/v1
kind: Job
metadata:
name: db-migration
annotations:
argocd.argoproj.io/hook: PreSync
argocd.argoproj.io/hook-delete-policy: BeforeHookCreation
spec:
template:
spec:
containers:
- name: migrate
image: registry.example.com/my-app:v1.5.2
command: ["./migrate", "--run"]
restartPolicy: Never
2. Sync waves: If you don't want a PreSync hook, use argocd.argoproj.io/sync-wave: "1" on the migration Job and sync-wave: "2" on the Deployment. Argo CD waits for wave 1 to succeed before starting wave 2. The Job must complete (not just start) for the wave transition to occur.
3. Flyway / Liquibase as an init container: Embed migration logic directly in the application Pod as an init container. The init container runs flyway migrate or liquibase update against the database before the main container starts. Migration SQL files are packaged inside the application image. No separate Job needed — migrations run atomically with Pod startup. Downside: the application image must always include all historical migrations.
Policy enforcement in GitOps works on two levels: shift-left checks in the CI pipeline (before changes reach the cluster) and runtime admission controls in the cluster (enforced on every resource creation or update, including GitOps operator applies).
OPA/Gatekeeper: The gatekeeper-controller registers as a Kubernetes admission webhook. Policies are written in Rego and packaged as ConstraintTemplate CRDs, with Constraint CRs instantiating them with parameters. You store both in the GitOps config repo and sync them via Argo CD or Flux — policies are themselves managed as GitOps-first resources. In CI, use conftest to run the same Rego policies against manifests before they are committed, catching violations before merge.
Kyverno: Policy-as-code using Kubernetes-native YAML syntax — no Rego required. A ClusterPolicy CR can validate, mutate, or generate resources. Store ClusterPolicy CRs in Git; your GitOps operator applies them. Kyverno also supports a CLI (kyverno apply) for CI-phase pre-validation.
# Kyverno ClusterPolicy: every Pod must declare CPU and memory limits
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: require-resource-limits
spec:
validationFailureAction: Enforce
rules:
- name: check-container-limits
match:
any:
- resources:
kinds:
- Pod
validate:
message: "Resource limits (cpu and memory) are required for all containers."
pattern:
spec:
containers:
- name: "*"
resources:
limits:
memory: "?*"
cpu: "?*" The GitOps policy-as-code flow: write ClusterPolicy/ConstraintTemplate → commit to config repo → CI runs kyverno apply or conftest test to validate other manifests in the PR → GitOps operator syncs policies to cluster → admission webhook enforces on all future resource applies, including those from the operator itself. This means every deployment the operator makes is validated against the current policy set.
GitOps is powerful but not a universal solution. Being aware of its genuine limitations prevents teams from forcing it into contexts where it creates more friction than value.
Genuine limitations:
- Secret management adds complexity: Every team member must understand at least one secret management tool (SOPS, Sealed Secrets, ESO). There is no "just commit the secret" escape hatch. Onboarding engineers to this mental model takes real effort.
- Stateful workloads need careful design: Databases, message brokers, and anything with persistent state require ordered operations (migrations, backup/restore, leader election) that don't map naturally to declarative reconciliation. GitOps doesn't replace operational runbooks for stateful services.
- Slow emergency response path: When a production incident requires an immediate replica-count change, the fastest GitOps response is still: edit file → commit → push → wait for operator poll cycle. Teams must define "break-glass" procedures (e.g., temporarily pausing the operator's selfHeal to allow direct kubectl) without abandoning the GitOps principle entirely.
- Git history pollution: Automated image-tag updates generate many tiny commits. Over time the commit log becomes noisy and hard to scan. Mitigate with squashing, dedicated automation commits, or Flux's
imageTemplatecommit message customisation. - Learning curve: Teams unfamiliar with Kubernetes CRDs find GitOps reconciliation errors hard to debug — "why is my Kustomization not Ready?" is a different mental model from "why is my pipeline failing?"
Anti-patterns to avoid:
- Storing unencrypted Kubernetes Secrets in Git (even base64 is cleartext).
- Running
kubectl applydirectly on a GitOps-managed cluster — the operator will revert it, causing confusion and potential incidents. - Using mutable image tags (
:latest) — breaks the traceability between Git commit and deployed artefact. - Keeping source code and deployment manifests in the same repository — CI rebuilds trigger on code changes and GitOps changes become entangled.
- Not pinning Helm chart versions — an upstream chart update can unexpectedly change a production deployment.
Migrating to GitOps is not a big-bang cutover — it works best as a phased process where the old pipeline and the GitOps operator run in parallel until confidence is established, then the old pipeline is decommissioned.
Step-by-step migration:
- Capture current state as declarative YAML: Export live Kubernetes resources with
kubectl get all,configmap,secret,ingress -o yaml. Strip ephemeral fields (status,resourceVersion,uid, managed fields). This becomes your starting commit. - Create the config repository: Organise manifests into an app folder structure with base + overlays (or Helm chart + values files). Commit the cleaned-up YAML. Add secrets using SOPS or Sealed Secrets before committing.
- Install the GitOps operator: Install Argo CD or Flux in the cluster. Keep the operator in dry-run or manual-sync mode initially.
- Connect operator to config repo: Create an Application or Kustomization CR pointing to the new config repo.
- Validate with dry-run: Confirm the operator produces manifests identical to what is currently running. Fix discrepancies in the config repo.
- Enable sync — manual first, then automated: Switch to manual sync. Run a few syncs from the UI or CLI. Observe. Then enable automated sync with
prune: falseinitially (no deletions yet). - Update CI to write to the config repo instead of kubectl:
# Replace: kubectl set image deployment/my-app my-app=${IMAGE}
# With: update the image tag in the config repo and push
- name: Promote image to config repo
run: |
cd gitops-config
kustomize edit set image my-app=registry.example.com/my-app:${IMAGE_TAG}
git add apps/my-app/overlays/prod/kustomization.yaml
git commit -m "chore(cd): promote my-app ${IMAGE_TAG} to prod"
git push origin main- Revoke direct cluster write access: Remove kubeconfig from the CI system. Update RBAC to deny write access to any service account not owned by the GitOps operator.
- Enable prune: Once the team is comfortable, enable
prune: trueso deleted manifests are removed from the cluster.
Platform engineering is the discipline of building Internal Developer Platforms (IDPs) that give application teams self-service access to infrastructure and deployment capabilities through well-defined abstractions. GitOps is the delivery mechanism underneath the platform — it ensures that everything the IDP provisions or configures is continuously reconciled to its declared state, without platform engineers manually applying changes.
Key integration points:
- Golden path templates: Platform teams define standardised GitOps repository templates (via Backstage Software Templates, Cookiecutter, or custom scaffolding) that new services bootstrap from. The template pre-wires the CI pipeline, the Kustomize overlay structure, the Argo CD Application CR, and the required RBAC/AppProject — so a developer starts with a working GitOps setup from day one.
- Self-service via PR: Rather than filing a Jira ticket to get a new namespace or a database, a developer fills in a form in Backstage, which opens a PR to the config repo. An Argo CD ApplicationSet or Crossplane Composition is triggered by that PR merge, provisioning the requested resources automatically. The platform team approves the template once; individual team requests need no manual platform-team intervention.
- Crossplane + GitOps for infrastructure: Developers request cloud resources (databases, queues, storage) by committing Crossplane Claim CRs to the config repo. The GitOps operator applies them; Crossplane provisions the cloud resources. Connection secrets are injected via External Secrets Operator. The developer only interacts with Git and a Kubernetes-style API — never with cloud consoles.
- Policy-as-code as a platform service: Platform teams manage Kyverno ClusterPolicies and OPA/Gatekeeper ConstraintTemplates in Git. Every team's workload automatically inherits these guardrails because the GitOps operator applies the policies to all clusters. Compliance is built in, not bolted on.
- Observability configuration as GitOps: Prometheus, Grafana, Loki, and alerting rules are themselves deployed and managed via GitOps, ensuring every cluster has a consistent observability stack that can be audited and version-controlled like any other workload.
The net result: developers interact with Git and a friendly UI, not Kubernetes directly. Platform teams own the templates and policies, not the individual deployments. GitOps closes the loop between expressed intent and running reality for every layer of the platform.
