Opsio - Cloud and AI Solutions
8 min read· 1,851 words

ArgoCD App of Apps Pattern: Scale Multi-Cluster GitOps

Published: ·Updated: ·Reviewed by Opsio Engineering Team
Johan Carlsson

Country Manager, Sweden

AI, DevOps, Security, and Cloud Solutioning. 12+ years leading enterprise cloud transformation across Scandinavia

Why the App of Apps Pattern Exists

As Kubernetes adoption matures inside an organisation, the number of workloads an operations team must manage grows non-linearly. A single ArgoCD Application manifest works well for one service. It breaks down operationally when you reach dozens of services spread across development, staging, and production clusters — especially when those clusters span multiple cloud providers or geographic regions.

The App of Apps pattern resolves this by treating ArgoCD Application objects themselves as managed resources inside Git. A single parent Application — the "root app" — points to a Git directory that contains the manifests of every child Application. ArgoCD reconciles the parent first, discovers the child manifests, and then independently reconciles each child. The result is a fully declarative, hierarchical deployment graph that can scale to hundreds of applications across any number of clusters without manual intervention.

This model underpins cluster bootstrapping, environment promotion, and multi-tenant platform engineering. Understanding its mechanics, limits, and operational requirements is a prerequisite for any team running GitOps at enterprise scale.

Core Architecture and Repository Structure

A well-structured App of Apps implementation relies on a deliberate separation of concerns inside the Git repository. A flat structure that works at ten applications becomes unmaintainable at one hundred. The following layout is representative of production deployments that handle multi-cluster, multi-environment topologies:

  • apps/ — contains the parent Application manifest (the root app) and a subdirectory per environment or cluster.
  • apps/dev/, apps/staging/, apps/prod/ — each contains child Application manifests pointing to the Helm charts or Kustomize overlays for that environment.
  • charts/ or manifests/ — the actual workload definitions, versioned independently of the application wiring.
  • clusters/ — cluster-level configuration, including RBAC, network policies, and namespace definitions.

The parent Application is typically set with automated sync disabled or with a manual promotion gate. This prevents a single bad commit from cascading through every environment simultaneously. Child applications can carry their own sync policies, allowing per-service autonomy while retaining central visibility through the ArgoCD UI and API.

Kustomize overlays are the most common templating layer at this level. A base Application manifest is patched per environment to set the correct target cluster, namespace, and Helm value overrides. Helm-of-Helm approaches (wrapping child Application manifests inside a Helm chart) are also valid and allow parameterised generation of child manifests, which is particularly useful when onboarding new services or tenants programmatically.

Free Expert Consultation

Need expert help with argocd app of apps pattern: scale multi-cluster gitops?

Our cloud architects can help you with argocd app of apps pattern: scale multi-cluster gitops — from strategy to implementation. Book a free 30-minute advisory call with no obligation.

Solution ArchitectAI ExpertSecurity SpecialistDevOps Engineer
50+ certified engineersAWS Advanced Partner24/7 support
Completely free — no obligationResponse within 24h

App of Apps vs. ApplicationSet: Choosing the Right Tool

A frequent architectural question is when to use the App of Apps pattern versus ArgoCD's native ApplicationSet controller. The two are not mutually exclusive, and production environments often combine them. The table below summarises the key differences:

Dimension App of Apps ApplicationSet
Manifest authorship Hand-authored or templated via Kustomize / Helm Generated dynamically by ApplicationSet generators
Multi-cluster targeting Manual per child Application Native via Cluster or Matrix generator
Dynamic fleet discovery Requires manual manifest addition Automatic via Git directory or cluster list generators
Deletion behaviour Cascade delete requires explicit configuration (ArgoCD 3.2+) Controlled via preserveResourcesOnDeletion policy
Complexity ceiling Scales well to ~200 apps with disciplined repo structure Scales to 1,000+ apps; generator logic can become opaque
Audit trail Full Git history of every child Application manifest Generator template in Git; rendered manifests ephemeral

For teams prioritising auditability and explicit change control — common requirements in ISO 27001-aligned environments — the App of Apps pattern offers a cleaner audit trail because every child Application manifest is a first-class Git object. ApplicationSet is preferable when the cluster fleet is large and dynamic, such as a managed Kubernetes-as-a-service platform serving multiple internal product teams.

Scaling to Multi-Cluster Environments

Scaling the App of Apps pattern beyond a single cluster introduces several operational considerations that are not immediately obvious from the basic pattern documentation.

Cluster Registration and Secret Management

ArgoCD manages remote clusters via Kubernetes secrets stored in the ArgoCD namespace. At scale, these cluster secrets must themselves be managed declaratively — typically via Terraform or External Secrets Operator pulling credentials from AWS Secrets Manager, Azure Key Vault, or HashiCorp Vault. Manually rotating kubeconfig credentials across twenty clusters is error-prone and not acceptable in a production environment with a 99.9% uptime expectation.

Sync Wave Ordering

When the root app syncs, ArgoCD deploys child applications in parallel by default. In a cluster bootstrap scenario, this causes problems: the cert-manager Application must be healthy before any application that depends on Certificate CRDs is synced. Sync waves — defined via the argocd.argoproj.io/sync-wave annotation — enforce ordering. A typical production wave sequence is: CRD installation → namespace provisioning → platform services (cert-manager, external-dns, Velero) → application workloads.

Health Checks and Custom Resource Definitions

ArgoCD's default health checks do not cover every CRD out of the box. Custom Lua health check scripts must be registered for operators such as Velero backup schedules, Prometheus rules, and Istio VirtualService objects. Without these, ArgoCD will report child applications as healthy when their backing CRDs are actually in a degraded state, which defeats the purpose of continuous reconciliation.

RBAC at Scale

In multi-tenant environments, different teams should have sync and override permissions only for their own child applications. ArgoCD's project-based RBAC model maps cleanly onto this requirement: each team's child applications are placed inside a dedicated ArgoCD AppProject that restricts source repositories, destination clusters, and permitted resource kinds. This prevents a misconfigured child application from deploying into the wrong namespace or cluster, which is a meaningful security control in environments subject to compliance review.

Common Pitfalls and How to Avoid Them

Teams that attempt to implement the App of Apps pattern without prior GitOps experience consistently encounter the same failure modes. The following are the most operationally significant:

  • Cascade deletion surprises. Deleting the root application in ArgoCD versions prior to 3.2 had inconsistent behaviour with respect to child applications and their managed resources. From ArgoCD 3.2 onwards, cascade deletion behaviour is explicit and consistent. Confirm your ArgoCD version before running any deletion operation in production, and always test deletion paths in a non-production cluster first.
  • Repository sprawl. Splitting application manifests across too many repositories increases the cognitive overhead of tracing a deployment failure. A mono-repo or a disciplined multi-repo strategy with clear ownership boundaries is preferable to ad hoc repository proliferation.
  • Ignoring drift detection. Auto-sync is not equivalent to drift prevention. Resources mutated directly via kubectl — bypassing Git — will be reverted by ArgoCD, which can be surprising to teams not fully committed to GitOps discipline. Establish clear policies about out-of-band changes and enforce them through 24/7 monitoring.
  • Insufficient resource quotas on the ArgoCD controller. At scale, the application controller's memory footprint grows with the number of managed resources. Organisations running 200+ child applications without tuning the controller's cache sharding and resource limits will encounter OOMKill events and reconciliation delays.
  • No promotion gate between environments. Allowing child applications in production to auto-sync from the main branch without a promotion workflow introduces unacceptable deployment risk. Use branch-based or tag-based promotion strategies, or integrate ArgoCD with a CI pipeline (GitHub Actions, GitLab CI, Tekton) to enforce approval gates before production sync.
  • Missing backup and disaster recovery. ArgoCD's own configuration — projects, applications, RBAC policies — is stateful. Back up the ArgoCD namespace using Velero on a schedule aligned with your RTO/RPO requirements. Without this, recovering from a corrupted ArgoCD installation at 2:00 AM requires reconstructing manifests from Git, which is slower than a restore.

Observability and Security Integration

Operating the App of Apps pattern at enterprise scale requires observability beyond the ArgoCD UI. Key integrations include:

Metrics and alerting: ArgoCD exposes a Prometheus metrics endpoint. Critical alerts include argocd_app_sync_total for sync failure rates, argocd_app_info for health status degradations, and reconciliation queue depth. Route these alerts into PagerDuty or OpsGenie for 24/7 on-call coverage.

Audit logging: All ArgoCD API interactions — syncs, rollbacks, permission changes — should be shipped to a SIEM. In AWS environments, this typically means CloudWatch Logs forwarded to a security analytics platform. In Azure environments, Microsoft Sentinel is the natural destination. Audit log retention must meet the requirements of your compliance framework; for ISO 27001-aligned deployments, a minimum of one year is standard.

Policy enforcement: Integrate OPA Gatekeeper or Kyverno as admission controllers on every target cluster. This ensures that even if a child Application attempts to deploy a non-compliant manifest — missing resource limits, disallowed image registries, absent security contexts — the deployment is blocked at the Kubernetes API server level rather than relying solely on GitOps process controls.

Cloud-native security: In AWS multi-cluster deployments, enable Amazon GuardDuty with EKS Runtime Monitoring on every cluster managed by ArgoCD. Anomalous process execution inside a pod — potentially indicating a compromised workload that was deployed via a compromised GitOps pipeline — will surface as a GuardDuty finding before it can propagate laterally.

How Opsio Implements App of Apps for Enterprise Clients

Opsio operates from its headquarters in Karlstad, Sweden, and its delivery centre in Bangalore, India, serving mid-market and enterprise clients across Nordic and global markets. Our engineering team holds CKA and CKAD certifications and has delivered more than 3,000 infrastructure projects since 2022. The following is how we operationalise the App of Apps pattern for clients with real compliance and availability requirements.

As an AWS Advanced Tier Services Partner with AWS Migration Competency, a Microsoft Partner, and a Google Cloud Partner, Opsio designs multi-cluster ArgoCD architectures that are cloud-agnostic at the workload layer but leverage native services — GuardDuty, Azure Policy, GKE Config Connector — at the security and compliance layer. This avoids vendor lock-in for application teams while preserving access to the most effective cloud-native security controls.

Our Bangalore delivery centre holds ISO 27001 certification, which directly informs how we structure GitOps repositories: secrets never appear in Git, all cluster credentials are managed via External Secrets Operator with audit-logged access, and every production ArgoCD project is scoped with least-privilege RBAC. Our 24/7 NOC, staffed by more than 50 certified engineers, monitors ArgoCD sync health, reconciliation lag, and cluster-level security findings continuously — providing clients with a 99.9% uptime SLA on managed Kubernetes platforms.

For clients migrating from manual Helm releases or legacy CI/CD pipelines, Opsio delivers a structured App of Apps implementation that includes:

  • Repository structure design aligned with the client's existing branching strategy and team topology.
  • Sync wave sequencing for platform dependencies — cert-manager, external-dns, Velero, ingress controllers — before application workloads.
  • Terraform-managed cluster registration and credential rotation, eliminating manual kubeconfig management.
  • Custom Lua health check scripts for all non-standard CRDs in the client's environment.
  • Prometheus alerting and SIEM integration for audit log forwarding, designed to satisfy ISO 27001 control requirements.
  • Velero backup schedules for ArgoCD state with tested restore runbooks, ensuring recovery time objectives are met without heroic manual effort.

The App of Apps pattern is not complex in theory. It is demanding in practice — particularly when the cluster fleet grows, compliance requirements are non-negotiable, and the operations team cannot afford a reconciliation failure to go undetected at 3:00 AM on a Saturday. That operational gap is precisely where Opsio's combination of certified engineering depth, cloud partner status, and round-the-clock NOC coverage delivers measurable value to enterprise GitOps programmes.

About the Author

Johan Carlsson
Johan Carlsson

Country Manager, Sweden at Opsio

AI, DevOps, Security, and Cloud Solutioning. 12+ years leading enterprise cloud transformation across Scandinavia

Editorial standards: This article was written by a certified practitioner and peer-reviewed by our engineering team. We update content quarterly to ensure technical accuracy. Opsio maintains editorial independence — we recommend solutions based on technical merit, not commercial relationships.