Applying a GitOps approach to building infrastructure across the entire organization, including integrations with Kubernetes operators, public clouds, and external services.

Petr Stukalov
Feb 26
23 min read

Basic CI/CD Pipeline with GitHub, GitHub Actions, Helm, and Deployment on AWS EKS
More complex CI/CD when multiple microservices are involved and the problems that arise at this stage
Introduction of a Central Repository as a Single Source of Truth
Overall change management scheme and how changes are promoted through environments (ArgoCD + Jsonnet)
First Infrastructure Layer Using Terraform/Terragrunt
Second Infrastructure Layer Using ArgoCD
Infrastructure Layer “2.5” Using the TFC Operator
Control Center and State Components
Integrating CI Pipelines with the State
Integrating Secret Management (SealedSecrets and External Secrets) with the State
Emergency Rollback Mechanism to a Previous Consistent State
Final summary

Basic CI/CD Pipeline with GitHub, GitHub Actions, Helm, and Deployment on AWS EKS

This scenario assumes you have a code repository on GitHub that contains:

The microservice’s source code (managed by developers).
A Helm chart, deployment configurations, and related tools (managed by the DevOps engineer).

Workflow Overview

Developer ChangesWhen a developer makes changes to the code and pushes them (or opens a Pull Request) in the main repository, GitHub Actions automatically triggers a sequence of steps.
Build and Test
- GitHub Actions retrieves the code from the repository.
- It builds the application (for example, into a Docker image).
- It runs unit tests and checks code quality (linting).
- If all tests pass, the code is considered ready for deployment.
Image Publishing
- If the build succeeds, the Docker image is pushed to a container registry (e.g., Amazon ECR or Docker Hub).
- DevOps can configure details (image version, tags, etc.) in the Helm chart’s values.
Helm Chart and Configuration Updates
- The repository contains a Helm chart, which DevOps sets up for the specific service and environment.
- Any changes to the Helm chart (like updating the image version, replicas, or resource limits) happen via a Pull Request. DevOps reviews the parameters, while developers can see how it affects their service.
Deployment to AWS EKS
- GitHub Actions, connects to the AWS EKS cluster (using kubectl/helm plugins).
- It performs an install or upgrade of the Helm chart using a command like helm upgrade --install.
- The application is deployed or updated to the new version, referencing the Docker image from the previous step.

Transparent and Conflict-Free Process

Because both the application code and the Helm chart (with all parameters) are versioned in Git, changes are tracked via commits and Pull Requests.
DevOps sees required infrastructure changes; developers maintain control of their application code. There is no “gap” between code and infrastructure—everything remains in sync.

Basic Quality Control

After the deployment, you can add basic availability checks (health checks) and set up logging and metrics.
If everything is running well (Pods in “Running” state and no errors in logs), the release is deemed successful.

In short, there’s little friction merging Helm chart changes with the application’s code: everything is in one repository, fully version-controlled. Pull Requests let both developers and DevOps specialists coordinate changes in code and in Helm configurations. This is a good setup for small projects.

More complex CI/CD when multiple microservices are involved and the problems that arise at this stage

What if we have many repositories and microservices? How can we be sure that changes in one repository are consistent with other repositories?

Different Repositories

Each microservice has its own code, its own Helm chart.For example, one microservice might require a specific set of Kubernetes resources (Ingress, StorageClass, ConfigMap, Secrets), while another needs a different set. If a new resource (like a new S3 bucket, an additional Security Group, an environment variable, etc.) is added or changed in one place, you have to account for it in several places at once.

Without centralized control, it’s easy to end up in a situation where one repository is updated with the new resources, but another one isn’t. This leads to discrepancies in what’s actually deployed in the cluster.

Three Environments (DEV, PREPROD, PROD)

Each environment has its own configurations and requirements. For example, DEV might have simplified settings (minimal resources), while PROD might have high resource limits and additional security policies.

You need to promote changes through all environments in sync: if you introduced a new variable/secret/security group setting in DEV, it must be properly transferred to PREPROD and PROD. There’s a risk that some services will switch to the new configuration in DEV while others remain on the old settings, or that something “gets forgotten” during the deployment to PROD.

Lack of a “Single Source of Truth”

When Helm charts and configurations for microservices are scattered across different repositories, there is no common place that clearly describes the current state of all services in all environments. A developer or a DevOps engineer might build a new Docker image, but someone else failed to include the necessary environment variables; or they updated a Terraform module but forgot to adjust the Helm chart.

As a result, the DevOps engineer (even if it’s just one person) ends up “juggling” multiple files and versions, tracking them in their head or in additional documentation. The slightest mistake leads to one environment running a service with new resources, while another still runs the old ones.

Complexity of Synchronizing Changes

Some microservices may depend on shared resources (for example, a shared S3 bucket, a shared Redis instance, an IAM policy, etc.). If you fix the settings for these shared resources in one repository, you need to make the corresponding changes in the others, too. It often happens that the DevOps engineer “forgot” or “didn’t have time” to synchronize all repositories, and as a result, conflicts arise during deployment: one service expects updated settings, while others do not.

Human Factor and the Increasing Risk of Errors

Even if there are only a few repositories and three environments, every update involves many small actions: changing an environment variable somewhere, rebuilding a chart in another place, adding a new AWS resource somewhere else. Without a centralized mechanism that shows the complete picture, it’s easy to overlook a detail. For example, a service might be deployed to DEV but never updated in PREPROD, PROD might still be using an older Helm chart.

All this leads to unpredictable failures, time spent on manual checks, and last-minute “firefighting” patches.

Ultimately, with a more complex architecture (multiple microservices, three environments, scattered resources), the main problem is the absence of a centralized way to manage versions and configurations. You can’t get a single-glance understanding of which service is on which version and with what parameters are currently deployed. As a result, the DevOps engineer has to rely more on manual steps and remember too many details, which inevitably leads to errors and delays.

Introduction of a Central Repository as a Single Source of Truth

To simplify managing configurations and application versions across multiple environments, it is practical to use a central repository that serves as the single source of truth. In this repository, we store:

A Helm chart – defining the structure and rules for deploying the application (or a set of applications).
YAML files – containing parameters and configuration data required for templating (service versions, environment variables, references to additional resources, etc.).

As a result, all infrastructure logic and its related settings are consolidated in one place.

How It Works

State Updates When a new microservice version becomes available or configuration changes are needed (e.g., updating environment variables, adding resources), the relevant parameters are added to the YAML files in the central repository.
Git Commit All adjustments are made via commits or Pull Requests, ensuring transparency, allowing reviews, and enabling quick rollbacks if necessary.
CI Pipeline Trigger A CI process linked to the central repository watches for new commits. When they appear, it generates the final manifests based on the Helm chart and the updated YAML files.
Deployment to the Target Environment The generated manifests are automatically applied to the target environment (DEV, PREPROD, or PROD). Thus, any update reflected in the repository is promptly synchronized with the cluster.

Advantages

Single Source of Truth All information about microservices, image versions, and key parameters is stored in one location, increasing transparency and preventing synchronization issues among various teams or repositories.
Simplified Management Because all configurations are stored centrally, it is straightforward to locate and modify the required parameters. There is no need to change multiple repositories in parallel.
Automation and Error Reduction CI pipelines within the central repository handle manifest creation and deployment, reducing human error and manual operations.
Complete Change History Every update is tracked in Git, making it easy to see who made a change and when, and to revert to a previous stable version if necessary.

Drawbacks

Push-Based Deployment With manual or CI-initiated deployment (e.g., using helm upgrade), it can be difficult to track which updates have been applied, especially if an error occurs during the process. There is no automated mechanism to continually verify that the cluster matches the declared configuration.
Limited Helm Templating As infrastructures scale, Helm charts may become large and unwieldy, and working with Go templates can introduce complexity. This can make more advanced data structures or logic difficult to manage.
No Built-In In-Cluster State Management In the current approach, configurations are pushed to the cluster. There is no built-in mechanism to continuously track and reconcile the actual state of the cluster with the repository in real time.

Challenges in Scaling to Multiple Teams

As the number of services, environments, and contributors (developers, DevOps, QA, etc.) increases, coordinating changes can become more complex. This may lead to merge conflicts and communication issues if processes are not carefully managed.

Overall change management scheme and how changes are promoted through environments (ArgoCD + Jsonnet)

To move away from the "push" approach and the complexities of Helm, we are transitioning to ArgoCD and Jsonnet, where the cluster itself "pulls" the required configuration from the Git repository. This enables us to centrally manage changes and safely promote new versions across multiple environments.

We have a central repository with our desired state (template).

Blue arrows show the work of the CI system, when we build a new Docker image or update secrets, all changes are sent to the DEV branch and DEV environment.

Red arrows show the movement of changes between environments. If we have an error in the template code, it will be found in the DEV environment. The compatibility of the template with the environment is tested when moving DEV->PREPROD.

The black arrows show that ArgoCD is constantly reading changes from the environment-matching branch and constantly applying them.

Central Repository and Three Branches A central repository is created with branches for DEV, PREPROD, and PROD, each corresponding to its respective environment. Instead of Helm charts, we use Jsonnet, which collects all necessary data (microservice versions, environment settings, etc.) and generates the final Kubernetes manifests.
"Control Center" (YAML Files) These are YAML files structured to address the most common needs of developers and the entire company: service versions, key parameters, flags for enabling/disabling specific options, etc. The core idea is that the "Control Center" provides a clear schema tailored to the developers' level of understanding. They don’t need to dive into the intricacies of the infrastructure or rely on DevOps for every issue—they can independently update versions or parameters through standard operations.All changes are initially applied to the DEV branch. If an error is detected, it can be caught in checks in the DEV or PREPROD stages.
File(s) with Docker Tags A separate YAML file (or multiple files) is maintained to map each microservice "version" to its corresponding Docker tag.When the CI system (after building a microservice) generates a new tag (e.g., myservice:0.3.11), it commits the updated data to the DEV branch of the central repository.
ArgoCD and the "Pull" Approach
- DEPLOY to DEV ArgoCD, which monitors the DEV branch, detects changes in the "Control Center" and tag files, assembles the final manifests using Jsonnet, and applies them to the DEV cluster. This process occurs without manual "push" commands—ArgoCD automatically "pulls" updates from Git.
- Promotion of Changes After validation in DEV, changes are submitted via a Pull Request (or merged) into the PREPROD branch. ArgoCD, monitoring the PREPROD branch, picks up the updated configurations and deploys them to the PREPROD cluster. Similarly for PROD: after successful testing in PREPROD, the changes are moved to the PROD branch, and the PROD cluster is automatically updated.
Other Data Sources The final "state" of an environment may include additional components such as secrets, environment variables, etc. How these are integrated into the process (e.g., via SealedSecrets, External Secrets, or additional parameters) will be covered in subsequent sections.

Thus, we have outlined the fundamental principles of the GitOps approach, where all changes are stored in Git, and the ArgoCD system automatically aligns the clusters with the repository state. Further details and nuances will be explored in additional materials.

First Infrastructure Layer Using Terraform/Terragrunt

This layer is responsible for laying the foundation of our infrastructure. It not only describes the network topology and core resources (VPC, subnets, IAM roles) but also deploys the Kubernetes cluster along with key operators. All of this is necessary so that subsequent stages can build GitOps processes and other high-level mechanisms (CI/CD, automation, etc.). Changes to this layer are made very rarely—on average, just a few times a year.

Why This Layer Is Needed

Core Resources

Creation of the network (VPC, subnets, routing), IAM role policies, security settings, and so on.
Without this, the subsequent levels (CI/CD, GitOps) cannot function.

Kubernetes Cluster and Operators

Automatic installation of ArgoCD, a Load Balancer, Argo Workflows, and Karpenter, so that from the very start, you have a “live” platform for deploying applications.
This enables immediate access to autoscaling, GitOps pipelines, and other capabilities.

Consistency Across Environments

By using Terraform modules and Terragrunt, we can recreate DEV, PREPROD, and PROD environments “identically,” changing only key variables.
Updates to this layer are rare (a couple of times a year) because the fundamental infrastructure does not require frequent changes.

Terraform + Terragrunt: Brief Description and Advantages

Terraform is the primary tool for declaratively describing infrastructure, with a broad range of supported providers.
Terragrunt is a wrapper around Terraform that simplifies configurations, helps avoid copy-paste, and manages dependencies between modules.

Advantages of Terragrunt

DRY (Don’t Repeat Yourself): Encourages storing shared Terraform code in reusable modules and referencing them in minimal Terragrunt configs for different environments, reducing duplication.
Dependency Management: Automates the order in which resources should be deployed, ensuring prerequisites are in place before dependent components.
Centralized Configuration: Allows you to define environment-specific variables (e.g., region, instance types) in Terragrunt files while reusing the same module code across multiple environments.

Process of Obtaining Environment Parameters

During the deployment of the Kubernetes cluster and the associated resources (NAT gateways, IAM roles, OIDC configuration for IRSA, databases, etc.), Terraform/Terragrunt generates “output” values, such as:

NAT addresses (if outbound traffic goes through NAT),
Resource ARNs (IAM roles, security groups, etc.),
OIDC parameters (linking Kubernetes service accounts to IAM policies—IRSA),
Endpoints for databases and other services needed in the cluster.

Storing Parameters and Their Role in the Next Layer

To enable subsequent layers to use the results of the first layer, we store these parameters in a central repository. For each environment (DEV, PREPROD, PROD), there is a dedicated YAML file:

env-dev.yaml — parameters for DEV,
env-preprod.yaml — for PREPROD,
env-prod.yaml — for PROD.

These YAML files do not “travel” through the branches (DEV → PREPROD → PROD) during merges; instead, we write the data for each environment directly to the corresponding branch. This prevents merge conflicts and avoids mixing up different environments.

The second layer, which uses ArgoCD, relies on the data from these YAML files during templating. That is, when ArgoCD “pulls” the configuration, it needs NAT addresses, ARNs, database endpoints, etc., in order to deploy services or configure the environment correctly. Therefore, the parameters generated by Terraform/Terragrunt in the first layer become one of the key input sources for subsequent stages (CI/CD, GitOps).

This is the only case where changes are made not only to the DEV branch.

First layer summary

Thus, the first layer provides a consistent and reliable foundation: the network, a Kubernetes cluster with operators (ArgoCD, LB, Argo Workflows, Karpenter), and a set of key environment parameters stored in the central repository and required by the second layer (ArgoCD).

Second Infrastructure Layer Using ArgoCD

If the first layer builds the foundation (Kubernetes cluster, basic operators, network resources) and provides “output” parameters (NAT addresses, ARNs, endpoints), then the second layer is the most critical from the standpoint of day-to-day operations. In this layer, we use ArgoCD to:

Pull data from multiple sources (the *.yaml files with environment parameters, CI build results, Docker image versions, application configurations, etc.).
Template the final configuration (using Helm, Jsonnet, or other tools).
Create/update all necessary resources in the cluster

Changes in this layer can occur many times a day, because every update to an application or configuration must be committed to Git and then picked up by ArgoCD.

Main Tasks of the Second Layer

GitOps Automation

ArgoCD continuously monitors a branch (DEV, PREPROD, or PROD) in Git. When changes are introduced (via PR or direct commit), ArgoCD detects the new commit and synchronizes the cluster.Promoting changes from one environment to another relies on standard Git mechanisms (merge/PR), and ArgoCD automatically applies the new configurations.

Templating and Data Consolidation

Environment parameters (from env-*.yaml), image versions (from CI), application settings, and other data sources (e.g., SealedSecrets, JSON/YAML configs) are combined in templates (Helm, Jsonnet, etc.).As a result, ArgoCD generates the final Kubernetes manifests, which are then applied to the cluster.

Deploying Applications

This second layer contains specific manifest templates for each microservice and resource (Ingress, ConfigMap, Service, Deployment, etc.).

Advantages and Capabilities of ArgoCD

Real-Time Logs and DebuggingArgoCD provides a convenient web interface where you can view real-time logs. This is useful for both DevOps and developers to understand what went wrong during a deployment. The interface shows the relationships between resources and allows you to restart pods or re-sync an application if needed.

Security: Secrets Are Not DisplayedArgoCD does not reveal the contents of secrets, enabling DevOps and developers to safely use the interface without risking leakage of confidential information.

Uniformity and TransparencyEvery merge/PR into the environment branch leaves a clear record of who changed what and when.ArgoCD automatically aligns the cluster with Git, eliminating situations in which a new version is “forgotten” or “skipped.”

Synchronization Process

Commit to GitA developer or DevOps engineer modifies the configuration (for example, updates Helm values or Docker tags), and after review, these changes are merged into the corresponding branch (DEV, PREPROD, or PROD).
ArgoCD Detects New CommitsArgoCD periodically checks the repository. Upon discovering new changes, it initiates a synchronization.
DeploymentArgoCD aggregates all necessary data (from env-*.yaml, CI, secrets, etc.), applies templates (Helm/Jsonnet), forms the final manifests, and deploys them to the cluster.In case of errors, ArgoCD displays them in the interface, provides logs, and allows debugging.

The Ultimate Value of the Second Layer

Centralized management of all resources — from operators to business applications.
Clear traceability: every deployment is tied to a specific commit, making it easy to see who changed what.
High iteration speed: changes to code and configurations are introduced continuously, which is particularly important when updates occur multiple times a day.

Thus, the second layer provides GitOps automation, taking inputs from the first layer (plus information on image versions and other configurations) and generating/applying the final resources. It is here that the bulk of DevOps and developer operational work takes place, with frequent updates, log viewing, and resource management through ArgoCD’s convenient interface.

Infrastructure Layer “2.5” Using the TFC Operator

After we have created the Kubernetes cluster and basic operators in the “first layer,” and implemented a GitOps approach with ArgoCD in the “second layer,” any cloud or other external resources that a microservice requires are created in this layer. For example, a developer can describe the desired configuration in the “control center” YAML, and ArgoCD automatically generates a resource for the TFC Operator, which then launches the corresponding Terraform module. The result of the module’s work might be a ConfigMap or Secret in Kubernetes, used by microservices. In this way, “layer 2.5” makes it possible to dynamically manage external resources (AWS, GCP, Azure, etc.) directly from the cluster while preserving the benefits of Terraform.

The Essence and Purpose of Layer 2.5

Managing External Resources via GitOpsInstead of running Terraform manually or through separate pipelines, we want changes in Git to automatically trigger a Terraform run on Terraform Cloud.The status and results of the Terraform apply return via a Secret created by the deployed module.
Supplement to ArgoCDArgoCD continues to handle the synchronization of “native” Kubernetes resources and applications within the cluster.However, if we need a new S3 bucket or other external resource, for example, we describe this in the form of a Custom Resource recognized by the TFC Operator. It launches the Terraform plan/apply in Terraform Cloud, and data about the resulting resources is reflected in the CR’s status.
A “Half-Layer” Between the Foundational Infrastructure and Applications
- The “first layer” (Terraform/Terragrunt) rarely changes and creates the fundamental infrastructure.
- The “second layer” (ArgoCD) manages configuration and services inside the cluster many times a day.
- “Layer 2.5” enables the dynamic creation or modification of external Terraform resources as needed (for example, when a new microservice needs an additional AWS resource).

How the TFC Operator Works

Installation and CRDThe TFC Operator itself is deployed in the cluster, adding Custom Resource Definitions (CRDs).When such an object (CR) appears, the TFC Operator understands that Terraform code needs to be executed.
Connection to Terraform CloudThe operator knows about tokens, the workspace, and the settings in Terraform Cloud (or Terraform Enterprise).When we create a CR specifying: “Use this Terraform module with these variables,” the operator sends the job to Terraform Cloud, monitors the plan/apply process, and updates the CR’s status.
Synchronization with GitArgoCD sees the YAML files describing the CR for the TFC Operator. When we change them in Git (through a Pull Request), ArgoCD applies the changes in the cluster.The TFC Operator “reads” the new CR, launches the corresponding Terraform processes in the cloud, and the result is reflected back in the resource’s status or special secret or configMap.

Advantages of This Approach

Dynamic Resource CreationThere’s no need to touch the “first layer” again: if a project needs a new S3 bucket or SQS queue, we simply add a CR, and Terraform runs from inside the cluster.This is especially useful for short-lived (ephemeral) or experimental resources.
Unified GitOps ProcessAll descriptions, including external Terraform resources, are stored in Git and go through ArgoCD. There is no need for a separate pipeline or manual terraform apply.The logic of “push commit → ArgoCD → CR → TFC Operator → Terraform Cloud → resource is ready” simplifies version control and rollbacks.
Separation of Responsibilities
- “Layer 1” remains stable (changes very rarely).
- “Layer 2.5” allows DevOps engineers (or even developers themselves) to deliver new capabilities more quickly without touching the foundational infrastructure.

Constraints and Nuances

Dependency on Terraform CloudA Terraform Cloud (or Terraform Enterprise) account is required, which means there’s some “lock-in” to that service.

Layer “2.5” Conclusion

“Layer 2.5” with the TFC Operator brings the flexibility of Terraform into our GitOps environment:

ArgoCD still manages the YAML descriptions,
The TFC Operator takes those descriptions, triggers the plan/apply in Terraform Cloud, and returns the result to the cluster.

This provides a dynamic, decomposed solution where the foundational infrastructure (layer 1) is almost untouched, and new external resources can be created or changed “on the fly” through familiar Git processes.

Control Center and State Components

In most cases, DevOps and infrastructure specialists strive to minimize manual configuration changes. However, developers often modify microservice versions, resource parameters, and other settings. To avoid having them constantly turn to DevOps, we use a “Control Center”—YAML files specifically adapted to the needs and workflow of the company and its developers.

In some situations, developers do not need to change anything at all—if all parameters have already been specified in advance or if the deployment is fully automated. But when the need arises (changing a version, adjusting limits, enabling features, etc.), the Control Center provides a convenient way for them to make edits without delving deeply into the infrastructure.

What Is the “Control Center”?

The Control Center is one of the data sources used during templating (along with Terraform outputs, CI information, etc.). It describes key parameters such as:

Microservice/application versions: these might be Git revisions (tags, branches, commit hashes).
Environment parameters (URLs of external services, etc., if manual management is needed).
Flags and settings specific to the project (memory limits, replicas, feature toggles, etc.).

Example Structure

Below is a shortened example of a YAML file for dev, preprod, and prod environments. It contains versions, resources, and other settings. Instead of tag, you could, for example, use git_rev to specify a branch or commit hash:

# Example (shortened)
dev:
  elk_endpoint: https://search-example-dev-...
  service_common:
    git_rev: v1.0.86  # Could be a tag, branch, or commit
  service_rpps:
    git_rev: v1.0.150
    memory_min: 2Gi
    memory_max: 4Gi
    replicas: 1
  ...

preprod:
  elk_endpoint: https://search-example-preprod-...
  service_rpps:
    git_rev: v1.0.146
    memory_min: 2Gi
    memory_max: 4Gi
  ...

prod:
  elk_endpoint: https://search-example-prod-...
  service_rpps:
    git_rev: v1.0.146
    memory_min: 2Gi
    memory_max: 4Gi
  ...

Why It’s Needed

Developers can manage things themselves. There is no need to bother DevOps every time; simply modify the YAML file.
Integration with GitOps. Any edits in the Control Center are stored in Git. ArgoCD (or another operator) picks up the new values and applies them to the environment.
Flexibility and versatility. We can specify a Git tag, a branch, or a commit hash, as well as memory and replica settings. Each environment has its own block.

How Changes Are Made

All edits to the Control Center’s YAML file are first committed to the DEV branch, even if they affect the production section. Thus, changes go through the usual promotion process (merge or PR) and do not reach the real PROD environment until they are approved and merged into the corresponding branch. This ensures strict control and auditing: it’s easy to see who initiated the PROD update and when, and to roll back quickly if necessary.

Once changes are made:

ArgoCD reads the YAML file (for example, env.yaml), selects the environment section (DEV, PREPROD, PROD, etc.), and inserts the necessary parameters during templating (Helm, Jsonnet).
Developers can change the Git revision (tag, branch, commit hash), resource limits, or flags if they want to quickly customize the configuration without DevOps involvement.

Advantages

Low entry threshold: basic knowledge of YAML structure is enough; no deep DevOps skills are required.
Traceability and rollbacks: any commit shows who changed the configuration and when. If an error is discovered, it’s easy to revert the changes.
Scalability: as the project expands, you can add new blocks and environments without breaking the existing structure.

Control Center conclusion

The “Control Center” is a developer-friendly set of YAML configurations that allows them to manage Git revisions, flags, resources, and environment parameters. All changes are stored in Git and applied to the cluster automatically (via ArgoCD or other tools). As a result, DevOps can focus on more complex tasks, while deployments remain transparent, flexible, and secure.

Integrating CI Pipelines with the State

Each microservice has its own repository and its own CI process. When a build and tests are completed, the CI updates a special YAML file—such as tags.yaml—which lists microservices by name and the mappings between their revisions (Git tags, branches, commits) and Docker tags. This makes it easier to track the current versions ready for deployment.

Example of tags.yaml

Below is a shortened example of the contents of tags.yaml. For each microservice (for example, service1 and service2), different revisions (branches, tags, etc.) and their corresponding Docker images are specified:

service1:
  test: 123456789012.dkr.ecr.us-east-1.amazonaws.com/service1:17
  1.0.1: 123456789012.dkr.ecr.us-east-1.amazonaws.com/services1:18
  ...
  v1.0.86: 123456789012.dkr.ecr.us-east-1.amazonaws.com/service1:59

service2:
  test: 123456789012.dkr.ecr.us-east-1.amazonaws.com/service2:25
  1.0.2: 123456789012.dkr.ecr.us-east-1.amazonaws.com/service2:25
  ...
  v1.0.52: 123456789012.dkr.ecr.us-east-1.amazonaws.com/service2:78

Thus, when the CI process builds a new version “v1.2.3” of a particular microservice, it can automatically add the mapping v1.2.3 → <docker_image> to tags.yaml and commit the changes to the DEV branch, creating a record of the change.

Typical Scenario

Build and Tests The CI (e.g., GitHub Actions, GitLab CI, Jenkins) retrieves the code, compiles it, and runs tests, ultimately producing a new version (say, v1.2.3).
Modification Using a script, the CI updates the YAML file—adding (or updating) the entry for the microservice to indicate that v1.2.3 corresponds to a specific Docker image.
Commit to the DEV BranchAfter editing tags.yaml, the CI pushes the changes directly to the DEV branch. A log is kept of who made the edit and when.
GitOps (ArgoCD)ArgoCD “watches” the repository and notices the update. If the configuration specifies deploying the new revision (for instance, via the “Control Center”), ArgoCD applies the changes to the cluster, pulling in the fresh Docker image.

Advantages

Automation: No need to manually edit YAML files after each build.
Transparency: Every commit in Git shows why and when the version was updated.
Easy Rollbacks: Since all changes are logged in the repository, reverting to a previous version in case of an error is straightforward.

CI Pipelines conclusion

Integrating CI with the tags.yaml file streamlines the “code → build → version update → deploy” pipeline. Any new microservice version is automatically added to the YAML, and ArgoCD detects it and brings the environment up to date. As a result, releases happen faster and require fewer manual steps.

Integrating Secret Management (SealedSecrets and External Secrets) with the State

General GitOps Principles and Secrets
In a GitOps approach, we aim to store all configuration in Git so that the cluster automatically pulls changes from the repository (via ArgoCD). However, we cannot place sensitive data (passwords, keys, credentials) in plain text. To maintain the convenience of GitOps while securely protecting confidential information, we use special tools:
- SealedSecrets: secrets are encrypted and stored in the repository as an encrypted secret.
- External Secrets: sensitive data remains in an external vault (Vault, AWS Secrets Manager, etc.), and only references are kept in Git.
Both approaches integrate well with the GitOps model, where the repository remains the “single source of truth” (our “State”), and ArgoCD automatically applies the changes.

SealedSecrets: Encrypt and Store in Git

How It Works

Controller in the ClusterInstall the SealedSecrets Controller, which can decrypt SealedSecret objects and convert them into ordinary Secret objects in Kubernetes.
Private Repositories by Environment For each environment (DEV, PREPROD, PROD), there is a private repository containing unencrypted YAML files with secrets. Access to these repositories is limited to the CTO and DevOps personnel (and in DEV, possibly developers as well).
CI, Encryption, and the Central Repository When a secret in a private repository changes, CI is triggered, searches for the “raw” secrets, and encrypts them using the kubeseal utility. The resulting SealedSecrets are combined into a single secrets.yaml (with sections for DEV, PREPROD, PROD) and committed to the central repository.
ArgoCD ArgoCD “pulls” the fresh secrets.yaml. The SealedSecrets Controller decrypts each record and creates the actual Secret resources. Applications access the secrets as usual.

Example Flow

A DevOps engineer or developer (in DEV) edits myapp-secret.yaml in a private repository.
CI encrypts the changes as a SealedSecret.
The resulting SealedSecret is added to secrets.yaml in the appropriate section (for example, prod) and committed to the DEV branch of the central repo.
ArgoCD notices the new commit and applies the changes.
The SealedSecrets Controller decrypts the data in the cluster and creates a Secret.
The application gets the updated confidential data.

Important: Even if this is a secret for PROD, the changes first go to the DEV branch, but in the prod section. After review and testing, they are merged into PREPROD and PROD.

Advantages and Disadvantages

Pros:
- No external secrets service needed.
- ArgoCD integrates easily with SealedSecrets.
- Every secret change goes through Git.
Cons:
- Secrets (even though encrypted) are still stored in Git.
- Auditors sometimes require that no copies of secrets be kept, even in encrypted form.

External Secrets: Secrets Outside Git

Main Idea Instead of storing encrypted secrets in the repository, we use an external service (Vault, AWS Secrets Manager, AWS SSM, Google Secret Manager, etc.). In Git, we store only the ExternalSecret object, which describes where to find the real data. When ArgoCD applies the manifest, the External Secrets operator retrieves the secrets from the vault and creates a normal Secret in the cluster.

Pros:
- Secrets never end up in Git (even in encrypted form).
- It’s easier to comply with strict regulations (PCI DSS, SOC2, HIPAA, etc.).
Cons:
- You need to deploy and maintain an external service.

“Control Center” and the State In our project, the “State” is formed from various sources: Terraform outputs (network, IAM, databases, etc.), CI tags, The “Control Center” YAML files, And other data.

SealedSecrets: The “Control Center” or the template can include references to secrets in secrets.yaml.
External Secrets: In the “Control Center,” we specify a reference to the secret and its version. The template generates an ExternalSecret. ArgoCD applies it.

Promoting Changes Promotion of changes occurs as usual—between environments, you can propagate either an encrypted secret or the updated path to the secret in the vault.

Rotation and Backups

SealedSecrets: Generate and commit new encrypted values manually.
External Secrets: Change the password in Vault/SSM, and the operator picks it up automatically.

Conclusion and Tips

SealedSecrets: Good for scenarios without strict requirements, where you don’t want to set up Vault.
External Secrets: Slightly more complex, but it completely avoids storing secrets in Git.

Both options adhere to the GitOps approach: all changes (configurations, links, or encrypted objects) go through Git, and ArgoCD automatically synchronizes the cluster.

Emergency Rollback Mechanism to a Previous Consistent State

In the context of GitOps, where all changes are stored and promoted through Git, the command git revert HEAD provides a simple and quick way to return to a previous stable state without disrupting the versioning logic. ArgoCD automatically pulls the rollback commit into the cluster and aligns all resources with this “reverse” commit, ensuring configuration consistency. This rollback preserves transparency (the entire history is visible in Git), avoids manual “patching,” and offers convenience: with literally one operation, we restore the system using the same GitOps processes as in a normal deployment.

Final summary

Single Source of Truth (GitOps) Everything is stored in Git—code versions, infrastructure manifests, and “Control Center” data (YAML parameters, service versions, flags). This ensures complete transparency: each commit changes the configuration, making it easy to track and revert.
Developer Convenience With the “Control Center,” developers can change microservice versions and key parameters without diving into intricate infrastructure details. ArgoCD also provides a web interface showing which applications are deployed, how they are connected, and their status.
Automation and Consistency with ArgoCD Any change in Git (whether Helm/Jsonnet updates, SealedSecrets, or External Secrets) is automatically “pulled” and applied to the cluster by ArgoCD. This simplifies deploying new versions and minimizes manual interventions.
Scalability and Structure The approach scales easily across multiple environments (DEV, PREPROD, PROD). By strictly separating branches in Git and using a unified management scheme (CI artifacts, the “Control Center,” Helm/Jsonnet), consistency is maintained even as the number of services grows.
Dynamic Resource Management via TFC Operator If you need an S3 bucket or any other cloud integration, everything is described as a CR (Custom Resource), and Terraform Cloud creates/updates the resource. This allows you to manage external resources directly in the GitOps workflow without manual terraform apply.
Fast Recovery and Rollback If issues arise in production, you can revert to a previous stable commit in Git. ArgoCD automatically restores the cluster to the previous configuration. Everything is transparent, with no on-the-fly “patches.”
Change Promotion via Pull Requests Before each step (DEV → PREPROD → PROD), all changes go through a Pull Request and code review procedure. This ensures that configurations are verified and approved before reaching the next environment. It minimizes the risk of accidental errors and provides collective responsibility for the quality of changes.

Security and Protection Against “Fumbling Hands”

Since any updates can only be made through Git, and environment branches are protected (requiring review, approvals, etc.), the chance of accidentally making a fatal direct change in production is practically zero. Additionally, using SealedSecrets/External Secrets prevents leaks of confidential data, and properly managed roles and access rights in Git and ArgoCD protect the system from unauthorized actions. Configurations are checked at least twice (DEV and PREPROD), and all variables and settings are stored in code, ensuring consistency and minimizing the likelihood of conflicts or errors during change promotion.

Conclusion

A GitOps approach using ArgoCD, a “Control Center,” and the TFC Operator provides a transparent, manageable deployment scheme, allowing developers and DevOps to make changes in a single pipeline (through Git). Full automation, easy rollbacks, and the ability to dynamically manage external resources are significant advantages, especially in large-scale projects.

More than 15 years of experience

Table of Contents

Basic CI/CD Pipeline with GitHub, GitHub Actions, Helm, and Deployment on AWS EKS

Workflow Overview

Transparent and Conflict-Free Process

Basic Quality Control

More complex CI/CD when multiple microservices are involved and the problems that arise at this stage

Different Repositories

Three Environments (DEV, PREPROD, PROD)

Lack of a “Single Source of Truth”

Complexity of Synchronizing Changes

Human Factor and the Increasing Risk of Errors

Introduction of a Central Repository as a Single Source of Truth

How It Works

Advantages

Drawbacks

Overall change management scheme and how changes are promoted through environments (ArgoCD + Jsonnet)

First Infrastructure Layer Using Terraform/Terragrunt

Why This Layer Is Needed

Core Resources

Kubernetes Cluster and Operators

Consistency Across Environments

Terraform + Terragrunt: Brief Description and Advantages

Advantages of Terragrunt

Process of Obtaining Environment Parameters

Storing Parameters and Their Role in the Next Layer

First layer summary

Second Infrastructure Layer Using ArgoCD

Main Tasks of the Second Layer

GitOps Automation

Templating and Data Consolidation

Deploying Applications

Advantages and Capabilities of ArgoCD

Synchronization Process

The Ultimate Value of the Second Layer

Infrastructure Layer “2.5” Using the TFC Operator

The Essence and Purpose of Layer 2.5

How the TFC Operator Works

Advantages of This Approach

Constraints and Nuances

Layer “2.5” Conclusion

Control Center and State Components

What Is the “Control Center”?

Example Structure

Why It’s Needed

How Changes Are Made

Advantages

Control Center conclusion

Integrating CI Pipelines with the State

Example of tags.yaml

Typical Scenario

Advantages

CI Pipelines conclusion

Integrating Secret Management (SealedSecrets and External Secrets) with the State

SealedSecrets: Encrypt and Store in Git

External Secrets: Secrets Outside Git

Conclusion and Tips

Emergency Rollback Mechanism to a Previous Consistent State

Final summary

Comments

More than 15 years of experience