Cloud Infrastructure

Multi-Cloud Infrastructure as Code with Terraform: Lessons Learned

April 20, 2025
14 min read
By Infrastructure Team

Best practices for managing infrastructure across AWS, GCP, and Azure using Terraform, including state management, modules, and CI/CD integration.

Introduction: Why Multi-Cloud with Terraform?

Managing infrastructure across AWS, Azure, and GCP manually is a recipe for disaster. Terraform enables consistent, repeatable infrastructure deployment across all clouds.

  • Single tool, multiple clouds
  • Infrastructure as Code (version controlled, reviewable)
  • State management and drift detection
  • Modular and reusable components
  • Plan before apply (no surprises)
  1. 1.Disaster Recovery: Primary in AWS, failover in Azure
  2. 2.Vendor Diversification: Avoid single-vendor lock-in
  3. 3.Cost Optimization: Use cheapest region/service for each workload
  4. 4.Regulatory Compliance: Data residency requirements
  5. 5.Best-of-Breed: Use best service from each cloud
  • Level 1: Manual state, no modules (1-2 engineers)
  • Level 2: Remote state, basic modules (5-10 engineers)
  • Level 3: Workspaces, CI/CD, governance (10-50 engineers)
  • Level 4: Platform team, custom modules, policy as code (50+ engineers)

Production infrastructure managed: 2,000+ resources across 3 clouds, 15+ regions.

State Management

Terraform state is critical-it tracks your infrastructure and enables collaboration.

Remote State Best Practices:

  • S3 for state storage (versioned, encrypted)
  • DynamoDB for state locking (prevents concurrent modifications)
  • Enable server-side encryption (SSE-S3 or KMS)
  • Encrypt sensitive values (passwords, keys) with SOPS
  • Separate state files per environment (dev/staging/prod)
  • Use workspaces or separate backends
  • Never share state across unrelated infrastructure
  • Enable S3 versioning (rollback capability)
  • Periodic state backups to separate location
  • Test state recovery process
  • Restrict state access (IAM policies)
  • Separate read/write permissions
  • Audit state access (CloudTrail)

Common State Issues:

Problem: State drift (Terraform state != actual infrastructure)
Solution: Run terraform refresh or terraform plan regularly

Problem: State corruption
Solution: Use state locking, enable versioning, keep backups

Problem: Secrets in state
Solution: Use AWS Secrets Manager/Vault, not Terraform variables

Terraform
# Backend configuration for multi-cloud state management

# backend.tf
terraform {
  backend "s3" {
    bucket         = "company-terraform-state"
    key            = "production/multi-cloud/terraform.tfstate"
    region         = "us-east-1"
    encrypt        = true
    dynamodb_table = "terraform-state-lock"
    
    # Enable versioning for rollback
    versioning     = true
    
    # Server-side encryption with KMS
    kms_key_id     = "arn:aws:kms:us-east-1:123456789:key/..."
  }
}

# State locking with DynamoDB
resource "aws_dynamodb_table" "terraform_lock" {
  name           = "terraform-state-lock"
  billing_mode   = "PAY_PER_REQUEST"
  hash_key       = "LockID"

  attribute {
    name = "LockID"
    type = "S"
  }

  tags = {
    Name        = "Terraform State Lock"
    Environment = "production"
  }
}

# State bucket with versioning and encryption
resource "aws_s3_bucket" "terraform_state" {
  bucket = "company-terraform-state"

  lifecycle {
    prevent_destroy = true  # Protect state bucket
  }

  tags = {
    Name        = "Terraform State"
    Environment = "production"
  }
}

resource "aws_s3_bucket_versioning" "terraform_state" {
  bucket = aws_s3_bucket.terraform_state.id

  versioning_configuration {
    status = "Enabled"
  }
}

resource "aws_s3_bucket_server_side_encryption_configuration" "terraform_state" {
  bucket = aws_s3_bucket.terraform_state.id

  rule {
    apply_server_side_encryption_by_default {
      sse_algorithm     = "aws:kms"
      kms_master_key_id = aws_kms_key.terraform_state.arn
    }
  }
}

# Block public access
resource "aws_s3_bucket_public_access_block" "terraform_state" {
  bucket = aws_s3_bucket.terraform_state.id

  block_public_acls       = true
  block_public_policy     = true
  ignore_public_acls      = true
  restrict_public_buckets = true
}

# IAM policy for state access
resource "aws_iam_policy" "terraform_state_access" {
  name        = "TerraformStateAccess"
  description = "Policy for Terraform state operations"

  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Effect = "Allow"
        Action = [
          "s3:ListBucket",
          "s3:GetObject",
          "s3:PutObject"
        ]
        Resource = [
          aws_s3_bucket.terraform_state.arn,
          "${aws_s3_bucket.terraform_state.arn}/*"
        ]
      },
      {
        Effect = "Allow"
        Action = [
          "dynamodb:GetItem",
          "dynamodb:PutItem",
          "dynamodb:DeleteItem"
        ]
        Resource = aws_dynamodb_table.terraform_lock.arn
      }
    ]
  })
}

Reusable Modules for Multi-Cloud

Modules are the key to maintainable multi-cloud infrastructure. Build once, deploy everywhere.

Module Structure Best Practices:

  • Define common interface across clouds
  • Hide cloud-specific details
  • Use consistent naming conventions
  • Validate inputs with variable validation blocks
  • Provide sensible defaults
  • Document all variables
  • Return consistent outputs (IDs, endpoints, etc.)
  • Include all necessary information for dependent modules
  • Use descriptive output names
  • Pin module versions in production
  • Use semantic versioning
  • Test upgrades in lower environments

Module Organization:

modules/
├── compute/
│ ├── aws/ (AWS EC2-specific implementation)
│ ├── azure/ (Azure VM-specific implementation)
│ ├── gcp/ (GCP Compute-specific implementation)
│ └── interface.tf (Common interface)
├── database/
│ ├── aws/ (RDS)
│ ├── azure/ (Azure SQL)
│ └── gcp/ (Cloud SQL)
└── networking/
├── aws/ (VPC)
├── azure/ (VNet)
└── gcp/ (VPC)

Example: Multi-Cloud Compute Module

Terraform
# modules/compute/interface.tf
# Common interface for compute resources across clouds

variable "cloud_provider" {
  description = "Cloud provider (aws, azure, gcp)"
  type        = string

  validation {
    condition     = contains(["aws", "azure", "gcp"], var.cloud_provider)
    error_message = "Provider must be aws, azure, or gcp"
  }
}

variable "instance_size" {
  description = "Instance size (small, medium, large)"
  type        = string
  default     = "medium"

  validation {
    condition     = contains(["small", "medium", "large"], var.instance_size)
    error_message = "Size must be small, medium, or large"
  }
}

variable "environment" {
  description = "Environment (dev, staging, prod)"
  type        = string
}

# modules/compute/aws/main.tf
# AWS-specific implementation

locals {
  instance_types = {
    small  = "t3.medium"
    medium = "t3.large"
    large  = "t3.xlarge"
  }
}

resource "aws_instance" "app" {
  ami           = data.aws_ami.ubuntu.id
  instance_type = local.instance_types[var.instance_size]

  tags = {
    Name        = "${var.environment}-app-server"
    Environment = var.environment
    ManagedBy   = "terraform"
  }

  root_block_device {
    volume_type = "gp3"
    volume_size = 50
    encrypted   = true
  }

  metadata_options {
    http_tokens = "required"  # IMDSv2
  }
}

output "instance_id" {
  value = aws_instance.app.id
}

output "public_ip" {
  value = aws_instance.app.public_ip
}

# modules/compute/azure/main.tf
# Azure-specific implementation

locals {
  vm_sizes = {
    small  = "Standard_B2s"
    medium = "Standard_D2s_v3"
    large  = "Standard_D4s_v3"
  }
}

resource "azurerm_linux_virtual_machine" "app" {
  name                = "${var.environment}-app-vm"
  resource_group_name = var.resource_group_name
  location            = var.location
  size                = local.vm_sizes[var.instance_size]

  admin_username = "adminuser"

  admin_ssh_key {
    username   = "adminuser"
    public_key = var.ssh_public_key
  }

  os_disk {
    caching              = "ReadWrite"
    storage_account_type = "Premium_LRS"
    disk_size_gb         = 50
  }

  source_image_reference {
    publisher = "Canonical"
    offer     = "UbuntuServer"
    sku       = "20.04-LTS"
    version   = "latest"
  }

  tags = {
    Environment = var.environment
    ManagedBy   = "terraform"
  }
}

output "instance_id" {
  value = azurerm_linux_virtual_machine.app.id
}

output "public_ip" {
  value = azurerm_linux_virtual_machine.app.public_ip_address
}

# Root module usage - environment/prod/main.tf
module "compute_aws" {
  source = "../../modules/compute/aws"

  cloud_provider = "aws"
  instance_size  = "large"
  environment    = "production"
}

module "compute_azure" {
  source = "../../modules/compute/azure"

  cloud_provider     = "azure"
  instance_size      = "large"
  environment        = "production"
  resource_group_name = azurerm_resource_group.prod.name
  location           = "eastus"
}

CI/CD Pipeline for Terraform

Automated Terraform workflows with GitHub Actions ensure consistent, safe deployments.

CI/CD Best Practices:

  • terraform fmt check (code formatting)
  • terraform validate (syntax validation)
  • Security scan (Checkov, tfsec)
  • Cost estimation (Infracost)
  • terraform plan (preview changes)
  • Comment plan output on PR
  • Require PR approval (2+ reviewers)
  • Auto-run plan again
  • Manual approval for apply
  • terraform apply on approval
  • Notify on Slack/Teams
  • Checkov: Policy-as-code validation
  • tfsec: Security best practices
  • Prevent merges on critical findings
  • Infracost: Estimate costs before apply
  • Alert on >20% cost increase
  • Require approval for >$1K/month changes

Production Pipeline Example:

YAML
# .github/workflows/terraform.yml
name: 'Terraform CI/CD'

on:
  pull_request:
    paths:
      - 'terraform/**'
      - '.github/workflows/terraform.yml'
  push:
    branches:
      - main
    paths:
      - 'terraform/**'

env:
  TF_VERSION: '1.6.0'
  AWS_REGION: 'us-east-1'

jobs:
  terraform-validate:
    name: 'Validate and Plan'
    runs-on: ubuntu-latest
    defaults:
      run:
        working-directory: terraform/production

    steps:
      - name: Checkout
        uses: actions/checkout@v3

      - name: Setup Terraform
        uses: hashicorp/setup-terraform@v2
        with:
          terraform_version: ${{ env.TF_VERSION }}

      - name: Configure AWS Credentials
        uses: aws-actions/configure-aws-credentials@v2
        with:
          role-to-assume: ${{ secrets.AWS_TERRAFORM_ROLE_ARN }}
          aws-region: ${{ env.AWS_REGION }}

      - name: Terraform Format Check
        id: fmt
        run: terraform fmt -check -recursive
        continue-on-error: true

      - name: Terraform Init
        id: init
        run: terraform init

      - name: Terraform Validate
        id: validate
        run: terraform validate -no-color

      - name: Run Checkov Security Scan
        id: checkov
        uses: bridgecrewio/checkov-action@master
        with:
          directory: terraform/production
          framework: terraform
          output_format: cli
          soft_fail: false  # Fail on security issues

      - name: Run tfsec
        uses: aquasecurity/tfsec-action@v1.0.0
        with:
          working_directory: terraform/production

      - name: Terraform Plan
        id: plan
        run: terraform plan -no-color -out=tfplan
        continue-on-error: true

      - name: Setup Infracost
        uses: infracost/actions/setup@v2
        with:
          api-key: ${{ secrets.INFRACOST_API_KEY }}

      - name: Generate Cost Estimate
        id: cost
        run: |
          infracost breakdown --path tfplan --format json --out-file /tmp/cost.json
          infracost output --path /tmp/cost.json --format github-comment --out-file /tmp/cost_comment.md

      - name: Comment PR with Plan
        uses: actions/github-script@v6
        if: github.event_name == 'pull_request'
        with:
          script: |
            const fs = require('fs');
            const plan = fs.readFileSync('terraform/production/tfplan.txt', 'utf8');
            const cost = fs.readFileSync('/tmp/cost_comment.md', 'utf8');
            
            const output = `#### Terraform Format and Style 🖌\`${{ steps.fmt.outcome }}\`
            #### Terraform Initialization ⚙️\`${{ steps.init.outcome }}\`
            #### Terraform Validation 🤖\`${{ steps.validate.outcome }}\`
            #### Terraform Plan 📖\`${{ steps.plan.outcome }}\`

            <details><summary>Show Plan</summary>

            \`\`\`terraform
            ${plan}
            \`\`\`

            </details>

            ${cost}

            *Pusher: @${{ github.actor }}, Action: \`${{ github.event_name }}\`, Workflow: \`${{ github.workflow }}\`*`;

            github.rest.issues.createComment({
              issue_number: context.issue.number,
              owner: context.repo.owner,
              repo: context.repo.repo,
              body: output
            });

  terraform-apply:
    name: 'Apply Changes'
    runs-on: ubuntu-latest
    needs: terraform-validate
    if: github.ref == 'refs/heads/main' && github.event_name == 'push'
    environment: production  # Requires manual approval in GitHub
    defaults:
      run:
        working-directory: terraform/production

    steps:
      - name: Checkout
        uses: actions/checkout@v3

      - name: Setup Terraform
        uses: hashicorp/setup-terraform@v2
        with:
          terraform_version: ${{ env.TF_VERSION }}

      - name: Configure AWS Credentials
        uses: aws-actions/configure-aws-credentials@v2
        with:
          role-to-assume: ${{ secrets.AWS_TERRAFORM_ROLE_ARN }}
          aws-region: ${{ env.AWS_REGION }}

      - name: Terraform Init
        run: terraform init

      - name: Terraform Apply
        id: apply
        run: |
          terraform apply -auto-approve -no-color | tee apply.log
          echo "APPLY_OUTPUT<<EOF" >> $GITHUB_ENV
          cat apply.log >> $GITHUB_ENV
          echo "EOF" >> $GITHUB_ENV

      - name: Notify Slack on Success
        if: success()
        uses: slackapi/slack-github-action@v1
        with:
          payload: |
            {
              "text": "- Terraform apply succeeded in production",
              "blocks": [
                {
                  "type": "section",
                  "text": {
                    "type": "mrkdwn",
                    "text": "*Terraform Apply Successful*

Environment: Production
Commit: ${{ github.sha }}
Actor: @${{ github.actor }}"
                  }
                }
              ]
            }
        env:
          SLACK_WEBHOOK_URL: ${{ secrets.SLACK_WEBHOOK_URL }}

      - name: Notify Slack on Failure
        if: failure()
        uses: slackapi/slack-github-action@v1
        with:
          payload: |
            {
              "text": "❌ Terraform apply failed in production",
              "blocks": [
                {
                  "type": "section",
                  "text": {
                    "type": "mrkdwn",
                    "text": "*Terraform Apply Failed*

Environment: Production
Commit: ${{ github.sha }}
Actor: @${{ github.actor }}

Check workflow: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}"
                  }
                }
              ]
            }
        env:
          SLACK_WEBHOOK_URL: ${{ secrets.SLACK_WEBHOOK_URL }}

Multi-Cloud Networking and Security

Consistent networking and security policies across AWS, Azure, and GCP.

Network Architecture Patterns:

  • Central hub VPC/VNet for shared services
  • Spoke VPCs/VNets for applications
  • Transit Gateway (AWS) / Virtual WAN (Azure) / Network Connectivity Center (GCP)
  • Public zone: Internet-facing resources
  • Private zone: Application tier
  • Data zone: Databases, sensitive data
  • Management zone: Admin access, monitoring
  • VPN tunnels for site-to-site
  • Direct Connect / ExpressRoute / Interconnect for dedicated connectivity
  • Cloud Router for BGP peering

Security Best Practices:

  • Separate subnets per tier (web, app, data)
  • Security groups / NSGs / Firewall rules
  • Zero-trust architecture
  • TLS for data in transit
  • KMS / Key Vault / Cloud KMS for data at rest
  • Rotate keys automatically
  • IAM roles (least privilege)
  • Service accounts (no long-lived credentials)
  • MFA enforcement for human access
  • Flow logs for all networks
  • CloudTrail / Activity Log / Cloud Audit Logs
  • SIEM integration (Splunk, Datadog, ELK)
  • 15 regions across AWS, Azure, GCP
  • 50+ VPCs/VNets globally
  • 99.99% uptime SLA
  • Sub-50ms latency between clouds
TerraformIaCMulti-CloudAWSGCPAzure

Need Expert Help?

Our team has extensive experience implementing solutions like this. Let's discuss your project.