Cloud Security During Rapid Scaling

April 27, 2020

Rapid scaling creates security risk. When you’re racing to add capacity, security reviews get skipped. When everyone is working remotely, attack surface expands. When new tools are adopted quickly, configurations get missed.

Here’s how to maintain security posture during rapid growth.

The Risk Landscape

Scaling Pressures

What happens during rapid scaling:

Normal pace:
  Change → Review → Test → Deploy → Monitor

Rapid scaling:
  Change → Deploy → (maybe review later)

Common shortcuts:

Expanded Attack Surface

Rapid growth expands exposure:

Before:
- 50 EC2 instances
- 5 S3 buckets
- 2 databases
- Centralized office network

After:
- 200 EC2 instances
- 20 S3 buckets
- 8 databases
- 500 home networks
- New SaaS tools

More resources = more potential misconfigurations.

Preventive Controls

IAM Boundaries

Limit the blast radius:

// Permissions boundary - applied to all roles
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": "*",
      "Resource": "*",
      "Condition": {
        "StringEquals": {
          "aws:RequestedRegion": ["us-east-1", "us-west-2"]
        }
      }
    },
    {
      "Effect": "Deny",
      "Action": [
        "iam:CreateUser",
        "iam:CreateAccessKey",
        "organizations:*",
        "account:*"
      ],
      "Resource": "*"
    }
  ]
}

Even if someone creates an overly permissive role, the boundary limits what it can do.

Service Control Policies

Organization-level guardrails:

// SCP: Prevent disabling CloudTrail
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Deny",
      "Action": [
        "cloudtrail:StopLogging",
        "cloudtrail:DeleteTrail"
      ],
      "Resource": "*"
    }
  ]
}
// SCP: Require encryption
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Deny",
      "Action": "s3:PutObject",
      "Resource": "*",
      "Condition": {
        "Null": {
          "s3:x-amz-server-side-encryption": "true"
        }
      }
    }
  ]
}

Network Isolation

Default deny, explicit allow:

# Terraform: Private by default
resource "aws_db_instance" "main" {
  publicly_accessible = false  # Always

  vpc_security_group_ids = [
    aws_security_group.database.id
  ]
}

resource "aws_security_group" "database" {
  vpc_id = aws_vpc.main.id

  # Only from application security group
  ingress {
    from_port       = 5432
    to_port         = 5432
    protocol        = "tcp"
    security_groups = [aws_security_group.application.id]
  }

  # No egress needed for database
  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = []
  }
}

Detection Controls

Config Rules

Continuous compliance checking:

# AWS Config rules for common issues
rules:
  - s3-bucket-public-read-prohibited
  - s3-bucket-public-write-prohibited
  - s3-bucket-ssl-requests-only
  - encrypted-volumes
  - rds-storage-encrypted
  - iam-password-policy
  - root-account-mfa-enabled
  - access-keys-rotated
  - cloudtrail-enabled
  - vpc-flow-logs-enabled

CloudTrail Analysis

Monitor for suspicious activity:

# CloudTrail alert patterns
suspicious_patterns = [
    # IAM escalation attempts
    {
        "eventName": ["CreateAccessKey", "AttachUserPolicy", "CreateLoginProfile"],
        "condition": "new user or role creating credentials"
    },
    # S3 exposure
    {
        "eventName": ["PutBucketPolicy", "PutBucketAcl"],
        "condition": "allows public access"
    },
    # Security group changes
    {
        "eventName": ["AuthorizeSecurityGroupIngress"],
        "condition": "0.0.0.0/0 on sensitive ports"
    },
    # Unusual regions
    {
        "awsRegion": "not in [us-east-1, us-west-2]",
        "condition": "any API call"
    }
]

GuardDuty

Enable threat detection:

resource "aws_guardduty_detector" "main" {
  enable = true

  datasources {
    s3_logs {
      enable = true
    }
    kubernetes {
      audit_logs {
        enable = true
      }
    }
  }
}

GuardDuty catches:

Response Capabilities

Automated Remediation

Fix issues before they’re exploited:

# Lambda: Auto-remediate public S3 buckets
import boto3

def lambda_handler(event, context):
    s3 = boto3.client('s3')
    bucket = event['detail']['requestParameters']['bucketName']

    # Check if public
    try:
        acl = s3.get_bucket_acl(Bucket=bucket)
        for grant in acl['Grants']:
            if grant['Grantee'].get('URI') == 'http://acs.amazonaws.com/groups/global/AllUsers':
                # Make private
                s3.put_bucket_acl(Bucket=bucket, ACL='private')
                notify_security_team(bucket, 'Public bucket auto-remediated')
                return
    except Exception as e:
        notify_security_team(bucket, f'Remediation failed: {e}')

Incident Runbooks

Pre-defined response procedures:

## Runbook: Compromised AWS Credentials

### Detection:
- GuardDuty alert for anomalous API calls
- CloudTrail showing activity from unusual location/IP
- AWS abuse notification

### Immediate actions (< 5 minutes):
1. Disable the compromised credentials

aws iam update-access-key –access-key-id AKIA… –status Inactive –user-name compromised-user


2. Revoke active sessions

aws iam put-user-policy –user-name compromised-user –policy-name DenyAll –policy-document ‘{“Version”:“2012-10-17”,“Statement”:[{“Effect”:“Deny”,“Action”:"",“Resource”:""}]}’


### Investigation (< 30 minutes):
1. Query CloudTrail for all actions by credential
2. Identify created/modified resources
3. Check for persistence mechanisms (new users, roles, keys)

### Remediation:
1. Delete any malicious resources
2. Rotate affected credentials
3. Review and harden affected systems

### Post-incident:
1. Document timeline and impact
2. Update detection rules if needed
3. Conduct lessons learned

Secure Defaults

Terraform Modules

Encode security into infrastructure templates:

# Secure RDS module
module "secure_rds" {
  source = "./modules/secure-rds"

  # Required parameters
  name            = "myapp-db"
  engine          = "postgres"
  instance_class  = "db.r5.large"

  # Security baked in
  # - private subnet only
  # - encryption at rest
  # - encryption in transit
  # - automated backups
  # - deletion protection
  # - audit logging
}

# The module enforces:
resource "aws_db_instance" "this" {
  # ... user params ...

  # Security defaults - not configurable
  publicly_accessible    = false
  storage_encrypted      = true
  deletion_protection    = true
  skip_final_snapshot    = false

  # Logging
  enabled_cloudwatch_logs_exports = ["postgresql", "upgrade"]
}

Policy as Code

Prevent insecure resources:

# OPA policy for Terraform
package terraform

deny[msg] {
    resource := input.resource_changes[_]
    resource.type == "aws_s3_bucket"
    not resource.change.after.server_side_encryption_configuration
    msg := sprintf("S3 bucket %s must have encryption enabled", [resource.address])
}

deny[msg] {
    resource := input.resource_changes[_]
    resource.type == "aws_db_instance"
    resource.change.after.publicly_accessible == true
    msg := sprintf("RDS instance %s must not be publicly accessible", [resource.address])
}

deny[msg] {
    resource := input.resource_changes[_]
    resource.type == "aws_security_group_rule"
    resource.change.after.cidr_blocks[_] == "0.0.0.0/0"
    resource.change.after.from_port <= 22
    resource.change.after.to_port >= 22
    msg := "SSH must not be open to the world"
}

Remote Work Security

Endpoint Security

Home devices need protection:

endpoint_requirements:
  - Full disk encryption
  - EDR/antivirus installed and updated
  - Automatic screen lock (5 minutes)
  - Password manager required
  - No local admin (or logged)

access_controls:
  - VPN required for internal resources
  - MFA on all accounts
  - SSO for applications
  - Device certificate for sensitive systems

Zero Trust Principles

Don’t trust the network:

Traditional (perimeter):
  Inside network → trusted
  Outside network → untrusted

Zero trust:
  Every request → verify identity, device, context
  Never trust → always verify

Implementation:

Checklist for Rapid Scaling

Before Scaling

During Scaling

After Scaling

Key Takeaways

Security during rapid growth requires automation and guardrails. You can’t review everything manually, but you can ensure bad patterns are prevented or detected quickly.