Managing your AWS Sandbox Account Budget with Amazon EventBridge

A serverless way to automatically clean up your AWS Sandbox Account and reduce the likelihood of bill-shock from forgotten experiments

Posted by Steven Tan on 30th March 2023

A person seeing his AWS bill and spitting out coffee

I'm sure everyone working with AWS has a similar story to tell. You spin up a few resources, go away for lunch or for the weekend and then completely forget about it. The only reminder of your forgotten experiment is the bill you receive after the first few days of the following month. What was supposed to be a small couple-dollar exercise has now turned into a double-digit or triple-digit bill. This story is commonly told in sandbox environments, where users are encouraged to experiment with AWS services.

Because a sandbox account is a non-production environment designed for experimentation, testing, and development. Efficiently managing resources and cost is critical to ensuring that the sandbox budget isn't going to waste. One generally recommended solution is to monitor your usage via AWS Cost Anomaly Detection or a billing alarm using Amazon CloudWatch. I try to avoid relying solely on these as I consider it to be my last line of defence against forgotten resources.

Once my bill has reached the stage where my alarm is triggered, I consider that too late. So what can I do to introduce another layer of defence to protect against unexpected AWS bills?

Automate Your Resource Deletion

By using the following approach, I can automate the management of my experiments to ensure that they are short-lived and reproducible:

  1. Maintain AWS Resources via infrastructure as code (AWS CDK is my preferred option)
  2. Configure an Amazon EventBridge rule to monitor resource creation and deletion
  3. Configure an Amazon EventBridge Scheduler to automatically clean up these resources after 2 days

With AWS CDK and CloudFormation, I can quickly create, update and delete AWS resources using a single AWS service. If I need to come back to the experiment later, I can get it up and running again within a matter of minutes.

Building the Automation

AWS Architecture Diagram

As seen in the architecture diagram above, we will create the following resources:

The Amazon EventBridge rule invokes an AWS Lambda function when any CloudFormation changes its state. This AWS Lambda function configures an EventBridge scheduler to delete the newly created AWS CloudFormation stack at a specified time. When that time arrives, the scheduler invokes a second lambda function that deletes the AWS CloudFormation stack, automatically cleaning up after the engineer.

Firstly, create an AWS IAM role allowing the EventBridge scheduler to invoke a Lambda Function (which I will later name cfn-stack-deletion-scheduler). It should look similar to this. When modifying the roles, please remember to change the region ap-southeast-2 and account id 0123456789 to match your environment:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Action": [
                "lambda:InvokeFunction"
            ],
            "Effect": "Allow",
            "Resource": "arn:aws:lambda:ap-southeast-2:0123456789:function:cfn-stack-deletion-scheduler"
        }
    ]
}

We will need to reference the AWS IAM role ARN so you will want to keep a copy of it in a text document temporarily.

We will then create the AWS Lambda function responsible for deleting the AWS CloudFormation stack. I will call it cfn-stack-deleter

Creating a Lambda Function

The AWS Lambda function will also come with a default execution role. Modify this role to add permissions to delete AWS CloudFormation stacks.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "logs:CreateLogGroup"
            ],
            "Resource": [
                "arn:aws:logs:us-east-1:0123456789:*",
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "logs:CreateLogStream",
                "logs:PutLogEvents"
            ],
            "Resource": "arn:aws:logs:us-east-1:0123456789:log-group:/aws/lambda/cfn-stack-deleter:*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "cloudformation:DeleteStack"
            ],
            "Resource": "*"
        },
    ]
}

Replace the hello-world code with this:

import boto3
import json

cfn_client = boto3.client("cloudformation")

def lambda_handler(event, context):
    stack_arn = event["stack_arn"]
    cfn_client.delete_stack(
        StackName=stack_arn
    )

This lambda will receive the stack_arn from Amazon EventBridge and delete the stack. You may run into permission issues depending on whether a role was used to create a CloudFormation stack (CDK uses one by default). You can modify the delete_stack call to include a role via the RoleARN parameter. This can be as simple as creating a role with a trust-relationship for CloudFormation with the "AdministratorAccess" AWS managed permission attached.

Next, we will create another lambda function called cfn-stack-deletion-scheduler with the following IAM permissions and code. This will listen for any events triggered when an AWS CloudFormation stack is created.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "iam:PassRole",
                "scheduler:CreateSchedule",
                "logs:CreateLogGroup"
            ],
            "Resource": [
                "arn:aws:logs:us-east-1:0123456789:*",
                "arn:aws:iam::0123456789:role/cfn-stack-deletion-scheduler-role",
                "arn:aws:scheduler:*:0123456789:schedule/*/cfn-deletion-*"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "logs:CreateLogStream",
                "logs:PutLogEvents"
            ],
            "Resource": "arn:aws:logs:us-east-1:0123456789:log-group:/aws/lambda/cfn-stack-deletion-scheduler:*"
        }
    ]
}
from datetime import datetime, timezone, timedelta
import boto3

scheduler_client = boto3.client("scheduler")

def lambda_handler(event, context):

    stack_id = event['detail']['stack-id']

    flex_window = { "Mode": "OFF" }
    lambda_target = {
        "RoleArn": "<IAM_ROLE_ARN>",
        "Arn": "<DELETER_LAMBDA_ARN>",
        "Input": f"{ 'stack_arn': '{stack_id}' }"
    }

    dt = datetime.now(timezone.utc) + timedelta(hours=48)

    scheduler_client.create_schedule(
        Name=f"cfn-{stack_id.split('/')[-1]}",
        ScheduleExpression=f"at({dt.strftime('%Y-%m-%dT%H:%M:%S')})",
        Target=lambda_target,
        ScheduleExpressionTimezone="Etc/UTC",
        FlexibleTimeWindow=flex_window
    )

Replace the values for <IAM_ROLE_ARN> and <DELETER_LAMBDA_ARN> with the:

  1. IAM role you initially created for the Amazon EventBridge Scheduler
  2. ARN of the AWS Lambda Function that deletes the CloudFormation templates.

Finally, create an Amazon EventBridge rule named cfn-stack-deleter-rule with an event pattern matching the AWS CloudFormation stack status update and CREATE_SUCCESS events:

{
    "source": [
        "aws.cloudformation"
    ],
    "detail-type": [
        "CloudFormation Stack Status Change"
    ],
    "detail": {
        "status-details": {
            "status": [
                "CREATE_COMPLETE"
            ]
        }
    }
}

Ensure that this targets the Lambda Function we created earlier called cfn-stack-deletion-scheduler and you should be all done.

To test the solution, you can start by modifying the code in the cfn-stack-deletion-scheduler AWS Lambda Function to create the Amazon EventBridge Scheduler (from timedelta(hours=48) to timedelta(minutes=5)). Then create a CloudFormation template and watch it get deleted.

AWSTemplateFormatVersion: '2010-09-09'
Description: Demo stack, creates one SSM parameter and gets deleted after 5 minutes.
Resources:
  DemoParameter:
    Type: "AWS::SSM::Parameter"
    Properties:
      Type: "String"
      Value: "date"
      Description: "A temporary SSM parameter."
      AllowedPattern: "^[a-zA-Z]{1,10}$"

Conclusion

In conclusion, using the Amazon EventBridge and AWS Lambda Functions to automatically delete CloudFormation stacks has numerous benefits. This solution provides an easy and efficient way to manage resources and avoid unnecessary costs incurred from idle stacks. With this automation, resources can be automatically removed after a specific period which frees up resources and lowers the likelihood of a wasted budget.