Steven Tan - Managing Your Multi-Tenant AWS Environments with Event-Driven Serverless Architecture

Introduction

Whether you are running a Software as a Service (SaaS) platform or simply trying to manage a mixture of AWS pre-defined workloads for your organisation, there is always that lingering question of how to manage these at scale. You might need to apply per-environment parameters when onboarding them onto your platform services. Or you might find that not all environments have the same subscription to your services.

Managing these various environments and requirements via a single plane of glass is a challenge that needs to be solved when designing a multi-tenant solution.

In this blog post, I will outline a minimum viable design for a multi-tenant solution with an event-driven architecture, taking full advantage of AWS and its scalable serverless offerings. Using event-driven architecture can help scale operations to multiple hundreds or thousands of environments while keeping costs and management overhead as low as possible.

Setting the Scene

Let's pretend to be a fictional gaming company providing on-demand servers for players to enjoy a multiplayer sandbox world with their friends. Each player environment is configured with the following parameters:

Expected player count
A world map random seed to generate unique but reproducible environments
The difficulty level that they wish to experience

My Architecture

architecture Diagram

Amazon DynamoDB Table Schema

DynamoDB Table Schema

How does it work?

In this architecture, you will notice that all actions performed in the SaaS environment are not triggered directly via the user but instead the DynamoDB stream. With this design, the AWS Lambda functions the user interacts with only serve to validate and perform Create, Read, Update or Delete (CRUD) operations in Amazon DynamoDB, reducing the complexity of the AWS Lambda function logic. This also serves to reduce any potential logic duplication when accepting changes from various sources such as end-user self service, automation or even administrative users performing changes.

DynamoDB Stream

All updates made to the Amazon DynamoDB data will be streamed via DynamoDB Streams to the next set of AWS Lambda functions which will be the brains of the operation. By integrating the DynamoDB Stream as a AWS Lambda trigger, we can ensure that:

Changes to the Amazon DynamoDB data are captured by the Lambda function and processed as they occur and executed
Errors can be re-tried using the inbuilt retry mechanism when integrating the DynamoDB Stream Lambda Trigger
Complicated logic based on the stream data can be further split out by making use of Lambda Event Filtering

To extract the necessary information to handle our creation and deletion of game servers, a Lambda event filter should be applied for when INSERT or DELETE event occurs on the Amazon DynamoDB table:

{
    "eventName": ["INSERT", "REMOVE"]
}

Lambda Function

These AWS Lambda functions will handle all of the management of game servers running in the SaaS AWS account. These lambda functions will ultimately have the 2 following roles:

Creation of a game server:

Performing a cross-account role assumption
Deploying a pre-defined AWS CloudFormation Template to the AWS account, passing the configuration options as stack parameters

Deletion of a game server:

Performing a cross-account role assumption
Deleting the AWS CloudFormation stack from the AWS account

A sample code snippet to perform this can be as simple as:

from typing import Any

import boto3
from boto3.dynamodb.types import TypeDeserializer

# Replace these values with your own
REMOTE_ACCOUNT_ID = "123456789012"
CFN_TEMPLATE_URL = "https://s3.amazonaws.com/gameserver-templates/v1.json"

def deserialise_dynamodb(item: dict[str, Any]) -> dict[str, Any]:
    """
    Deserialises raw DynamoDB data into a Python dictionary.
    """
    return TypeDeserializer().deserialize({"M": item})

def lambda_handler(event: dict[str, Any], context: dict[str, Any]) -> None:
    """
    Takes the CRUD operations from the DynamoDB Stream and applies changes to the
    remote AWS account.
    """

    # Perform an sts AssumeRole to get temporary credentials for the remote account
    sts = boto3.client("sts")
    assumed_role_object = sts.assume_role(
        RoleArn=f"arn:aws:iam::{REMOTE_ACCOUNT_ID}:role/deployment-role",
        RoleSessionName="gameserver-controller"
    )
    remote_credentials = {
        "AccessKeyId": assumed_role_object["Credentials"]["AccessKeyId"],
        "SecretAccessKey": assumed_role_object["Credentials"]["SecretAccessKey"],
        "SessionToken": assumed_role_object["Credentials"]["SessionToken"],
    }

    cloudformation = boto3.client(
        "cloudformation",
        aws_access_key_id=remote_credentials["AccessKeyId"],
        aws_secret_access_key=remote_credentials["SecretAccessKey"],
        aws_session_token=remote_credentials["SessionToken"],
    )

    # Loop through the DynamoDB Stream records
    for record in event.get("Records", []):
        object_key = "NewImage"
        # Valid events are: INSERT, MODIFY, REMOVE
        if record["eventName"] == "REMOVE":
            object_key = "OldImage"

        row = deserialise_dynamodb(record["dynamodb"][object_key])

        if record["eventName"] == "INSERT":
            stack_params =  [
                {"ParameterKey": k, "ParameterValue": v} for k, v in row["configuration"].items()  # type: ignore
            ]
            cloudformation.create_stack(
                StackName=f"gameserver-{row['server_id']}",
                TemplateURL=CFN_TEMPLATE_URL,
                Parameters=stack_params,
            )
        elif record["eventName"] == "DELETE":
            cloudformation.delete_stack(
                StackName=f"gameserver-{row['server_id']}",
            )

Testing

To test the solution without deploying anything, you can replace the CloudFormation boto3 calls with some print statements in the code sample above and insert some dummy data into your DynamoDB table:

{
    "username": {
        "S": "[email protected]"
    },
    "gameserver_id": {
        "S": "gs1234"
    },
    "config": {
        "M": {
            "difficulty": {
                "S": "peaceful"
            },
            "player_count": {
                "N": "5"
            },
            "seed": {
                "S": "bailao9SheeSi1i"
            }
        }
    },
    "created_at": {
        "S": "2023‐05‐13T14:06:33Z"
    }
}

Conclusion

In this blog post, I have outlined what a multi-tenant environment operated by a gaming company might look like. By leveraging AWS serverless offerings such as AWS Lambda and Amazon DynamoDB and then integrating that into an event-driven solution, we have lowered the cost and complexity of seamlessly managing our resources across multiple tenants.

Though the blog post mainly focuses on the requirements of a gaming company, it is flexible enough to cater for opinionated SaaS application deployments to entire AWS environments with unique configurations. I hope this has provided some insights into how managing multiple environments at scale can be achieved.

Managing Your Multi-Tenant AWS Environments with Event-Driven Serverless Architecture

In this blog post, I will walk through a design on how to manage multi-tenant AWS environments in a serverless and event-driven manner