compute!

"Big Cloud"

I recently came across a take-home exercise for a Product Security role at a large software company. During the exercise, I quickly discovered interesting challenges that forced me to think "from source".

Before diving into the details, I'll cover the core requirements of the exercise. The exercise focused on securing a new public-facing web application using AWS. The key goals were to:

Infra

During infrastructure deployment, I utilized Terraform to define and manage all AWS resources (except the AWS WAF, more on this later) as code. For the compute environment, I opted to deploy the application on an ECS cluster utilizing EC2 instances, giving me full scaling configurations while keeping the deployment straightforward. I exposed the application publicly through an AWS Application Load Balancer (ALB), which provided a reliable entry point to route incoming traffic to the EC2 instances.

Screenshot 2025-10-03 at 1

Untitled-2025-09-23-1607

AWS CDK & Race Conditions

WAF WebACL's?

Before going any further, A Web Application Firewall (WAF) Web Access Control List (WebACL) in AWS is a set of rules that protects web applications by filtering and monitoring HTTP(S) requests. It helps block common attack patterns like SQL injection, cross-site scripting (XSS), and other malicious traffic before they reaching applications. In AWS, a WebACL can be associated with resources such as CloudFront distributions or Application Load Balancers (ALB) to enforce these security rules at the edge.More info.

The next step was to develop a reusable AWS WAF WebACL module using the AWS CDK in Go.

Design Choices (side-note)?

Now, you may be thinking why not just continue utilizing Terraform ? Lately, I’ve been leaning more towards Go over Python. I’ve really enjoyed exploring the CLI framework ecosystem in Go, and this felt like a practical opportunity to build with Go.

A crucial part of the implementation was associating the WebACL with the front-door resource—either a CloudFront distribution or an Application Load Balancer (ALB). This required passing the resource ARN during deployment to properly link the WebACL.

contextValue := app.Node().TryGetContext(jsii.String("resource_arn"))

	// Check if the value exists and is of the expected type (string).
	resourceArnStr, ok := contextValue.(string)
	
	if !ok || resourceArnStr == "" {
		log.Fatalf("Context value 'resource_arn' not provided or not a string. Please use `cdk deploy -c resource_arn=\"arn:aws:elasticloadbalancing:...\"`")
	}

One of the trickiest challenges I faced was a race condition during deployment. Because the WebACL resource had to be created and then associated with an existing front-door resource, the deployment sometimes failed or behaved inconsistently if the association happened before the resource was fully available.

app.Synth() ?

In AWS CDK, the app.Synth() function is responsible for generating the CloudFormation templates from your application’s constructs. This step translates your code into actual deployable infrastructure definitions.

In most CDK supported languages (e.g., TypeScript or Python), app.synth() is implicitly called at the end of execution. So explicitly calling it again typically has no effect.

However, in Go, where you're working more directly with lower-level constructs, that implicit behavior isn't always guaranteed—especially in more complex setups or when customizing the synthesis phase.

To resolve the issue, I explicitly forced the synthesis step using:

app.Synth(&awscdk.StageSynthesisOptions{
	Force: jsii.Bool(true),
})

As a result, I was able to include and deploy AWS-managed rule groups and few custom rules as part of the WAF module without race conditions or partial deployments.More info.

WAF Testing

I forgot to mention this earlier; the deployed application is vulnerable by default and widely used for educational purposes. This allows me to test known vulnerable endpoints with malicious payloads. Below is an example an SQL Injection attempt.

test

As the terminal output displays, the request was responded 200. Passing the parameter

qwert')) UNION SELECT id, email, password, '4', '5', '6', '7', '8', '9' FROM Users--

to a specific endpoint allows us to verify the vulnerabilities that exist before associating the WAF with the Application Load Balancer. Once the association has been completed, request respond 403.

Rapid Updates

The last major component of the exercise involved creating a script for rapid updates to AWS WAF configurations in response to emerging threats. This allows infra/security teams to quickly block suspicious IPs, user-agent patterns, or malicious payloads without waiting for full infrastructure re-deployments.

I chose to build an CLI application using the Cobra framework in Go, which made it easy to create a structured command-line interface with support for subcommands and flags. The tool supports multiple types of rules, such as:

--ip: Block or allow specific IPs or CIDR blocks

--regex: Match suspicious patterns in headers, query strings, or bodies

--file: Load bulk entries (e.g., IPs or regex rules) from a file

check this out.

While testing the functionality of the application, I constantly ran into database update latency issues. I realized on-premises or in the "Cloud", systems are systems.

Distributed Systems & The Illusion of Instant Consistency

In modern cloud environments, we often assume things should ā€œjust workā€, infrastructure should be created instantly, API calls should be atomic, and deployments should complete without surprises. But when you're dealing with distributed systems, these assumptions can be dangerous.

This exercise exposed that tension clearly. I ran into race conditions, non-deterministic behaviors — all symptoms of a deeper truth: the "Cloud" may feel fast, but it is not always consistent. If you design applications without accounting for consistency models, state propagation, or failure modes, you end up with automation that is hard to debug, and prone to silent errors.

For example, associating a WAF WebACL with a Load Balancer before the resource is fully available seems harmless — until your deployment fails or silently skips a step. These issues aren’t bugs in code; they’re bugs in assumptions.

Theoretical concepts (e.g., distributed systems, eventual consistency, and state transitions) aren't just academic. They’re foundational to building robust cloud-native systems.

P.S. all ARN's shown during this blog are obsolete, well..they should be.