Daily Devops Tips
Posts
AWS VPC Design: A Practical Approach For Beginners

AWS VPC Design: A Practical Approach For Beginners

Bibin Wilson
November 27, 2024

👋 Hi! I’m Bibin Wilson. In each edition, I share practical tips, guides, and the latest trends in DevOps and MLOps to make your day-to-day DevOps tasks more efficient. If someone forwarded this email to you, you can subscribe here to never miss out!

In this blog we will look at,

Understanding VPC Requirements
Understanding application requirements
Choosing a CIDR for VPC
Avoiding IP Address Conflicts (Best Practice)
Subnet Design
VPC & Subnet Documentation (Best Practice)
AWS VPC Topology
VPC Endpoints

This guide is only focussed on the AWS cloud environment.

I am not taking a hybrid environment into consideration. However, we will touch base on a few concepts related to hybrid cloud environments, but the key focus is on AWS VPC.

Understand VPC Requirements

As a DevOps engineer, you need to understand the VPC requirements by asking questions to the relevant teams.

When working in real projects, following are some of the important questions that will help you understand the VPC requirements better.

Identifying Your Hosting Needs: What do you want to host?
Meeting Compliance Standards: What are its compliance requirements?
Handling Sensitive Information: Does it have applications dealing with PCI/PII data?
Public vs. Private Accessibility: Are the applications internet-facing?
Connecting to On-Premise Systems: Does the VPC require a Hybrid connectivity to an on-premise environment? If yes, is it DNS or IP-based connectivity?
User Accessibility to VPC Services: How are users going to connect to the services hosted in VPC?
VPC to VPC Connectivity: Does it need access to services hosted on other VPCs that are part of organizations network?

It is always best to document these requirements.

Note: Organizations typically keep a questionnaire to understand the VPC requirements from network, security, and compliance perspectives.

Deployment Architecture

Before designing a VPC, it is essential to understand the infrastructure requirements of the application.

This guide will walk you through designing a VPC network using an example application and its specific requirements.

The architecture consists of four categories of applications:

Web Application (Java-based)
Automation Tools (App/Infra CI/CD)
Platform Tools (e.g., Prometheus, Grafana, Consul)
Managed Services (RDS, Cloudwatch etc)

Below is the high level application's deployment architecture.

(Open the image in a new tab for a high-resolution view.)

The DevOps team and Cloud Architects design a high-level architecture based on the application architecture provided by the Application teams. Sometimes, the Application teams may also recommend specific infrastructure services based on their proof of concept (POC) findings.

This high-level architecture serves as a blueprint for the infrastructure and deployment strategy. It is then presented to the Application teams to ensure alignment, gather feedback, and refine the approach as needed.

VPC Network Design

Ideally, in most organizations, the VPC is created and managed by a dedicated network team. However, DevOps engineers working with the application team need to define the VPC requirements to ensure it can host all the required applications.

How to Choose a CIDR for VPC?

The CIDR block for a VPC depends on the number of servers planned for deployment. This includes both self-hosted and AWS-managed services.

We not only consider the immediate requirements but also future expansion. While we may start with 15 servers, the infrastructure should be scalable to accommodate 1,000+ servers in the future.

A /16 CIDR block provides 65,536 IPs, but it's often too large.
A /20 CIDR block (4,096 IPs) may be too small if we scale beyond 1,000 servers.
A /18 CIDR block (16,384 IPs) is a balanced choice, ensuring scalability.

VPC CIDR Block: 10.0.0.0/18 (16,384 IPs)

Total IP Addresses: 16,384 (including reserved AWS IPs)

Usable IPs: 16,379 (AWS reserves 5 IPs per subnet)

Note: In actual project environments, VPC ranges are decided solely based on requirements. Typically, the Application, DevOps, and Network teams discuss and determine the required ranges to prevent both over-allocation and under-allocation of IP addresses.

Subnet Design

Based on our application architecture and components we would need the following public and private subnets.

3 Public Subnets – To deploy internet facing Load balancers for the Java app autoscaling group
3 Application Subnets (Private) – To deploy the Java app autoscaling group
3 Database Subnets (Private) – To deploy the RDS MYSQL instance
3 Management Subnets (Private) – Dedicated to CI/CD tools and automation services.
3 Platform Subnets (Private) – For platform tools such as Prometheus, Grafana, and Consul used for monitoring and service discovery.

Total 15 Subnets (One per availability Zone)

These subnets should be carved our from the 10.0.0.0/18 CIDR.

Starting IP Address: 10.0.0.0
Last IP Address: 10.0.63.255

Each subnet must have enough IPs to support scaling needs while maintaining AWS best practices.

For example,

Subnet Type	AZ1 CIDR Block	AZ2 CIDR Block	AZ3 CIDR Block	Total Usable IPs
Public Subnets (3)	10.0.0.0/24	10.0.1.0/24	10.0.2.0/24	753 (251 each)
Application Subnets (3)	10.0.4.0/23	10.0.6.0/23	10.0.8.0/23	1,509 (503 each)
Database Subnets (3)	10.0.12.0/24	10.0.13.0/24	10.0.14.0/24	753 (251 each)
Management Subnets (3)	10.0.16.0/24	10.0.17.0/24	10.0.18.0/24	753 (251 each)
Platform Subnets (3)	10.0.20.0/24	10.0.21.0/24	10.0.22.0/24	753 (251 each)

Avoiding IP Address Conflicts

Let’s consider a scenario where 10.0.0.0/16 range is already allocated to a project in an on-prem environment.

Even if there is no hybrid cloud connectivity to on-prem, we should not re-use 10.0.0.0/16 for VPC. Because in the future, if hybrid connectivity is set up, it could lead to IP conflicts.

Network teams in organizations ensure there are no IP range conflicts by keeping track of private IP addresses reserved for projects. This way, there won’t be any IP conflicts.

Typically they use IP Address Management (IPAM) tools to track IP address allocation. These tools provide a centralized view of the IP address space used within the organization.

The following image shows an example dashboard of an open-source IPAM tool called Netbox.

Note: If you use AWS Private NAT gateway you can avoid IP conflicts even if two VPCs have the same CIDR ranges.

Private Subnet Access

Since we have private subnets, DevOps engineers & developers need access to the servers on private subnets.

Most organizations set up a VPN connection to the AWS cloud to access the servers deployed in VPC.

Following are the native-options for connecting instances in the AWS VPC private subnets.

EC2 Instance Connect: Helps you to connect to AWS instances in a private subnet securely without needing a Public IP. It is an identity-aware proxy that uses IAM permissions to connect to the instance. One instance can be used as a JUMP server to connect to other instances in the VPC (cheapest solution)
AWS Client VPN (client-to-site VPN): Allows remote workers to access AWS resources securely; Ideal for a distributed team that needs to use AWS services. (Gets expensive with more users)
Site-to-Site VPN: Connects the on-premises network to the AWS Virtual Private Cloud (VPC); This is the ideal solution for organizations that want a secure, private connection between their on-prem network and AWS. Requires an on-premises VPN device. Setup can be expensive.
AWS Direct Connect: Creates a direct, private link between the on-prem and AWS network; It is ideal for businesses that need a fast, reliable connection to AWS without using the public internet. It comes with a higher upfront costs.

Note: The type of access depends on the project requirements, compliance requirements, and budget.

Internet Access For Subnets

Both private and public subnet servers require internet access.

Public Subnet: Adding an Internet Gateway (IGW) makes a subnet public, allowing instances to receive inbound traffic directly from the internet.
Private Subnet: By default, subnets without an internet gateway remain private. However, instances in private subnets still need outbound internet access for tasks like reaching third-party services or package repositories.

To enable outbound internet access for private subnets, a NAT Gateway must be attached. This ensures that private subnet instances can access external resources while remaining inaccessible from the internet.

For cost optimization, a single NAT Gateway can serve all AZs, but for high availability, deploy one per AZ (note: this increases costs)

Egress Traffic Filtering

Most organizations use a forward proxy to manage all outbound internet requests from both private and public subnets. This means that even with a NAT Gateway, outbound traffic passes through a firewall service for filtering and control.

AWS offers a service called AWS Network Firewall, which can be integrated with a NAT gateway for egress traffic filtering. You can restrict or filter HTTP and HTTPS traffic using domain names.

Some organizations deploy Squid Proxies for DNS filtering and traffic control.

Large enterprises often use advanced security solutions like Checkpoint for both ingress (incoming) and egress (outgoing) traffic filtering.

Here is how the Outbound Traffic Flows.

All outgoing requests first go through the proxy server.
The proxy applies security policies and filters traffic.
Approved requests are then forwarded through the NAT Gateway to reach the internet.

VPC & Subnet Documentation

One of the key things in VPC design is documentation. All VPC configurations should be documented to ensure the VPC stays compliant over time.

You can choose a documentation method of your choice. It could be an Excel sheet, confluence documentation, or GitHub Markdown documentation.

Here is an example documentation of a Public Subnet.

Subnet Name	Availability Zone	CIDR Block	Type
Prod-Web-Public-2a	us-west-2a	10.0.1.0/24	Public
Prod-Web-Public-2b	us-west-2b	10.0.2.0/24	Public
Prod-Web-Public-2c	us-west-2c	10.0.3.0/24	Public

Like this you need to document it for all subnets.

Route Table Design

Route tables are important for directing traffic (e.g., public subnets to IGW, private subnets to NAT Gateway).

For each subnet group, we will create a custom route table and assign rules required for the specific subnets.

For example, all three public subnets will share the same public-subnet route table.

Subnet	Destination CIDR	Target
Public	0.0.0.0/0	Internet Gateway
App	0.0.0.0/0	Nat Gateway
DB	0.0.0.0/0	Nat Gateway
Management	0.0.0.0/0	Nat Gateway

AWS VPC Topology

The following diagram shows the high-level VPC topology for our design.

Note: Both the internet Gateway (IGW) and NAT gateway(NAT-GW) gets deployed in the public subnet.

(Open image in new tab for HD)

Network ACLs

Network access control list (NACL) is the native VPC functionality to control the inbound and outbound traffic at the subnet level.

In our architecture, the connection to the DB subnet should be allowed only from the App subnet and management subnet. The public subnet should not have direct access to the DB subnet.

VPC Endpoints

VPC interface and gateway endpoints lets you connect to AWS managed services like s3 , Secrets manager, Cloudwatch etc. privately using AWS Privatelink.

As per our application architecture, we use s3, secrets manger and Cloudwatch services.

Here is an AWS official image for reference.

Automating VPC Management

Now that we have all the requirements for the VPC documented, we can use an IaC tool to provision and manage the VPC resources and configurations.

Note: If you are a beginner, first create the entire stack manually to understand the components better. Then move on to automating the stack.

You can use Terraform/Cloudformation to automate and manage a VPC.

Follow Terraform AWS VPC blog to automate AWS VPC creation.

Reply

or to participate.