Introduction
any information pipeline, information sources — particularly databases — are the spine. To simulate a sensible pipeline, I wanted a safe, dependable database atmosphere to function the supply of fact for downstream ETL jobs.
Moderately than provisioning this manually, I selected to automate all the things utilizing Terraform, aligning with trendy information engineering and DevOps finest practices. This not solely saved time but in addition ensured the atmosphere could possibly be simply recreated, scaled, or destroyed with a single command — similar to in manufacturing. And should you’re engaged on the AWS Free Tier, that is much more vital — automation ensures you possibly can clear up all the things with out forgetting a useful resource that may generate prices.
Stipulations
To observe together with this challenge, you’ll want the next instruments and setup:
- Terraform put in https://developer.hashicorp.com/terraform/install
- AWS CLI & IAM Setup
- Set up the AWS CLI
- Create an IAM consumer with programmatic entry that has permission to:
- Connect the coverage
AdministratorAccess
(or create a customized coverage with restricted permissions to create all of the assets included) - Obtain the Entry Key ID and Secret Entry Key
- Connect the coverage
- Configure the AWS CLI
- An AWS Key Pair
Required to SSH into the bastion host. You may create one within the AWS Console beneath EC2 > Key Pairs. - A Unix-based atmosphere (Linux/macOS, or WSL for Home windows)
This ensures compatibility with the shell script and Terraform instructions.
Getting Began: What We’re Constructing
Let’s stroll by means of the right way to construct a safe and automatic AWS database setup utilizing Terraform.
Infrastructure Overview
This challenge provisions a whole, production-style AWS atmosphere utilizing Terraform. The next assets shall be created:
Networking
- A customized VPC with a CIDR block (
10.0.0.0/16
) - Two non-public subnets in numerous Availability Zones (for the RDS occasion)
- One public subnet (for the bastion host)
- Web Gateway and Route Tables for public subnet routing
- A DB Subnet Group for multi-AZ RDS deployment
Compute
- A bastion EC2 occasion within the public subnet
- Used to SSH into the non-public subnets and entry the database securely
- Provisioned with a customized safety group permitting solely port 22 (SSH) entry
Database
- A MySQL RDS occasion
- Deployed in non-public subnets (not accessible from the general public web)
- Configured with a devoted safety group that enables entry solely from the bastion host
Safety
- Safety teams:
- Bastion SG: permits inbound SSH (port 22) out of your IP
- RDS SG: permits inbound MySQL (port 3306) from the bastion’s SG
Automation
- A setup script (
setup.sh
) that:- Exports Terraform variables
Modular Design With Terraform
I broke the infrastructure into modules like community, bastion and rds. This permits me to reuse, scale and check totally different parts independently.
The next diagram illustrates how Terraform understands and buildings the dependencies between totally different parts of the infrastructure, the place every node represents a useful resource or module.
This visualization helps confirm that:
- Sources are correctly linked (e.g., the RDS occasion will depend on non-public subnets),
- Modules are remoted but interoperable (e.g.,
community
,bastion
, andrds
), - There aren’t any round dependencies.
To take care of the above-mentioned modular configuration, I structured the challenge accordingly and offered explanations for every element to make clear their roles inside the setup.
.
├── information
│ └── mysqlsampledatabase.sql # Pattern dataset to be imported into the RDS database
├── scripts
│ └── setup.sh # Bash script to export atmosphere variables (TF_VAR_*), fetch dynamic values, and add Glue scripts (optionally available)
└── terraform
├── modules # Reusable infrastructure modules
│ ├── bastion
│ │ ├── compute.tf # Defines EC2 occasion configuration for the Bastion host
│ │ ├── community.tf # Makes use of information sources to reference current public subnet and VPC (doesn't create them)
│ │ ├── outputs.tf # Outputs Bastion host public IP deal with
│ │ └── variables.tf # Enter variables required by the Bastion module (AMI ID, key pair identify, and so forth.)
│ ├── community
│ │ ├── community.tf # Provisions VPC, public/non-public subnets, Web gateway, and route tables
│ │ ├── outputs.tf # Exposes VPC ID, subnet IDs, and route desk IDs for downstream modules
│ │ └── variables.tf # Enter variables like CIDR blocks and availability zones
│ └── rds
│ ├── community.tf # Defines DB subnet group utilizing non-public subnet IDs
│ ├── outputs.tf # Outputs RDS endpoint and safety group for different modules to eat
│ ├── rds.tf # Provisions a MySQL RDS occasion inside non-public subnets
│ └── variables.tf # Enter variables resembling DB identify, username, password, and occasion dimension
└── rds-bastion # Root Terraform configuration
├── backend.tf # Configures the Terraform backend (e.g., native or distant state file location)
├── fundamental.tf # Prime-level orchestrator file that connects and wires up all modules
├── outputs.tf # Consolidates and re-exports outputs from the modules (e.g., Bastion IP, DB endpoint)
├── supplier.tf # Defines the AWS supplier and required model
└── variables.tf # Undertaking-wide variables handed to modules and referenced throughout recordsdata
With the modular construction in place, the fundamental.tf
file is situated within the rds-bastion
listing acts because the orchestrator. It ties collectively the core parts: the community, the RDS database, and the bastion host. Every module is invoked with required inputs, most of that are outlined in variables.tf
or handed by way of atmosphere variables (TF_VAR_*
).
module "community" {
supply = "../modules/community"
area = var.area
project_name = var.project_name
availability_zone_1 = var.availability_zone_1
availability_zone_2 = var.availability_zone_2
vpc_cidr = var.vpc_cidr
public_subnet_cidr = var.public_subnet_cidr
private_subnet_cidr_1 = var.private_subnet_cidr_1
private_subnet_cidr_2 = var.private_subnet_cidr_2
}
module "bastion" {
supply = "../modules/bastion"
area = var.area
vpc_id = module.community.vpc_id
public_subnet_1 = module.community.public_subnet_id
availability_zone_1 = var.availability_zone_1
project_name = var.project_name
instance_type = var.instance_type
key_name = var.key_name
ami_id = var.ami_id
}
module "rds" {
supply = "../modules/rds"
area = var.area
project_name = var.project_name
vpc_id = module.community.vpc_id
private_subnet_1 = module.community.private_subnet_id_1
private_subnet_2 = module.community.private_subnet_id_2
availability_zone_1 = var.availability_zone_1
availability_zone_2 = var.availability_zone_2
db_name = var.db_name
db_username = var.db_username
db_password = var.db_password
bastion_sg_id = module.bastion.bastion_sg_id
}
On this modular setup, every infrastructure element is loosely coupled however linked by means of well-defined inputs and outputs.
For instance, after provisioning the VPC and subnets within the community
module, I retrieve their IDs utilizing its outputs, and go them as enter variables to different modules like rds
and bastion
. This avoids hardcoding and allows Terraform to dynamically resolve dependencies and construct the dependency graph internally.
In some instances (resembling inside the bastion
module), I additionally use information
sources to reference current assets created by earlier modules, as an alternative of recreating or duplicating them.
The dependency between modules depends on the proper definition and publicity of outputs from beforehand created modules. These outputs are then handed as enter variables to dependent modules, enabling Terraform to construct an inside dependency graph and orchestrate the proper creation order.
For instance, the community
module exposes the VPC ID and subnet IDs utilizing outputs.tf
. These values are then consumed by downstream modules like rds
and bastion
by means of the fundamental.tf
file of the foundation configuration.
Beneath is how this works in follow:
Inside modules/community/outputs.tf
:
output "vpc_id" {
description = "ID of the VPC"
worth = aws_vpc.fundamental.id
}
Inside modules/bastion/variables.tf
:
variable "vpc_id" {
description = "ID of the VPC"
sort = string
}
Inside modules/bastion/community.tf
:
information "aws_vpc" "fundamental" {
id = var.vpc_id
}
To provision the RDS occasion, I created two non-public subnets in numerous Availability Zones, as AWS requires at the least two subnets in separate AZs to arrange a DB subnet group.
Though I met this requirement for proper configuration, I disabled Multi-AZ deployment throughout RDS creation to keep inside the AWS Free Tier limits and keep away from further prices. This setup nonetheless simulates a production-grade community format whereas remaining cost-effective for growth and testing.
Deployment Workflow
With all of the modules correctly wired by means of inputs and outputs, and the infrastructure logic encapsulated in reusable blocks, the following step is to automate the provisioning course of. As an alternative of manually passing variables every time, a helper script setup.sh
is used to export mandatory atmosphere variables (TF_VAR_*
).
As soon as the setup script is sourced, deploying the infrastructure turns into so simple as working a number of Terraform instructions.
supply scripts/setup.sh
cd terraform/rds-bastion
terraform init
terraform plan
terraform apply
To streamline the Terraform deployment course of, I created a helper script (setup.sh
) that robotically exports required atmosphere variables utilizing the TF_VAR_
naming conference. Terraform robotically picks up variables prefixed with TF_VAR_
, so this method avoids hardcoding values in .tf
recordsdata or requiring guide enter each time.
#!/bin/bash
set -e
export de_project=""
export AWS_DEFAULT_REGION=""
# Outline the variables to handle
declare -A TF_VARS=(
["TF_VAR_project_name"]="$de_project"
["TF_VAR_region"]="$AWS_DEFAULT_REGION"
["TF_VAR_availability_zone_1"]="us-east-1a"
["TF_VAR_availability_zone_2"]="us-east-1b"
["TF_VAR_ami_id"]=""
["TF_VAR_key_name"]=""
["TF_VAR_db_username"]=""
["TF_VAR_db_password"]=""
["TF_VAR_db_name"]=""
)
for var in "${!TF_VARS[@]}"; do
worth="${TF_VARS[$var]}"
if grep -q "^export $var=" "$HOME/.bashrc"; then
sed -i "s|^export $var=.*|export $var=$worth|" "$HOME/.bashrc"
else
echo "export $var=$worth" >> "$HOME/.bashrc"
fi
achieved
# Supply up to date .bashrc to make modifications out there instantly on this shell
supply "$HOME/.bashrc"
After working terraform apply
, Terraform will provision all of the outlined assets—VPC, subnets, route tables, RDS occasion, and Bastion host. As soon as the method completes efficiently, you’ll see output values much like the next:
Apply full! Sources: 12 added, 0 modified, 0 destroyed.
Outputs:
bastion_public_ip = ""
bastion_sg_id = ""
db_endpoint = ":3306"
instance_public_dns = ""
rds_db_name = ""
vpc_id = ""
vpc_name = ""
These outputs are outlined within the outputs.tf
recordsdata of your modules and re-exported within the root module (rds-bastion/outputs.tf
). They’re essential for:
- SSH-ing into the Bastion Host
- Connecting securely to the non-public RDS occasion
- Validating useful resource creation
Connecting to the RDS by way of Bastion Host and Seeding the Database
Now that the infrastructure is provisioned, the following step is to seed the MySQL database hosted on the RDS occasion. Because the database is inside a non-public subnet, we can not entry it straight from our native machine. As an alternative, we’ll use the Bastion EC2 occasion as a leap host to:
- Switch the pattern dataset (
mysqlsampledatabase.sql
) to the Bastion.
- Join from the Bastion to the RDS occasion.
- Import the SQL information to initialize the database.
You might transfer two directories up from the Terraform fundamental listing and ship the SQL content material to the distant EC2 (Bastion) after studying the native SQL file inside information listing.
cd ../..
cat information/mysqlsampledatabase.sql | ssh -i your-key.pem ec2-user@ 'cat > ~/mysqlsampledatabase.sql'
As soon as the dataset is copied to the Bastion EC2 occasion, the following step is to SSH into the distant machine and :
ssh -i ~/.ssh/new-key.pem ec2-user@
After connecting, you should use the MySQL shopper (already put in should you used mariadb105
in your EC2 setup) to import the SQL file into your RDS database:
mysql -h -P 3306 -u -p < mysqlsampledatabase.sql
Enter the password when prompted.
As soon as the import is full, you possibly can connect with the RDS MySQL database once more to confirm that the database and its tables have been efficiently created.
Run the next command from inside the Bastion host:
mysql -h -P 3306 -u -p
After coming into your password, you possibly can checklist the out there databases and tables:


To make sure the dataset was correctly imported into the RDS occasion, I ran a easy question:

This returned a row from the clients
desk, confirming that:
- The database and tables have been created efficiently
- The pattern dataset was seeded into the RDS occasion
- The Bastion host and personal RDS setup are working as meant
This completes the infrastructure setup and information import course of.
Destroying the Infrastructure
When you’re achieved testing or demonstrating your setup, it’s vital to destroy the AWS assets to keep away from pointless prices.
Since all the things was provisioned utilizing Terraform, tearing down the whole infrastructure is simply so simple as working one command after navigating to your root configuration listing:
cd terraform/rds-bastion
terraform destroy
Conclusion
On this challenge, I demonstrated the right way to provision a safe and production-like database infrastructure utilizing Terraform on AWS. Moderately than exposing the database to the general public web, I carried out finest practices by putting the RDS occasion in non-public subnets, accessible solely by way of a bastion host in a public subnet.
By structuring the challenge with modular Terraform configurations, I ensured every element—community, database, and bastion host—was loosely coupled, reusable, and simple to handle. I additionally showcased how Terraform’s inside dependency graph handles the orchestration and sequencing of useful resource creation seamlessly.
Because of infrastructure as code (IaC), the whole atmosphere will be introduced up or torn down with a single command, making it excellent for ETL prototyping, information engineering follow, or proof-of-concept pipelines. Most significantly, this automation helps keep away from surprising prices by letting you destroy all assets cleanly when you’re achieved.
You’ll find the entire supply code, Terraform configuration, and setup scripts in my GitHub repository:
https://github.com/YagmurGULEC/rds-ec2-terraform.git
Be at liberty to discover the code, clone the repo, and adapt it to your personal AWS initiatives. Contributions, suggestions, and stars are at all times welcome!
What’s Subsequent?
You may prolong this setup by:
- Connecting an AWS Glue job to the RDS occasion for ETL processing.
- Including monitoring on your RDS database and EC2 occasion