For most organizations using a multi account strategy, AWS Transit Gateway is an integral component of the network routing infrastructure. AWS Transit Gateway connects AWS Virtual Private Clouds (VPCs), Virtual Private Networks (VPNs) and on-premise networks connections in the same region to a central hub. The central hub provides; simplified and centralised network management, increased scalability and improved security.
I recently worked for an organization with several vendors providing AWS services. My organization was made responsible for managing ‘the network’. However, there were so many vendors with administration rights making Transit Gateway changes it was causing network downtime.
My organization implemented a pipeline that managed the customer’s Transit Gateway changes. Unfortunately the other vendors continued to make Transit Gateway changes manually via the console and we would only notice these manual changes when our pipeline did a difference enumeration between the current state and future desired state or when pulling changes from a feature branch into the main branch. As manual changes are not reflected in the Terraform state, a Terraform import operation is required to re-align the state and the current infrastructure. This practice of catching up with the other vendors Transit Gateway changes was causing delays in our delivery.
Our team did not have a mandate to be ‘Network routing Policemen’ nor the bandwidth to monitor Cloudtrail for Transit Gateway changes. My company’s motto is ‘to leave things in a better state’. We could have asked the customer to update everybody else’s permissions so others could not make Transit Gateway changes. However, we preferred to have a system where all vendors could submit changes to a pipeline and they would be peer reviewed/approved by my team before implementation.
How did we get the other vendors (and the customer) to change their behavior? We put in place a Transit Gateway change notification system that everybody could subscribe to. We used EventBridge rules to look for specific patterns in Eventbridge messages and forward the message to a SNS topic. Once the notification system was implemented it only took a couple of days for the first SNS email to be sent to subscribers indicating a manual change had been requested. Below is an example of the alert which is in JSON format (with data obscured/changed for data protection reasons).
{
"version": "0",
"id": "14f77b8e-f2ad-5636-707b-ad9c80a3ab75",
"detail-type": "AWS API Call via CloudTrail",
"source": "aws.ec2",
"account": "##removed##",
"time": "2023-03-19T07:32:05Z",
"region": "ap-southeast-2",
"resources": [],
"detail": {
"eventVersion": "1.08",
"userIdentity": {
"type": "AssumedRole",
"principalId": "##removed##:bad.actor@othervendor.com",
"arn": "arn:aws:sts::##removed##:assumed-role/AWSReservedSSO_AdministratorAccess_##removed##/bad.actor@othervendor.com",
"accountId": "##removed##",
"accessKeyId": "##removed##",
"sessionContext": {
"sessionIssuer": {
"type": "Role",
"principalId": "##removed##",
"arn": "arn:aws:iam::##removed##:role/aws-reserved/sso.amazonaws.com/ap-southeast-2/AWSReservedSSO_AdministratorAccess_##removed##",
"accountId": "##removed##",
"userName": "AWSReservedSSO_AdministratorAccess_##removed##"
},
"webIdFederationData": {},
"attributes": {
"creationDate": "2023-03-19T07:31:04Z",
"mfaAuthenticated": "false"
}
}
},
"eventTime": "2023-03-19T07:32:05Z",
"eventSource": "ec2.amazonaws.com",
"eventName": "CreateTransitGatewayRoute",
"awsRegion": "ap-southeast-2",
"sourceIPAddress": "##removed##",
"userAgent": "AWS Internal",
"requestParameters": {
"CreateTransitGatewayRouteRequest": {
"TransitGatewayRouteTableId": "tgw-rtb-03e03888c28034a2c",
"Blackhole": true,
"DestinationCidrBlock": "192.168.0.0/16"
}
},
"responseElements": {
"CreateTransitGatewayRouteResponse": {
"xmlns": "http://ec2.amazonaws.com/doc/2016-11-15/",
"route": {
"destinationCidrBlock": "192.168.0.0/16",
"state": "blackhole",
"type": "static"
},
"requestId": "3ece7287-53a3-4fcc-82b8-600f40b28600"
}
},
"requestID": "3ece7287-53a3-4fcc-82b8-600f40b28600",
"eventID": "7e63ebcd-a506-40b8-a70d-92ca761a4c76",
"readOnly": false,
"eventType": "AwsApiCall",
"managementEvent": true,
"recipientAccountId": "##removed##",
"eventCategory": "Management",
"sessionCredentialFromConsole": "true"
}
}
We can see in the example above that the change was requested by :
- principal ID : “##removed##:bad.actor@othervendor.com” ie not our OIDC pipeline role
- New route created: destination 192.168.0.0/16 for a blackhole
- At: “2023-03-19T07:32:05Z”
We expected Transit Gateway changes to be implemented by our pipeline role and if they were not, it was a clear indication that the expected process was not being followed.
The implementation of the Transit Gateway change notification system immediately changed the behaviour of the other vendors in regards to making networking changes on the Transit Gateway. The other vendors got out of the habit of making manual changes and into the habit of submitting Pipeline Pull Requests that required a review/approval to proceed into production.
The overall infrastructure and workflow is shown below.
Note, a pipeline pull request would also trigger the notification, however the Principle ID would match OIDC role. Later we ran a Transit Gateway Immersion Day so that the customer (and other vendors) could get a better understanding of how the Transit Gateway operated.
Our notification system looks for the following changes on the Transit Gateway:
- Creating a Transit Gateway route
- Deleting a Transit Gateway route
- Enabling Route Table propagation
- Disabling Route Table propagation
- Associating an attachment to a route table
- Disassociating an attachment to a route table
- Creating a Transit Gateway table
- Deleting a Transit Gateway table
- Creating a Transit Gateway attachment
- Deleting a Transit Gateway attachment
The Github repository has code for the Transit Gateway change notifications. The summary of the files are:
cloudwatch.tf – Eventbridge rules
#--------------------------------------------------------------------------
#create/delete TGW routes
#--------------------------------------------------------------------------
resource "aws_cloudwatch_event_rule" "tgw_change_detection_createroute" {
name = "tgw-change-detection-routecreate"
description = "Event rule to detect changes in transit gateway route tables - createroute"
event_pattern = <<PATTERN
{
"source": ["aws.ec2"],
"detail-type": ["AWS API Call via CloudTrail"],
"detail": {
"eventName": ["CreateTransitGatewayRoute"]
}
}
PATTERN
}
resource "aws_cloudwatch_event_target" "sns_tgw_change_detection_createroute" {
rule = aws_cloudwatch_event_rule.tgw_change_detection_createroute.name
arn = aws_sns_topic.tgw_sns_topic.arn
}
resource "aws_cloudwatch_event_rule" "tgw_change_detection_deleteroute" {
name = "tgw-change-detection-routedelete"
description = "Event rule to detect changes in transit gateway route tables - deleteroute"
event_pattern = <<PATTERN
{
"source": ["aws.ec2"],
"detail-type": ["AWS API Call via CloudTrail"],
"detail": {
"eventName": ["DeleteTransitGatewayRoute"]
}
}
PATTERN
}
resource "aws_cloudwatch_event_target" "sns_tgw_change_detection_deleteroute" {
rule = aws_cloudwatch_event_rule.tgw_change_detection_deleteroute.name
arn = aws_sns_topic.tgw_sns_topic.arn
}
resource "aws_cloudwatch_event_rule" "tgw_change_detection_enablepropagation" {
name = "tgw-change-detection-routepropagation-enable"
description = "Event rule to detect changes in transit gateway route tables - enablepropagation"
event_pattern = <<PATTERN
{
"source": ["aws.ec2"],
"detail-type": ["AWS API Call via CloudTrail"],
"detail": {
"eventName": ["EnableTransitGatewayRouteTablePropagation"]
}
}
PATTERN
}
resource "aws_cloudwatch_event_target" "sns_tgw_change_detection_enablepropagation" {
rule = aws_cloudwatch_event_rule.tgw_change_detection_enablepropagation.name
arn = aws_sns_topic.tgw_sns_topic.arn
}
resource "aws_cloudwatch_event_rule" "tgw_change_detection_disablepropagation" {
name = "tgw-change-detection-routepropagation-disable"
description = "Event rule to detect changes in transit gateway route tables - disablepropagation"
event_pattern = <<PATTERN
{
"source": ["aws.ec2"],
"detail-type": ["AWS API Call via CloudTrail"],
"detail": {
"eventName": ["DisableTransitGatewayRouteTablePropagation"]
}
}
PATTERN
}
resource "aws_cloudwatch_event_target" "sns_tgw_change_detection_disablepropagation" {
rule = aws_cloudwatch_event_rule.tgw_change_detection_disablepropagation.name
arn = aws_sns_topic.tgw_sns_topic.arn
}
resource "aws_cloudwatch_event_rule" "tgw_change_detection_disassociatetable" {
name = "tgw-change-detection-routetabledisassociate"
description = "Event rule to detect changes in transit gateway route tables - disassociatetable"
event_pattern = <<PATTERN
{
"source": ["aws.ec2"],
"detail-type": ["AWS API Call via CloudTrail"],
"detail": {
"eventName": ["DisassociateTransitGatewayRouteTable"]
}
}
PATTERN
}
resource "aws_cloudwatch_event_target" "sns_tgw_change_detection_disassociatetable" {
rule = aws_cloudwatch_event_rule.tgw_change_detection_disassociatetable.name
arn = aws_sns_topic.tgw_sns_topic.arn
}
resource "aws_cloudwatch_event_rule" "tgw_change_detection_associatetable" {
name = "tgw-change-detection-routetableassociate"
description = "Event rule to detect changes in transit gateway route tables - associatetable"
event_pattern = <<PATTERN
{
"source": ["aws.ec2"],
"detail-type": ["AWS API Call via CloudTrail"],
"detail": {
"eventName": ["AssociateTransitGatewayRouteTable"]
}
}
PATTERN
}
resource "aws_cloudwatch_event_target" "sns_tgw_change_detection_associatetable" {
rule = aws_cloudwatch_event_rule.tgw_change_detection_associatetable.name
arn = aws_sns_topic.tgw_sns_topic.arn
}
resource "aws_cloudwatch_event_rule" "tgw_change_detection_createtable" {
name = "tgw-change-detection-routetablecreate"
description = "Event rule to detect changes in transit gateway route tables-createtable"
event_pattern = <<PATTERN
{
"source": ["aws.ec2"],
"detail-type": ["AWS API Call via CloudTrail"],
"detail": {
"eventName": ["CreateTransitGatewayRouteTable"]
}
}
PATTERN
}
resource "aws_cloudwatch_event_target" "sns_tgw_change_detection_createtable" {
rule = aws_cloudwatch_event_rule.tgw_change_detection_createtable.name
arn = aws_sns_topic.tgw_sns_topic.arn
}
resource "aws_cloudwatch_event_rule" "tgw_change_detection_deletetable" {
name = "tgw-change-detection-routetabledelete"
description = "Event rule to detect changes in transit gateway route tables - deletetable"
event_pattern = <<PATTERN
{
"source": ["aws.ec2"],
"detail-type": ["AWS API Call via CloudTrail"],
"detail": {
"eventName": ["DeleteTransitGatewayRouteTable"]
}
}
PATTERN
}
resource "aws_cloudwatch_event_target" "sns_tgw_change_detection_deletetable" {
rule = aws_cloudwatch_event_rule.tgw_change_detection_deletetable.name
arn = aws_sns_topic.tgw_sns_topic.arn
}
#--------------------------------------------------------------------------
# Create/Delete VpcAttachment
#--------------------------------------------------------------------------
resource "aws_cloudwatch_event_rule" "tgw_change_detection_createvpcattachment" {
name = "tgw-change-detection-vpcattachmentcreate"
description = "Event rule to detect changes in transit gateway route tables - createvpcattachment"
event_pattern = <<PATTERN
{
"source": ["aws.ec2"],
"detail-type": ["AWS API Call via CloudTrail"],
"detail": {
"eventName": ["CreateTransitGatewayVpcAttachment"]
}
}
PATTERN
}
resource "aws_cloudwatch_event_target" "sns_tgw_change_detection_createvpcattachment" {
rule = aws_cloudwatch_event_rule.tgw_change_detection_createvpcattachment.name
arn = aws_sns_topic.tgw_sns_topic.arn
}
resource "aws_cloudwatch_event_rule" "tgw_change_detection_deletevpcattachment" {
name = "tgw-change-detection-vpcattachmentdelete"
description = "Event rule to detect changes in transit gateway route tables - deletevpcattachment"
event_pattern = <<PATTERN
{
"source": ["aws.ec2"],
"detail-type": ["AWS API Call via CloudTrail"],
"detail": {
"eventName": ["DeleteTransitGatewayVpcAttachment"]
}
}
PATTERN
}
resource "aws_cloudwatch_event_target" "sns_tgw_change_detection_deletevpcattachment" {
rule = aws_cloudwatch_event_rule.tgw_change_detection_deletevpcattachment.name
arn = aws_sns_topic.tgw_sns_topic.arn
}
data.tf – used to get account ID for automation and build IAM policy documents using variables
data "aws_caller_identity" "current" {}
locals.tf – local variables definitions
locals {
region = "ap-southeast-2"
tags_generic = {
appname = var.app_name
environment = var.environment
costcentre = "TBC"
ManagedBy = var.ManagedByLocation
}
}
provider.tf – default Terraform file
provider "aws" {
region = var.region
}
sns.tf – SNS topic for EventBridge rule targets
resource "aws_sns_topic" "tgw_sns_topic" {
#checkov:skip=CKV_AWS_26: "Ensure all data stored in the SNS topic is encrypted"
name = "tgw-alerts"
}
resource "aws_sns_topic_policy" "sns_tgw_policy" {
arn = aws_sns_topic.tgw_sns_topic.arn
policy = data.aws_iam_policy_document.tgw_sns_topic_policy.json
}
data "aws_iam_policy_document" "tgw_sns_topic_policy" {
statement {
effect = "Allow"
actions = ["SNS:Publish"]
principals {
type = "Service"
identifiers = [
"events.amazonaws.com"
]
}
resources = [aws_sns_topic.tgw_sns_topic.arn]
}
}
terraform.tfvars – sets variable
environment = "prod"
app_name = "TGW-monitoring-demo"
tgw.tf – transit gateway
module "tgw" {
source = "terraform-aws-modules/transit-gateway/aws"
version = "2.9.0"
name = "demo-tgw"
description = "TGW shared with other AWS accounts"
amazon_side_asn = 64532
enable_auto_accept_shared_attachments = true
ram_allow_external_principals = true
tags = local.tags_generic
}
variables.tf – variable definition
variable "region" {
description = "AWS Region"
default = "ap-southeast-2"
type = string
}
variable "environment" {
description = "AWS environment name"
type = string
}
variable "app_name" {
description = "Applicaiton Name"
type = string
}
variable "ManagedByLocation" {
description = "IaC location"
default = "https://github.com/arinzl"
}
