Monitoring AWS Windows EC2 instances with Cloudwatch – part 1

This multi-part blog post focuses on monitoring a Microsoft Windows EC2 instances running on AWS. The AWS’s native tool for monitoring and observability is Cloudwatch so we will using it in this blog series.

AWS Cloudwatch allows you to monitor and alert on the following elements:

  • Server performance metrics (eg cpu, memory and disk statistics)
  • Windows service status (eg windows sploor)
  • Windows event logs (eg application log entries)
  • Windows file logs (eg iis logs)

Cloudwatch includes two areas of data collection

  • Metrics
  • Logs

Under metrics, we gather server performance data and some windows process data (or service data in Windows terminology). The Cloudwatch logs capability can sync Windows Event logs (eg system, security & application event logs) entries and generic files logs such as IIS logs into Cloudwatch log group. The first part of this blog provides an overview of Cloudwatch Windows data collection. Part 2 will focus on metric & log collection in more detail.

The same infrastructure will be used for both parts of the blog series and can be found in GitHub repository here.

You may have noticed that the code used in the ‘Checkov Security Linting’ blog has been extended and used in this blog series and hopefully some aspects of the code are familiar. ******* Note, It is possible to only deploy the VPC infrastructure to a single Availability Zone (AZ) to save costs.

Why do we need so many components for monitoring/alerting? Some of the components such as KMS keys are required to meet some best practice principles ie encrypt data at rest (on the EC2 instance and log groups). The SNS topic allows users to subscribe to Cloudwatch alarm alerts. Parameter store is used to store the Cloudwatch monitoring configuration centrally so that it can be reused on other application servers in the same solution.

Monitoring is setup using the Cloudwatch configuration file in the GitHub repository. The majority of the configuration is done using EC2 userdata where the Cloudwatch agent is downloaded from the internet and configured using the configuration settings stored in the SSM parameter store. This all happens during the EC2 initial boot up. The overall agent configuration and how alerts sent to a SNS topic is shown below.

It is possible to gather/see metrics from a hypervisor perspective without using the Cloudwatch agent. Typical Windows server admins like to use resource monitor as their initial investigations when users complain about poor application performance. From resource manager server admins we can inspect CPU utilisation, memory utilisation and free disk space. Some of these metrics are not available via the hypervisor so we need to get the operating systems to supply this data via Cloudwatch agent. For Windows Server, these metrics are retrievable using Performance Monitor (aka Perfmon) and is the purple dotted box in the diagram above. To configure the Cloudwatch agent to gather OS data we tell it which Perfmon counters to monitor. The extract from the cloudwatch config file shows the perfmon counter names in red boxes below.

Note, there is no Perfmon counter for Memory utilisation. So we need to do a bit of a calculation to work this out using the “Available Mbytes” eg a Windows server admin may want to generate an alert if the utilisation is greater than 80%. This is the same as generating an alert when the utilisation is less than 20%.

A summary of the files in Github repository

app_cw_agent_config.json – cloudwatch config file of metrics to send to cloudwatch

{
        "logs": {
                "logs_collected": {
                        "files": {
                                "collect_list": [
                                        {
                                                "file_path": "C:\\inetpub\\logs\\LogFiles\\W3SVC1\\*.log",
                                                "log_group_name": "/demo-monitoring-iis-logs",
                                                "log_stream_name": "{instance_id}",
                                                "retention_in_days": 3
                                        }
                                ]
                        },
                        "windows_events": {
                                "collect_list": [
                                        {
                                                "event_format": "text",
                                                "event_levels": [
                                                        "VERBOSE",
                                                        "INFORMATION",
                                                        "WARNING",
                                                        "ERROR",
                                                        "CRITICAL"
                                                ],
                                                "event_name": "application",
                                                "log_group_name": "/demo-monitoring-application-logs",
                                                "log_stream_name": "{instance_id}",
                                                "retention_in_days": 7
                                        }
                                ]
                        }
                }
        },

        "metrics": {
                "namespace": "demo-monitoring",
                "aggregation_dimensions": [
                        [
                                "InstanceId"
                        ]
                ],
                "append_dimensions": {
                        "InstanceId": "${aws:InstanceId}"
                },
                "metrics_collected": {
                        "LogicalDisk": {
                                "measurement": [
                                        "% Free Space"
                                ],
                                "metrics_collection_interval": 60,
                                "resources": [
                                        "*"
                                ]
                        },
                        "Memory": {
                                "measurement": [
                                        "Available Mbytes"
                                ],
                                "metrics_collection_interval": 60
                        },
                        "Processor": {
                                "measurement": [
                                        "% Processor Time"
                                ],
                                "metrics_collection_interval": 60,
                                "resources": [
                                        "_Total"
                                ]
                        },
                        "procstat" : [
                                {
                                        "exe" : "spoolsv",
                                        "measurement" : [
                                                "memory_rss",
                                                "memory_vms",
                                                "cpu_time_system",
                                                "cpu_time_user"
                                        ]
                                }

                        ]
                }
        }
}

cloudwatch.tf – cloudwatch alarms, log group configuration and log group filter configuration

#------------------------------------------------------------------------------
# EC2 alarms
#------------------------------------------------------------------------------


resource "aws_cloudwatch_metric_alarm" "demo_monitoring_os_cpu_utilization" {
  count = (var.demo_monitoring_cpu_utilization_threshold > 0 && var.demo_monitoring_enabled) ? 1 : 0

  alarm_name          = "${var.app_name}-ec2-cpu-utilization"
  alarm_description   = "Average CPU utilization alarm activated when CPU utilisation over threshold."
  namespace           = var.demo_monitoring_namespace
  metric_name         = "Processor % Processor Time"
  statistic           = "Average"
  datapoints_to_alarm = 3
  period              = 60
  evaluation_periods  = 3
  comparison_operator = "GreaterThanThreshold"
  threshold           = var.demo_monitoring_cpu_utilization_threshold
  alarm_actions       = [module.demo_monitoring_sns_topic.sns_topic_arn]
  ok_actions          = [module.demo_monitoring_sns_topic.sns_topic_arn]
  dimensions = {
    InstanceId = module.demo_monitoring_server01.id
    instance   = "_Total"
    objectname = "Processor"
  }
  treat_missing_data = "breaching"

  tags = merge(local.tags_generic)
}

resource "aws_cloudwatch_metric_alarm" "demo_monitoring_os_mem_free" {
  count = (var.demo_monitoring_memory_available_Mbytes_threshold > 0 && var.demo_monitoring_enabled) ? 1 : 0

  alarm_name          = "${var.app_name}-ec2-memory-free"
  alarm_description   = "Average Memory free (MBytes) alarm activated when memory is less the threshold."
  namespace           = var.demo_monitoring_namespace
  metric_name         = "Memory Available Mbytes"
  statistic           = "Average"
  datapoints_to_alarm = 3
  period              = 60
  evaluation_periods  = 3
  comparison_operator = "LessThanThreshold"
  threshold           = var.demo_monitoring_memory_available_Mbytes_threshold
  alarm_actions       = [module.demo_monitoring_sns_topic.sns_topic_arn]
  ok_actions          = [module.demo_monitoring_sns_topic.sns_topic_arn]
  dimensions = {
    InstanceId = module.demo_monitoring_server01.id
  }
  treat_missing_data = "breaching"

  tags = merge(local.tags_generic)
}

resource "aws_cloudwatch_metric_alarm" "demo_monitoring_os_logicaldrive_c_percent_freespace" {
  count = (var.demo_monitoring_disk_free_threshold > 0 && var.demo_monitoring_enabled) ? 1 : 0

  alarm_name          = "${var.app_name}-ec2-drive-free-space-c"
  alarm_description   = "Free disk space alarm is activated when disk space is less than threshold (percentage)."
  namespace           = var.demo_monitoring_namespace
  metric_name         = "LogicalDisk % Free Space"
  statistic           = "Average"
  datapoints_to_alarm = 3
  period              = 60
  evaluation_periods  = 3
  comparison_operator = "LessThanOrEqualToThreshold"
  threshold           = var.demo_monitoring_disk_free_threshold
  alarm_actions       = [module.demo_monitoring_sns_topic.sns_topic_arn]
  ok_actions          = [module.demo_monitoring_sns_topic.sns_topic_arn]
  dimensions = {
    InstanceId = module.demo_monitoring_server01.id
    instance   = "C:"
    objectname = "LogicalDisk"
  }
  treat_missing_data = "breaching"

  tags = merge(local.tags_generic)
}


resource "aws_cloudwatch_metric_alarm" "demo_monitoring_os_logicaldrive_d_percent_freespace" {
  count = (var.demo_monitoring_disk_free_threshold > 0 && var.demo_monitoring_enabled) ? 1 : 0

  alarm_name          = "${var.app_name}-ec2-drive-free-space-d"
  alarm_description   = "Free disk space alarm is activated when disk space is less than threshold (percentage)."
  namespace           = var.demo_monitoring_namespace
  metric_name         = "LogicalDisk % Free Space"
  statistic           = "Average"
  datapoints_to_alarm = 3
  period              = 60
  evaluation_periods  = 3
  comparison_operator = "LessThanOrEqualToThreshold"
  threshold           = var.demo_monitoring_disk_free_threshold
  alarm_actions       = [module.demo_monitoring_sns_topic.sns_topic_arn]
  ok_actions          = [module.demo_monitoring_sns_topic.sns_topic_arn]
  dimensions = {
    InstanceId = module.demo_monitoring_server01.id
    instance   = "D:"
    objectname = "LogicalDisk"
  }
  treat_missing_data = "breaching"

  tags = merge(local.tags_generic)
}


resource "aws_cloudwatch_metric_alarm" "demo_monitoring_os_spooler_service" {
  count = (var.demo_monitoring_service_sploorer_alarm_enabled && var.demo_monitoring_enabled) ? 1 : 0

  alarm_name          = "${var.app_name}-service-spooler"
  alarm_description   = "Spooler service"
  namespace           = var.demo_monitoring_namespace
  metric_name         = "procstat memory_rss"
  statistic           = "Minimum"
  datapoints_to_alarm = 2
  period              = 60
  evaluation_periods  = 3
  comparison_operator = "LessThanThreshold"
  threshold           = 123
  alarm_actions       = [module.demo_monitoring_sns_topic.sns_topic_arn]
  ok_actions          = [module.demo_monitoring_sns_topic.sns_topic_arn]
  dimensions = {
    InstanceId   = module.demo_monitoring_server01.id
    exe          = "spoolsv"
    process_name = "spoolsv.exe"
  }
  treat_missing_data = "breaching"

  tags = merge(local.tags_generic)
}


resource "aws_cloudwatch_metric_alarm" "demo_monitoring_windows_events_Error_demo" {
  count = (var.demo_monitoring_events_application_alarm_enabled && var.demo_monitoring_enabled) ? 1 : 0

  alarm_name          = "${var.app_name}-windows-events-appliation-log-error-demo"
  alarm_description   = "Alert on word demo with serverity Error in application log if found more the 3 times a minute"
  namespace           = var.demo_monitoring_namespace
  metric_name         = "ERROR-demo-occurances"
  statistic           = "Sum"
  datapoints_to_alarm = 1
  period              = 60
  evaluation_periods  = 1
  comparison_operator = "GreaterThanThreshold"
  threshold           = 2
  alarm_actions       = [module.demo_monitoring_sns_topic.sns_topic_arn]
  ok_actions          = [module.demo_monitoring_sns_topic.sns_topic_arn]
  treat_missing_data  = "notBreaching"

  tags = merge(local.tags_generic)
}

#------------------------------------------------------------------------------
# Log groups 
#------------------------------------------------------------------------------

resource "aws_cloudwatch_log_group" "demo_monitoring_iis_logs" {
  name              = var.demo_monitoring_windows_iis_log_group
  retention_in_days = 3
  kms_key_id        = aws_kms_key.demo_monitoring_kms_key.arn

  tags = merge(local.tags_generic)
}

resource "aws_cloudwatch_log_group" "demo_monitoring_applicaiton_logs" {
  name              = var.demo_monitoring_windows_event_application_log_group
  retention_in_days = 7
  kms_key_id        = aws_kms_key.demo_monitoring_kms_key.arn

  tags = merge(local.tags_generic)
}

#------------------------------------------------------------------------------
# Log group filter 
#------------------------------------------------------------------------------

resource "aws_cloudwatch_log_metric_filter" "demo_monitoring_applicaiton_filter" {
  name           = "demo-monitoring-applicaiton-filter"
  pattern        = " \"[ERROR]\" \"[567]\" demo"
  log_group_name = aws_cloudwatch_log_group.demo_monitoring_applicaiton_logs.name

  metric_transformation {
    name      = "ERROR-demo-occurances"
    namespace = var.app_name
    value     = "1"
  }
}


data.tf – used to get account ID for automation and build IAM policy documents using variables

data "aws_caller_identity" "current" {}

data "aws_iam_policy_document" "vpc_flow_logging_boundary_role_doc" {
  statement {
    sid    = "ServiceBoundaries"
    effect = "Allow"
    actions = [
      "logs:CreateLogGroup",
      "logs:CreateLogStream",
      "logs:PutLogEvents",
      "logs:DescribeLogStreams"
    ]
    resources = ["arn:aws:logs:${var.region}:${data.aws_caller_identity.current.account_id}:log-group:/aws/vpc-flow-log/${module.demo_monitoring_vpc.vpc_id}:*"]
  }
}

data "aws_iam_policy_document" "demo_monitoring_ec2_assumable_doc" {
  statement {
    sid    = "readssmparameter"
    effect = "Allow"
    actions = [
      "ssm:GetParameter"
    ]
    resources = [aws_ssm_parameter.cw_agent.arn]
  }


  statement {
    sid    = "test"
    effect = "Allow"
    actions = [
      "logs:CreateLogGroup",
      "logs:CreateLogStream",
      "logs:PutLogEvents",
      "logs:DescribeLogStreams"
    ]
    resources = ["arn:aws:logs:${var.region}:${data.aws_caller_identity.current.account_id}:${var.demo_monitoring_windows_event_application_log_group}:*",
    "arn:aws:logs:${var.region}:${data.aws_caller_identity.current.account_id}:${var.demo_monitoring_windows_iis_log_group}:*"]
  }

  statement {
    sid    = "KMSkeySSMParameter"
    effect = "Allow"
    actions = [
      "kms:Decrypt",
      "kms:*"
    ]
    resources = [aws_kms_key.demo_monitoring_kms_key.arn]
  }

}


data "aws_ami" "windows-server-2022" {
  most_recent = true
  owners      = ["amazon"]
  name_regex  = "Windows_Server-2022-English-Full-Base*"
}


ec2.tf – configuration of ec2 instance

#-------------------------------------------------------------------
# Demo Monitoring Server Configuration
#-------------------------------------------------------------------
module "demo_monitoring_server01" {
  source  = "terraform-aws-modules/ec2-instance/aws"
  version = "3.5.0"

  #checkov:skip=CKV_AWS_8: "Ensure all data stored in the Launch configuration or instance Elastic Blocks Store is securely encrypted"
  #checkov:skip=CKV_AWS_126: "Ensure that detailed monitoring is enabled for EC2 instances"
  #checkov:skip=CKV_AWS_79: "Ensure Instance Metadata Service Version 1 is not enabled"

  depends_on = [aws_ssm_parameter.cw_agent]

  name = "${var.app_name}-${var.environment}-01"

  ami                         = data.aws_ami.windows-server-2022.id
  instance_type               = "t3.medium"
  subnet_id                   = module.demo_monitoring_vpc.private_subnets[2]
  availability_zone           = module.demo_monitoring_vpc.azs[2]
  associate_public_ip_address = false
  vpc_security_group_ids      = [module.demo_server_sg.security_group_id]
  iam_instance_profile        = module.demo_monitoring_ec2_assumable_role.iam_instance_profile_id
  user_data_base64            = base64encode(local.user_data_prod)
  private_ip                  = var.demo_server_private_ip
  disable_api_termination     = false


  #  CKV_AWS_79: "Ensure Instance Metadata Service Version 1 is not enabled"
  metadata_options = {
    http_endpoint = "enabled"
    http_tokens   = "required"
  }


  enable_volume_tags = false
  root_block_device = [
    {
      volume_type = "gp3"
      volume_size = 80
      encrypted   = true
      kms_key_id  = aws_kms_key.demo_monitoring_kms_key.arn

      tags = {
        Name = "Demo-Monitoring-C-Drive"
      }
    },
  ]

  tags = merge(local.tags_generic)


}

resource "aws_ebs_volume" "demo_monitoring_d_drive" {

  size              = 30
  type              = "gp3"
  availability_zone = module.demo_monitoring_vpc.azs[2]
  encrypted         = true
  kms_key_id        = aws_kms_key.demo_monitoring_kms_key.arn

  tags = {
    Name = "Demo-Monitoring-D-Drive"
  }

}

resource "aws_volume_attachment" "demo_monitoring_d_drive_attachment" {

  device_name = "/dev/xvdf"
  volume_id   = aws_ebs_volume.demo_monitoring_d_drive.id
  instance_id = module.demo_monitoring_server01.id

}


iam.tf – configuration of IAM roles and policies

#------------------------------------------------------------------------------
# Unrestraint access for vpc flow log role to all logs - Check: CKV_AWS_111
#------------------------------------------------------------------------------
resource "aws_iam_policy" "vpc_flow_logging_boundary_role_policy" {
  name   = "vpc-flow-logging-boundary-policy"
  path   = "/"
  policy = data.aws_iam_policy_document.vpc_flow_logging_boundary_role_doc.json

  tags = local.tags_generic
}

#--------------------------------------------------------------------------
# SSM EC2  assumable role 
#--------------------------------------------------------------------------
resource "random_id" "random_id" {
  byte_length = 5

}

module "demo_monitoring_ec2_assumable_role" {
  source  = "terraform-aws-modules/iam/aws//modules/iam-assumable-role"
  version = "4.17.1"

  trusted_role_services = [
    "ec2.amazonaws.com"
  ]

  role_requires_mfa       = false
  create_role             = true
  create_instance_profile = true

  role_name = "${var.app_name}-ec2-assumable-role-${random_id.random_id.hex}"

  custom_role_policy_arns = [
    "arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore",
    "arn:aws:iam::aws:policy/CloudWatchAgentServerPolicy" #required for cw agent
  ]

  tags = local.tags_generic
}

resource "aws_iam_policy" "demo_monitoring_ec2_assumable_policy" {
  name   = "${var.app_name}-ec2-assumable-role-policy"
  path   = "/"
  policy = data.aws_iam_policy_document.demo_monitoring_ec2_assumable_doc.json

  tags = local.tags_generic
}

resource "aws_iam_role_policy_attachment" "demo_monitoring_ec2_assumable_role_attachement" {
  role       = module.demo_monitoring_ec2_assumable_role.iam_role_name
  policy_arn = aws_iam_policy.demo_monitoring_ec2_assumable_policy.arn
}


kms.tf – configuration of kms key and policies

resource "aws_kms_key" "demo_monitoring_kms_key" {
  description         = "KMS Keys for Demo-Monitoring EBS Encryption"
  is_enabled          = true
  enable_key_rotation = true

  tags = merge(local.tags_generic)



  policy = <<EOF
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "Enable IAM User Permissions",
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::${data.aws_caller_identity.current.account_id}:root"
            },
            "Action": "kms:*",
            "Resource": "*"
        },
        {
            "Sid": "Allow cloudwatch to encrypt log groups with kms",
            "Effect": "Allow",
            "Principal": {
                "Service": "logs.${var.region}.amazonaws.com"
            },
            "Action": [
                "kms:Encrypt*",
                "kms:Decrypt*",
                "kms:ReEncrypt*",
                "kms:GenerateDataKey*",
                "kms:Describe*"
            ],
            "Resource": "*"
        },
        {
            "Sid": "Allow access for Key Administrators",
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::${data.aws_caller_identity.current.account_id}:root"
            },
            "Action": [
                "kms:Create*",
                "kms:Describe*",
                "kms:Enable*",
                "kms:List*",
                "kms:Put*",
                "kms:Update*",
                "kms:Revoke*",
                "kms:Disable*",
                "kms:Get*",
                "kms:Delete*",
                "kms:TagResource",
                "kms:UntagResource",
                "kms:ScheduleKeyDeletion",
                "kms:CancelKeyDeletion"
            ],
            "Resource": "*"
        },
        {
            "Sid": "Allow use of the key",
            "Effect": "Allow",
            "Principal": {
                 "AWS": [
                    "arn:aws:iam::${data.aws_caller_identity.current.account_id}:root"
                ]
            },
            "Action": [
                "kms:Encrypt",
                "kms:Decrypt",
                "kms:ReEncrypt*",
                "kms:GenerateDataKey*",
                "kms:DescribeKey"
            ],
            "Resource": "*"
        },
        {
            "Sid": "Allow attachment of persistent resources",
            "Effect": "Allow",
            "Principal": {
                "AWS": [
                    "arn:aws:iam::${data.aws_caller_identity.current.account_id}:root"
                ]
            },
            "Action": [
                "kms:CreateGrant",
                "kms:ListGrants",
                "kms:RevokeGrant"
            ],
            "Resource": "*",
            "Condition": {
                "Bool": {
                    "kms:GrantIsForAWSResource": "true"
                }
            }
        }
    ]
}
EOF
}

resource "aws_kms_alias" "demo_monitoring_kms_alias" {
  target_key_id = aws_kms_key.demo_monitoring_kms_key.key_id
  name          = "alias/${var.app_name}-${var.environment}"
}


locals.tf – user data and tagging schema

locals {
  region = "ap-southeast-2"

  tags_generic = {
    appname     = var.app_name
    environment = var.environment
    costcentre  = "TBC"
    ManagedBy   = var.ManagedByLocation
  }

  tags_ssm_ssm = {
    Name = "myvpc-vpce-interface-ssm-ssm"
  }

  tags_ssm_ssmmessages = {
    Name = "myvpc-vpce-interface-ssm-ssmmessages"
  }

  tags_ssm_ec2messages = {
    Name = "myvpc-vpce-interface-ssm-ec2messages"
  }


  user_data_prod = <<EOT
<powershell>
  Set-TimeZone -Name "New Zealand Standard Time"
  New-Item -Path "c:\temp" -Name "logfiles" -ItemType "directory"

  Install-WindowsFeature -name Web-Server -IncludeManagementTools
  Start-Sleep -Seconds 120
  New-Item -Path C:\inetpub\wwwroot\index.html -ItemType File -Value "Welcome to the Monitoring Demo IIS Webserver Home Page " -Force
  Set-Service -name W3SVC -startupType Automatic

  Invoke-WebRequest -Uri "http://localhost" -UseBasicParsing
  Invoke-WebRequest -Uri "http://localhost/fail" -UseBasicParsing

  # Get-Disk | Where partitionstyle -eq ‘raw’ | Initialize-Disk -PartitionStyle GPT -PassThru | New-Partition -AssignDriveLetter -UseMaximumSize | Format-Volume -FileSystem NTFS -NewFileSystemLabel “disk2” -Confirm:$false
  # Start-Sleep -Seconds 30

  ## CW Agent install
  c:
  cd \temp
  Invoke-WebRequest -Uri https://s3.amazonaws.com/amazoncloudwatch-agent/windows/amd64/latest/amazon-cloudwatch-agent.msi -OutFile c:\temp\amazon-cloudwatch-agent.msi
  & msiexec /i "c:\temp\amazon-cloudwatch-agent.msi" /l*v "cw_agent_install.log"
  Start-Sleep -Seconds 60
  & $env:ProgramFiles\Amazon\AmazonCloudWatchAgent\amazon-cloudwatch-agent-ctl.ps1 -a fetch-config -m ec2 -c ssm:/demo-monitoring/cloudwatch-agent/config -s
  
  Start-Sleep -Seconds 120

  # New-EventLog –LogName "Application" –Source “My Demo"
  # Write-EventLog –LogName "Application" –Source “My Demo" –EntryType "Error" –EventID 567 –Message “Test message for monitoring demo filter pattern.”

  # Invoke-WebRequest -Uri "http://localhost" -UseBasicParsing
  # Invoke-WebRequest -Uri "http://localhost/fail" -UseBasicParsing

</powershell>
  EOT

}


provider.tf – default Terraform file

provider "aws" {
  region = var.region

}


security-groups.tf – security group configuration for VPC endpoints and EC2 instance

#------------------------------------------------------------------------------
# Security Groups - SSM
#------------------------------------------------------------------------------
module "https_443_security_group" {
  source  = "terraform-aws-modules/security-group/aws//modules/https-443"
  version = "4.16.2"
  # Ignoring Checkov secret_name false positive detection
  #checkov:skip=CKV2_AWS_5: "Ensure that Security Groups are attached to another resource"

  name        = "https-443-sg"
  description = "Allow https 443"
  vpc_id      = module.demo_monitoring_vpc.vpc_id

  # Allow ingress rules to be accessed only within current VPC
  ingress_cidr_blocks = [module.demo_monitoring_vpc.vpc_cidr_block]

  # Allow all rules for all protocols
  egress_rules = ["https-443-tcp"]

  tags = local.tags_generic
}

#------------------------------------------------------------------------------
# Restrict default VPC Security Group - Check: CKV2_AWS_12
#------------------------------------------------------------------------------

resource "aws_default_security_group" "default" {
  depends_on = [module.demo_monitoring_vpc]

  vpc_id = module.demo_monitoring_vpc.vpc_id

  ingress = []
  egress  = []

  tags = local.tags_generic
}


module "demo_server_sg" {
  source  = "terraform-aws-modules/security-group/aws"
  version = "4.9.0"

  name        = "${var.environment}-${var.app_name}-sg"
  description = "Security group for ${var.environment} ${var.app_name} Server"
  vpc_id      = module.demo_monitoring_vpc.vpc_id

  ingress_cidr_blocks = ["0.0.0.0/0"]
  ingress_rules       = ["https-443-tcp"]

  ingress_with_cidr_blocks = [
    {
      from_port   = 3389
      to_port     = 3389
      protocol    = "tcp"
      description = "RDP accdess of VPC"
      cidr_blocks = var.vpc_cidr_range
    },

  ]
  egress_cidr_blocks = ["0.0.0.0/0"]
  egress_rules       = ["https-443-tcp", "http-80-tcp"]


  egress_with_cidr_blocks = [
    {
      rule        = "all-tcp"
      cidr_blocks = var.vpc_cidr_range
      description = "VPC Access"
    },
  ]

  tags = merge(local.tags_generic)
}


sns.tf – SNS topic configuration

module "demo_monitoring_sns_topic" {
  source  = "terraform-aws-modules/sns/aws"
  version = "3.3.0"

  name = "${var.app_name}-alerts"

  tags = merge(local.tags_generic)
}


ssm.tf – SSM endpoint configuration for remote access

#------------------------------------------------------------------------------
# VPC - SSM Endpoints
#------------------------------------------------------------------------------
module "vpc_ssm_endpoint" {

  source  = "terraform-aws-modules/vpc/aws//modules/vpc-endpoints"
  version = "3.13.0"

  vpc_id             = module.demo_monitoring_vpc.vpc_id
  security_group_ids = [module.https_443_security_group.security_group_id]

  endpoints = {
    ssm = {
      service             = "ssm"
      private_dns_enabled = true
      subnet_ids          = module.demo_monitoring_vpc.private_subnets
      tags                = merge(local.tags_generic, local.tags_ssm_ssm)
    },
    ssmmessages = {
      service             = "ssmmessages"
      private_dns_enabled = true,
      subnet_ids          = module.demo_monitoring_vpc.private_subnets
      tags                = merge(local.tags_generic, local.tags_ssm_ssmmessages)
    },
    ec2messages = {
      service             = "ec2messages",
      private_dns_enabled = true,
      subnet_ids          = module.demo_monitoring_vpc.private_subnets
      tags                = merge(local.tags_generic, local.tags_ssm_ec2messages)
    }
  }
}


ssm_parameter.tf – SSM parameter store configuration

resource "aws_ssm_parameter" "cw_agent" {
  description = "Cloudwatch agent config to configure Server metrics and alarms"
  name        = "/demo-monitoring/cloudwatch-agent/config"
  type        = "String"
  value       = file("app_cw_agent_config.json")

  #checkov:skip=CKV2_AWS_34: "AWS SSM Parameter should be Encrypted"

  tags = merge(local.tags_generic)
}

terraform.tfvars – default Terraform file with variable configuration settings

environment          = "test"
vpc_cidr_range       = "172.17.0.0/20"
private_subnets_list = ["172.17.0.0/24", "172.17.1.0/24", "172.17.2.0/24"]
public_subnets_list  = ["172.17.3.0/24", "172.17.4.0/24", "172.17.5.0/24"]
app_name             = "demo-monitoring"

#------------------------------------------------------------------------------
# EC2
#------------------------------------------------------------------------------

demo_server_private_ip = "172.17.2.34"

#------------------------------------------------------------------------------
#  Monitoring
#------------------------------------------------------------------------------

demo_monitoring_namespace                           = "demo-monitoring"
demo_monitoring_enabled                             = true
demo_monitoring_service_sploorer_alarm_enabled      = true
demo_monitoring_events_application_alarm_enabled    = true
demo_monitoring_log_iis_alarm_enabled               = true
demo_monitoring_cpu_utilization_threshold           = 80  # percent
demo_monitoring_disk_free_threshold                 = 10  # percent
demo_monitoring_memory_available_Mbytes_threshold   = 200 #32768*0.2
demo_monitoring_windows_iis_log_group               = "/demo-monitoring-iis-logs"
demo_monitoring_windows_event_application_log_group = "/demo-monitoring-application-logs"


variables.tf – default Terraform file

variable "region" {
  description = "AWS Region"
  default     = "ap-southeast-2"
  type        = string

}

variable "environment" {
  description = "AWS environment name"
  type        = string

}

variable "app_name" {
  description = "Applicaiton Name"
  type        = string

}

#------------------------------------------------------------------------------
# VPC
#------------------------------------------------------------------------------

variable "vpc_cidr_range" {
  type = string

}

variable "private_subnets_list" {
  description = "Private subnet list for infrastructure"
  type        = list(string)

}

variable "public_subnets_list" {
  description = "Public subnet list for infrastructure"
  type        = list(string)

}

#------------------------------------------------------------------------------
# Other
#------------------------------------------------------------------------------

variable "ManagedByLocation" {
  description = "IaC location"
  default     = "https://github.com/"
}



#------------------------------------------------------------------------------
#  EC2
#------------------------------------------------------------------------------

variable "demo_server_private_ip" {
  description = "Private IP address for Server"
  type        = string

}

#------------------------------------------------------------------------------
#  Monitoring
#------------------------------------------------------------------------------

variable "demo_monitoring_namespace" {
  description = "Name space for metrics and alerts"
  type        = string
}

variable "demo_monitoring_enabled" {
  description = "Switch to tunrn on/off monitoring"
  type        = bool
}



variable "demo_monitoring_service_sploorer_alarm_enabled" {
  description = "Switch to tunrn on/off element monitoring"
  type        = bool
}

variable "demo_monitoring_events_application_alarm_enabled" {
  description = "Switch to tunrn on/off windows application event log monitoring"
  type        = bool
}

variable "demo_monitoring_log_iis_alarm_enabled" {
  description = "Switch to tunrn on/off windows iis file log monitoring"
  type        = bool
}




variable "demo_monitoring_cpu_utilization_threshold" {
  description = "Maximum % CPU level before cloudwatch alarm trigger"
  type        = number
}

variable "demo_monitoring_disk_free_threshold" {
  description = "Maximum % free disk level before cloudwatch alarm trigger"
  type        = number
  default     = 90
}


variable "demo_monitoring_memory_available_Mbytes_threshold" {
  description = "Maximum % Memory level before cloudwatch alarm trigger"
  type        = number
  default     = 80
}



variable "demo_monitoring_windows_iis_log_group" {
  description = "Log group name in AWS for Windows EC2 IIS log file data"
  type        = string
}

variable "demo_monitoring_windows_event_application_log_group" {
  description = "Log group name in AWS for Windows EC2 Application log captured events"
  type        = string
}




vpc.tf – VPC configuration with NAT gateway, Internet gateway, Subnets etc

#------------------------------------------------------------------------------
# VPC Module
#------------------------------------------------------------------------------
module "demo_monitoring_vpc" {
  source  = "terraform-aws-modules/vpc/aws"
  version = "3.18.1"
  # Ignoring Checkov mitigation via boundary rule added, checkov unable logging enabled in module
  #checkov:skip=CKV_AWS_111: "Ensure IAM policies does not allow write access without constraints"
  #checkov:skip=CKV2_AWS_11: "Ensure VPC flow logging is enabled in all VPCs"
  #checkov:skip=CKV2_AWS_19: "Ensure that all EIP addresses allocated to a VPC are attached to EC2 instances"
  #checkov:skip=CKV2_AWS_12: "Ensure the default security group of every VPC restricts all traffic"
  #checkov:skip=CKV_AWS_130: "Ensure VPC subnets do not assign public IP by default"

  name = "${var.app_name}-${var.environment}-vpc"
  cidr = var.vpc_cidr_range

  azs             = ["${var.region}a", "${var.region}b", "${var.region}c"]
  private_subnets = var.private_subnets_list
  public_subnets  = var.public_subnets_list


  enable_flow_log                      = true
  create_flow_log_cloudwatch_log_group = true
  create_flow_log_cloudwatch_iam_role  = true
  vpc_flow_log_permissions_boundary    = aws_iam_policy.vpc_flow_logging_boundary_role_policy.arn
  flow_log_max_aggregation_interval    = 60

  create_igw         = true
  enable_nat_gateway = true
  enable_ipv6        = false

  enable_dns_hostnames = true
  enable_dns_support   = true

  tags = local.tags_generic

}

After you deploy the code, you will need to wait about 10 minutes for the Cloudwatch agent to be configured at the OS level and start sending data to AWS. There are a couple of commands we need to get alerts working.

I will end the post here and continue with a new post (part 2) here.

One comment

Leave a comment