Shared Services Account – Slack Integration

Slack Integration make it easier for all of your applications to easily surface alerts.

If you have not yet read my previous post on Enterprise Eventing Strategy, I would recommend understanding that post as this one builds on it.

With our home-run event management in place, its super simple to add a new feature available across all accounts as a simple event.  In this case we will create the ability to fire off a slack notification from any account.

Using the cloudformation template below:

SharedServices-slack.yaml

you can fire off a slack message as easy as:

payload1 = [{
  "Source": "com.mycompany.slacker",
  "Detail": "{ \"accountname\": \"web-prod\", \"severity\": \"warning\", \"title\": \"title\", \"text\": \"This is a test message:\\nbody content\" }",
  "DetailType": "slacker"
}]

events_client = boto3.client('events') 
events_client.put_events(Entries=payload1)

The explosion of the digital economy

The tech industry has never seen this level of investment [in datacenters]. The investment we’re seeing in cloud capacity really has no precedent, save perhaps Henry Ford’s manic factory building for his Model T, the US government’s armaments efforts in WWII, and Foxconn’s manufacturing support for smartphones. As Ford’s efforts presaged the boom growth of the industrial economy, so too do (cloud) investments augur the explosion of the digital economy.   — Bernard Golden

Landing Zone Accounts – Health Events

Possibly one of the most important things to communicate out to your enterprise teams is upcoming health events.  These can indicate that some resource has gone bad and will automatically be replaced, but can also notify you of just about an infinite amount of bad news that your teams should have time to react to.

Recommended best practice is to catch these events and send them directly to your teams.  This CloudFormation script implements a simple event handler that catches health events and pushes them to Slack.  You could easily modify to send this information via another mechanism.

Example Health Notification

Enterprise Tagging Strategy and Enforcement

Who owns that Instance?  Can we delete this snapshot?  Should this bucket be public?

Nothing becomes as troublesome in an enterprise AWS environment as knowing who owns what resources.  Tagging give you the metadata to responsibly operate in a highly complex environment.

Tagging Basics

  • Tags are key-value pairs.
  • There is a default tag for most resources (“Name”).
  • There is a maximum tag key length of 128 characters and a maximum value length of 256 characters.
  • There are 50 tags available per resource.
  • Amazon restricts use of tags beginging with ‘AWS:’ for their use, these tags do not subtract from the 50 available.
  • Tags are case sensitive. Keys and values should be all lowercase. Use underscore as a separator.

Tagging Your Resources

You can define tags from the EC2 console by selecting the relevant instance and selecting the “Tags” tag.  Alternatively, you can select the instance, click “Actions” and select “Add/Edit Tags.”

In S3, you can apply tags at the object or bucket level. You’ll note that tags are referred to as “Metadata” in S3.

Tags can also be managed via the AWS API. See  Amazon’s documentation for more information and helpful code samples for this.

Recommended Tags:

 

Key
Value
Description
Name not required, but recommended User defined name, this should NOT be the resource being tagged, nothing is less usefull then finding an instance named ‘Instance’.
environment dev | qa | staging| beta | prod This can be confusing, what defines prod, what defines dev, often times the settings in dev accounts can be considered prod since if they are lost they would cause an outage internally.
division engineering the corporate division responsible for these assets, examples include Engineering, Marketing, Sales, etc.
product SuperLaserMonkey Ideally this comes from a currated list of products maintained by the organization, a rollup for billing purposes.
component LaserMonkeyMap the subcomponent of the above product
owner Must be an email address A member of engineer staff, ideally not executive leadership, someone with day to day knowledge of the product and/or asset.
classification public, internal or confidential should map to your companies information security policy

AWS Documentation

 

Amazon Elastic Block Store (Amazon EBS) Tagging Your Resources in the Amazon Elastic Compute Cloud User Guide.
Amazon ElastiCache (ElastiCache) Using Cost Allocation Tags in ElastiCache in the Amazon ElastiCache User Guide.
Amazon Elastic Compute Cloud (Amazon EC2) Tagging Your Resources in the Amazon Elastic Compute Cloud User Guide.
Elastic Load Balancing Add or Remove Tags in the Elastic Load Balancing Developer Guide.
Amazon EMR Tagging Amazon EMR Clusters in the Amazon EMR Developer Guide.
Amazon Glacier Tagging Your Amazon Glacier Resources in the Amazon Glacier Developer Guide.
Amazon Kinesis Tagging Your Amazon Kinesis Streams in the Amazon Kinesis Developer Guide.
Amazon Redshift Tagging Resources in Amazon Redshift in the Amazon Redshift Cluster Management Guide.
Amazon Relational Database Service (Amazon RDS) Tagging Amazon RDS Resources in the Amazon Relational Database Service User Guide.
Amazon Route 53 Tagging Amazon Route 53 Resources in the Amazon Route 53 Developer Guide.
Amazon Simple Storage Service (Amazon S3) Billing and Reporting of Buckets in the Amazon Simple Storage Service Developer Guide.
Amazon Virtual Private Cloud (Amazon VPC) Amazon VPC and Amazon EC2 resources that can be tagged are listed in Tagging Your Resources in the Amazon Elastic Compute Cloud User Guide.
Auto Scaling Tagging Auto Scaling Groups and Amazon EC2 Instances in the Auto Scaling Developer Guide.
AWS CloudFormation Tagging Your Member Resources in the AWS CloudFormation User Guide. Tagging your CloudFormation doesn’t guarantee all resources will have tags.
AWS Elastic Beanstalk Tagging Your Environments and Applications in the AWS Elastic Beanstalk Developer Guide.

Account Setup – Account Alias

Most people never give any consideration to the account alias, but it helps provide context.  Getting status messages about account numbers is frustrating, and your account alias can save you from that frustration by giving you just a little bit of metadata that helps identify your account.

I recommend standardizing these names as much as possible, if your accounts are split up by project and environment, then name your accounts accordingly… ‘project-dev’, ‘project-prod’, etc…

Your developers will be able to pull this name via the AWS CLI, or via the API’s such as Boto3.  This can make alerting out to places such as slack much more informative.

Example of setting Account Alias:

aws iam create-account-alias \
   --account-alias "web-prod"

Example of getting Account Alias:

iam_client = boto3.client('iam')

account_alias = iam_client.list_account_aliases()['AccountAliases'][0].title()

Account Setup – Route 53

If you have not yet read my post on Enterprise Account Structure you may want to read that first so you understand my definition of the following account types.

Route53 is a wonderfully reliable service that takes allot of the work out of managing zones.   The question to be answered is how do I set it up in an enterprise so that everyone who needs to manage a zone can, but the overall structure is still well understood.

Shared Services Account

Your going to need to host your root domain someplace, and your shared services account makes all the sense in the world.  And your systems administrators and admins already have access to the shared services account, so configuring new subdomains and diagnosing issues should be easy.

Your shared services Route53 domain is going to be set up a single time, so you can either configure this manually as a big empty zone, or use a cloudformation resource:

DNS: 
   Type: "AWS::Route53::HostedZone" 
   Properties: 
      HostedZoneConfig: 
         Comment: "My hosted zone for mydomain.com" 
      Name: "mydomain.com" 
      VPCs: 
       - 
         VPCId: "vpc-abcd1234" 
         VPCRegion: "ap-northeast-1"

 

Landing Zone Accounts

Configuring your new landing zone DNS involves creating your new hosted zone for your subdomain, which may be something like ‘Web-Prod.mydomain.com”, then copying the 4 DNS servers AWS will assign you up into your shared services account as a DNS delegation for your subdomain into the zone hosted there named ‘mydomain.com’.  When it comes time for someone to lookup web-prod.mydomain.com, the DNS server will first look up mydomain.com and ask it where to find web-prod, it will ask the Route53 in your shared services account this question, which will then point it to the 4 DNS servers that handle your web-prod domain, who will answer the question based on the entries in your web-prod route53 zone.

Bash/Pseudo code example of how to create a zone for your new landing zone. Notice this script operates on two accounts, your landing zone, and the shared services account.

sub-domain = "web-prod"

aws route53 create-hosted-zone \
   --name $sub-domain \
   --caller-reference $sub-domain \
   --profile LandingZone

hosted_zone_id = aws route53 list-hosted-zones \
   --query 'HostedZones[].{Name:Name, Id:Id}' \
   --profile Landingzone \
   --output text | grep $sub-domain | awk '{print $1}'

name_servers = aws route53 list-resource-record-sets \
   --hosted-zone-id $hosted_zone_id \
   --query 'ResourceRecordSets[?contains(Type, `NS`)].ResourceRecords[].Value' \
   --profile LandingZone \
   --output text 

for r in ${ns}; do
   resource_record+="{\"Value\": \"${r}\"},"
done

RESOURCE_RECORD=$(echo ${resource_record%?})

cat <<EOF > dns-delegate.json
{
   "Changes": [
      {
         "Action": "CREATE",
         "ResourceRecordSet": {
            "Name": "${SUB_DOMAIN}",
            "ResourceRecords": [
               ${RESOURCE_RECORD}
            ],
            "TTL": 300,
            "Type": "NS"
         }
      }
   ],
   "Comment": "Creating Subdomain delegation."
}
EOF

hosted_zone_id = aws route53 list-hosted-zones \
   --query 'HostedZones[].{Name:Name, Id:Id}' \
   --profile SharedServices \
   --output text | grep mydomain.com | awk '{print $1}'

aws route53 change-resource-record-sets \
   --hosted-zone-id hosted_zone_id \
   --change-batch file://dns-delegate.json \
   --profile SharedServices \
   --output text

Enterprise Eventing Strategy

Key to any large multi-region deployment is how do you handle events?  Do you snowfort and deploy code into every account, or do you centralize so you have one bit of code handling all accounts?

Events are the heart and soul of AWS, and when taken to the scale of an enterprise they can become daunting to handle.   Consider an environment that has 100 accounts and runs across 3 regions, that is 150 different sets of events to handle.  These events can be super important like ‘I found your access keys on a public website’ or they can be as innocuous as ‘someone successfully listed the contents of a bucket’.   Either way you will want to process these events, looking for issues you can catch and resolve real-time.

Rather than deploy monitoring into 50 accounts x 3 regions, its easier to home-run all events back to your ‘Shared Services’ account in your ‘primary region’ for processing centrally.

Putting together something like this is not especially challenging, but it can be hard to keep clear where each component needs to be deployed.

Step-by-step

Shared Services Account – Primary Region

  1. Log in with proper access
  2. Click on ‘CloudWatch’
  3. Click on ‘Event Buses’
  4. ensure ‘default’ event bus is selected
  5. click ‘add permision’, enter landing zone account.
  6. Wash, Rinse, Repeat step #5 until all landing zones are added.

Landing Zone Accounts – Primary Region

For all Landing Zone accounts in the same region as your Shared Services account, all that is required to get these events over to Shared Services is a simple event hook, that redirects the event to the shared services event bus.

SameRegion_BusForwarder

AWSTemplateFormatVersion: '2010-09-09'
Description: >

This stack deploys cloudwatch rules that forward events back to Shared Services

Resources:
   SlackerEventRule:
      Type: AWS::Events::Rule
      Properties:
         Description: EventRule
         EventPattern:
           source:
             - aws.autoscaling
             - aws.ec2
             - forward.aws.autoscaling
             - forward.aws.ec2
      State: ENABLED
      Targets:
        - Arn: arn:aws:events:us-west-2:012345678:event-bus/default
          Id: SharedServices

Also in our Landing zone Primary Region is going to be some code to catch events forwarded from the same landing zone account in other regions.  This code basically catches the event, but it cant forward it untouched, because for security purposes, AWS does not let you send events that look identical to events authored by AWS itself, so you can receive ‘aws.ec2’ but you cant send it.   To resolve this our listener takes every event it catches and prefixes the source with ‘forward.’  so it receives ‘aws.ec2’ and sends ‘forward.aws.ec2’, which gets picked up by the SameRegion_BusForwarder above, and dispatched to the global event bus, meaning we can listen for it over in the SharedServices account.

OtherRegion_BusListener

import boto3
import json
from pprint import pprint

events_client = boto3.client('events')

def lambda_handler(event, context):
   records = event['Records']

   for record in records:
      payload = json.loads(record['Sns']['Message'])

   payload['Source'] = "{}.{}".format("forward",payload['source'])
   del payload['source']

   payload['DetailType'] = payload['detail-type']
   del payload['detail-type']

   payload['detail']['region']=payload['region']
   payload['Detail'] = json.dumps(payload['detail']).replace('"', '\"')
   del payload['detail']

   payload['Resources'] = payload['resources']
   del payload['resources']
   del payload['account']
   del payload['region']
   del payload['version']
   del payload['id']
   del payload['time']

   list_of_payload = []
   list_of_payload.append(payload)

   response = events_client.put_events(Entries=list_of_payload)

   if 'FailedEntryCount' in response:
      print("FailedEntryCount: {}".format(response['FailedEntryCount']))
      for error in response['Entries']:
         pprint(error)

Landing Zone Accounts – Secondary Regions

For other regions in your landing zone accounts, your going to capture them with a CloudWatch Event rule, and forward them to and SNS Topic that can be read by the OtherRegion_BusListener in your Landing zone primary region.

OtherRegion_EventForwarder

AWSTemplateFormatVersion: '2010-09-09'
Description: >

   This stack deploys cloudwatch rules that forward events back to Shared Services

Resources:
   EventConvoyTopic:
      Type: "AWS::SNS::Topic"
      Properties:
        Subscription:
          - Endpoint: !Join [ '', [ 'arn:aws:lambda:us-west-2:', !Ref 'AWS::AccountId', ':function:SharedServices-Event-Listener']]
            Protocol: "lambda"
        DisplayName: "sharedservices-event-convoy"
        TopicName: "sharedservices-event-convoy"
EventConvoyPolicy:
   Type: "AWS::SNS::TopicPolicy"
   Properties:
      PolicyDocument:
        Id: EventConvoyPolicy
        Version: '2012-10-17'
        Statement:
        - Sid: AllowPublishEvents
          Effect: Allow
          Principal:
            Service: "events.amazonaws.com"
          Action: 
           - 'sns:Publish'
          Resource: !Ref 'EventConvoyTopic'
        - Sid: AllowSharedServices
          Effect: Allow
          Principal:
            AWS: !Ref 'AWS::AccountId'
          Action: 
           - 'sns:Subscribe'
           - 'sns:ListSubscriptionsByTopic'
           - 'sns:Receive'
          Resource: "*"
      Topics:
        - !Ref 'EventConvoyTopic'
CloudWatchEventRule1:
   Type: AWS::Events::Rule
   Properties:
      Description: Majority Events for Event-Handler
      EventPattern:
        source:
         - aws.autoscaling
         - aws.ec2
   State: ENABLED
   Targets:
   - Arn: !Ref 'EventConvoyTopic'
     Id: EventTopic