Amazon EKS Workers
Overview
This service contains Terraform and Packer code to deploy a production-grade EC2 server cluster as workers for Elastic Kubernetes Service (EKS) on AWS.
 EKS architecture
EKS architecture
Features
- Deploy self-managed worker nodes in an Auto Scaling Group 
- Deploy managed workers nodes in a Managed Node Group 
- Zero-downtime, rolling deployment for updating worker nodes 
- Auto scaling and auto healing 
- For Nodes: - Server-hardening with fail2ban, ip-lockdown, auto-update, and more
- Manage SSH access via IAM groups via ssh-grunt
- CloudWatch log aggregation
- CloudWatch metrics and alerts
 
Learn
note
This repo is a part of the Gruntwork Service Catalog, a collection of reusable, battle-tested, production ready infrastructure code. If you’ve never used the Service Catalog before, make sure to read How to use the Gruntwork Service Catalog!
Under the hood, this is all implemented using Terraform modules from the Gruntwork terraform-aws-eks repo. If you are a subscriber and don’t have access to this repo, email support@gruntwork.io.
Core concepts
To understand core concepts like what is Kubernetes, the different worker types, how to authenticate to Kubernetes, and more, see the documentation in the terraform-aws-eks repo.
Repo organization
- modules: the main implementation code for this repo, broken down into multiple standalone, orthogonal submodules.
- examples: This folder contains working examples of how to use the submodules.
- test: Automated tests for the modules and examples.
Deploy
Non-production deployment (quick start for learning)
If you just want to try this repo out for experimenting and learning, check out the following resources:
- examples/for-learning-and-testing folder: The
examples/for-learning-and-testingfolder contains standalone sample code optimized for learning, experimenting, and testing (but not direct production usage).
Production deployment
If you want to deploy this repo in production, check out the following resources:
- examples/for-production folder: The - examples/for-productionfolder contains sample code optimized for direct usage in production. This is code from the Gruntwork Reference Architecture, and it shows you how we build an end-to-end, integrated tech stack on top of the Gruntwork Service Catalog.
- How to deploy a production-grade Kubernetes cluster on AWS: A step-by-step guide for deploying a production-grade EKS cluster on AWS using the code in this repo. 
Manage
For information on registering the worker IAM role to the EKS control plane, refer to the IAM Roles and Kubernetes API Access section of the documentation.
For information on how to perform a blue-green deployment of the worker pools, refer to the How do I perform a blue green release to roll out new versions of the module section of the documentation.
For information on how to manage your EKS cluster, including how to deploy Pods on Fargate, how to associate IAM roles to Pod, how to upgrade your EKS cluster, and more, see the documentation in the terraform-aws-eks repo.
Reference
- Inputs
- Outputs
Required
Configure one or more self-managed Auto Scaling Groups (ASGs) to manage the EC2 instances in this cluster. Set to empty object ({}) if you do not wish to configure self-managed ASGs.
Any types represent complex values of variable type. For details, please consult `variables.tf` in the source repo.
cluster_instance_amistringThe AMI to run on each instance in the EKS cluster. You can build the AMI using the Packer template eks-node-al2.json. One of cluster_instance_ami or cluster_instance_ami_filters is required. Only used if cluster_instance_ami_filters is null. Set to null if cluster_instance_ami_filters is set.
cluster_instance_ami_filtersobject(…)Properties on the AMI that can be used to lookup a prebuilt AMI for use with self managed workers. You can build the AMI using the Packer template eks-node-al2.json. One of cluster_instance_ami or cluster_instance_ami_filters is required. If both are defined, cluster_instance_ami_filters will be used. Set to null if cluster_instance_ami is set.
object({
    # List of owners to limit the search. Set to null if you do not wish to limit the search by AMI owners.
    owners = list(string)
    # Name/Value pairs to filter the AMI off of. There are several valid keys, for a full reference, check out the
    # documentation for describe-images in the AWS CLI reference
    # (https://docs.aws.amazon.com/cli/latest/reference/ec2/describe-images.html).
    filters = list(object({
      name   = string
      values = list(string)
    }))
  })
eks_cluster_namestringThe name of the EKS cluster. The cluster must exist/already be deployed.
Configure one or more Node Groups to manage the EC2 instances in this cluster. Set to empty object ({}) if you do not wish to configure managed node groups.
Any types represent complex values of variable type. For details, please consult `variables.tf` in the source repo.
Optional
additional_security_groups_for_workerslist(string)A list of additional security group IDs to be attached on worker groups.
[]alarms_sns_topic_arnlist(string)The ARNs of SNS topics where CloudWatch alarms (e.g., for CPU, memory, and disk space usage) should send notifications.
[]allow_inbound_ssh_from_cidr_blockslist(string)The list of CIDR blocks to allow inbound SSH access to the worker groups.
[]allow_inbound_ssh_from_security_groupslist(string)The list of security group IDs to allow inbound SSH access to the worker groups.
[]asg_custom_iam_role_namestringCustom name for the IAM role for the Self-managed workers. When null, a default name based on worker_name_prefix will be used. One of asg_custom_iam_role_name and asg_iam_role_arn is required (must be non-null) if asg_iam_role_already_exists is true.
nullDefault value for enable_detailed_monitoring field of autoscaling_group_configurations.
trueDefault value for the asg_instance_root_volume_encryption field of autoscaling_group_configurations. Any map entry that does not specify asg_instance_root_volume_encryption will use this value.
trueDefault value for the asg_instance_root_volume_iops field of autoscaling_group_configurations. Any map entry that does not specify asg_instance_root_volume_iops will use this value.
nullDefault value for the asg_instance_root_volume_size field of autoscaling_group_configurations. Any map entry that does not specify asg_instance_root_volume_size will use this value.
40Default value for the asg_instance_root_volume_throughput field of autoscaling_group_configurations. Any map entry that does not specify asg_instance_root_volume_throughput will use this value.
nullDefault value for the asg_instance_root_volume_type field of autoscaling_group_configurations. Any map entry that does not specify asg_instance_root_volume_type will use this value.
"standard"Default value for the asg_instance_type field of autoscaling_group_configurations. Any map entry that does not specify asg_instance_type will use this value.
"t3.medium"Default value for the max_pods_allowed field of autoscaling_group_configurations. Any map entry that does not specify max_pods_allowed will use this value.
nullasg_default_max_sizenumberDefault value for the max_size field of autoscaling_group_configurations. Any map entry that does not specify max_size will use this value.
2asg_default_min_sizenumberDefault value for the min_size field of autoscaling_group_configurations. Any map entry that does not specify min_size will use this value.
1Default value for the multi_instance_overrides field of autoscaling_group_configurations. Any map entry that does not specify multi_instance_overrides will use this value.
Any types represent complex values of variable type. For details, please consult `variables.tf` in the source repo.
[]Default value for the on_demand_allocation_strategy field of autoscaling_group_configurations. Any map entry that does not specify on_demand_allocation_strategy will use this value.
nullDefault value for the on_demand_base_capacity field of autoscaling_group_configurations. Any map entry that does not specify on_demand_base_capacity will use this value.
nullDefault value for the on_demand_percentage_above_base_capacity field of autoscaling_group_configurations. Any map entry that does not specify on_demand_percentage_above_base_capacity will use this value.
nullDefault value for the spot_allocation_strategy field of autoscaling_group_configurations. Any map entry that does not specify spot_allocation_strategy will use this value.
nullDefault value for the spot_instance_pools field of autoscaling_group_configurations. Any map entry that does not specify spot_instance_pools will use this value.
nullDefault value for the spot_max_price field of autoscaling_group_configurations. Any map entry that does not specify spot_max_price will use this value. Set to empty string (default) to mean on-demand price.
nullasg_default_tagslist(object(…))Default value for the tags field of autoscaling_group_configurations. Any map entry that does not specify tags will use this value.
list(object({
    key                 = string
    value               = string
    propagate_at_launch = bool
  }))
[]Default value for the use_multi_instances_policy field of autoscaling_group_configurations. Any map entry that does not specify use_multi_instances_policy will use this value.
falseCustom name for the IAM instance profile for the Self-managed workers. When null, the IAM role name will be used. If asg_use_resource_name_prefix is true, this will be used as a name prefix.
nullWhether or not the IAM role used for the Self-managed workers already exists. When false, this module will create a new IAM role.
falseasg_iam_role_arnstringARN of the IAM role to use if iam_role_already_exists = true. When null, uses asg_custom_iam_role_name to lookup the ARN. One of asg_custom_iam_role_name and asg_iam_role_arn is required (must be non-null) if asg_iam_role_already_exists is true.
nullasg_security_group_tagsmap(string)A map of tags to apply to the Security Group of the ASG for the self managed worker pool. The key is the tag name and the value is the tag value.
{}When true, all the relevant resources for self managed workers will be set to use the name_prefix attribute so that unique names are generated for them. This allows those resources to support recreation through create_before_destroy lifecycle rules. Set to false if you were using any version before 0.65.0 and wish to avoid recreating the entire worker pool on your cluster.
trueAdds additional tags to each ASG that allow a cluster autoscaler to auto-discover them. Only used for self-managed workers.
trueNamespace where the AWS Auth Merger is deployed. If configured, the worker IAM role will be mapped to the Kubernetes RBAC group for Nodes using a ConfigMap in the auth merger namespace.
nullcloud_init_partsmap(object(…))Cloud init scripts to run on the EKS worker nodes when it is booting. See the part blocks in https://www.terraform.io/docs/providers/template/d/cloudinit_config.html for syntax. To override the default boot script installed as part of the module, use the key default.
map(object({
    # A filename to report in the header for the part. Should be unique across all cloud-init parts.
    filename = string
    # A MIME-style content type to report in the header for the part. For example, use "text/x-shellscript" for a shell
    # script.
    content_type = string
    # The contents of the boot script to be called. This should be the full text of the script as a raw string.
    content = string
  }))
{}Whether or not to associate a public IP address to the instances of the self managed ASGs. Will only work if the instances are launched in a public subnet.
falseThe name of the Key Pair that can be used to SSH to each instance in the EKS cluster.
nullcustom_egress_security_group_rulesmap(object(…))A map of unique identifiers to egress security group rules to attach to the worker groups.
map(object({
    # The network ports and protocol (tcp, udp, all) for which the security group rule applies to.
    from_port = number
    to_port   = number
    protocol  = string
    # The target of the traffic. Only one of the following can be defined; the others must be configured to null.
    target_security_group_id = string       # The ID of the security group to which the traffic goes to.
    cidr_blocks              = list(string) # The list of IP CIDR blocks to which the traffic goes to.
  }))
{}custom_ingress_security_group_rulesmap(object(…))A map of unique identifiers to ingress security group rules to attach to the worker groups.
map(object({
    # The network ports and protocol (tcp, udp, all) for which the security group rule applies to.
    from_port = number
    to_port   = number
    protocol  = string
    # The source of the traffic. Only one of the following can be defined; the others must be configured to null.
    source_security_group_id = string       # The ID of the security group from which the traffic originates from.
    cidr_blocks              = list(string) # The list of IP CIDR blocks from which the traffic originates from.
  }))
{}Parameters for the worker cpu usage widget to output for use in a CloudWatch dashboard.
object({
    # The period in seconds for metrics to sample across.
    period = number
    # The width and height of the widget in grid units in a 24 column grid. E.g., a value of 12 will take up half the
    # space.
    width  = number
    height = number
  })
{
  height = 6,
  period = 60,
  width = 8
}
Parameters for the worker disk usage widget to output for use in a CloudWatch dashboard.
object({
    # The period in seconds for metrics to sample across.
    period = number
    # The width and height of the widget in grid units in a 24 column grid. E.g., a value of 12 will take up half the
    # space.
    width  = number
    height = number
  })
{
  height = 6,
  period = 60,
  width = 8
}
Parameters for the worker memory usage widget to output for use in a CloudWatch dashboard.
object({
    # The period in seconds for metrics to sample across.
    period = number
    # The width and height of the widget in grid units in a 24 column grid. E.g., a value of 12 will take up half the
    # space.
    width  = number
    height = number
  })
{
  height = 6,
  period = 60,
  width = 8
}
Set to true to enable several basic CloudWatch alarms around CPU usage, memory usage, and disk space usage. If set to true, make sure to specify SNS topics to send notifications to using alarms_sns_topic_arn.
trueSet to true to add IAM permissions to send custom metrics to CloudWatch. This is useful in combination with https://github.com/gruntwork-io/terraform-aws-monitoring/tree/master/modules/agents/cloudwatch-agent to get memory and disk metrics in CloudWatch for your Bastion host.
trueenable_fail2banboolEnable fail2ban to block brute force log in attempts. Defaults to true.
trueIf you are using ssh-grunt and your IAM users / groups are defined in a separate AWS account, you can use this variable to specify the ARN of an IAM role that ssh-grunt can assume to retrieve IAM group and public SSH key info from that account. To omit this variable, set it to an empty string (do NOT use null, or Terraform will complain).
""Custom name for the IAM role for the Managed Node Groups. When null, a default name based on worker_name_prefix will be used. One of managed_node_group_custom_iam_role_name and managed_node_group_iam_role_arn is required (must be non-null) if managed_node_group_iam_role_already_exists is true.
nullWhether or not the IAM role used for the Managed Node Group workers already exists. When false, this module will create a new IAM role.
falseARN of the IAM role to use if iam_role_already_exists = true. When null, uses managed_node_group_custom_iam_role_name to lookup the ARN. One of managed_node_group_custom_iam_role_name and managed_node_group_iam_role_arn is required (must be non-null) if managed_node_group_iam_role_already_exists is true.
nullDefault value for capacity_type field of managed_node_group_configurations.
"ON_DEMAND"Default value for desired_size field of managed_node_group_configurations.
1Default value for enable_detailed_monitoring field of managed_node_group_configurations.
trueDefault value for the instance_root_volume_encryption field of managed_node_group_configurations.
trueDefault value for the instance_root_volume_size field of managed_node_group_configurations.
40Default value for the instance_root_volume_type field of managed_node_group_configurations.
"gp3"node_group_default_instance_typeslist(string)Default value for instance_types field of managed_node_group_configurations.
nullnode_group_default_labelsmap(string)Default value for labels field of managed_node_group_configurations. Unlike common_labels which will always be merged in, these labels are only used if the labels field is omitted from the configuration.
{}Default value for the max_pods_allowed field of managed_node_group_configurations. Any map entry that does not specify max_pods_allowed will use this value.
nullDefault value for max_size field of managed_node_group_configurations.
1Default value for min_size field of managed_node_group_configurations.
1node_group_default_subnet_idslist(string)Default value for subnet_ids field of managed_node_group_configurations.
nullnode_group_default_tagsmap(string)Default value for tags field of managed_node_group_configurations. Unlike common_tags which will always be merged in, these tags are only used if the tags field is omitted from the configuration.
{}The instance type to configure in the launch template. This value will be used when the instance_types field is set to null (NOT omitted, in which case node_group_default_instance_types will be used).
nullnode_group_nameslist(string)The names of the node groups. When null, this value is automatically calculated from the managed_node_group_configurations map. This variable must be set if any of the values of the managed_node_group_configurations map depends on a resource that is not available at plan time to work around terraform limitations with for_each.
nullnode_group_security_group_tagsmap(string)A map of tags to apply to the Security Group of the ASG for the managed node group pool. The key is the tag name and the value is the tag value.
{}ssh_grunt_iam_groupstringIf you are using ssh-grunt, this is the name of the IAM group from which users will be allowed to SSH to the EKS workers. To omit this variable, set it to an empty string (do NOT use null, or Terraform will complain).
"ssh-grunt-users"ssh_grunt_iam_group_sudostringIf you are using ssh-grunt, this is the name of the IAM group from which users will be allowed to SSH to the EKS workers with sudo permissions. To omit this variable, set it to an empty string (do NOT use null, or Terraform will complain).
"ssh-grunt-sudo-users"tenancystringThe tenancy of the servers in the self-managed worker ASG. Must be one of: default, dedicated, or host.
"default"If this variable is set to true, then use an exec-based plugin to authenticate and fetch tokens for EKS. This is useful because EKS clusters use short-lived authentication tokens that can expire in the middle of an 'apply' or 'destroy', and since the native Kubernetes provider in Terraform doesn't have a way to fetch up-to-date tokens, we recommend using an exec-based provider as a workaround. Use the use_kubergrunt_to_fetch_token input variable to control whether kubergrunt or aws is used to fetch tokens.
trueEKS clusters use short-lived authentication tokens that can expire in the middle of an 'apply' or 'destroy'. To avoid this issue, we use an exec-based plugin to fetch an up-to-date token. If this variable is set to true, we'll use kubergrunt to fetch the token (in which case, kubergrunt must be installed and on PATH); if this variable is set to false, we'll use the aws CLI to fetch the token (in which case, aws must be installed and on PATH). Note this functionality is only enabled if use_exec_plugin_for_auth is set to true.
trueWhen true, all IAM policies will be managed as dedicated policies rather than inline policies attached to the IAM roles. Dedicated managed policies are friendlier to automated policy checkers, which may scan a single resource for findings. As such, it is important to avoid inline policies when targeting compliance with various security standards.
trueWhen true, assumes prefix delegation mode is in use for the AWS VPC CNI component of the EKS cluster when computing max pods allowed on the node. In prefix delegation mode, each ENI will be allocated 16 IP addresses (/28) instead of 1, allowing you to pack more Pods per node.
falseName of the IAM role to Kubernetes RBAC group mapping ConfigMap. Only used if aws_auth_merger_namespace is not null.
"eks-cluster-worker-iam-mapping"worker_name_prefixstringPrefix EKS worker resource names with this string. When you have multiple worker groups for the cluster, you can use this to namespace the resources. Defaults to empty string so that resource names are not excessively long by default.
""Map of Node Group names to ARNs of the created EKS Node Groups.
The ARN of the IAM role associated with the Managed Node Group EKS workers.
The name of the IAM role associated with the Managed Node Group EKS workers.
Map of Node Group names to Auto Scaling Group security group IDs. Empty if cluster_instance_keypair_name is not set.
The ID of the common AWS Security Group associated with all the managed EKS workers.
A CloudWatch Dashboard widget that graphs CPU usage (percentage) of the Managed Node Group EKS workers.
A CloudWatch Dashboard widget that graphs disk usage (percentage) of the Managed Node Group EKS workers.
A CloudWatch Dashboard widget that graphs memory usage (percentage) of the Managed Node Group EKS workers.
A CloudWatch Dashboard widget that graphs CPU usage (percentage) of the self-managed EKS workers.
A CloudWatch Dashboard widget that graphs disk usage (percentage) of the self-managed EKS workers.
A CloudWatch Dashboard widget that graphs memory usage (percentage) of the self-managed EKS workers.
The ARN of the IAM role associated with the self-managed EKS workers.
The name of the IAM role associated with the self-managed EKS workers.
The ID of the AWS Security Group associated with the self-managed EKS workers.
The list of names of the ASGs that were deployed to act as EKS workers.