AWS Solutions Architect Associate: Domain 1 - Design Resilient Architectures
Complete guide to AWS Domain 1: designing resilient architectures for high availability and fault tolerance. Covers multi-AZ deployments, Auto Scaling, and architectural patterns.
Introduction
Domain 1 of the AWS Solutions Architect Associate certification focuses on designing resilient architectures that can withstand failures and maintain availability. This domain accounts for approximately 30% of the exam.
Multi-AZ Deployment Strategy
Understanding Availability Zones
AWS regions are divided into multiple Availability Zones (AZs). Each AZ:
- Is a separate data center
- Has independent power and cooling
- Connected via redundant network links
- Isolated from failures in other AZs
Multi-AZ RDS Deployment
DBInstance: Engine: postgres MultiAZ: true BackupRetentionPeriod: 30 PreferredBackupWindow: "03:00-04:00" PreferredMaintenanceWindow: "sun:04:00-sun:05:00" Failover: Automatic: true Duration: 1-2 minutes Multi-AZ Application Architecture
Region (e.g., us-east-1) ├── AZ-1 (us-east-1a) │ ├── EC2 instance (app server) │ └── RDS read replica ├── AZ-2 (us-east-1b) │ ├── EC2 instance (app server) │ └── RDS read replica └── AZ-3 (us-east-1c) ├── EC2 instance (app server) └── RDS primary with Multi-AZ Elastic Load Balancing
Application Load Balancer (ALB)
Best for HTTP/HTTPS traffic with advanced routing.
LoadBalancer: Type: application Scheme: internet-facing Listeners: - Port: 80 Protocol: HTTP DefaultAction: redirect-to-https - Port: 443 Protocol: HTTPS DefaultAction: forward-to-target-group TargetGroup: Port: 8080 Protocol: HTTP HealthCheck: Path: /health Interval: 30 SuccessCount: 2 UnhealthyThreshold: 3 Network Load Balancer (NLB)
Extreme performance for millions of requests per second.
aws elbv2 create-load-balancer \ --name my-nlb \ --type network \ --scheme internet-facing \ --subnets subnet-1 subnet-2 Classic Load Balancer (ELB)
Legacy load balancer (not recommended for new applications).
Auto Scaling
Launch Templates and Configurations
LaunchTemplate: ImageId: ami-12345 InstanceType: t3.medium SecurityGroupIds: - sg-12345 TagSpecifications: - ResourceType: instance Tags: - Key: Name Value: app-server Auto Scaling Group Configuration
aws autoscaling create-auto-scaling-group \ --auto-scaling-group-name app-asg \ --launch-template LaunchTemplateName=app-lt \ --min-size 2 \ --max-size 10 \ --desired-capacity 4 \ --vpc-zone-identifier "subnet-1,subnet-2,subnet-3" Scaling Policies
Target Tracking Scaling
{ "TargetTrackingConfiguration": { "TargetValue": 70.0, "PredefinedMetricSpecification": { "PredefinedMetricType": "ASGAverageCPUUtilization" }, "ScaleOutCooldown": 300, "ScaleInCooldown": 300 } } Step Scaling
aws autoscaling put-scaling-policy \ --auto-scaling-group-name app-asg \ --policy-name scale-up \ --policy-type StepScaling \ --metric-aggregation-type Average RTO and RPO
Definitions
- RTO (Recovery Time Objective): Maximum acceptable downtime
- RPO (Recovery Point Objective): Maximum acceptable data loss
Tier 1 - Critical: RTO: 1 hour RPO: 15 minutes Strategy: Multi-region active-active Tier 2 - Important: RTO: 4 hours RPO: 1 hour Strategy: Multi-region active-passive Tier 3 - Standard: RTO: 24 hours RPO: 24 hours Strategy: Single region with backups Backup and Recovery Strategies
AWS Backup
# Create backup plan aws backup create-backup-plan \ --backup-plan file://backup-plan.json # Assign resources aws backup create-backup-selection \ --backup-plan-id plan-123 \ --backup-selection file://backup-selection.json Cross-Region Backup
# Enable cross-region replication for RDS aws rds modify-db-instance \ --db-instance-identifier prod-db \ --enable-cloudwatch-logs-exports error,general Fault-Tolerant Architecture Pattern
┌─────────────────────────────────────────┐ │ Route 53 (DNS) │ │ Geolocation Routing Policy │ └──────────────┬──────────────────────────┘ │ ┌──────────┴──────────┐ │ │ ┌───▼────────┐ ┌──────▼────┐ │ Region 1 │ │ Region 2 │ │ (Active) │ │ (Standby) │ │ us-east-1 │ │ us-west-2 │ │ │ │ │ │ ┌────────┐ │ │ ┌────────┐ │ │ │ ALB │ │ │ │ ALB │ │ │ └───┬────┘ │ │ └───┬────┘ │ │ │ │ │ │ │ │ ┌───▼───┐ │ │ ┌───▼───┐ │ │ │ ASG │ │ │ │ ASG │ │ │ │(2-10) │ │ │ │(0-5) │ │ │ └───────┘ │ │ └───────┘ │ │ │ │ │ │ ┌────────┐ │ │ ┌────────┐ │ │ │Multi-AZ│ │ │ │Multi-AZ│ │ │ │RDS │ │ │ │RDS │ │ │ │(Primary) │ │(Replica) │ │ └────────┘ │ │ └────────┘ │ └────────────┘ └────────────┘ Health Checks
ELB Health Check Configuration
HealthCheck: Target: HTTP:8080/health Interval: 30 seconds Timeout: 5 seconds HealthyThreshold: 2 UnhealthyThreshold: 3 Custom Health Checks
from flask import Flask, jsonify import psutil app = Flask(__name__) @app.route('/health', methods=['GET']) def health_check(): checks = { 'database': check_database(), 'disk_space': check_disk_space(), 'memory': check_memory() } status = 'healthy' if all(checks.values()) else 'unhealthy' return jsonify({'status': status, 'checks': checks}) Common Exam Questions
Q: What is the maximum RTO for a Multi-AZ RDS failover? A: 1-2 minutes for automatic failover
Q: Can you use the same security group across AZs? A: Yes, security groups are region-specific but can span AZs
Q: What happens during an ASG scale-in event? A: Instances are terminated based on termination policies (oldest launch config, oldest instance, etc.)
Key Takeaways
- Design for multiple AZs within a region
- Use Auto Scaling for elasticity
- Implement health checks properly
- Define RTO/RPO for each application tier
- Use appropriate load balancing strategy
- Regular backup and testing of recovery procedures