AWS Solutions Architect Associate: Domain 1 - Design Resilient Architectures

Complete guide to AWS Domain 1: designing resilient architectures for high availability and fault tolerance. Covers multi-AZ deployments, Auto Scaling, and architectural patterns.

Posted Nov 5, 2025

4 min read

Introduction

Domain 1 of the AWS Solutions Architect Associate certification focuses on designing resilient architectures that can withstand failures and maintain availability. This domain accounts for approximately 30% of the exam.

Multi-AZ Deployment Strategy

Understanding Availability Zones

AWS regions are divided into multiple Availability Zones (AZs). Each AZ:

Is a separate data center
Has independent power and cooling
Connected via redundant network links
Isolated from failures in other AZs

Multi-AZ RDS Deployment

  
DBInstance: Engine: postgres MultiAZ: true BackupRetentionPeriod: 30 PreferredBackupWindow: "03:00-04:00" PreferredMaintenanceWindow: "sun:04:00-sun:05:00" Failover: Automatic: true Duration: 1-2 minutes 

Multi-AZ Application Architecture

Region (e.g., us-east-1) ├── AZ-1 (us-east-1a) │ ├── EC2 instance (app server) │ └── RDS read replica ├── AZ-2 (us-east-1b) │ ├── EC2 instance (app server) │ └── RDS read replica └── AZ-3 (us-east-1c) ├── EC2 instance (app server) └── RDS primary with Multi-AZ 

Elastic Load Balancing

Application Load Balancer (ALB)

Best for HTTP/HTTPS traffic with advanced routing.

  
LoadBalancer: Type: application Scheme: internet-facing Listeners: - Port: 80 Protocol: HTTP DefaultAction: redirect-to-https - Port: 443 Protocol: HTTPS DefaultAction: forward-to-target-group TargetGroup: Port: 8080 Protocol: HTTP HealthCheck: Path: /health Interval: 30 SuccessCount: 2 UnhealthyThreshold: 3 

Network Load Balancer (NLB)

Extreme performance for millions of requests per second.

  
aws elbv2 create-load-balancer \ --name my-nlb \ --type network \ --scheme internet-facing \ --subnets subnet-1 subnet-2 

Classic Load Balancer (ELB)

Legacy load balancer (not recommended for new applications).

Auto Scaling

Launch Templates and Configurations

  
LaunchTemplate: ImageId: ami-12345 InstanceType: t3.medium SecurityGroupIds: - sg-12345 TagSpecifications: - ResourceType: instance Tags: - Key: Name Value: app-server 

Auto Scaling Group Configuration

  
aws autoscaling create-auto-scaling-group \ --auto-scaling-group-name app-asg \ --launch-template LaunchTemplateName=app-lt \ --min-size 2 \ --max-size 10 \ --desired-capacity 4 \ --vpc-zone-identifier "subnet-1,subnet-2,subnet-3" 

Scaling Policies

Target Tracking Scaling

  
{ "TargetTrackingConfiguration": { "TargetValue": 70.0, "PredefinedMetricSpecification": { "PredefinedMetricType": "ASGAverageCPUUtilization" }, "ScaleOutCooldown": 300, "ScaleInCooldown": 300 } } 

Step Scaling

  
aws autoscaling put-scaling-policy \ --auto-scaling-group-name app-asg \ --policy-name scale-up \ --policy-type StepScaling \ --metric-aggregation-type Average 

RTO and RPO

Definitions

RTO (Recovery Time Objective): Maximum acceptable downtime
RPO (Recovery Point Objective): Maximum acceptable data loss

  
Tier 1 - Critical: RTO: 1 hour RPO: 15 minutes Strategy: Multi-region active-active Tier 2 - Important: RTO: 4 hours RPO: 1 hour Strategy: Multi-region active-passive Tier 3 - Standard: RTO: 24 hours RPO: 24 hours Strategy: Single region with backups 

Backup and Recovery Strategies

AWS Backup

  
# Create backup plan aws backup create-backup-plan \ --backup-plan file://backup-plan.json # Assign resources aws backup create-backup-selection \ --backup-plan-id plan-123 \ --backup-selection file://backup-selection.json 

Cross-Region Backup

  
# Enable cross-region replication for RDS aws rds modify-db-instance \ --db-instance-identifier prod-db \ --enable-cloudwatch-logs-exports error,general 

Fault-Tolerant Architecture Pattern

┌─────────────────────────────────────────┐ │ Route 53 (DNS) │ │ Geolocation Routing Policy │ └──────────────┬──────────────────────────┘ │ ┌──────────┴──────────┐ │ │ ┌───▼────────┐ ┌──────▼────┐ │ Region 1 │ │ Region 2 │ │ (Active) │ │ (Standby) │ │ us-east-1 │ │ us-west-2 │ │ │ │ │ │ ┌────────┐ │ │ ┌────────┐ │ │ │ ALB │ │ │ │ ALB │ │ │ └───┬────┘ │ │ └───┬────┘ │ │ │ │ │ │ │ │ ┌───▼───┐ │ │ ┌───▼───┐ │ │ │ ASG │ │ │ │ ASG │ │ │ │(2-10) │ │ │ │(0-5) │ │ │ └───────┘ │ │ └───────┘ │ │ │ │ │ │ ┌────────┐ │ │ ┌────────┐ │ │ │Multi-AZ│ │ │ │Multi-AZ│ │ │ │RDS │ │ │ │RDS │ │ │ │(Primary) │ │(Replica) │ │ └────────┘ │ │ └────────┘ │ └────────────┘ └────────────┘ 

Health Checks

ELB Health Check Configuration

  
HealthCheck: Target: HTTP:8080/health Interval: 30 seconds Timeout: 5 seconds HealthyThreshold: 2 UnhealthyThreshold: 3 

Custom Health Checks

  
from flask import Flask, jsonify import psutil app = Flask(__name__) @app.route('/health', methods=['GET']) def health_check(): checks = { 'database': check_database(), 'disk_space': check_disk_space(), 'memory': check_memory() } status = 'healthy' if all(checks.values()) else 'unhealthy' return jsonify({'status': status, 'checks': checks}) 

Common Exam Questions

Q: What is the maximum RTO for a Multi-AZ RDS failover? A: 1-2 minutes for automatic failover

Q: Can you use the same security group across AZs? A: Yes, security groups are region-specific but can span AZs

Q: What happens during an ASG scale-in event? A: Instances are terminated based on termination policies (oldest launch config, oldest instance, etc.)

Key Takeaways

Design for multiple AZs within a region
Use Auto Scaling for elasticity
Implement health checks properly
Define RTO/RPO for each application tier
Use appropriate load balancing strategy
Regular backup and testing of recovery procedures

Resources

AWS, Certification, Architecture

This post is licensed under CC BY 4.0 by the author.