¥
立即购买

云平台部署脚本生成器

25 浏览
1 试用
0 购买
Nov 3, 2025更新

本提示词专为DevOps工程师设计,能够根据指定的云平台和部署需求,生成专业、准确的基础部署脚本。它涵盖了基础设施即代码、持续集成和持续交付等DevOps核心实践,确保脚本具备清晰的结构、精确的技术实现和最佳安全实践。通过系统化的分析流程,能够为AWS、Azure、GCP等主流云平台提供定制化的自动化部署解决方案,显著提升部署效率和质量。

部署脚本概述

本方案在 AWS 上部署一套生产级的无服务器容器集群基线,采用 ECS on Fargate + ALB 的参考架构,具备以下特性:

  • VPC 三层网络:公有/私有子网,NAT 出网(生产可用)
  • ECS 集群(Fargate 容器):私有子网运行,最小权限 IAM,CloudWatch 日志
  • 应用负载均衡 ALB(公网):HTTPS 监听,HTTP 自动重定向 HTTPS
  • 自动伸缩:基于平均 CPU 的目标追踪伸缩(可配置)
  • 安全加固:安全组最小暴露,仅开放 ALB 80/443;ECS 仅允许来自 ALB 的容器端口
  • 基础观测:ECS/ALB 健康检查,CloudWatch 日志;部署完成自动验证稳定性和 HTTP 连通性
  • IaC 与状态管理:Terraform(S3 远程状态 + DynamoDB 锁)
  • 无硬编码凭证:通过 AWS CLI/环境变量加载,Secrets Manager 支持(可选)

适用场景:生产环境的容器化服务对外发布,作为后续微服务扩展与流水线集成的基线模板。

前置依赖

  • AWS 账号与权限
    • 能创建:VPC/ALB/ECS/Fargate/IAM/CloudWatch/S3/DynamoDB/ACM
  • 工具
    • Terraform ≥ 1.5
    • AWS CLI ≥ 2.9
    • jq ≥ 1.6
    • bash(Linux/macOS)
  • 证书
    • 已在同一区域准备好 ACM 证书 ARN(域名归属验证通过)
  • 容器镜像
    • 已存在可拉取的镜像(ECR 或公共镜像),形如 123456789012.dkr.ecr.eu-west-1.amazonaws.com/app:prod 或 public.ecr.aws/nginx:latest

核心脚本代码

以下为完整可执行的部署脚本与 Terraform 配置。将文件保存到同一目录后执行 deploy.sh 即可完成部署与验证。

  1. deploy.sh(部署与验证,带错误处理与远程状态初始化)
#!/usr/bin/env bash
set -euo pipefail

# -------- Configurable Environment --------
: "${AWS_REGION:=eu-west-1}"       # 设置默认区域
: "${PROJECT_NAME:=myapp}"         # 项目名
: "${ENVIRONMENT:=production}"     # 环境名(已设为 production)
: "${TF_STATE_BUCKET:=}"           # 可留空,由脚本自动按规范生成
: "${TF_STATE_DDB_TABLE:=}"        # 可留空,由脚本自动生成
: "${DOMAIN_CERT_ARN:=}"           # 必填:ACM 证书 ARN
: "${CONTAINER_IMAGE:=}"           # 必填:容器镜像
: "${ALLOWED_CIDRS:=0.0.0.0/0}"    # 可限制来源 IP CIDR,多个用逗号分隔

# -------- Helpers --------
log() { printf "[%s] %s\n" "$(date '+%F %T')" "$*"; }
err() { printf "[%s] ERROR: %s\n" "$(date '+%F %T')" "$*" >&2; }
trap 'err "部署失败。请查看上述日志。"' ERR

# -------- Preflight --------
command -v aws >/dev/null || { err "缺少 aws CLI"; exit 1; }
command -v terraform >/dev/null || { err "缺少 terraform"; exit 1; }
command -v jq >/dev/null || { err "缺少 jq"; exit 1; }

[ -n "${DOMAIN_CERT_ARN}" ] || { err "必须设置 DOMAIN_CERT_ARN(ALB HTTPS 证书 ARN)"; exit 1; }
[ -n "${CONTAINER_IMAGE}" ] || { err "必须设置 CONTAINER_IMAGE(容器镜像)"; exit 1; }

aws sts get-caller-identity >/dev/null
AWS_ACCOUNT_ID="$(aws sts get-caller-identity --query Account --output text)"
REGION="${AWS_REGION}"

# -------- Remote state (S3 + DynamoDB) --------
if [ -z "${TF_STATE_BUCKET}" ]; then
  TF_STATE_BUCKET="${PROJECT_NAME}-${ENVIRONMENT}-tfstate-${AWS_ACCOUNT_ID}-${REGION}"
fi
if [ -z "${TF_STATE_DDB_TABLE}" ]; then
  TF_STATE_DDB_TABLE="${PROJECT_NAME}-${ENVIRONMENT}-tf-lock"
fi

log "准备远程状态:S3=${TF_STATE_BUCKET}, DDB=${TF_STATE_DDB_TABLE}"
if ! aws s3api head-bucket --bucket "${TF_STATE_BUCKET}" 2>/dev/null; then
  log "创建 S3 bucket 用于 Terraform 状态"
  aws s3api create-bucket \
    --bucket "${TF_STATE_BUCKET}" \
    --region "${REGION}" \
    --create-bucket-configuration LocationConstraint="${REGION}"
  aws s3api put-bucket-versioning --bucket "${TF_STATE_BUCKET}" --versioning-configuration Status=Enabled
  aws s3api put-bucket-encryption --bucket "${TF_STATE_BUCKET}" --server-side-encryption-configuration '{"Rules":[{"ApplyServerSideEncryptionByDefault":{"SSEAlgorithm":"AES256"}}]}'
  aws s3api put-public-access-block --bucket "${TF_STATE_BUCKET}" --public-access-block-configuration '{
    "BlockPublicAcls": true, "IgnorePublicAcls": true, "BlockPublicPolicy": true, "RestrictPublicBuckets": true
  }'
fi

if ! aws dynamodb describe-table --table-name "${TF_STATE_DDB_TABLE}" >/dev/null 2>&1; then
  log "创建 DynamoDB 表用于 Terraform 锁"
  aws dynamodb create-table \
    --table-name "${TF_STATE_DDB_TABLE}" \
    --attribute-definitions AttributeName=LockID,AttributeType=S \
    --key-schema AttributeName=LockID,KeyType=HASH \
    --billing-mode PAY_PER_REQUEST
  aws dynamodb wait table-exists --table-name "${TF_STATE_DDB_TABLE}"
fi

# -------- Terraform Init/Plan/Apply --------
log "Terraform 初始化"
terraform init \
  -backend-config="bucket=${TF_STATE_BUCKET}" \
  -backend-config="key=${PROJECT_NAME}/${ENVIRONMENT}.tfstate" \
  -backend-config="region=${REGION}" \
  -backend-config="dynamodb_table=${TF_STATE_DDB_TABLE}"

log "Terraform 格式化与校验"
terraform fmt -recursive
terraform validate

log "Terraform 规划"
terraform plan \
  -var="region=${REGION}" \
  -var="project_name=${PROJECT_NAME}" \
  -var="environment=${ENVIRONMENT}" \
  -var="certificate_arn=${DOMAIN_CERT_ARN}" \
  -var="container_image=${CONTAINER_IMAGE}" \
  -var="allowed_source_cidrs=${ALLOWED_CIDRS}" \
  -out=tfplan

log "Terraform 部署"
terraform apply -auto-approve tfplan

log "读取输出"
OUT_JSON="$(terraform output -json)"
ALB_DNS=$(echo "$OUT_JSON" | jq -r '.alb_dns_name.value')
CLUSTER=$(echo "$OUT_JSON" | jq -r '.ecs_cluster_name.value')
SERVICE=$(echo "$OUT_JSON" | jq -r '.ecs_service_name.value')
TG_ARN=$(echo "$OUT_JSON" | jq -r '.target_group_arn.value')

log "等待 ECS 服务稳定"
aws ecs wait services-stable --cluster "$CLUSTER" --services "$SERVICE"

log "等待 ALB Target 组健康"
for i in {1..30}; do
  UNHEALTHY=$(aws elbv2 describe-target-health --target-group-arn "$TG_ARN" \
    --query "TargetHealthDescriptions[?TargetHealth.State!='healthy'] | length(@)" --output text)
  if [ "$UNHEALTHY" = "0" ]; then
    log "所有 Target 已健康"
    break
  fi
  log "Target 未全部健康,重试 $i/30 ..."
  sleep 10
done

log "基本连通性测试(HTTPS)"
set +e
HTTP_CODE=$(curl -k -s -o /dev/null -w "%{http_code}" "https://${ALB_DNS}/")
set -e
log "ALB: https://${ALB_DNS} 返回状态码: ${HTTP_CODE}"

log "部署完成"
echo "ECS Cluster: ${CLUSTER}"
echo "ECS Service: ${SERVICE}"
echo "ALB DNS: https://${ALB_DNS}"
  1. versions.tf
terraform {
  required_version = ">= 1.5.0"
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.57"
    }
  }
}

provider "aws" {
  region = var.region
}
  1. backend.tf(占位,实际参数由 deploy.sh 注入)
terraform {
  backend "s3" {}
}
  1. variables.tf
variable "region" {
  type        = string
  description = "AWS 区域"
}

variable "project_name" {
  type        = string
  description = "项目名称"
}

variable "environment" {
  type        = string
  description = "环境标识"
  default     = "production"
}

variable "vpc_cidr" {
  type        = string
  default     = "10.20.0.0/16"
  description = "VPC CIDR"
}

variable "public_subnets" {
  type        = list(string)
  description = "公有子网 CIDRs(2个)"
  default     = ["10.20.0.0/24", "10.20.1.0/24"]
}

variable "private_subnets" {
  type        = list(string)
  description = "私有子网 CIDRs(2个)"
  default     = ["10.20.10.0/24", "10.20.11.0/24"]
}

variable "container_image" {
  type        = string
  description = "容器镜像 URI(ECR 或公共仓库),必须存在可拉取的 tag"
}

variable "container_port" {
  type        = number
  default     = 8080
  description = "应用容器端口"
}

variable "certificate_arn" {
  type        = string
  description = "ALB HTTPS 监听所用的 ACM 证书 ARN"
}

variable "cpu" {
  type        = number
  default     = 512
  description = "Fargate Task CPU(256/512/1024/2048/4096)"
}

variable "memory" {
  type        = number
  default     = 1024
  description = "Fargate Task 内存(匹配 CPU 要求)"
}

variable "desired_count" {
  type        = number
  default     = 2
  description = "ECS 服务期望副本数(生产建议>=2)"
}

variable "min_capacity" {
  type        = number
  default     = 2
  description = "自动伸缩最小副本数"
}

variable "max_capacity" {
  type        = number
  default     = 10
  description = "自动伸缩最大副本数"
}

variable "health_check_path" {
  type        = string
  default     = "/health"
  description = "ALB 健康检查 HTTP Path"
}

variable "log_retention_days" {
  type        = number
  default     = 30
  description = "CloudWatch 日志保留天数"
}

variable "allowed_source_cidrs" {
  type        = string
  default     = "0.0.0.0/0"
  description = "允许访问 ALB 的来源 CIDR(逗号分隔)"
}

variable "task_secrets" {
  description = "注入容器的 Secret 列表(来自 Secrets Manager 或 SSM Parameter Store)"
  type = list(object({
    name      = string
    valueFrom = string
  }))
  default = []
}

variable "tags" {
  type        = map(string)
  default     = {}
  description = "额外资源标签"
}
  1. main.tf
locals {
  name_prefix = "${var.project_name}-${var.environment}"
  tags = merge({
    Project     = var.project_name
    Environment = var.environment
    ManagedBy   = "Terraform"
  }, var.tags)
}

data "aws_caller_identity" "current" {}
data "aws_region" "current" {}

# ---------------- VPC & Networking ----------------
resource "aws_vpc" "main" {
  cidr_block           = var.vpc_cidr
  enable_dns_hostnames = true
  enable_dns_support   = true
  tags = merge(local.tags, { Name = "${local.name_prefix}-vpc" })
}

resource "aws_internet_gateway" "igw" {
  vpc_id = aws_vpc.main.id
  tags   = merge(local.tags, { Name = "${local.name_prefix}-igw" })
}

# 两个可用区
data "aws_availability_zones" "available" {
  state = "available"
}

resource "aws_subnet" "public" {
  for_each = {
    az1 = { cidr = var.public_subnets[0], az = data.aws_availability_zones.available.names[0] }
    az2 = { cidr = var.public_subnets[1], az = data.aws_availability_zones.available.names[1] }
  }
  vpc_id                  = aws_vpc.main.id
  cidr_block              = each.value.cidr
  availability_zone       = each.value.az
  map_public_ip_on_launch = true
  tags = merge(local.tags, { Name = "${local.name_prefix}-public-${each.key}", Tier = "public" })
}

resource "aws_subnet" "private" {
  for_each = {
    az1 = { cidr = var.private_subnets[0], az = data.aws_availability_zones.available.names[0] }
    az2 = { cidr = var.private_subnets[1], az = data.aws_availability_zones.available.names[1] }
  }
  vpc_id            = aws_vpc.main.id
  cidr_block        = each.value.cidr
  availability_zone = each.value.az
  tags = merge(local.tags, { Name = "${local.name_prefix}-private-${each.key}", Tier = "private" })
}

resource "aws_route_table" "public" {
  vpc_id = aws_vpc.main.id
  tags   = merge(local.tags, { Name = "${local.name_prefix}-public-rt" })
}

resource "aws_route" "public_igw" {
  route_table_id         = aws_route_table.public.id
  destination_cidr_block = "0.0.0.0/0"
  gateway_id             = aws_internet_gateway.igw.id
}

resource "aws_route_table_association" "public_assoc" {
  for_each       = aws_subnet.public
  subnet_id      = each.value.id
  route_table_id = aws_route_table.public.id
}

# 单 NAT(成本友好,可按需扩展为多 NAT)
resource "aws_eip" "nat" {
  domain = "vpc"
  tags   = merge(local.tags, { Name = "${local.name_prefix}-nat-eip" })
}

resource "aws_nat_gateway" "nat" {
  allocation_id = aws_eip.nat.id
  subnet_id     = aws_subnet.public["az1"].id
  tags          = merge(local.tags, { Name = "${local.name_prefix}-nat" })
  depends_on    = [aws_internet_gateway.igw]
}

resource "aws_route_table" "private" {
  vpc_id = aws_vpc.main.id
  tags   = merge(local.tags, { Name = "${local.name_prefix}-private-rt" })
}

resource "aws_route" "private_nat" {
  route_table_id         = aws_route_table.private.id
  destination_cidr_block = "0.0.0.0/0"
  nat_gateway_id         = aws_nat_gateway.nat.id
}

resource "aws_route_table_association" "private_assoc" {
  for_each       = aws_subnet.private
  subnet_id      = each.value.id
  route_table_id = aws_route_table.private.id
}

# ---------------- Security Groups ----------------
resource "aws_security_group" "alb" {
  name        = "${local.name_prefix}-alb-sg"
  description = "Allow HTTP/HTTPS from allowed sources"
  vpc_id      = aws_vpc.main.id

  dynamic "ingress" {
    for_each = split(",", replace(var.allowed_source_cidrs, " ", ""))
    content {
      description = "HTTP from ${ingress.value}"
      from_port   = 80
      to_port     = 80
      protocol    = "tcp"
      cidr_blocks = [ingress.value]
    }
  }
  dynamic "ingress" {
    for_each = split(",", replace(var.allowed_source_cidrs, " ", ""))
    content {
      description = "HTTPS from ${ingress.value}"
      from_port   = 443
      to_port     = 443
      protocol    = "tcp"
      cidr_blocks = [ingress.value]
    }
  }

  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }

  tags = merge(local.tags, { Name = "${local.name_prefix}-alb-sg" })
}

resource "aws_security_group" "ecs_tasks" {
  name        = "${local.name_prefix}-ecs-sg"
  description = "Allow ALB to reach ECS tasks"
  vpc_id      = aws_vpc.main.id

  ingress {
    description     = "App traffic from ALB"
    from_port       = var.container_port
    to_port         = var.container_port
    protocol        = "tcp"
    security_groups = [aws_security_group.alb.id]
  }

  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }

  tags = merge(local.tags, { Name = "${local.name_prefix}-ecs-sg" })
}

# ---------------- ALB ----------------
resource "aws_lb" "app" {
  name               = substr("${local.name_prefix}-alb", 0, 32)
  load_balancer_type = "application"
  internal           = false
  security_groups    = [aws_security_group.alb.id]
  subnets            = [for s in aws_subnet.public : s.id]
  tags               = merge(local.tags, { Name = "${local.name_prefix}-alb" })
}

resource "aws_lb_target_group" "app" {
  name        = substr("${local.name_prefix}-tg", 0, 32)
  port        = var.container_port
  protocol    = "HTTP"
  vpc_id      = aws_vpc.main.id
  target_type = "ip"

  health_check {
    enabled             = true
    path                = var.health_check_path
    matcher             = "200-399"
    interval            = 30
    healthy_threshold   = 3
    unhealthy_threshold = 3
    timeout             = 5
  }

  tags = merge(local.tags, { Name = "${local.name_prefix}-tg" })
}

resource "aws_lb_listener" "http" {
  load_balancer_arn = aws_lb.app.arn
  port              = 80
  protocol          = "HTTP"

  default_action {
    type = "redirect"
    redirect {
      port        = "443"
      protocol    = "HTTPS"
      status_code = "HTTP_301"
    }
  }
}

resource "aws_lb_listener" "https" {
  load_balancer_arn = aws_lb.app.arn
  port              = 443
  protocol          = "HTTPS"
  ssl_policy        = "ELBSecurityPolicy-2016-08"
  certificate_arn   = var.certificate_arn

  default_action {
    type             = "forward"
    target_group_arn = aws_lb_target_group.app.arn
  }
}

# ---------------- IAM ----------------
data "aws_iam_policy_document" "task_execution_trust" {
  statement {
    effect = "Allow"
    principals {
      type        = "Service"
      identifiers = ["ecs-tasks.amazonaws.com"]
    }
    actions = ["sts:AssumeRole"]
  }
}

resource "aws_iam_role" "task_execution" {
  name               = "${local.name_prefix}-task-exec"
  assume_role_policy = data.aws_iam_policy_document.task_execution_trust.json
  tags               = local.tags
}

resource "aws_iam_role_policy_attachment" "task_execution_attach" {
  role       = aws_iam_role.task_execution.name
  policy_arn = "arn:aws:iam::aws:policy/service-role/AmazonECSTaskExecutionRolePolicy"
}

resource "aws_iam_role" "task" {
  name               = "${local.name_prefix}-task"
  assume_role_policy = data.aws_iam_policy_document.task_execution_trust.json
  tags               = local.tags
}

# 仅当配置了 task_secrets 时授予读取权限
data "aws_iam_policy_document" "task_inline" {
  dynamic "statement" {
    for_each = length(var.task_secrets) > 0 ? [1] : []
    content {
      effect = "Allow"
      actions = [
        "secretsmanager:GetSecretValue",
        "ssm:GetParameter",
        "ssm:GetParameters"
      ]
      resources = ["*"]
    }
  }
}

resource "aws_iam_role_policy" "task_inline" {
  count  = length(var.task_secrets) > 0 ? 1 : 0
  name   = "${local.name_prefix}-task-inline"
  role   = aws_iam_role.task.id
  policy = data.aws_iam_policy_document.task_inline.json
}

# ---------------- Logs ----------------
resource "aws_cloudwatch_log_group" "app" {
  name              = "/ecs/${local.name_prefix}"
  retention_in_days = var.log_retention_days
  tags              = local.tags
}

# ---------------- ECS Cluster / Task / Service ----------------
resource "aws_ecs_cluster" "this" {
  name = "${local.name_prefix}-cluster"
  configuration {
    execute_command_configuration {
      logging = "DEFAULT"
    }
  }
  tags = local.tags
}

resource "aws_ecs_cluster_capacity_providers" "cp" {
  cluster_name       = aws_ecs_cluster.this.name
  capacity_providers = ["FARGATE", "FARGATE_SPOT"]
}

locals {
  container_def = [{
    name      = "${var.project_name}"
    image     = var.container_image
    essential = true
    portMappings = [{
      containerPort = var.container_port
      hostPort      = var.container_port
      protocol      = "tcp"
    }]
    logConfiguration = {
      logDriver = "awslogs"
      options = {
        awslogs-group         = aws_cloudwatch_log_group.app.name
        awslogs-region        = var.region
        awslogs-stream-prefix = "ecs"
      }
    }
    environment = []
    secrets     = var.task_secrets
  }]
}

resource "aws_ecs_task_definition" "app" {
  family                   = "${local.name_prefix}-taskdef"
  network_mode             = "awsvpc"
  requires_compatibilities = ["FARGATE"]
  cpu                      = tostring(var.cpu)
  memory                   = tostring(var.memory)
  execution_role_arn       = aws_iam_role.task_execution.arn
  task_role_arn            = aws_iam_role.task.arn
  runtime_platform {
    operating_system_family = "LINUX"
    cpu_architecture        = "X86_64"
  }
  container_definitions = jsonencode(local.container_def)
  tags                  = local.tags
}

resource "aws_ecs_service" "app" {
  name            = "${local.name_prefix}-svc"
  cluster         = aws_ecs_cluster.this.id
  task_definition = aws_ecs_task_definition.app.arn
  desired_count   = var.desired_count
  enable_execute_command = true
  deployment_minimum_healthy_percent = 50
  deployment_maximum_percent         = 200

  network_configuration {
    subnets         = [for s in aws_subnet.private : s.id]
    security_groups = [aws_security_group.ecs_tasks.id]
    assign_public_ip = false
  }

  load_balancer {
    target_group_arn = aws_lb_target_group.app.arn
    container_name   = var.project_name
    container_port   = var.container_port
  }

  capacity_provider_strategy {
    capacity_provider = "FARGATE"
    weight            = 1
  }

  lifecycle {
    ignore_changes = [desired_count] # 方便外部伸缩修改
  }

  depends_on = [aws_lb_listener.https]
  tags       = local.tags
}

# ---------------- Application Auto Scaling ----------------
resource "aws_appautoscaling_target" "ecs" {
  max_capacity       = var.max_capacity
  min_capacity       = var.min_capacity
  resource_id        = "service/${aws_ecs_cluster.this.name}/${aws_ecs_service.app.name}"
  scalable_dimension = "ecs:service:DesiredCount"
  service_namespace  = "ecs"
}

resource "aws_appautoscaling_policy" "cpu_target" {
  name               = "${local.name_prefix}-cpu-scaling"
  policy_type        = "TargetTrackingScaling"
  resource_id        = aws_appautoscaling_target.ecs.resource_id
  scalable_dimension = aws_appautoscaling_target.ecs.scalable_dimension
  service_namespace  = aws_appautoscaling_target.ecs.service_namespace

  target_tracking_scaling_policy_configuration {
    predefined_metric_specification {
      predefined_metric_type = "ECSServiceAverageCPUUtilization"
    }
    target_value       = 60
    scale_in_cooldown  = 60
    scale_out_cooldown = 60
  }
}

# ---------------- Outputs ----------------
  1. outputs.tf
output "vpc_id" {
  value = aws_vpc.main.id
}

output "alb_dns_name" {
  value = aws_lb.app.dns_name
}

output "target_group_arn" {
  value = aws_lb_target_group.app.arn
}

output "ecs_cluster_name" {
  value = aws_ecs_cluster.this.name
}

output "ecs_service_name" {
  value = aws_ecs_service.app.name
}

配置说明

  • region:AWS 区域,如 eu-west-1
  • project_name:项目名称,用于资源命名与标签
  • environment:环境标识,默认为 production,用于资源隔离
  • vpc_cidr/public_subnets/private_subnets:网络划分,生产建议至少两个 AZ
  • container_image:容器镜像,需已存在(ECR 或公共镜像)
  • container_port:容器内部监听端口,与 ALB Target 组对应
  • certificate_arn:ACM 证书 ARN,要求与 ALB 同区域
  • desired_count/min_capacity/max_capacity:服务副本数与自动伸缩边界
  • health_check_path:ALB 健康检查路径(应用需返回 2xx/3xx)
  • log_retention_days:日志保留天数
  • allowed_source_cidrs:允许访问 ALB 的来源 IP 段,生产建议限制公司办公出口或 WAF
  • task_secrets:注入容器的机密配置,形如
    • [{ name="DB_PASSWORD", valueFrom="arn:aws:secretsmanager:...:secret:db-pass-xxxxx" }]

安全要点:

  • ECS 任务安全组仅开放容器端口给 ALB 安全组
  • ALB 仅开放 80/443,80 将 301 重定向到 443
  • Terraform 远程状态启用 S3 版本控制与加密、DynamoDB 锁防并发
  • 无硬编码凭证,凭证通过 AWS CLI/环境变量注入

部署步骤

  1. 保存文件
  • 将上述 deploy.sh、*.tf 放到同一目录,确保 deploy.sh 可执行:chmod +x deploy.sh
  1. 设置环境变量
  • 必填:
    • export DOMAIN_CERT_ARN="arn:aws:acm:eu-west-1:123456789012:certificate/xxxx-xxxx"
    • export CONTAINER_IMAGE="123456789012.dkr.ecr.eu-west-1.amazonaws.com/app:prod"
  • 可选:
    • export AWS_REGION="eu-west-1"
    • export PROJECT_NAME="myapp"
    • export ALLOWED_CIDRS="203.0.113.0/24" 仅允许公司出口
    • export TF_STATE_BUCKET / TF_STATE_DDB_TABLE 自定义远程状态资源名
  1. 执行部署
  • ./deploy.sh
  1. 首次部署后
  • 如使用自建镜像仓库,请确保镜像可被拉取;如需改端口/健康检查路径,修改 variables.tf 或在 plan 时注入 -var
  1. 后续更新
  • 修改镜像 tag 或参数后,重复执行 ./deploy.sh,会进行滚动升级

验证方法

  • 自动验证:
    • deploy.sh 会等待 ECS 服务稳定,Target 组健康,然后尝试访问 https://<ALB_DNS> 并输出状态码
  • 手动检查:
    • ALB 连通性:curl -I https://<ALB_DNS>
    • 目标健康:aws elbv2 describe-target-health --target-group-arn <TG_ARN>
    • 服务状态:aws ecs describe-services --cluster --services
    • 日志查看:在 CloudWatch Logs 中查看 /ecs/ 日志组
    • 指标观测:查看 CloudWatch 指标 ECSServiceAverageCPUUtilization/ALB 5xx 等
  • 故障排查:
    • 403/5xx:检查应用容器日志(CloudWatch)、安全组放通、健康检查路径是否正确
    • Image pull 失败:确认镜像存在、ECR 权限与执行角色策略(AmazonECSTaskExecutionRolePolicy)有效
    • 证书错误:确认证书 ARN 正确、域名已验证、区域一致

备注与建议:

  • NAT 网关:当前为单 NAT,生产高可用可改为每个 AZ 一个 NAT 网关
  • WAF:生产建议为 ALB 绑定 AWS WAF 以加强 L7 防护
  • 机密管理:优先使用 Secrets Manager/SSM,并通过 task_secrets 注入
  • 访问限制:生产环境建议收紧 allowed_source_cidrs 或前置 API Gateway/PrivateLink 方案视业务而定

部署脚本概述

该方案使用 Azure CLI + Bicep 以基础设施即代码方式在 Azure 上创建一个可用于“staging”环境的 Web Application(App Service on Linux),并包含以下特性:

  • 基础架构自动化:资源组、App Service Plan、Web App、staging 部署槽位、Application Insights、Log Analytics、Key Vault
  • 安全加固:HTTPS-only、TLS 1.2、FTPS-only、禁用明文部署凭据、系统分配托管标识 + Key Vault RBAC
  • 可观测性:应用/HTTP/控制台/平台日志与指标接入 Log Analytics,AI 连接
  • 健康检查:配置统一 healthCheckPath,提供部署后健康验证
  • 部署流程:支持将应用包(zip)发布到 staging 槽位,并提供可选的手动槽位切换指引
  • 完整的错误处理与日志记录:Bash 脚本内置严格的错误处理、重试与结构化日志输出

适用场景:需要在 Azure 上为 web_application 构建标准化的 staging 环境,并实现规范、可重复、安全的自动化部署流程。

前置依赖

  • 操作系统:Linux 或 macOS(Windows 可通过 WSL)
  • 必备工具:
    • Azure CLI 2.50.0 及以上(内置 Bicep 支持)
    • curl、zip、unzip
    • 可选:jq(用于更友好的 JSON 处理,脚本不强依赖)
  • Azure 权限:
    • 订阅层级:对目标订阅有足够权限(建议使用 Contributor + User Access Administrator)
    • 能在资源组内创建/修改 Microsoft.Web、Microsoft.Insights、Microsoft.OperationalInsights、Microsoft.KeyVault 等资源
  • 身份登录:
    • 交互登录:az login
    • 或服务主体非交互登录:导出 AZURE_CLIENT_ID、AZURE_CLIENT_SECRET、AZURE_TENANT_ID 环境变量后由脚本处理

核心脚本代码

以下为单文件可执行脚本,包含:

  • 参数解析与校验
  • 生成并部署 Bicep 模板(创建资源)
  • 可选:将应用 zip 包部署到 staging 槽位
  • 健康检查与验证 将内容保存为 deploy_azure_staging.sh 并赋予执行权限:chmod +x deploy_azure_staging.sh
#!/usr/bin/env bash
# Azure Web App (Linux) Staging 环境一键部署脚本
# - 创建 RG、App Service Plan、Web App、staging 槽位、AI、LA、KV
# - 配置安全策略、诊断日志、健康检查
# - 可选将 zip 包部署到 staging 槽位并进行健康验证

set -Eeuo pipefail

#####################################
# 日志与错误处理
#####################################
LOG_TS() { date -u +"%Y-%m-%dT%H:%M:%SZ"; }
log()   { echo "[$(LOG_TS)] [INFO]    $*"; }
warn()  { echo "[$(LOG_TS)] [WARN]    $*" >&2; }
error() { echo "[$(LOG_TS)] [ERROR]   $*" >&2; }
fatal() { echo "[$(LOG_TS)] [FATAL]   $*" >&2; exit 1; }

cleanup() {
  if [[ -n "${TMP_DIR:-}" && -d "${TMP_DIR:-}" ]]; then
    rm -rf "$TMP_DIR" || true
  fi
}
trap 'rc=$?; if [[ $rc -ne 0 ]]; then error "脚本异常退出 (exit=$rc)"; fi; cleanup' EXIT

retry() {
  # retry <times> <sleep> <cmd...>
  local -r times="$1"; shift
  local -r sleep_s="$1"; shift
  local n=0
  until "$@"; do
    n=$((n+1))
    if [[ $n -ge $times ]]; then
      return 1
    fi
    warn "命令失败,第 $n 次重试后等待 ${sleep_s}s: $*"
    sleep "$sleep_s"
  done
}

#####################################
# 参数与默认值
#####################################
SUBSCRIPTION_ID="${SUBSCRIPTION_ID:-}"     # 必填或使用当前订阅
LOCATION="${LOCATION:-eastasia}"           # Azure 区域
NAME_PREFIX="${NAME_PREFIX:-demoapp}"      # 资源名前缀(最终名称将追加 uniqueString 确保唯一)
RUNTIME_STACK="${RUNTIME_STACK:-node|18-lts}" # 运行时栈:示例 node|18-lts、node|20-lts、DOTNETCORE|8.0、python|3.11、JAVA|11-java11
HEALTHCHECK_PATH="${HEALTHCHECK_PATH:-/}"  # 健康检查路径:请确保应用实现该端点,或保持为 /
ARTIFACT_PATH="${ARTIFACT_PATH:-}"         # 可选:zip 包路径(提供则推送到 staging 槽位)
RESOURCE_GROUP="${RESOURCE_GROUP:-}"       # 可选:指定资源组名(不指定则自动使用前缀派生)
# 可选:非交互服务主体登录
# AZURE_CLIENT_ID, AZURE_CLIENT_SECRET, AZURE_TENANT_ID

usage() {
  cat <<EOF
用法:
  SUBSCRIPTION_ID=<id> LOCATION=<region> NAME_PREFIX=<prefix> \\
  RUNTIME_STACK="<stack>" HEALTHCHECK_PATH="<path>" ARTIFACT_PATH="<zip>" \\
  RESOURCE_GROUP="<rg>" ./deploy_azure_staging.sh

示例:
  SUBSCRIPTION_ID=00000000-0000-0000-0000-000000000000 \\
  LOCATION=eastasia NAME_PREFIX=myweb RUNTIME_STACK="node|20-lts" \\
  HEALTHCHECK_PATH="/healthz" \\
  ARTIFACT_PATH="build/app.zip" \\
  ./deploy_azure_staging.sh
EOF
}

#####################################
# 先决条件检查
#####################################
command -v az >/dev/null 2>&1 || fatal "未安装 Azure CLI"
command -v curl >/dev/null 2>&1 || fatal "未安装 curl"

AZ_VER="$(az version --query 'azure-cli' -o tsv 2>/dev/null || echo 0)"
log "检测到 Azure CLI 版本: ${AZ_VER}"

#####################################
# 登录与订阅
#####################################
if az account show >/dev/null 2>&1; then
  log "已检测到已登录的 Azure 会话"
else
  if [[ -n "${AZURE_CLIENT_ID:-}" && -n "${AZURE_CLIENT_SECRET:-}" && -n "${AZURE_TENANT_ID:-}" ]]; then
    log "使用服务主体执行非交互登录"
    az login --service-principal -u "$AZURE_CLIENT_ID" -p "$AZURE_CLIENT_SECRET" --tenant "$AZURE_TENANT_ID" >/dev/null
  else
    log "执行交互式登录(如需非交互,请设置 AZURE_* 环境变量)"
    az login >/dev/null
  fi
fi

if [[ -n "$SUBSCRIPTION_ID" ]]; then
  az account set --subscription "$SUBSCRIPTION_ID"
else
  SUBSCRIPTION_ID="$(az account show --query id -o tsv)"
fi
log "使用订阅: $SUBSCRIPTION_ID"

#####################################
# 资源组与名称
#####################################
if [[ -z "$RESOURCE_GROUP" ]]; then
  # 基于 prefix 派生 RG 名称(不包含特殊字符)
  SANITIZED_PREFIX="$(echo "$NAME_PREFIX" | tr '[:upper:]' '[:lower:]' | tr -cd 'a-z0-9-')"
  RESOURCE_GROUP="${SANITIZED_PREFIX}-rg"
fi
log "目标资源组: ${RESOURCE_GROUP},区域: ${LOCATION}"

if ! az group show -n "$RESOURCE_GROUP" >/dev/null 2>&1; then
  log "创建资源组: ${RESOURCE_GROUP}"
  az group create -n "$RESOURCE_GROUP" -l "$LOCATION" >/dev/null
else
  log "资源组已存在,跳过创建"
fi

#####################################
# 生成临时目录与 Bicep 模板
#####################################
TMP_DIR="$(mktemp -d)"
BICEP_FILE="${TMP_DIR}/main.bicep"

cat > "$BICEP_FILE" <<'BICEP'
targetScope = 'resourceGroup'

@description('资源名前缀(会与 uniqueString 组合确保全局唯一)')
param namePrefix string

@description('部署位置')
param location string = resourceGroup().location

@description('Linux WebApp 运行时栈,如 node|18-lts, node|20-lts, DOTNETCORE|8.0, python|3.11, JAVA|11-java11')
param linuxFxVersion string = 'node|18-lts'

@description('应用健康检查路径')
param healthCheckPath string = '/'

var baseName = toLower(replace('${namePrefix}-${uniqueString(resourceGroup().id)}','_','-'))

// Log Analytics Workspace
resource law 'Microsoft.OperationalInsights/workspaces@2022-10-01' = {
  name: '${baseName}-law'
  location: location
  sku: {
    name: 'PerGB2018'
  }
  retentionInDays: 30
}

// Application Insights(联接到 LA)
resource appi 'Microsoft.Insights/components@2020-02-02' = {
  name: '${baseName}-appi'
  location: location
  kind: 'web'
  properties: {
    Application_Type: 'web'
    WorkspaceResourceId: law.id
  }
}

// App Service Plan (Linux)
resource asp 'Microsoft.Web/serverfarms@2023-12-01' = {
  name: '${baseName}-asp'
  location: location
  sku: {
    tier: 'PremiumV3'
    name: 'P1v3'
    size: 'P1v3'
    capacity: 1
  }
  properties: {
    reserved: true // Linux
  }
}

// Web App(生产槽位)
resource site 'Microsoft.Web/sites@2023-12-01' = {
  name: '${baseName}-web'
  location: location
  kind: 'app,linux'
  identity: {
    type: 'SystemAssigned'
  }
  properties: {
    serverFarmId: asp.id
    httpsOnly: true
    siteConfig: {
      ftpsState: 'FtpsOnly'
      minTlsVersion: '1.2'
      http20Enabled: true
      alwaysOn: true
      linuxFxVersion: linuxFxVersion
      healthCheckPath: healthCheckPath
    }
  }
}

// Web App Staging 槽位
resource slot 'Microsoft.Web/sites/slots@2023-12-01' = {
  name: '${site.name}/staging'
  location: location
  kind: 'app,linux'
  identity: {
    type: 'SystemAssigned'
  }
  properties: {
    serverFarmId: asp.id
    httpsOnly: true
    siteConfig: {
      ftpsState: 'FtpsOnly'
      minTlsVersion: '1.2'
      http20Enabled: true
      alwaysOn: true
      linuxFxVersion: linuxFxVersion
      healthCheckPath: healthCheckPath
    }
  }
}

// 应用程序设置(生产)
resource siteAppSettings 'Microsoft.Web/sites/config@2023-12-01' = {
  name: '${site.name}/appsettings'
  properties: {
    'WEBSITE_RUN_FROM_PACKAGE': '0'
    'APPINSIGHTS_INSTRUMENTATIONKEY': appi.properties.InstrumentationKey
    'APPLICATIONINSIGHTS_CONNECTION_STRING': appi.properties.ConnectionString
    'AZURE_WEBAPP_ENVIRONMENT': 'production'
    // 可在后续通过部署管道写入其他 Key Vault 引用等
  }
}

// 应用程序设置(staging 槽位)
resource slotAppSettings 'Microsoft.Web/sites/slots/config@2023-12-01' = {
  name: '${site.name}/staging/appsettings'
  properties: {
    'WEBSITE_RUN_FROM_PACKAGE': '0'
    'APPINSIGHTS_INSTRUMENTATIONKEY': appi.properties.InstrumentationKey
    'APPLICATIONINSIGHTS_CONNECTION_STRING': appi.properties.ConnectionString
    'AZURE_WEBAPP_ENVIRONMENT': 'staging'
    // 槽位特定变量建议放在此处(或使用「槽位设置」)
  }
}

// 诊断设置(生产)
resource siteDiag 'Microsoft.Insights/diagnosticSettings@2021-05-01-preview' = {
  name: 'diag'
  scope: site
  properties: {
    workspaceId: law.id
    logs: [
      { category: 'AppServiceAppLogs', enabled: true }
      { category: 'AppServiceHTTPLogs', enabled: true }
      { category: 'AppServiceConsoleLogs', enabled: true }
      { category: 'AppServiceFileAuditLogs', enabled: true }
      { category: 'AppServiceAuditLogs', enabled: true }
      { category: 'AppServicePlatformLogs', enabled: true }
    ]
    metrics: [
      { category: 'AllMetrics', enabled: true }
    ]
  }
}

// 诊断设置(staging 槽位)
resource slotDiag 'Microsoft.Insights/diagnosticSettings@2021-05-01-preview' = {
  name: 'diag'
  scope: slot
  properties: {
    workspaceId: law.id
    logs: [
      { category: 'AppServiceAppLogs', enabled: true }
      { category: 'AppServiceHTTPLogs', enabled: true }
      { category: 'AppServiceConsoleLogs', enabled: true }
      { category: 'AppServiceFileAuditLogs', enabled: true }
      { category: 'AppServiceAuditLogs', enabled: true }
      { category: 'AppServicePlatformLogs', enabled: true }
    ]
    metrics: [
      { category: 'AllMetrics', enabled: true }
    ]
  }
}

// Key Vault(启用基于 RBAC 的数据面授权)
resource kv 'Microsoft.KeyVault/vaults@2023-07-01' = {
  name: '${baseName}-kv'
  location: location
  properties: {
    tenantId: subscription().tenantId
    enableRbacAuthorization: true
    sku: {
      name: 'standard'
      family: 'A'
    }
    publicNetworkAccess: 'Enabled'
    softDeleteRetentionInDays: 7
    purgeProtectionEnabled: true
    networkAcls: {
      defaultAction: 'Allow'
      bypass: 'AzureServices'
    }
  }
}

// 为 Web App 与 Staging 槽位的托管标识授予读取密钥权限(Key Vault Secrets User)
resource kvRoleSite 'Microsoft.Authorization/roleAssignments@2022-04-01' = {
  name: guid(kv.id, site.identity.principalId, 'kv-secrets-user')
  scope: kv
  properties: {
    roleDefinitionId: subscriptionResourceId('Microsoft.Authorization/roleDefinitions', '4633458b-17de-408a-b874-0445c86b69e6')
    principalId: site.identity.principalId
    principalType: 'ServicePrincipal'
  }
}

resource kvRoleSlot 'Microsoft.Authorization/roleAssignments@2022-04-01' = {
  name: guid(kv.id, slot.identity.principalId, 'kv-secrets-user')
  scope: kv
  properties: {
    roleDefinitionId: subscriptionResourceId('Microsoft.Authorization/roleDefinitions', '4633458b-17de-408a-b874-0445c86b69e6')
    principalId: slot.identity.principalId
    principalType: 'ServicePrincipal'
  }
}

output webAppName string = site.name
output webAppHostname string = site.properties.defaultHostName
output stagingHostname string = slot.properties.hostNames[0]
output keyVaultName string = kv.name
output appInsightsName string = appi.name
output logAnalyticsWorkspaceId string = law.id
BICEP

log "Bicep 模板已写入: ${BICEP_FILE}"

#####################################
# 部署 Bicep
#####################################
log "开始部署资源(Bicep)..."
retry 3 5 az deployment group create \
  --resource-group "$RESOURCE_GROUP" \
  --mode Incremental \
  --name "webapp-staging-$(date +%Y%m%d%H%M%S)" \
  --template-file "$BICEP_FILE" \
  --parameters namePrefix="$NAME_PREFIX" location="$LOCATION" linuxFxVersion="$RUNTIME_STACK" healthCheckPath="$HEALTHCHECK_PATH" >/dev/null

DEPLOY_OUT_JSON="$(az deployment group show -g "$RESOURCE_GROUP" -n "$(az deployment group list -g "$RESOURCE_GROUP" --query '[0].name' -o tsv)" -o json)"
WEBAPP_NAME="$(echo "$DEPLOY_OUT_JSON" | sed -n 's/.*"webAppName": *{"type":"String","value":"\([^"]*\)".*/\1/p')"
WEBAPP_HOSTNAME="$(echo "$DEPLOY_OUT_JSON" | sed -n 's/.*"webAppHostname": *{"type":"String","value":"\([^"]*\)".*/\1/p')"
STAGING_HOSTNAME="$(echo "$DEPLOY_OUT_JSON" | sed -n 's/.*"stagingHostname": *{"type":"String","value":"\([^"]*\)".*/\1/p')"

if [[ -z "$WEBAPP_NAME" || -z "$STAGING_HOSTNAME" ]]; then
  warn "解析部署输出失败,回退到推断方式"
  WEBAPP_NAME="$(az resource list -g "$RESOURCE_GROUP" --resource-type Microsoft.Web/sites --query '[0].name' -o tsv)"
  STAGING_HOSTNAME="${WEBAPP_NAME}-staging.azurewebsites.net"
  WEBAPP_HOSTNAME="${WEBAPP_NAME}.azurewebsites.net"
fi

log "部署完成:WebApp=${WEBAPP_NAME} 生产URL=https://${WEBAPP_HOSTNAME} StagingURL=https://${STAGING_HOSTNAME}"

#####################################
# 可选:将应用包部署到 Staging 槽位
#####################################
if [[ -n "$ARTIFACT_PATH" ]]; then
  if [[ ! -f "$ARTIFACT_PATH" ]]; then
    fatal "指定的 ARTIFACT_PATH 不存在: $ARTIFACT_PATH"
  fi
  log "开始将应用包部署到 staging 槽位: $ARTIFACT_PATH"
  retry 3 5 az webapp deployment source config-zip \
    --resource-group "$RESOURCE_GROUP" \
    --name "$WEBAPP_NAME" \
    --slot staging \
    --src "$ARTIFACT_PATH" >/dev/null
  log "应用包已部署到 staging"
else
  warn "未提供 ARTIFACT_PATH,跳过应用代码部署(仅完成基础设施)"
fi

#####################################
# 健康检查与验证(staging 槽位)
#####################################
STAGING_URL="https://${STAGING_HOSTNAME}${HEALTHCHECK_PATH}"
log "等待 staging 健康检查就绪: ${STAGING_URL}"

ATTEMPTS=30
SLEEP_SECS=5
OK=0
for ((i=1; i<=ATTEMPTS; i++)); do
  set +e
  HTTP_CODE="$(curl -sS -o /dev/null -w "%{http_code}" "$STAGING_URL")"
  CURL_RC=$?
  set -e
  if [[ $CURL_RC -eq 0 && "$HTTP_CODE" == "200" ]]; then
    OK=1
    log "staging 健康检查响应 200,验证通过"
    break
  else
    log "第 $i/$ATTEMPTS 次健康检查:HTTP $HTTP_CODE(${SLEEP_SECS}s 后重试)"
    sleep "$SLEEP_SECS"
  fi
done
if [[ $OK -ne 1 ]]; then
  warn "健康检查未达预期(非 200)。请确认应用实现了 ${HEALTHCHECK_PATH} 或查看应用日志。"
fi

#####################################
# 输出关键信息
#####################################
cat <<EOF

部署完成:
- 资源组:${RESOURCE_GROUP}
- Web App:${WEBAPP_NAME}
- 生产地址:https://${WEBAPP_HOSTNAME}
- Staging 地址:https://${STAGING_HOSTNAME}
- 健康检查路径:${HEALTHCHECK_PATH}

下一步(可选上线):
- 手动槽位切换(蓝绿发布): 
  az webapp deployment slot swap -g "${RESOURCE_GROUP}" -n "${WEBAPP_NAME}" --slot staging --target-slot production

日志与故障排查:
- App Service 日志与指标已发送到 Log Analytics,可通过 Azure 门户或 Kusto 查询分析
- Application Insights 已接入,可查看请求、失败、依赖、分布式跟踪
EOF

配置说明

  • SUBSCRIPTION_ID:目标订阅 ID(若不设置则使用 az 当前订阅)
  • LOCATION:部署区域,如 eastasia、southeastasia、japaneast、westeurope、eastus2 等
  • NAME_PREFIX:资源名前缀,最终资源名为 namePrefix + uniqueString,确保跨订阅/区域全局唯一,避免命名冲突
  • RUNTIME_STACK:运行时栈(Linux),常用示例:
    • node|18-lts、node|20-lts
    • DOTNETCORE|8.0
    • python|3.11
    • JAVA|11-java11
  • HEALTHCHECK_PATH:App Service 健康检查路径(需由应用实现返回 200),默认 /
  • ARTIFACT_PATH:可选,指向要部署到 staging 槽位的 zip 包
  • RESOURCE_GROUP:可选,目标资源组名称,未指定则基于 NAME_PREFIX 自动生成

安全与合规要点(已在模板中体现):

  • httpsOnly: true,minTlsVersion: 1.2,ftpsState: FtpsOnly
  • App Service 始终开启 alwaysOn,启用 HTTP/2
  • 使用系统分配的托管标识,不硬编码凭据
  • Key Vault 采用数据面 RBAC(未创建访问策略),并为 Web App/槽位授予 Secrets User 角色
  • 启用诊断日志与指标上报至 Log Analytics
  • 严格的 Bash 错误处理(set -Eeuo pipefail)与重试机制

部署步骤

  1. 准备环境

    • 安装 Azure CLI,登录:az login
    • 如需 CI 非交互:导出 AZURE_CLIENT_ID、AZURE_CLIENT_SECRET、AZURE_TENANT_ID 后执行脚本
  2. 执行脚本(示例)

    • 仅部署基础设施: SUBSCRIPTION_ID=<your_sub_id> LOCATION=eastasia NAME_PREFIX=myweb RUNTIME_STACK="node|20-lts" HEALTHCHECK_PATH="/healthz" ./deploy_azure_staging.sh
    • 部署基础设施并推送 zip 包至 staging: SUBSCRIPTION_ID=<your_sub_id> LOCATION=eastasia NAME_PREFIX=myweb RUNTIME_STACK="node|20-lts" HEALTHCHECK_PATH="/healthz" ARTIFACT_PATH="build/app.zip" ./deploy_azure_staging.sh
  3. 验证资源

    • 脚本将输出生产与 staging 地址
    • 如提供 ARTIFACT_PATH,将对 staging 健康检查端点发起轮询验证
  4. 上线(可选)

    • 确认 staging 验证通过后,执行槽位切换: az webapp deployment slot swap -g -n <webapp_name> --slot staging --target-slot production
  5. 清理(可选)

    • 删除资源组(会删除所有资源): az group delete -n --yes --no-wait

验证方法

  • 健康检查

    • 访问 https://<webapp_name>-staging.azurewebsites.net<HEALTHCHECK_PATH>,确认返回 200
    • 部署到生产前建议进行更全面的端到端验证与回归测试
  • 日志与指标

    • 在 Azure 门户查看:
      • Application Insights:失败率、依赖、请求、性能、Live Metrics
      • Log Analytics Workspace:使用「日志」编写 KQL 查询,如:
        • AppServiceHTTPLogs
        • AppServiceAppLogs
        • traces、requests、exceptions(Application Insights 表)
  • 安全配置核对

    • az webapp show -g -n --query "httpsOnly"
    • az webapp config show -g -n --query "{minTls: minTlsVersion, ftps: ftpsState, http2: http20Enabled}"

注意

  • 请确保应用实现 HEALTHCHECK_PATH(如 /healthz),返回 200 以通过验证。
  • 若需从 Key Vault 加载配置,建议使用托管标识 + Key Vault 引用(App Settings 中使用 @Microsoft.KeyVault 语法),避免明文凭据。
  • 若需进一步网络加固(私有终结点/VNet 集成/IP 访问限制),可在此基础上按需扩展 Bicep 模板与脚本。

部署脚本概述

该方案在 GCP 的开发环境中,基于 Cloud Run 部署微服务,并配套基础网络与安全配置。脚本具备以下特点:

  • 使用声明式配置(YAML)描述微服务,保证可重复性与可审计性
  • 自动创建基础资源:VPC、子网、Serverless VPC Connector、Artifact Registry、执行服务账号
  • 安全默认配置:强制鉴权(可切换)、最小权限的IAM绑定、Secret Manager集成
  • 完整错误处理、日志记录与部署后健康验证(含鉴权调用)

适用场景:

  • 初创团队或项目在 GCP 的开发环境进行微服务基础架构搭建
  • 需要快速部署多个 Cloud Run 服务并统一注入环境变量与密钥
  • 希望在不引入复杂编排(如 GKE)的情况下实现标准化开发环境

前置依赖

  • 已创建并启用计费的 GCP 项目
  • 工具:
    • gcloud CLI ≥ 420.0.0
    • jq ≥ 1.6
    • yq ≥ 4.x(读取 YAML)
    • curl ≥ 7.x
  • 身份与权限(执行脚本的用户或其所属服务账号需具备以下角色):
    • roles/run.admin(管理 Cloud Run)
    • roles/compute.networkAdmin(管理 VPC/子网/VPC Connector)
    • roles/iam.serviceAccountAdmin 与 roles/iam.serviceAccountUser(创建和使用服务账号)
    • roles/artifactregistry.admin(创建 Artifact Registry)
    • roles/secretmanager.admin(管理密钥)
    • roles/serviceusage.serviceUsageAdmin(启用 API)

注意:请勿在脚本或配置中硬编码凭证或密钥值。


核心脚本代码

以下为完整可执行的基础部署脚本 deploy.sh(建议置于项目根目录)。该脚本依赖配置文件 microservices.yaml,示例见后文。

#!/usr/bin/env bash
set -euo pipefail

# === 简易日志系统 ===
LOG_FILE="${LOG_FILE:-deploy_$(date +%Y%m%d_%H%M%S).log}"
exec > >(tee -a "$LOG_FILE") 2>&1
info()  { printf "[INFO] %s\n" "$*"; }
warn()  { printf "[WARN] %s\n" "$*"; }
error() { printf "[ERROR] %s\n" "$*" >&2; }

trap 'error "部署失败。查看日志: $LOG_FILE"; exit 1' ERR

# === 参数与环境 ===
PROJECT_ID="${PROJECT_ID:-}"
REGION="${REGION:-us-central1}"
ALLOW_UNAUTH="${ALLOW_UNAUTH:-false}"  # "true" 允许匿名访问;默认关闭
CONFIG_FILE="${CONFIG_FILE:-microservices.yaml}"

NETWORK_NAME="${NETWORK_NAME:-dev-mesh-vpc}"
SUBNET_NAME="${SUBNET_NAME:-dev-subnet}"
SUBNET_RANGE="${SUBNET_RANGE:-10.10.0.0/24}"
CONNECTOR_NAME="${CONNECTOR_NAME:-dev-vpc-connector}"
REPO_ID="${REPO_ID:-dev-services}"
RUN_SA_NAME="${RUN_SA_NAME:-run-sa}" # 执行服务账号名称(不含域)
RUN_SA_EMAIL=""

# === 依赖检查 ===
need_cmd() { command -v "$1" >/dev/null 2>&1 || { error "缺少依赖: $1"; exit 1; }; }
need_cmd gcloud
need_cmd jq
need_cmd yq
need_cmd curl

# === 参数校验 ===
if [[ -z "$PROJECT_ID" ]]; then
  error "未设置 PROJECT_ID。请导出环境变量:export PROJECT_ID=your-gcp-project-id"
  exit 1
fi
info "使用项目: $PROJECT_ID, 区域: $REGION, 允许匿名访问: $ALLOW_UNAUTH"

gcloud config set project "$PROJECT_ID" >/dev/null

# === 功能函数 ===
# 幂等:存在则跳过创建
ensure_api() {
  local api="$1"
  if ! gcloud services list --enabled --format="value(config.name)" | grep -q "^${api}$"; then
    info "启用 API: $api"
    gcloud services enable "$api"
  else
    info "API 已启用: $api"
  fi
}

ensure_network() {
  if ! gcloud compute networks describe "$NETWORK_NAME" --format="value(name)" >/dev/null 2>&1; then
    info "创建 VPC: $NETWORK_NAME"
    gcloud compute networks create "$NETWORK_NAME" --subnet-mode=custom
  else
    info "VPC 已存在: $NETWORK_NAME"
  fi
}

ensure_subnet() {
  if ! gcloud compute networks subnets describe "$SUBNET_NAME" --region "$REGION" --format="value(name)" >/dev/null 2>&1; then
    info "创建子网: $SUBNET_NAME ($SUBNET_RANGE)"
    gcloud compute networks subnets create "$SUBNET_NAME" \
      --network "$NETWORK_NAME" \
      --region "$REGION" \
      --range "$SUBNET_RANGE"
  else
    info "子网已存在: $SUBNET_NAME"
  fi
}

ensure_connector() {
  if ! gcloud compute networks vpc-access connectors describe "$CONNECTOR_NAME" --region "$REGION" --format="value(name)" >/dev/null 2>&1; then
    info "创建 Serverless VPC Connector: $CONNECTOR_NAME"
    gcloud compute networks vpc-access connectors create "$CONNECTOR_NAME" \
      --network "$NETWORK_NAME" \
      --region "$REGION" \
      --subnet "$SUBNET_NAME" \
      --min-instances 2 \
      --max-instances 10
  else
    info "VPC Connector 已存在: $CONNECTOR_NAME"
  fi
}

ensure_repo() {
  if ! gcloud artifacts repositories describe "$REPO_ID" --location "$REGION" --format="value(name)" >/dev/null 2>&1; then
    info "创建 Artifact Registry 仓库: $REPO_ID"
    gcloud artifacts repositories create "$REPO_ID" \
      --repository-format=docker \
      --location="$REGION" \
      --description="Dev microservices registry"
  else
    info "Artifact Registry 仓库已存在: $REPO_ID"
  fi
}

ensure_service_account() {
  RUN_SA_EMAIL="${RUN_SA_NAME}@${PROJECT_ID}.iam.gserviceaccount.com"
  if ! gcloud iam service-accounts describe "$RUN_SA_EMAIL" --format="value(email)" >/dev/null 2>&1; then
    info "创建服务账号: $RUN_SA_EMAIL"
    gcloud iam service-accounts create "$RUN_SA_NAME" \
      --display-name="Cloud Run execution SA (dev)"
  else
    info "服务账号已存在: $RUN_SA_EMAIL"
  fi
}

ensure_iam_bindings() {
  info "为服务账号绑定最小权限角色"
  gcloud projects add-iam-policy-binding "$PROJECT_ID" \
    --member "serviceAccount:${RUN_SA_EMAIL}" \
    --role "roles/secretmanager.secretAccessor" >/dev/null
  gcloud projects add-iam-policy-binding "$PROJECT_ID" \
    --member "serviceAccount:${RUN_SA_EMAIL}" \
    --role "roles/logging.logWriter" >/dev/null
  gcloud projects add-iam-policy-binding "$PROJECT_ID" \
    --member "serviceAccount:${RUN_SA_EMAIL}" \
    --role "roles/monitoring.metricWriter" >/dev/null
}

check_secret_exists() {
  local name="$1"
  if ! gcloud secrets describe "$name" --format="value(name)" >/dev/null 2>&1; then
    error "Secret 不存在: $name。请先创建并写入版本:gcloud secrets create $name && gcloud secrets versions add $name --data-file=..."
    exit 1
  fi
}

deploy_service() {
  local name="$1"
  local image="$2"
  local port="$3"
  local cpu="$4"
  local memory="$5"
  local max_instances="$6"
  local allow_unauth="$7"
  local labels="$8"
  local env_kv="$9"
  local secrets_spec="${10}"

  info "部署服务: ${name}"

  local auth_flag="--no-allow-unauthenticated"
  [[ "$allow_unauth" == "true" ]] && auth_flag="--allow-unauthenticated"

  local set_env_flag=()
  [[ -n "$env_kv" ]] && set_env_flag=(--set-env-vars "$env_kv")

  local set_secrets_flag=()
  if [[ -n "$secrets_spec" ]]; then
    # 验证每个 secret 是否存在
    IFS=',' read -r -a pairs <<< "$secrets_spec"
    for p in "${pairs[@]}"; do
      # 形如 ENV=secret:version
      local pair="$p"
      local secret_name="$(echo "$pair" | cut -d'=' -f2 | cut -d':' -f1)"
      check_secret_exists "$secret_name"
    done
    set_secrets_flag=(--set-secrets "$secrets_spec")
  fi

  gcloud run deploy "$name" \
    --image="$image" \
    --region="$REGION" \
    --project="$PROJECT_ID" \
    --platform=managed \
    --ingress=all \
    $auth_flag \
    --port="$port" \
    --cpu="$cpu" \
    --memory="$memory" \
    --min-instances=0 \
    --max-instances="$max_instances" \
    --execution-environment=gen2 \
    --service-account="$RUN_SA_EMAIL" \
    --vpc-connector="$CONNECTOR_NAME" \
    --labels "$labels" \
    "${set_env_flag[@]}" \
    "${set_secrets_flag[@]}]"
}

service_url() {
  local name="$1"
  gcloud run services describe "$name" --region "$REGION" --format="value(uri)"
}

validate_service() {
  local name="$1"
  local url
  url=$(service_url "$name")
  if [[ -z "$url" ]]; then
    error "无法获取服务 URL: $name"
    return 1
  fi
  info "验证服务: $name (${url})"

  # 若启用匿名则直接访问,否则获取身份令牌
  local code
  if [[ "$ALLOW_UNAUTH" == "true" ]]; then
    code=$(curl -sS -o /dev/null -w "%{http_code}" "${url}/healthz")
    [[ "$code" == "404" ]] && code=$(curl -sS -o /dev/null -w "%{http_code}" "${url}/")
  else
    local token
    token=$(gcloud auth print-identity-token --audiences="$url")
    code=$(curl -sS -o /dev/null -w "%{http_code}" -H "Authorization: Bearer ${token}" "${url}/healthz")
    [[ "$code" == "404" ]] && code=$(curl -sS -o /dev/null -w "%{http_code}" -H "Authorization: Bearer ${token}" "${url}/")
  fi

  if [[ "$code" -ge 200 && "$code" -lt 400 ]]; then
    info "服务健康检查通过: $name (HTTP $code)"
  else
    warn "服务健康检查失败: $name (HTTP $code)。请检查 Cloud Logging 日志。"
    gcloud run services logs read "$name" --region "$REGION" --limit 50 || true
    return 1
  fi
}

# === 主流程 ===
info "启用必要 API"
for api in run.googleapis.com artifactregistry.googleapis.com secretmanager.googleapis.com cloudbuild.googleapis.com compute.googleapis.com iam.googleapis.com logging.googleapis.com monitoring.googleapis.com; do
  ensure_api "$api"
done

ensure_network
ensure_subnet
ensure_connector
ensure_repo
ensure_service_account
ensure_iam_bindings

# 解析 YAML 并部署
if [[ ! -f "$CONFIG_FILE" ]]; then
  error "未找到配置文件: $CONFIG_FILE"
  exit 1
fi

services_count=$(yq '.services | length' "$CONFIG_FILE")
if [[ "$services_count" -eq 0 ]]; then
  error "配置文件中未定义任何服务(.services)"
  exit 1
fi

info "发现 ${services_count} 个服务,开始部署..."
for i in $(seq 0 $((services_count - 1))); do
  name=$(yq ".services[$i].name" "$CONFIG_FILE")
  image=$(yq ".services[$i].image" "$CONFIG_FILE")
  port=$(yq ".services[$i].port // 8080" "$CONFIG_FILE")
  cpu=$(yq ".services[$i].cpu // \"1\"" "$CONFIG_FILE")
  memory=$(yq ".services[$i].memory // \"512Mi\"" "$CONFIG_FILE")
  max_instances=$(yq ".services[$i].maxInstances // 3" "$CONFIG_FILE")

  # env 转为 KEY=VALUE,KEY2=VALUE2
  env_kv=$(yq -o=json ".services[$i].env" "$CONFIG_FILE" | jq -r 'to_entries | map("\(.key)=\(.value)") | join(",")' )
  [[ "$env_kv" == "null" ]] && env_kv=""

  # secrets 转为 ENV=secret:version,ENV2=secret2:latest
  secrets_spec=$(yq -o=json ".services[$i].secrets" "$CONFIG_FILE" | jq -r 'if .==null then "" else map("\(.env)=\(.name):\(.version)") | join(",") end')
  [[ "$secrets_spec" == "null" ]] && secrets_spec=""

  labels="env=dev,service=${name}"

  deploy_service "$name" "$image" "$port" "$cpu" "$memory" "$max_instances" "$ALLOW_UNAUTH" "$labels" "$env_kv" "$secrets_spec"
done

info "全部服务已部署,开始验证..."
all_ok=true
for i in $(seq 0 $((services_count - 1))); do
  name=$(yq ".services[$i].name" "$CONFIG_FILE")
  validate_service "$name" || all_ok=false
done

if [[ "$all_ok" == "true" ]]; then
  info "部署与验证全部成功。"
else
  warn "部分服务验证失败,请参考日志与 Cloud Logging 进一步排查。"
fi

info "完成。部署日志保存在: $LOG_FILE"

配置说明

配置文件 microservices.yaml 用于描述待部署微服务。示例:

services:
  - name: svc-a
    image: us-central1-docker.pkg.dev/your-project-id/dev-services/svc-a:dev
    port: 8080
    cpu: "1"
    memory: "512Mi"
    maxInstances: 3
    env:
      APP_ENV: "development"
      LOG_LEVEL: "info"
    secrets:
      - env: "DB_PASSWORD"
        name: "dev-db-password"
        version: "latest"
  - name: svc-b
    image: us-central1-docker.pkg.dev/your-project-id/dev-services/svc-b:dev
    port: 8080
    cpu: "1"
    memory: "512Mi"
    maxInstances: 3
    env:
      APP_ENV: "development"
      LOG_LEVEL: "debug"
    secrets: []

关键项说明:

  • name:Cloud Run 服务名称(唯一)
  • image:容器镜像地址(推荐使用 Artifact Registry:{region}-docker.pkg.dev/{project}/{repo}/{image}:{tag})
  • port:容器监听端口(默认 8080)
  • cpu/memory:实例资源规格(示例“1”/“512Mi”,符合 Cloud Run 限制)
  • maxInstances:最大实例数(开发环境建议 3~10)
  • env:普通环境变量键值对
  • secrets:从 Secret Manager 注入的敏感变量(env 为容器内环境变量名,name/version 为 Secret 名与版本)

安全建议:

  • 不在 env 中写入敏感数据,敏感数据一律使用 secrets 注入
  • 默认关闭匿名访问(ALLOW_UNAUTH=false);如需开启仅用于临时开发调试

部署步骤

  1. 准备环境与权限
    • 安装 gcloud、jq、yq、curl
    • 设置项目并登录:gcloud auth login && gcloud config set project your-project-id
    • 导出环境变量:
      • export PROJECT_ID=your-project-id
      • export REGION=us-central1
      • export ALLOW_UNAUTH=false
  2. 准备镜像与密钥
    • 将微服务镜像推送至 Artifact Registry(脚本会创建仓库 dev-services)
    • 创建必要的 Secrets 并写入版本:
      • gcloud secrets create dev-db-password
      • gcloud secrets versions add dev-db-password --data-file=-
  3. 准备配置文件
    • 按上述示例创建 microservices.yaml,并替换 image 中的 your-project-id
  4. 执行部署
    • chmod +x deploy.sh
    • ./deploy.sh
  5. 观察日志与结果
    • 脚本会输出部署日志到 deploy_YYYYMMDD_HHMMSS.log
    • 若验证失败,会自动输出服务最近日志以协助排查

验证方法

脚本已自带验证流程,主要包含:

  • 资源存在性与 API 启用校验
  • Cloud Run 服务 URI 获取
  • 健康检查调用(优先访问 /healthz,如无该路由则访问根路径 /)
  • 访问鉴权:
    • 未开启匿名:使用 gcloud auth print-identity-token 生成 IAP 兼容的身份令牌并调用
    • 开启匿名:直接调用服务 URL

外部验证建议:

  • 手工验证服务是否可访问
  • 查看 Cloud Logging
    • gcloud run services logs read svc-a --region $REGION --limit 100
  • 查看服务详情与流量配置
    • gcloud run services describe svc-a --region $REGION

说明与最佳实践

  • 安全默认:关闭匿名访问(除非明确设置 ALLOW_UNAUTH=true),并采用最小权限的服务账号读取 Secret
  • 网络:使用 Serverless VPC Connector 以便服务访问私有资源(数据库、内网API等)
  • 配置即代码:微服务清单使用 YAML 描述,避免手工配置偏差
  • 可扩展:可将脚本接入 Cloud Build/GitHub Actions,按提交自动构建镜像并执行部署
  • 禁止硬编码:脚本不包含任何密钥值或固定风险配置。所有敏感信息通过 Secret Manager 管理

如需进一步演进到 Terraform 管理(更强的 IaC能力),可将 VPC、Connector、Artifact Registry、Cloud Run 服务声明为 Terraform 资源,并保留当前 YAML 作为变量源。上述脚本也可改为在 CI/CD 中调用,形成标准化的开发流水线。

示例详情

解决的问题

以更少的沟通成本、更短的交付周期,快速产出可直接执行的云部署脚本与上手指南。围绕“平台类型—部署目标—环境配置”三要素,自动生成适配 AWS/Azure/GCP 的标准化脚本与操作步骤,覆盖权限与网络策略、错误处理、日志与健康检查、回滚与验证。目标是:1) 将环境搭建从数天压缩到数小时;2) 降低手工配置失误与环境不一致;3) 实现多环境、多区域可复制;4) 以安全为默认,减少合规风险;5) 为持续集成交付建立可复用基线。

适用用户

DevOps工程师

统一多云脚本规范,快速产出项目可用部署脚本,内含日志、回滚与健康检查,显著减少加班救火。

初创公司技术负责人

用现成方案搭建生产级环境,一天内完成从账户准备到服务上线,控制成本并保留弹性扩容空间。

后端工程师

在缺乏专职运维时,一键生成服务部署脚本,自测通过即可交付,迭代上线从天缩短到小时。

特征总结

基于输入平台与目标,一键生成适配云环境的部署脚本,省去手工摸索与反复试错。
自动补全日志与错误处理,部署失败即时定位原因,减少回滚时间与人力沟通成本。
内置安全与权限策略建议,默认即安全,避免凭证泄露与不必要的暴露面。
提供从依赖到验证的完整执行步骤,新人也能按图索骥稳定上线并快速复盘。
支持AWS/Azure/GCP主流场景模板,常见服务开箱即用,异构环境快速统一。
按业务规模自动规划资源与网络结构,节省费用同时确保性能与可用性。
与持续交付流程友好衔接,轻松插入现有流水线,缩短从提交到上线的周期。
生成可读性强的配置说明与参数注释,后续团队扩展、交接与审计协作更顺畅。
为微服务、容器化与云迁移等复杂场景提供预置方案,降低落地门槛并提升上线成功率。
持续更新并对照业界最佳做法,避免过时配置,减少技术债与后期维护风险。

如何使用购买的提示词模板

1. 直接在外部 Chat 应用中使用

将模板生成的提示词复制粘贴到您常用的 Chat 应用(如 ChatGPT、Claude 等),即可直接对话使用,无需额外开发。适合个人快速体验和轻量使用场景。

2. 发布为 API 接口调用

把提示词模板转化为 API,您的程序可任意修改模板参数,通过接口直接调用,轻松实现自动化与批量处理。适合开发者集成与业务系统嵌入。

3. 在 MCP Client 中配置使用

在 MCP client 中配置对应的 server 地址,让您的 AI 应用自动调用提示词模板。适合高级用户和团队协作,让提示词在不同 AI 工具间无缝衔接。

AI 提示词价格
¥20.00元
先用后买,用好了再付款,超安全!

您购买后可以获得什么

获得完整提示词模板
- 共 585 tokens
- 3 个可调节参数
{ 平台类型 } { 部署目标 } { 环境配置 }
获得社区贡献内容的使用权
- 精选社区优质案例,助您快速上手提示词
使用提示词兑换券,低至 ¥ 9.9
了解兑换券 →
限时半价

不要错过!

半价获取高级提示词-优惠即将到期

17
:
23
小时
:
59
分钟
:
59