S3_跨桶复制完整指南

作者: RJ.Wang
邮箱: wangrenjun@gmail.com
创建时间: 2025-01-08
更新时间: 2025-01-08


🎯 概述

Amazon S3 提供多种 API 方式实现跨桶文件复制,支持同区域、跨区域、跨账户的复制操作。

graph TB
    Source[📦 源桶<br/>source-bucket]
    Target[📦 目标桶<br/>target-bucket]

    subgraph Methods["🔧 复制方法"]
        API1[🔄 CopyObject API]
        API2[📋 Batch Operations]
        API3[🔄 Cross-Region Replication]
        CLI[💻 AWS CLI]
        SDK[🛠️ AWS SDK]
    end

    Source --> API1 --> Target
    Source --> API2 --> Target
    Source --> API3 --> Target
    Source --> CLI --> Target
    Source --> SDK --> Target

    classDef bucket fill:#e3f2fd,stroke:#1976d2
    classDef method fill:#e8f5e8,stroke:#388e3c

    class Source,Target bucket
    class API1,API2,API3,CLI,SDK method

🔧 核心 API 方法

1. CopyObject API(推荐)

REST API 调用

PUT /target-key HTTP/1.1
Host: target-bucket.s3.amazonaws.com
x-amz-copy-source: /source-bucket/source-key
Authorization: AWS4-HMAC-SHA256 Credential=...

AWS CLI 命令

# 基本复制
aws s3 cp s3://source-bucket/file.txt s3://target-bucket/file.txt

# 跨区域复制
aws s3 cp s3://source-bucket/file.txt s3://target-bucket/file.txt \
  --source-region us-east-1 \
  --region us-west-2

# 跨账户复制(需要权限)
aws s3 cp s3://source-bucket/file.txt s3://target-bucket/file.txt \
  --profile source-account \
  --cli-write-timeout 0

2. 批量复制

# 复制整个目录
aws s3 sync s3://source-bucket/folder/ s3://target-bucket/folder/

# 递归复制所有文件
aws s3 cp s3://source-bucket/ s3://target-bucket/ --recursive

# 带过滤条件的复制
aws s3 cp s3://source-bucket/ s3://target-bucket/ \
  --recursive \
  --exclude "*" \
  --include "*.jpg"

🛠️ SDK 实现示例

Python (Boto3)

import boto3
from botocore.exceptions import ClientError

def copy_s3_object(source_bucket, source_key, target_bucket, target_key, 
                   source_region='us-east-1', target_region='us-east-1'):
    """
    S3 对象跨桶复制

    Args:
        source_bucket: 源桶名称
        source_key: 源对象键
        target_bucket: 目标桶名称  
        target_key: 目标对象键
        source_region: 源区域
        target_region: 目标区域
    """
    try:
        # 创建 S3 客户端
        s3_client = boto3.client('s3', region_name=target_region)

        # 构建复制源
        copy_source = {
            'Bucket': source_bucket,
            'Key': source_key,
            'Region': source_region
        }

        # 执行复制
        s3_client.copy_object(
            CopySource=copy_source,
            Bucket=target_bucket,
            Key=target_key,
            MetadataDirective='COPY'  # 保留原始元数据
        )

        print(f"✅ 复制成功: s3://{source_bucket}/{source_key} -> s3://{target_bucket}/{target_key}")
        return True

    except ClientError as e:
        print(f"❌ 复制失败: {e}")
        return False

# 使用示例
copy_s3_object(
    source_bucket='my-source-bucket',
    source_key='documents/report.pdf',
    target_bucket='my-target-bucket', 
    target_key='backup/report.pdf'
)

大文件分片复制

def copy_large_object(source_bucket, source_key, target_bucket, target_key):
    """大文件分片复制(>5GB)"""
    s3_client = boto3.client('s3')

    # 获取源对象信息
    response = s3_client.head_object(Bucket=source_bucket, Key=source_key)
    object_size = response['ContentLength']

    if object_size > 5 * 1024 * 1024 * 1024:  # 5GB
        # 使用分片上传
        copy_source = {'Bucket': source_bucket, 'Key': source_key}

        # 创建分片上传
        multipart_upload = s3_client.create_multipart_upload(
            Bucket=target_bucket,
            Key=target_key
        )

        upload_id = multipart_upload['UploadId']
        parts = []

        # 分片大小 (100MB)
        part_size = 100 * 1024 * 1024
        part_number = 1

        for start in range(0, object_size, part_size):
            end = min(start + part_size - 1, object_size - 1)

            # 复制分片
            part_response = s3_client.upload_part_copy(
                Bucket=target_bucket,
                Key=target_key,
                PartNumber=part_number,
                UploadId=upload_id,
                CopySource=copy_source,
                CopySourceRange=f'bytes={start}-{end}'
            )

            parts.append({
                'ETag': part_response['CopyPartResult']['ETag'],
                'PartNumber': part_number
            })

            part_number += 1

        # 完成分片上传
        s3_client.complete_multipart_upload(
            Bucket=target_bucket,
            Key=target_key,
            UploadId=upload_id,
            MultipartUpload={'Parts': parts}
        )

Node.js (AWS SDK v3)

import { S3Client, CopyObjectCommand } from "@aws-sdk/client-s3";

async function copyS3Object(sourceBucket, sourceKey, targetBucket, targetKey) {
    const s3Client = new S3Client({ region: "us-east-1" });

    const copyParams = {
        Bucket: targetBucket,
        Key: targetKey,
        CopySource: `${sourceBucket}/${sourceKey}`,
        MetadataDirective: "COPY"
    };

    try {
        const command = new CopyObjectCommand(copyParams);
        const response = await s3Client.send(command);

        console.log(`✅ 复制成功: ${response.CopyObjectResult.ETag}`);
        return response;
    } catch (error) {
        console.error(`❌ 复制失败:`, error);
        throw error;
    }
}

// 使用示例
await copyS3Object(
    'source-bucket',
    'path/to/file.txt', 
    'target-bucket',
    'backup/file.txt'
);

🔐 权限配置

源桶权限策略

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "AllowCopyFrom",
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::TARGET-ACCOUNT:user/copy-user"
            },
            "Action": [
                "s3:GetObject",
                "s3:GetObjectVersion"
            ],
            "Resource": "arn:aws:s3:::source-bucket/*"
        }
    ]
}

目标桶权限策略

{
    "Version": "2012-10-17", 
    "Statement": [
        {
            "Sid": "AllowCopyTo",
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::SOURCE-ACCOUNT:user/copy-user"
            },
            "Action": [
                "s3:PutObject",
                "s3:PutObjectAcl"
            ],
            "Resource": "arn:aws:s3:::target-bucket/*"
        }
    ]
}

IAM 用户权限

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "s3:GetObject",
                "s3:PutObject",
                "s3:PutObjectAcl"
            ],
            "Resource": [
                "arn:aws:s3:::source-bucket/*",
                "arn:aws:s3:::target-bucket/*"
            ]
        }
    ]
}

🌍 跨区域复制配置

自动复制规则

{
    "Role": "arn:aws:iam::ACCOUNT:role/replication-role",
    "Rules": [
        {
            "ID": "CrossRegionReplication",
            "Status": "Enabled",
            "Priority": 1,
            "Filter": {
                "Prefix": "documents/"
            },
            "Destination": {
                "Bucket": "arn:aws:s3:::target-bucket",
                "StorageClass": "STANDARD_IA"
            }
        }
    ]
}

CLI 配置复制规则

# 创建复制配置文件
cat > replication-config.json << 'EOF'
{
    "Role": "arn:aws:iam::123456789012:role/replication-role",
    "Rules": [
        {
            "ID": "ReplicateToBackup",
            "Status": "Enabled", 
            "Priority": 1,
            "Filter": {"Prefix": ""},
            "Destination": {
                "Bucket": "arn:aws:s3:::backup-bucket",
                "StorageClass": "GLACIER"
            }
        }
    ]
}
EOF

# 应用复制配置
aws s3api put-bucket-replication \
  --bucket source-bucket \
  --replication-configuration file://replication-config.json

📊 性能优化

复制性能对比

graph TB
    subgraph Performance["📈 性能对比"]
        Single[🔄 单文件复制<br/>适用: <100MB]
        Multi[📋 分片复制<br/>适用: >5GB]
        Batch[🚀 批量操作<br/>适用: 大量小文件]
        Sync[🔄 同步复制<br/>适用: 目录同步]
    end

    subgraph Metrics["📊 性能指标"]
        Speed[⚡ 速度: 100MB/s]
        Cost[💰 成本: $0.0004/1000请求]
        Limit[⚠️ 限制: 5GB单文件]
    end

    Single --> Speed
    Multi --> Cost
    Batch --> Limit

    classDef method fill:#e8f5e8,stroke:#388e3c
    classDef metric fill:#fff3e0,stroke:#f57c00

    class Single,Multi,Batch,Sync method
    class Speed,Cost,Limit metric

优化建议

  1. 小文件(<100MB): 直接使用 CopyObject
  2. 大文件(>5GB): 使用分片复制
  3. 批量文件: 使用 S3 Batch Operations
  4. 实时同步: 配置 Cross-Region Replication

🚨 常见问题

错误处理

def robust_copy_with_retry(source_bucket, source_key, target_bucket, target_key, max_retries=3):
    """带重试机制的复制"""
    import time

    for attempt in range(max_retries):
        try:
            return copy_s3_object(source_bucket, source_key, target_bucket, target_key)
        except ClientError as e:
            error_code = e.response['Error']['Code']

            if error_code == 'NoSuchBucket':
                print(f"❌ 桶不存在: {e}")
                return False
            elif error_code == 'AccessDenied':
                print(f"❌ 权限不足: {e}")
                return False
            elif error_code == 'SlowDown':
                # 请求过于频繁,等待后重试
                wait_time = 2 ** attempt
                print(f"⏳ 请求限流,等待 {wait_time} 秒后重试...")
                time.sleep(wait_time)
                continue
            else:
                print(f"❌ 未知错误: {e}")
                if attempt == max_retries - 1:
                    return False
                time.sleep(1)

    return False

常见错误码

  • NoSuchBucket: 桶不存在
  • NoSuchKey: 对象不存在
  • AccessDenied: 权限不足
  • InvalidRequest: 请求参数错误
  • SlowDown: 请求频率过高

💰 成本计算

复制成本组成

graph TB
    Cost[💰 总成本] --> Request[📋 请求费用<br/>$0.0004/1000请求]
    Cost --> Transfer[🔄 传输费用<br/>跨区域: $0.02/GB]
    Cost --> Storage[💾 存储费用<br/>目标桶存储成本]

    Request --> Same[🏠 同区域: 免费]
    Request --> Cross[🌍 跨区域: 收费]

    Transfer --> Internal[🔒 内部传输: 免费]
    Transfer --> External[🌐 外部传输: 收费]

    classDef cost fill:#ffebee,stroke:#d32f2f
    classDef free fill:#e8f5e8,stroke:#388e3c
    classDef paid fill:#fff3e0,stroke:#f57c00

    class Cost cost
    class Same,Internal free
    class Cross,External,Request,Transfer,Storage paid

成本优化建议

  1. 同区域复制: 无数据传输费用
  2. 批量操作: 减少 API 请求次数
  3. 存储类别: 选择合适的存储类别
  4. 生命周期: 配置自动删除或归档

🔍 监控和日志

CloudTrail 事件监控

{
    "eventSource": "s3.amazonaws.com",
    "eventName": "CopyObject",
    "sourceIPAddress": "192.168.1.100",
    "requestParameters": {
        "bucketName": "target-bucket",
        "key": "backup/file.txt",
        "x-amz-copy-source": "source-bucket/file.txt"
    }
}

CloudWatch 指标

  • NumberOfObjects: 对象数量变化
  • BucketSizeBytes: 桶大小变化
  • AllRequests: 请求总数
  • 4xxErrors: 客户端错误
  • 5xxErrors: 服务端错误

📋 最佳实践

安全建议

  1. 最小权限: 仅授予必要的复制权限
  2. 加密传输: 使用 HTTPS 传输
  3. 访问日志: 启用 S3 访问日志
  4. 版本控制: 启用对象版本控制

性能建议

  1. 并发控制: 避免过多并发请求
  2. 重试机制: 实现指数退避重试
  3. 分片上传: 大文件使用分片复制
  4. 区域选择: 选择最近的区域减少延迟

成本建议

  1. 存储类别: 根据访问频率选择存储类别
  2. 生命周期: 配置自动转换和删除
  3. 压缩: 复制前压缩文件
  4. 去重: 避免重复复制相同文件

🎯 总结

S3 提供了强大而灵活的跨桶复制能力,支持:

  • REST API: 直接 HTTP 调用
  • AWS CLI: 命令行工具
  • SDK: 多语言编程接口
  • 批量操作: 大规模文件处理
  • 自动复制: 实时同步机制

选择合适的方法取决于文件大小、复制频率、性能要求和成本考虑。


最后更新: 2025-01-08