作者: RJ.Wang
邮箱: wangrenjun@gmail.com
创建时间: 2025-01-08
更新时间: 2025-01-08
🎯 概述
Amazon S3 提供多种 API 方式实现跨桶文件复制,支持同区域、跨区域、跨账户的复制操作。
graph TB
Source[📦 源桶<br/>source-bucket]
Target[📦 目标桶<br/>target-bucket]
subgraph Methods["🔧 复制方法"]
API1[🔄 CopyObject API]
API2[📋 Batch Operations]
API3[🔄 Cross-Region Replication]
CLI[💻 AWS CLI]
SDK[🛠️ AWS SDK]
end
Source --> API1 --> Target
Source --> API2 --> Target
Source --> API3 --> Target
Source --> CLI --> Target
Source --> SDK --> Target
classDef bucket fill:#e3f2fd,stroke:#1976d2
classDef method fill:#e8f5e8,stroke:#388e3c
class Source,Target bucket
class API1,API2,API3,CLI,SDK method
🔧 核心 API 方法
1. CopyObject API(推荐)
REST API 调用
PUT /target-key HTTP/1.1
Host: target-bucket.s3.amazonaws.com
x-amz-copy-source: /source-bucket/source-key
Authorization: AWS4-HMAC-SHA256 Credential=...
AWS CLI 命令
# 基本复制
aws s3 cp s3://source-bucket/file.txt s3://target-bucket/file.txt
# 跨区域复制
aws s3 cp s3://source-bucket/file.txt s3://target-bucket/file.txt \
--source-region us-east-1 \
--region us-west-2
# 跨账户复制(需要权限)
aws s3 cp s3://source-bucket/file.txt s3://target-bucket/file.txt \
--profile source-account \
--cli-write-timeout 0
2. 批量复制
# 复制整个目录
aws s3 sync s3://source-bucket/folder/ s3://target-bucket/folder/
# 递归复制所有文件
aws s3 cp s3://source-bucket/ s3://target-bucket/ --recursive
# 带过滤条件的复制
aws s3 cp s3://source-bucket/ s3://target-bucket/ \
--recursive \
--exclude "*" \
--include "*.jpg"
🛠️ SDK 实现示例
Python (Boto3)
import boto3
from botocore.exceptions import ClientError
def copy_s3_object(source_bucket, source_key, target_bucket, target_key,
source_region='us-east-1', target_region='us-east-1'):
"""
S3 对象跨桶复制
Args:
source_bucket: 源桶名称
source_key: 源对象键
target_bucket: 目标桶名称
target_key: 目标对象键
source_region: 源区域
target_region: 目标区域
"""
try:
# 创建 S3 客户端
s3_client = boto3.client('s3', region_name=target_region)
# 构建复制源
copy_source = {
'Bucket': source_bucket,
'Key': source_key,
'Region': source_region
}
# 执行复制
s3_client.copy_object(
CopySource=copy_source,
Bucket=target_bucket,
Key=target_key,
MetadataDirective='COPY' # 保留原始元数据
)
print(f"✅ 复制成功: s3://{source_bucket}/{source_key} -> s3://{target_bucket}/{target_key}")
return True
except ClientError as e:
print(f"❌ 复制失败: {e}")
return False
# 使用示例
copy_s3_object(
source_bucket='my-source-bucket',
source_key='documents/report.pdf',
target_bucket='my-target-bucket',
target_key='backup/report.pdf'
)
大文件分片复制
def copy_large_object(source_bucket, source_key, target_bucket, target_key):
"""大文件分片复制(>5GB)"""
s3_client = boto3.client('s3')
# 获取源对象信息
response = s3_client.head_object(Bucket=source_bucket, Key=source_key)
object_size = response['ContentLength']
if object_size > 5 * 1024 * 1024 * 1024: # 5GB
# 使用分片上传
copy_source = {'Bucket': source_bucket, 'Key': source_key}
# 创建分片上传
multipart_upload = s3_client.create_multipart_upload(
Bucket=target_bucket,
Key=target_key
)
upload_id = multipart_upload['UploadId']
parts = []
# 分片大小 (100MB)
part_size = 100 * 1024 * 1024
part_number = 1
for start in range(0, object_size, part_size):
end = min(start + part_size - 1, object_size - 1)
# 复制分片
part_response = s3_client.upload_part_copy(
Bucket=target_bucket,
Key=target_key,
PartNumber=part_number,
UploadId=upload_id,
CopySource=copy_source,
CopySourceRange=f'bytes={start}-{end}'
)
parts.append({
'ETag': part_response['CopyPartResult']['ETag'],
'PartNumber': part_number
})
part_number += 1
# 完成分片上传
s3_client.complete_multipart_upload(
Bucket=target_bucket,
Key=target_key,
UploadId=upload_id,
MultipartUpload={'Parts': parts}
)
Node.js (AWS SDK v3)
import { S3Client, CopyObjectCommand } from "@aws-sdk/client-s3";
async function copyS3Object(sourceBucket, sourceKey, targetBucket, targetKey) {
const s3Client = new S3Client({ region: "us-east-1" });
const copyParams = {
Bucket: targetBucket,
Key: targetKey,
CopySource: `${sourceBucket}/${sourceKey}`,
MetadataDirective: "COPY"
};
try {
const command = new CopyObjectCommand(copyParams);
const response = await s3Client.send(command);
console.log(`✅ 复制成功: ${response.CopyObjectResult.ETag}`);
return response;
} catch (error) {
console.error(`❌ 复制失败:`, error);
throw error;
}
}
// 使用示例
await copyS3Object(
'source-bucket',
'path/to/file.txt',
'target-bucket',
'backup/file.txt'
);
🔐 权限配置
源桶权限策略
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "AllowCopyFrom",
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::TARGET-ACCOUNT:user/copy-user"
},
"Action": [
"s3:GetObject",
"s3:GetObjectVersion"
],
"Resource": "arn:aws:s3:::source-bucket/*"
}
]
}
目标桶权限策略
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "AllowCopyTo",
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::SOURCE-ACCOUNT:user/copy-user"
},
"Action": [
"s3:PutObject",
"s3:PutObjectAcl"
],
"Resource": "arn:aws:s3:::target-bucket/*"
}
]
}
IAM 用户权限
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:PutObject",
"s3:PutObjectAcl"
],
"Resource": [
"arn:aws:s3:::source-bucket/*",
"arn:aws:s3:::target-bucket/*"
]
}
]
}
🌍 跨区域复制配置
自动复制规则
{
"Role": "arn:aws:iam::ACCOUNT:role/replication-role",
"Rules": [
{
"ID": "CrossRegionReplication",
"Status": "Enabled",
"Priority": 1,
"Filter": {
"Prefix": "documents/"
},
"Destination": {
"Bucket": "arn:aws:s3:::target-bucket",
"StorageClass": "STANDARD_IA"
}
}
]
}
CLI 配置复制规则
# 创建复制配置文件
cat > replication-config.json << 'EOF'
{
"Role": "arn:aws:iam::123456789012:role/replication-role",
"Rules": [
{
"ID": "ReplicateToBackup",
"Status": "Enabled",
"Priority": 1,
"Filter": {"Prefix": ""},
"Destination": {
"Bucket": "arn:aws:s3:::backup-bucket",
"StorageClass": "GLACIER"
}
}
]
}
EOF
# 应用复制配置
aws s3api put-bucket-replication \
--bucket source-bucket \
--replication-configuration file://replication-config.json
📊 性能优化
复制性能对比
graph TB
subgraph Performance["📈 性能对比"]
Single[🔄 单文件复制<br/>适用: <100MB]
Multi[📋 分片复制<br/>适用: >5GB]
Batch[🚀 批量操作<br/>适用: 大量小文件]
Sync[🔄 同步复制<br/>适用: 目录同步]
end
subgraph Metrics["📊 性能指标"]
Speed[⚡ 速度: 100MB/s]
Cost[💰 成本: $0.0004/1000请求]
Limit[⚠️ 限制: 5GB单文件]
end
Single --> Speed
Multi --> Cost
Batch --> Limit
classDef method fill:#e8f5e8,stroke:#388e3c
classDef metric fill:#fff3e0,stroke:#f57c00
class Single,Multi,Batch,Sync method
class Speed,Cost,Limit metric
优化建议
- 小文件(<100MB): 直接使用 CopyObject
- 大文件(>5GB): 使用分片复制
- 批量文件: 使用 S3 Batch Operations
- 实时同步: 配置 Cross-Region Replication
🚨 常见问题
错误处理
def robust_copy_with_retry(source_bucket, source_key, target_bucket, target_key, max_retries=3):
"""带重试机制的复制"""
import time
for attempt in range(max_retries):
try:
return copy_s3_object(source_bucket, source_key, target_bucket, target_key)
except ClientError as e:
error_code = e.response['Error']['Code']
if error_code == 'NoSuchBucket':
print(f"❌ 桶不存在: {e}")
return False
elif error_code == 'AccessDenied':
print(f"❌ 权限不足: {e}")
return False
elif error_code == 'SlowDown':
# 请求过于频繁,等待后重试
wait_time = 2 ** attempt
print(f"⏳ 请求限流,等待 {wait_time} 秒后重试...")
time.sleep(wait_time)
continue
else:
print(f"❌ 未知错误: {e}")
if attempt == max_retries - 1:
return False
time.sleep(1)
return False
常见错误码
- NoSuchBucket: 桶不存在
- NoSuchKey: 对象不存在
- AccessDenied: 权限不足
- InvalidRequest: 请求参数错误
- SlowDown: 请求频率过高
💰 成本计算
复制成本组成
graph TB
Cost[💰 总成本] --> Request[📋 请求费用<br/>$0.0004/1000请求]
Cost --> Transfer[🔄 传输费用<br/>跨区域: $0.02/GB]
Cost --> Storage[💾 存储费用<br/>目标桶存储成本]
Request --> Same[🏠 同区域: 免费]
Request --> Cross[🌍 跨区域: 收费]
Transfer --> Internal[🔒 内部传输: 免费]
Transfer --> External[🌐 外部传输: 收费]
classDef cost fill:#ffebee,stroke:#d32f2f
classDef free fill:#e8f5e8,stroke:#388e3c
classDef paid fill:#fff3e0,stroke:#f57c00
class Cost cost
class Same,Internal free
class Cross,External,Request,Transfer,Storage paid
成本优化建议
- 同区域复制: 无数据传输费用
- 批量操作: 减少 API 请求次数
- 存储类别: 选择合适的存储类别
- 生命周期: 配置自动删除或归档
🔍 监控和日志
CloudTrail 事件监控
{
"eventSource": "s3.amazonaws.com",
"eventName": "CopyObject",
"sourceIPAddress": "192.168.1.100",
"requestParameters": {
"bucketName": "target-bucket",
"key": "backup/file.txt",
"x-amz-copy-source": "source-bucket/file.txt"
}
}
CloudWatch 指标
- NumberOfObjects: 对象数量变化
- BucketSizeBytes: 桶大小变化
- AllRequests: 请求总数
- 4xxErrors: 客户端错误
- 5xxErrors: 服务端错误
📋 最佳实践
安全建议
- 最小权限: 仅授予必要的复制权限
- 加密传输: 使用 HTTPS 传输
- 访问日志: 启用 S3 访问日志
- 版本控制: 启用对象版本控制
性能建议
- 并发控制: 避免过多并发请求
- 重试机制: 实现指数退避重试
- 分片上传: 大文件使用分片复制
- 区域选择: 选择最近的区域减少延迟
成本建议
- 存储类别: 根据访问频率选择存储类别
- 生命周期: 配置自动转换和删除
- 压缩: 复制前压缩文件
- 去重: 避免重复复制相同文件
🎯 总结
S3 提供了强大而灵活的跨桶复制能力,支持:
- ✅ REST API: 直接 HTTP 调用
- ✅ AWS CLI: 命令行工具
- ✅ SDK: 多语言编程接口
- ✅ 批量操作: 大规模文件处理
- ✅ 自动复制: 实时同步机制
选择合适的方法取决于文件大小、复制频率、性能要求和成本考虑。
最后更新: 2025-01-08