Simple Storage Service (S3)
Basic Facts
- Object-based storage.
- Objects are stored as key/value pairs (key = filename, value = data).
- Not to be used for operating systems or databases.
- Objects are stored in buckets.
- Objects are accessed using HTTP and receive a HTTP 200 code after successful write.
Use Cases
Bulk Storage
S3 is often used to store backup archives, log files or disaster recovery images.
Analytics
Big Data at rest can be stored in S3 for analytics processing.
Web Hosting
Static websites can be hosted from S3.
Design Considerations
Universal namespace. Buckets must have a universally unique name.
Buckets are stored in a region.
Unlimited storage. Objects up to 5GB. Scales automatically.
S3 is a regional construct and lives outside the VPC.
Use S3 batch operations when performing one-time or recurring actions at scale. This can tag objects and then use lifecycle management for reclassification.
Use prefixes to imitate folder structures
Tag data appropriately. Add metadata where possible. Create lifecycle policies immediately.
Use Cloudwatch for request metrics.
Data Transfer
- AWS Direct Connect
- Kinesis
- Snowball – Snowball Edge – Snowmobile
- Use Multipart Uploads to increase date transfer performance. Use Byterange downloads for faster downloading.
Redundancy Considerations
Durability is 11 x 9’s.
Replication can be configured to another bucket in the same region or different region.
Delete markers not replicated by default.
Versioning must be enabled for replication.
S3 Classes
Class | Availability and Durability | AZ | Use Case |
S3 Standard | ≥3 AZ | Active, frequently accessed data. | |
S3 Standard Infrequent Access | >3 AZ | Infrequently accessed data. | |
S3 Single Zone IA | 1 AZ | Recreatable less accessed data | |
S3 Intelligent Tiering | >3 AZ | Data with changing access profiles. | |
S3 Glacier | >3 AZ | Archive – minutes to hours to access | |
S3 Glacier Deep Archive | >3 AZ | Archive – 12 hours to access |
Infrequent = 1 time per month
S3 Storage Class Analysis
Monitors access patterns and classify data as frequently or infrequently accessed.
Lifecycle Policies
Automate object classes based on policy.
Can leverage object tags for more granular policies. ie move all objects with Project XYZ to Glacier.
Security
Use policies: Resource based: Object ACLs / Bucket ACLs / Bucket policies
Block public access: at account or bucket level
Encrypt data by default (SSE-S3 and SSE-KMS)
Bucket permission checks
AWS Trusted Advisor
Signed Amazon S3 links allow temporary tokens to write objects to a bucket
Object Lock can keep objects for Compliance, Governance, or Legal reasons.
Versioning
Not enabled by default.
Objects are given a version ID when versioning is activated. Versioning keeps multiple copies of a file. Only the latest version of the file is available by default.
Once versioning is enabled, it cannot be disabled. It may be suspended.
Delete adds a delete marker to the top version of the object. Get will come back with an error. Gives possibility to roll back.
Get request gets most recent version. Alternatively specify version ID requested.
Specific versions of objects can be deleted. Lifecycle policies can also do this.
Useful Links
https://docs.aws.amazon.com/AmazonS3/latest/userguide/Welcome.html
https://docs.aws.amazon.com/AmazonS3/latest/userguide/optimizing-performance.html