Dissecting the S3 SLA

Why S3?

  • It is a relatively simple service (S3 literally stands for Simple Storage Service)
  • It is from one of the world’s largest cloud providers
  • It is an internal dependency for some of the other popular services at AWS like EC2, Lambda, etc. which may affect their SLA
  • It has a relatively high SLA (99.9% availability) and a very good track record in the 16 years it’s been live
  • It hasn’t changed much since Jeff Bezos wanted malloc (a key memory allocation function for C programs) for the Internet. (source)

Uptime

  • Latency: the S3 SLA doesn’t commit to a specific SLA if a GET or PUT request to S3 takes too long leading to a timeout
  • Traffic: the S3 SLA doesn’t commit to how much load can be put simultaneously on one S3 bucket or object in a specific region however if it leads to errors, it’s covered (see below)
  • Errors: what percentage of good requests (eg. to a valid object) return an error. Note that instead of just relying on a synthetic ping, the S3 SLA commits to an SLA for your specific requests which is more realistic
  • Saturation: as a managed service with no theoretical limit on storage, this is not critical for S3

Only when you use the service

No rolling error budget

  • Calendar month: if the service was down for 43 minutes at the end of December 31st, it’ll earn a new error budget on Jan 1 and can immediately be down for another 43 minutes bringing the total to 86 minutes. However, as long as AWS is concerned, both December and January are in the green.
  • Rolling month: calculate the error budget for the last 30 days. This prevents the caveat above but doesn’t exactly map to how the billing periods are set up.

10% for <99.9%

You don’t get it automatically

You are responsible for the evidence

You have to apply within a deadline

Credit is not refund

There’s a lower bound

Other services have less SLA (98%)

  • 99.9% allows for 43m 49s downtime per month
  • 99% allows for 7h 18m 17s downtime per month

No disaster recovery

Conclusion

  1. What they promise
  2. How do they measure
  3. When/how credits are paid

--

--

--

Sr. Staff Engineer, Knowledge Worker, MSc Systems Engineering, Tech Lead, Web Developer

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Striker Production — About:

Striker Production

Best of the Week — February 14/20

How To Create A New GitHub Organization With An OAuth App Setup

Complete Guide to Lambda Triggers and Design Patterns (Part 2)

C++ QNX cross-compilation and static analysis with pvs-studio

The TAO: A New Framework to Power the Web

terraform destroy a vpc

Kotlin Scope Functions Made Simple

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Alex Ewerlöf

Alex Ewerlöf

Sr. Staff Engineer, Knowledge Worker, MSc Systems Engineering, Tech Lead, Web Developer

More from Medium

Calculating the SLA of a system behind a CDN

In Focus: Deepu Thomas Philip

Tracing our Observability Journey

Platform Engineering