Dissecting the S3 SLA

Why S3?

I’m going to pick S3 because:

  • It is from one of the world’s largest cloud providers
  • It is an internal dependency for some of the other popular services at AWS like EC2, Lambda, etc. which may affect their SLA
  • It has a relatively high SLA (99.9% availability) and a very good track record in the 16 years it’s been live
  • It hasn’t changed much since Jeff Bezos wanted malloc (a key memory allocation function for C programs) for the Internet. (source)

Uptime

There are many performance metrics to even a simple service like S3 but Amazon has decided to only guarantee the uptime. It is admittedly the most important golden signal. Here’s the full list:

  • Traffic: the S3 SLA doesn’t commit to how much load can be put simultaneously on one S3 bucket or object in a specific region however if it leads to errors, it’s covered (see below)
  • Errors: what percentage of good requests (eg. to a valid object) return an error. Note that instead of just relying on a synthetic ping, the S3 SLA commits to an SLA for your specific requests which is more realistic
  • Saturation: as a managed service with no theoretical limit on storage, this is not critical for S3

Only when you use the service

No rolling error budget

  • Rolling month: calculate the error budget for the last 30 days. This prevents the caveat above but doesn’t exactly map to how the billing periods are set up.

10% for <99.9%

The SLA and credit is clearly mentioned in a table:

You don’t get it automatically

You are responsible for the evidence

When contacting the support to get the credit, you need to have evidence supporting that the SLA was breached:

You have to apply within a deadline

Credit is not refund

Instead of getting a refund, you get credit which basically acts as a discount coupon towards your next billing cycle. You do however pay actual money to AWS. By giving you credit instead of actual money, AWS ensures that you stay a customer even if the service level didn’t match the expectation.

There’s a lower bound

Other services have less SLA (98%)

Not everything that goes under the S3 brand enjoys the same SLA:

  • 99% allows for 7h 18m 17s downtime per month

No disaster recovery

They don’t cover any issues:

Conclusion

AWS is one of the oldest and most popular cloud providers on the internet and S3 is one of their first services. This is a good example about what to look for when evaluating the SLA of a vendor. There are 3 main concerns:

  1. How do they measure
  2. When/how credits are paid

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store