Calculating composite SLA

Tip

A single service

Serial dependency

  • 99.5% availability for System A
  • 99.6% availability for System B

A bit more realistic

  • Application: unhandled exceptions, time sync issues, security issues, etc.
  • Runtime: cloud provider hiccups, lack of space/memory, etc.

Multiple dependencies

  • 99.5% availability for System A
  • 99.6% availability for System B (auth)
  • 99.1% availability for System C (database)

Parallel failover

A fault tolerant system has no service interruption but a significantly higher cost, while a highly available environment has a minimal service interruption.

  • 99.5% availability for replica 1
  • 99.6% availability for replica 2
  • 0.5% unavailability for replica 1
  • 0.4% unavailability for replica 2

Fallback

  • Failover: Perform the activity against identical copies of the system (either wait for one to fail or just send the request to all and return the quickest response)
  • Fallback: Use a different mechanism to achieve the same result.
  • 99.5% availability for System A
  • 99.1% availability for System B (database)
  • 99.8% availability for System C (queue)
  • 0.9% unavailability for System B (database)
  • 0.2% unavailability for System C (queue)

Conclusion

  • Reliability is not free
  • SLA is tied to system architecture

References

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store