What makes a good error message?
Typically the bigger part of software development time goes into solving problems that makes it work in the happy scenario. Many developers go the extra mile and use techniques like static code analysis or automated tests (unit testing, integration tests, smoke test, etc.) to detect potential errors. But in reality there are too many variables that can break the software: users, dependencies, hardware, network failures, to name a few.
No one enjoys receiving errors messages, but software that fails silently or proceeds with the wrong assumptions is even worse.
Usually the expected errors are mitigated by the program (usually called warnings). But when it is not possible, the best option is to show the error to the user for manual intervention.
Errors are encountered by real users with real goals. The error message is a communication method to help the user continue their goal.
The user in this context can be the end user or the developer trying to test, fix, maintain or add feature to the software.
Although the majority of the errors happen while developing the software, many of them find their way to production and some materialize themselves to the users. The “runtime” errors are inevitable. We’ve all seen them.
3 questions for every error
A good error is actually very helpful and can potentially prevent disastrous and expensive consequences or at least shorten the time required to fix it.
First of all, treat the errors as a communication medium to lead the user to a solution. Good leadership starts with why as beautifully demonstrated by Simon Sinek in this TEDx talk:
A good error message should indicate:
- WHY the program could not proceed?
Context about the goal of the process being executed and why the situation is an error. If the error is due to user input or data from other systems, ensure to include it in the error message. Sometimes user doesn’t provide the input in the right format or the system picks or parses the wrong input. Announce your assumptions for the error.
- HOW the error was detected?
A map for finding the root cause and inspecting more. Stack trace, error code or even a state dump is very useful.
- WHAT can be done to resolve it?
Suggestions for recovering or taking an alternative path to the goal. If this is a typical error, mention one or more alternatives that can usually help solve the issue. If you have support channels make sure to include them as well.
- WHERE did the error happen?
This provides useful context information for the developers trying to trace and/or reproduce the error.
- WHEN did the error happen?
Usually a form of timestamp. This information can help correlate the error to failures in other systems while inspecting the root cause.
- WHO was trying to achieve the goal?
Usually some form of identification like user id, session id or even user name. This data might be subjected to GDPR regulation, see my other articles introducing GDPR and pseudonymization techniques.