GDPR pseudonymization techniques
This article builds on top of GDPR in Plan English.
Your personal data is any information that can be used to directly or indirectly identify you. For example: your name, home address, photo, email address, bank details, posts on social networking websites, medical information, or a computer or mobile IP address.
IP address is considered personal data. Same as email address, phone number and social security number.
Before we start, a quick disclaimer: I don’t represent my current/previous employers on my personal blog. The information provided here is purely based on my own research. I am lucky to work at an international European company that takes GDPR seriously and I had the benefit of asking questions from a privacy lawyer but this post doesn’t necessarily reflect my company’s policies, strategy or implementation of GDPR.
According to GDPR personal data should be pseudonymized in a way that it can no longer be linked (or “attributed”) to a single data subject (user) without the use of additional data.
This additional data should be kept separate from the pseudonymized data and subject to technical and organisational measures to make it hard to link a piece of data to someone’s identity (non-attribution).
Anonymization vs pseudonymization
Pseudonymous data can still go through re-identification to link (attribute) it to an individual again.
But while anonymous data cannot be re-identified.
There are many ways to pseudonymize the data, which depends on the privacy impact assessment.
- Scrambling techniques involve a mixing or obfuscation of letters. The process can sometimes be reversible. For example: “Ewerlöf” could become “Ölfeewr”
- Encryption, which renders the original data unintelligible and the process cannot be reversed without access to the correct decryption key. The GDPR requires for the additional information (such as the decryption key) to be kept separately from the pseudonymized data.
- A masking technique allows an important/unique part of the data to be hidden with random characters or other data. For example: “5500 0000 0000 0004” credit card number can be stored as “XXXX XXXX XXXX 0004”. The advantage of masking is the ability to identify data without manipulating actual identities.
- Tokenization is a non-mathematical approach to protecting data at rest that replaces sensitive data with non-sensitive substitutes, referred to as tokens. The tokens have no extrinsic or exploitable meaning or value. Tokenization does not alter the type or length of data, which means it can be processed by legacy systems such as databases that may be sensitive to data length and type. That is achieved by keeping specific data fully or partially visible for processing and analytics while sensitive information is kept hidden.
- Data blurring uses an approximation of data values to render their meaning obsolete and/or make it impossible to identify individuals. A good example is a typical blurred face in an image (though it’s not a very good idea).
Hashing vs Encryption
As opposed to hashing, Encryption is a two-way transformation meaning that with a key the data can be decrypted and be transformed to its original form. Here is a super simple encryption algorithm that adds a certain number to ascii values in a string:
Encryption is a reversible action and its reverse is called decryption.
If the same key can be used to encrypt and decrypt the data, it is called “symmetric encryption” which is not safe. But if a different key is used, it is called “asymmetric encryption”.
In an Event Sourcing database, instead of CRUD operation you have CRAB (create, read, add, burn). Essentially you don’t delete or modify anything (similar to blockchain).
GDPR compliance for event sourcing databases is straight forward using crypto-thrashing. When throwing away the key, we effectively make it impossible to go back to the original data. Therefore the data is anonymized and falls outside the gDPR.
There is no feasible way to get back to the original data, it is now just random data.
The main problem for implementing GDPR compliance for event sourcing is when it is an after thought and it can get really ugly if it is not foreseen from the start.
If you want to learn more about GDPR implications for Blockchain technology, I got you covered.