Mage Data is the Customers’ Choice for Data Masking for the third consecutive year. READ MORE >

November 10, 2020

Differences between Anonymization and Pseudonymization

Under the umbrella of various data protection methods are anonymization and pseudonymization. More often than not, these terms are used interchangeably. But with the introduction of laws such as the GDPR, it becomes necessary to be able to distinguish both techniques clearly as anonymized data and pseudonymized data fall under different categories of the regulation. Moreover, this knowledge also helps organizations make an informed choice in the selection of data protection methods.

So, let’s break it down. Anonymization is the permanent replacement of sensitive data with unrelated characters, which means that data, once anonymized, cannot be re-identified, wherein lies the difference between both methods. In pseudonymization, the sensitive data is replaced in such a way that it can be re-identified with the help of an identifier (additional information). In short, while anonymization eliminates direct re-identification risk, pseudonymization substitutes the identifiable data with a reversible, consistent value.

However, it is essential to note that anonymization may sometimes carry the risk of indirect re-identification. For example, let’s say you picked up the novel The Open Window. The author’s name on the book is Saki. But this is a pen name. If you were to pick up another book of his, called The Chronicles of Clovis, you would notice that he has used his real name there, which is H. H. Munro, and that the writing style was similar. Hence, even though you didn’t know that the book was by Munro, you could put two and two together and find out that this is also a book by Saki based on the style of writing.

The same example could also apply to a shopping experience, where you may not know the name of the customer who made the purchase but may be able to find out who it is if you can identify that this customer has had a constant buying behavior. Every day for the past one year Alex has visited Starbucks at 1500, Broadway at 10:10 am and ordered the same Tall Mocha Frappuccino. Hence, even if his personally identifiable information, such as name, address, etc., has been anonymized or eliminated, his buying behavior still allows you to re-identify him. Therefore, organizations should be meticulous when they anonymize sensitive data, careful to hide any additional information that might aid re-identification.

There are a variety of methods available to anonymize data, such as directory replacement (modifying the individual’s name while maintaining consistency between values), scrambling (obfuscation; the process can sometimes be reversible), masking (hiding a part of the data with random characters; for example, pseudonymization with identities), personalized anonymization (custom anonymization) and blurring (make meaning of data values obsolete or re-identification of data values impossible). Pseudonymization methods include data encryption (change original data into a ciphertext; can be reversed with a decryption key) and data masking (masking of data while maintaining its usability for different functions). Organizations can select one or more techniques depending on the degree of risk and the intended use of the data.

Mage approaches anonymization and pseudonymization with its leading-edge solutions, named Customers’ Choice 2020 by Gartner Peer Insights. To read more, visit iScramble™ for Static Data Masking and iMask™ for Dynamic Data Masking.