Mage Data is the Customers’ Choice for Data Masking for the third consecutive year. READ MORE >

January 6, 2021

Difference between Encryption, Tokenization, and Masking

We’re all aware that the adoption of technology is necessary for keeping our data safe. And today, we have a plethora of techniques to help us protect our sensitive information. However, each of these techniques might seem just as tempting as the other, which is where most of us face a kink in the decision-making process; “What technique should I select, that would best suit the company’s need, and bring the most benefit, without risking security?”

Now that’s definitely not a simple question. And if you seem to be asking yourself the same, you know that you can’t find a solution shorthanded – you’re going to need all the information you can get. Because, making the right choice requires knowledge about the various features of each technique, which is probably why you’re here.

So, without further ado, let’s discuss the three most popular data security techniques – encryption, tokenization and masking. Let’s break down each technique in detail, from how it functions, to its use cases, down to its risks if any, so you can make the right decision without hesitation.

What is Encryption?

Encryption works by encoding the original plaintext data to an unreadable ciphertext through the use of sophisticated algorithms. A decryption key is needed to revert to a readable format. To enable higher performance, an encrypted mapping table can also be used to decrypt the data. Format Preserving Encryption (FPE) can be done, but it is less secure than that of tokenization and masking.

Encryption is best suited for unstructured fields (also supports structured), or databases that aren’t stored in multiple systems, and for protecting files. It is widely used – but not restricted to – protecting sensitive data for data communication. For this reason, encryption is suitable for exchanging data with third parties, who can access the original data if needed with a decryption key. It is also used to protect sensitive data such as payment card information (PCI), personally identifiable information (PII), financial account numbers, and more. Encryption can also be scaled properly with the help of a small encryption key.

What is Tokenization?

Tokenization is similar to encryption, the main difference being that a random generated alphanumeric value, called a token, replaces the original value, whereas in encryption algorithms are applied on plaintext to create ciphertext. In tokenization, the token server stores the relationships between the original and the token values. When a user application needs the original data, the tokenization system looks up the token value in the token database to retrieve it. This technique always preserves the format of the data, while maintaining high security.

Tokenization supports structured data fields (also supports unstructured), and hence is mostly used – but is not restricted – to protect sensitive data in payment processing systems, such as credit card information or social security numbers. Exchanging data becomes difficult here, unlike in encryption, because of access to the token database. Unlike encryption, tokenization does not scale well because the token database increases in size. On the other hand, less computing power is needed to process it.

What is Masking?

Masking has various approaches ranging from simple to complex based on the organization’s use case. A simple method is to replace the real data with null or constant values. A slightly sophisticated approach would be to mask the data in a way that retains the identity of the original data to preserve its analytical value. This approach ensures the efficient use of masked data for analysis without the fear of leaking private information.

Broadly speaking, there are two types of masking methods. Permanent scrambling of data which cannot be retrieved once masked is called Static Data Masking (SDM). Masking can also be used to control access to sensitive data based on who the user is. This method, known as Dynamic Data Masking (DDM), allows only authorized users to view the original data, whereas the masked data is shown to unauthorized users. Masking is used to secure structured and unstructured fields, in both non-production and production environments, to allow for testing or quality assurance requirements and user-based access without the risk of sensitive data disclosure. Masking always preserves the format, but there are chances of reidentification risk.

For detailed information on how to effectively anonymize data – ensure data functionality and security while eliminating reidentification risk, you can read the following two-part article on our blog (which is also posted on Forbes Tech Council): Reidentification Risk of Masked Datasets

Generally, data once masked cannot be unmasked, even thought reversible masking methods exist. As a result, it is very easy to exchange masked data with third parties since they cannot view the original data (unless you want them to as in DDM). Unlike encryption and tokenization, which is more suited for unstructured and structured fields respectively, masking pays more importance to the structure of the data itself rather than the actual values. It also scales better than encryption since there’s nothing additional to store.

Which solution you should opt for comprehensive data security?

Clearly, we can see that each technique has its own benefits. But if I was forced to favour one, just a little bit more than the others, I would say masking any day. Let me explain why.

While encryption and tokenization are used to secure data at rest and data in motion, masking is especially beneficial for data in use. When data is continually used for business purposes such as testing and development, encryption or tokenization becomes a complicated process. This is because a key or a token value is needed many times to retrieve the real data to not risk the disclosure of sensitive information. Masking addresses this issue wherein the masked data retains the characteristics of the original data, meaning it resembles the original data but is still fictitious. Hence, it is functional for business use cases without compromising sensitive data. Moreover, masking is a broader technique, and may even encompass tokenization and encryption techniques, given that it’s a comprehensive and well thought out solution.

However, at the end of the day, the choice of technology depends on the organization’s need and their resources. Depending on these factors, a combination of all three technologies can also end up being the best way to go about it.

Finally, one should also note that without finding all your sensitive data, the process of selecting a technique/s to protect it becomes moot.

To learn more about data discovery that works, and how to go about it, you can read the following article on Forbes Tech Council: Three Considerations for Data Discovery Solutions

In kind, an additional point to note would be the compliance side of things. So far, we have only been talking about security. So, make sure to take into account privacy compliance laws like PCI-DSS, GDPR, CCPA, HIPAA, etc. so you don’t compromise compliance in a quest to secure your data.