Mage Data strengthens its data security posture with the ISO 27001 certification. READ MORE >

July 21, 2022

A Data Protection and Data Privacy Glossary

The vocabulary of data protection seems to change every few years. Legislators pass new regulations, user expectations rise, and new technologies become available. It’s hard enough to keep up with the jargon, never mind best practices.

Those responsible for implementing data privacy solutions need to “talk the data privacy talk” before walking the walk. This helps ensure that you can be advised on the right set of solutions to solve the real problems at hand—and, hopefully, never have to guess whether or not you’re protected. This glossary provides definitions and explanations for 20+ data protection words, phrases, and concepts.


Data anonymization helps companies maximize the utility of data while preserving compliance. Anonymization removes personally identifiable information (PII), so the data cannot be tied to individuals if leaked or misused. Anonymizing the data eliminates privacy concerns so an organization can retain information for forecasting and other analysis. Businesses must avoid the most common data anonymization mistakes to keep their user information private.

Big Data

Big Data refers to data sets that are too large or complex for traditional data software solutions. Organizations are receiving and retaining increasing volumes of data, and modern data sets contain a much larger variety of information.

California Consumer Protection Act (CCPA)

The CCPA is one of the most significant data privacy regulations. The regulation was signed into law in June of 2018 and went into effect at the beginning of 2020. The CCPA gives users increased data privacy rights, and it’s changing the ways businesses in the United States collect and use information.

California Privacy Rights Act (CPRA)

The CPRA was passed in November 2020 and goes into effect at the beginning of 2023. It extends the CCPA, bringing additional protections for consumer information and increasing fines for violations. The regulation applies to all companies doing business in California or with customers within the state.

Database Activity Monitoring (DAM)

Database activity monitoring is a security technology for detecting fraudulent, illegal, or otherwise inappropriate behavior within a database. DAM gives security professionals the ability to monitor access to sensitive data in real time. Immediate, ongoing reporting helps keep an organization audit-ready.

Data Minimization

This principle means an organization must limit the collection of personal information to what is relevant and necessary. Furthermore, organizations should retain information only for as long as needed to satisfy a specific purpose. The GDPR (defined below) was one of the first to establish guidelines for data minimization.

Data Obfuscation

This term is often used interchangeably with data masking. Data obfuscation is the process of modifying sensitive data to protect the privacy of individuals. The process eliminates opportunities for hackers or other unauthorized parties to derive value from the data. At the same time, data obfuscation techniques can preserve the utility of data for authorized parties and software.

Data Privacy

Data privacy has to do with collecting, storing, and using data responsibly. Data privacy efforts focus on ensuring that only the appropriate parties have access to information. Explore the differences between data privacy and data protection to gain a deeper understanding of each.

Data Protection or Data Security

People often use data protection and data security interchangeably. These terms refer to strategies for ensuring the availability and integrity of data while guarding against threats. While there is some overlap with compliance, it’s worth noting that compliance with regulations is not the same as complete data security.

Data Retention

The principle of data retention outlines procedures for meeting requirements around data archiving and management. Organizations must store some information for specified periods to comply with government or industry regulations. Occasionally, there is tension between data retention and data privacy.

Data Scrambling

This method of obfuscating or removing confidential data is irreversible. Data scrambling techniques involve the generation of randomized strings that cannot be restored to the original information.

Data Scrubbing

Also known as data cleaning or data cleansing, data scrubbing is the process of fixing erroneous information within a data set. Examples that require scrubbing include incomplete, incorrect, and duplicate data. Data scrubbing is a two-step process. First, identify errors in the data set. Then change, update, or remove data as needed to correct issues.

Data Subject Access Rights

The right of subject access says individuals are entitled to obtain copies of their data. Technologies like data subject access rights automation help organizations respond to requests more efficiently.


De-identification of data is a type of dynamic data masking. This process involves stripping identifiers from collected data. Removing links between data and personal identities helps protect the privacy of individuals.


Encryption is the process of encoding data to protect the information from unauthorized access. Typically, an algorithm will turn plaintext data into unreadable ciphertext. This helps when sharing data with third parties, which may then decrypt the information with the decryption key.

General Data Protection Regulation (GDPR)

Adopted in 2016 and effective in May 2018, the GDPR is a model for many other data privacy laws. The regulation is part of EU privacy law and human rights law. The GDPR gives individuals more control over their personal data and supersedes other data protection regulations for international business.

Health Insurance Portability and Accountability Act (HIPAA)

This act, passed in 1996, is a federal law in the United States. It established national standards to prohibit the disclosure of sensitive health information without the patient’s disclosure.

Homomorphic Encryption

is a specialized type of encryption designed for data in use. Typically, encrypted data is transferred, decrypted, and then analyzed. Homomorphic encryption allows data to be valuable without being decrypted first.


There are multiple types of data masking. Static data masking techniques like tokenization and encryption protect data in pre-production and non-production environments. Dynamic masking protects data in production environments when it’s in transit or in use.

Personally Identifiable Information (PII)

PII is any personal data that relates to an identifiable person. Information such as names, addresses, and Social Security numbers are PII because they can directly identify an individual. Combinations of other information such as age, race, gender, birth date, and more can also be PII.

Protected Health Information (PHI)

As defined by HIPAA, PHI is any data related to an individual’s health. PHI also includes the healthcare provided to an individual or payment by the individual for said healthcare. PHI is a top consideration when developing  .

Personal Data Protection Act (PDPA)

The PDPA is a piece of data protection legislation from Singapore. It passed in 2012 and regulates the way organizations in the private sector can process personal data.

Privacy-Enhancing Technologies (PETs)

PETs are technologies to maximize data privacy while empowering individuals. These technologies help organizations get more from their data without compromising privacy or security.


Pseudonymization is the process of replacing sensitive data with a reversible, consistent value.  However, this brings increased risks of reidentification.


The phenomenon of having personal data extracted or inferred from a source, usually as the result of bad actors attempting to steal that data. For example, a classic case of reidentification occurred when New York City released data on taxi travel, but formatted the data in such a way that it was trivial to recover personally identifiable information of drivers, such as income, home address, etc.

Sensitive Data Discovery

As organizations store increasing volumes of data, it becomes crucial to discover sensitive information that may be hidden or forgotten. Sensitive data discovery is the first step in any data privacy and data security strategy. After all, you can’t protect what you don’t know.

SOX Compliance

Outlined by the Sarbanes-Oxley (SOX) Act,   involves annual auditing of public companies for accuracy and security in their financial reporting. To achieve SOX compliance, companies must keep data secure, track attempted breaches, keep event logs, and prove compliance for the most-recent 90-day period.


Like encryption, tokenization replaces plaintext data with an algorithm-generated value or string of values. In tokenization, the original data is retained in a secure server. The generated token can be passed to that secure server to retrieve the original information.

Getting Started With Data Privacy and Data Security

To see how data protection and data privacy concepts fit into a comprehensive product suite, schedule a demo with Mage. We’re happy to address your specific questions and tailor the demo to fit your requirements.