Mage Data strengthens its data security posture with the ISO 27001 certification. READ MORE >



Ultimate Guide to Sensitive Data Discovery


The rise of data privacy and security laws worldwide has transformed the data security landscape. Before these laws, the focus for security was primarily on the perimeter—keeping the bad actors out. Laws like the GDPR in Europe and the CCPA in California fundamentally altered how companies must approach data security. For the first time, a major emphasis was placed on regulating how companies use customer data, and formerly benign uses could result in serious regulatory action today. Businesses of all sizes need a plan to handle their data in accordance with the ever-growing set of data privacy and security laws. And these plans are impossible without a robust sensitive data discovery process.

What is Sensitive Data?

Before exploring sensitive data discovery, we must first establish what sensitive data is. Under most data privacy and security laws, not all data is treated equally. The first data group is personal data, which is simply any data pertaining to a person. Note that this could be a user or customer of your company, or an employee. Sensitive data is a subset of personal data that a regulatory authority has deemed more sensitive and, thus, more worthy of protection.

So what counts as sensitive data under the law? Under the GDPR, some data types include racial or ethnic makeup, political stance, religious belief, and sexual orientation. California’s law has a similar list, but also includes biometric data, medical or health information, precise geolocations, and social security numbers. For a deep dive into sensitive data types and a comparison of how different laws require companies to handle them, check out our article on the topic.

Sensitive data is usually categorized as such because if it was to be leaked or accessed improperly, it could cause harm to the person who generated the data. It’s also important to note that the person who generated the data ultimately controls it under modern data privacy laws. That means that they have the right to ask a company to provide all data they hold for review or even to ask the company to delete all data it has that relates to them. Compliance with these newer laws means that companies must be able to handle these data subject access requests at scale.

Depending on the law, the company may be required to obtain consent to process and use a person’s data and to only collect the minimum amount of data needed to provide a product or service. Laws also may restrict companies from moving personal information across borders unless it has been anonymized or pseudonymized.

What is Sensitive Data Discovery?

Given the tight regulations around how companies may use sensitive data, companies must take steps to protect sensitive data. However, this generally isn’t an easy task. A company’s data may be spread across dozens or hundreds of databases, which may be in dozens of different countries. If a customer asks to review their data, or if a regulator wants to ensure that you’re processing data correctly, you need to be able to deliver in near real time. And with such quantities of data stored in a deeply fragmented way, that can be a massive challenge.

Sensitive data discovery is the answer to that problem. Sensitive data discovery is the process of uncovering and organizing all data sources so that all sensitive data is known and accessible. As such, it is a subset of data discovery as a whole. During the process, you may answer questions such as:

  • How many discrete databases with sensitive information do we have?
  • How do the databases relate to each other (if at all)?
  • What types of sensitive data are stored?
  • How is the data secured? Who has access to it?

The above questions are a part of the subprocess known as data exploration. Data exploration is critical to understanding your data and the relative risk each piece of data represents. In addition to data exploration, your company may want to perform data preparation during the data discovery process. Data preparation aims to identify any Redundant, Obsolete, or Trivial (ROT) data and, if possible, eliminate it. ROT data takes up storage space, so eliminating it can save you money and reduce your overall data risk.

Explore the features of a robust sensitive data discovery tool, provided by Mage.

Why Is Sensitive Data Discovery Important?

The key reason sensitive data discovery is so important is that it is necessary for regulatory compliance. Unfortunately, being 95% compliant with a data privacy or security law will be treated the same as if you weren’t at all compliant. Companies won’t be able to reach full compliance unless they know what sensitive data they have and where it is stored. Likewise, when a user submits a data access request, a company won’t be able to fulfill it as required unless they have a data discovery process in place.

But there’s more to the story than just regulatory compliance. Data has a lifecycle that begins with its creation and ends with its retirement. However, its exact path may vary widely based on its type. Some data, especially sensitive data, will require additional security. Other data will be leveraged for analysis. Yet more data may be critical to accomplishing your core mission. Whatever the case, you won’t be able to start each data point on its appropriate journey until you know what it is and where it lives.

Sensitive data discovery also helps accelerate the data analysis process. Data analysts generally need to categorize and clean data before they begin their analysis. The data discovery process takes care of a lot of that prework for them since it’s already categorizing and sometimes cleaning data. As a result, analysts can spend more time generating insights, and less wrangling data, improving their overall efficiency.

How Common is Sensitive Data? Where is it Located?

Every company will have at least some sensitive data. How much your company has will vary based on your industry and your specific data handling practices. For example, financial services or healthcare companies will likely have more sensitive data, while retail companies will likely have less. However, how you approach data also influences how much sensitive data you have. Companies primarily using their data for enabling transactions likely have less personal data than those who leverage data-driven insights as a major part of their decision-making process.

Like all kinds of data, understanding what you have and where it lives is not always straightforward for sensitive data. The first place most people start to look for data is in their most commonly used databases. This approach makes a lot of sense, but it won’t uncover all places where sensitive data can be found. When companies adopt new systems, databases often get “orphaned” or forgotten about, and they may still be full of sensitive data. Failing to secure it properly could lead to a breach or regulatory action even if you’re not actively using it.

What about Unstructured Data Discovery?

When most people think of databases, they first think of a relational database composed of data points held in rows and columns within a table. Relationships between different data points are used to show connections and can exist within a table or across different tables. The importance of these relationships lends its name to the “relational” database. Relational databases have a predictable data structure, allowing them to be robust and scale well, but also leading to inflexibility.

While relational, or “structured” data tends to be what people think of first when talking about data, it’s only a portion of the data that exists. Much of the information businesses produce will be in the form of unstructured data. IBM reports that over 80 percent of all enterprise data is held in an unstructured format. Unlike structured data, unstructured data doesn’t adhere to any particular format or schema. As a result, it can be difficult to process its contents and understand the underlying data.

Unstructured data can be found in non-relational, or NoSQL, databases. However, not all unstructured data is found in a database. Employee emails, chat logs, presentations, and documents all contain data and frequently contain sensitive data. This data is fundamentally unstructured, making identifying and protecting sensitive data a huge challenge. Unfortunately, many companies overlook this data type and, as a result, fail to secure it, leading to unnecessary risk. Unstructured data discovery tools are designed to discover and process this kind of data to ensure that companies know what they have and that even unstructured data can be properly secured.

Data Classification Tools in Sensitive Data Discovery

Once companies have discovered all the structured and unstructured data they hold, the next step is to classify the data to make it useful and ensure that it is properly protected. Data classification is the process of taking identified data and determining its sensitivity or potential impact level. Data points with similar sensitivity and impact levels can be grouped into classes.

Then security policies that cover creation, access, manipulation, and deletion can be set at the class level. This approach saves significant time when compared to securing each data point manually. Data classification tools help automate this process and work continually to ensure that your business is always classifying and appropriately securing newly discovered data.

Features to Look for in Sensitive Data Discovery Tools

Businesses should look for a few key features in a sensitive data discovery tool:

Automation
The massive scale of modern data means that organizations cannot handle their data manually. Sensitive data discovery tools should be able to automate this process entirely. That means it should handle all data types and be able to run scans at a set interval to ensure that your business remains secure.

Flexible Data Types
The more data types a tool can support out of the box, the better. However, most organizations will have at least one data type that is unique to them, and possibly quite a few more. The ability to easily set a custom data type for your data discovery tool to identify can dramatically improve the overall user experience.

Multi-Channel Operation
As we covered above, your data isn’t just in relational databases anymore. In order to ensure that you’re cataloging all data, you need a data discovery tool that can work with structured and unstructured data, and that can look beyond traditional databases. Remember that employees often inadvertently produce new instances of sensitive data, so failing to identify sensitive data in employee communications and documents can open you up to legal scrutiny.

Risk Reporting
It’s important to recognize that not all data carries the same level of risk. Securing data requires time and resources, but over-securing data can drain worker productivity. Companies need a tool that can understand the relative risk of a data point and apply the correct level of security. And to ensure it’s doing its job right, employees will need regular risk reporting. As an added bonus, these risk reports can help a company understand what types of data drive the most risk and thus require the most attention.

How Mage Helps with Sensitive Data Discovery

Sensitive data discovery matters because it impacts the rest of your data security process. If it’s not performing well, you’ll accumulate significant risk even if the rest of your process runs perfectly. So, it’s important that you get a sensitive data discovery tool that you can trust. Mage’s patented sensitive data discovery has a 4.4/5 rating on Gartner Peer Insights and has supported customers of all sizes, from small businesses to enterprises. It handles structured and unstructured data, can identify more than 70 data types out of the box, plus features custom classification options. And with sample, full, or incremental scans, it will leave you with the peace of mind that all sensitive data has been discovered and properly handled. Click here to learn more about what our sensitive data discovery tool can do for you.