Mage Data strengthens its data security posture with the ISO 27001 certification. READ MORE >

April 6, 2022

Role of Sensitive Data Discovery in Regulatory Compliance

We don’t know who coined the phrase, “what you don’t know can’t hurt you.” We do know they weren’t talking about data retention. Sensitive data discovery is necessary to know what data your organization has held onto—perhaps without realizing it.

When your organization isn’t tracking sensitive data, you’re left exposed to unnecessary levels of risk, as you’re not able to protect it from leaks and other cybersecurity threats. But there is an even more immediate threat: Your data likely isn’t in compliance with data privacy laws such as GDPR, CCPA, and other data privacy regulations unless there is an intentional security plan in place.

The point of this is not that you should live in constant fear of the unknown. Rather, you should shine a light on all of your sensitive data, make it known, then bring it under the protection of your data privacy strategy. Armed with an understanding of the why and the how of data discovery, you’re much better equipped to ensure regulatory compliance.

Why is Data Discovery Important?

Almost all modern enterprises collect and analyze massive amounts of personal data. Consider the personal information held by banks, retailers, social media platforms, and other organizations:

● Names
● Ages
● Credit card numbers
● Addresses
● IP addresses
● Social security numbers
● Driver’s license numbers

The list goes on. It’s difficult for most individuals to fathom all of the different places their personal information lives. Similarly, it can be hard for enterprises and other large organizations to keep track of everywhere they’ve stored sensitive data. Maps of data storage plans often get outdated, discarded, or forgotten entirely.

Meanwhile, data processing becomes fluid, and the rapid move to the cloud makes it even more difficult to track the flow of information. Modern companies generate and collect massive amounts of data from many different sources. With data coming in from many different directions, old data processing architectures fail to keep track of all the movement.

This all presents a complex problem: Enterprise data has gotten so out of control that most don’t even know what they don’t know. The process of sensitive data discovery allows organizations to create accurate, updated records that capture everywhere personal information is stored. Once all retained data has been identified and documented, organizational leaders can make informed decisions about how to proceed.

Data Discovery Contributes to Regulatory Compliance

Data discovery is a critical part of data regulatory compliance. To comply with the GDPR and similar regulations, discovery is the first essential step in data protection. It’s important that there be no sensitive data sitting in unknown locations.

Where is Undiscovered Data Hiding?

Some companies get in trouble because of intentional misconduct related to data privacy. In many cases, though, the problem is just that organizations don’t think there’s any sensitive data to discover. Maybe they think they already know where all their data is, or maybe they think they don’t even store any sensitive data. In either case, it’s better to verify than trust.

So, if you are thinking, “but I already know where all my data is…” Let’s just say that you wouldn’t be the first person to say this…or to be caught by surprise when undiscovered data becomes a problem.

Undiscovered Data During Acquisitions

With data coming from so many sources, companies have to be worried about more than just their own practices. Marriott, for example, was fined £18.4 million by the Information Comissioner’s Office (ICO) for failure to discover another company’s mistake.

The data protection watchdog deemed that Marriot’s due diligence was insufficient during the acquisition of Starwood Hotels and Resorts Worldwide in 2016. Specifically, Marriott failed to discover a cyberattack on Starwood that happened two years earlier in 2014. Marriott didn’t even own Starwood at the time of the breach. Unfortunately for the hotel chain, it was still left holding the bag when the £18.4 million ICO fine came down in 2018.

Lost Visibility in the Data Lifecycle

The Spanish data protection authority, AEPD, fined Vodafone €8.15 million in 2021 for a few deviations from the GDPR. Some of those violations could have been prevented with effective, ongoing data discovery:
● The AEDP concluded that Vodafone should have required all of its marketing partners to filter their marketing data more accurately. There were numerous complaints of improper marketing messages.
● Vodafone failed to provide evidence of continuous data-monitoring at every stage of the data lifecycle.
Data discovery alone wouldn’t have solved all of Vodafone’s other problems, at least not directly. It certainly would have solved some, though, and proper data monitoring would have made other errors harder to overlook. For example, individuals who opted out of marketing were not always removed from outbound campaigns.

The improper messaging was the result of human error—Vodafone was having managers review lists of individuals who opted out. The AEPD didn’t deem this human-driven process acceptable. Indeed, the best data discovery tool is no longer a human. The scale and scope of enterprise-level data discovery process is simply too big now.

What does Data Discovery Look Like?

Undiscovered data can lead to compliance nightmares, so how does data discovery help avoid those situations? At its simplest, the path from discovery to compliance should look something like this:
1. Discovery – First, take a detailed inventory of every bit of sensitive data stored across the entire organization.
2. Classification – Consider the information gathered during the discovery phase, such as how sensitive the data is and how much risk is associated. Tag all data based on its type, format, and who should have access to the information.
3. Protection – Implement the appropriate security to protect data from internal and external threats. After implementation, continue to monitor risks and responses on an ongoing basis.
4. Compliance – Maintain updated data processing records and complete other reporting as required by the applicable privacy laws.
There will always be challenges in enterprise data security, but thorough data discovery is the first step toward addressing all of those challenges head on.

Data Discovery Techniques

For most, the thought of perusing an entire enterprise’s data to identify sensitive information is not pleasant. Worse still, given the rate at which data pours in, it’s essentially an impossible task. Fortunately, technology has evolved enough that manual data discovery is no longer necessary.

That said, the concept of data discovery predates smart data discovery technology. Years ago, data had to be stored in documents which had to be manually located whenever necessary. Did Billy from HR put all employee information in the shared drive, or is some of it on her local machine? Why are Sharon’s vendor contracts in the “Accounts Payable” folder when Admir’s are in a subfolder of the “Approved Vendors” folder?

This labor-intensive process led to the rise of data discovery specialists, who could charge high fees for their expertise. Even if it costs an arm and a leg, however, manual data discovery is still susceptible to human error. Like ignorance of where your data is stored, human error is an unacceptable reason to fall out of regulatory compliance.

This is where automated data discovery comes into play. State-of-the-art data discovery solutions completely eliminate the potential for human error while saving time and resources. Automated data discovery tools uncover all of an enterprise’s sensitive data. These solutions are built to identify even the most obscure locations, so data has nowhere to hide.

Simplify the Data Discovery Process

Automated data discovery is only as useful as the tasks that have been automated. Before implementing a sensitive information discovery solution, it’s critical to confirm that the technology meets all of the organization’s technical requirements. For example, it may be worth asking the following questions:

● Can the tool ingest both structured and unstructured data by leveraging AI to understand context?
● Will big data and the cloud come into play?
● Does your data discovery tool have classifications for all popular PII and PHI data?
● Is it possible to add custom classifications if necessary?
● Are there flexible scanning methods, such as the option for incremental scanning of only new data?
● Does your tool return a scorecard, including confidence scores, to highlight risk?
● Will your sensitive data discovery scan the underlying code to determine which users and programs have access to the data?
● Does the automated discovery include a historical look at how data got to its current location?
● Can leadership see an updated data flow map to understand the movement of sensitive data?
● Does the automated data discovery tool support all of the relevant data sources?

Demonstrate Regulatory Compliance

Not all sensitive data discovery tools are created equal, especially when your organization has a specific goal in mind. Confirm that your chosen solution is built to accommodate all of the data classifications mandated by the GDPR, CCPA, HIPAA, and other regulations. A sensitive data discovery solution shouldn’t just keep an organization compliant; it should make the compliance easy to demonstrate with features like the following:

● Dashboards that weigh sensitive data to track the risk of individual datastores
● Automated security reports that are specifically relevant to the maintenance of regulatory compliance
● Reporting to display the presence of all sensitive data in a way that is easily digestible by auditors
As data privacy regulations become more complex, there are no points awarded for effort. A data protection plan is not “good enough” unless it’s completely correct, and there’s no way to know if data is adequately protected without a thorough discovery process. At Mage™, we like to say that if you aren’t finding 100% of your data, it might as well be zero. Learn how to automate sensitive data discovery by requesting a demo of Mage Sensitive Data Discovery .