Mage Data strengthens its data security posture with the ISO 27001 certification. READ MORE >

April 6, 2022

Role of Sensitive Data Discovery in Regulatory Compliance

Whoever first said, “what you don’t know can’t hurt you” wasn’t talking talking about data retention. Sensitive data discovery is necessary to know what data your organization has held onto—perhaps without realizing it. Any data you’re not aware of may be exposed, and a prime target for a leak or breach.

But there is an even more immediate issue: Data your company isn’t aware of likely hasn’t been processed and stored in compliance with data privacy laws such as GDPR, CCPA, and other data privacy regulations. The point of this is not that you should live in constant fear of the unknown. Rather, you should shine a light on all of your sensitive data, make it known, then bring it under the protection of your data privacy strategy. Armed with an understanding of the why and the how of data discovery, you’re much better equipped to ensure privacy compliance.

Why Is Data Discovery Important?

Almost all modern enterprises collect and analyze massive amounts of personal data. Common types include:

  • Names
  • Ages
  • Credit card numbers
  • Addresses
  • IP addresses
  • Social security numbers
  • Driver’s license numbers

And this is only a sample of the information that a company might hold. Complicating the issue is the fact that companies are dynamic, and have changing needs. As a result, databases are in flux. New ones are created and old ones slip under the radar. It can be hard for enterprises and other large organizations to keep track where they’ve stored sensitive data. And that’s before we add in the complexity of data in a globally connected cloud environment.

Modern companies generate and collect massive amounts of data from many different sources. With data coming in from many different directions, old data processing approaches fail to keep track of all the movement. Even well-intentioned, well-run companies may lose track of data because there’s so much of it, and it’s constantly changing.

The process of sensitive data discovery allows organizations to create accurate, updated records that capture everywhere personal information is stored. Typically, this takes the form of automated data scanning, which catalogs all locations where a company stores data. Once all data has been identified and documented, organizational leaders can make informed decisions about privacy and security strategy.

Data Discovery Contributes to Privacy Compliance

Data discovery is a critical part of data privacy compliance. To comply with the GDPR and similar regulations, PII discovery is the first essential step in data protection. It’s important that there be no sensitive data sitting in unknown locations.

How Does Undiscovered Data Hide?

Some companies get in trouble because of intentional misconduct related to data privacy. In many cases, though, the problem is just that organizations don’t think there’s any sensitive data to discover. Maybe they think they already know where all their data is, or maybe they think they don’t store any sensitive data at all. In either case, it’s better to verify than trust.

So, if you are thinking, “but I already know where all my data is…” Let’s just say that you wouldn’t be the first person to say this…or to be caught by surprise when undiscovered data becomes a problem. Here are a few examples of companies that likely thought that, only to later turn out to be wrong.

Undiscovered Data During Acquisitions

With data coming from so many sources, companies need to take a more holistic approach to PII discovery and ensure they have the right sensitive data discovery tools. When they don’t the consequences can be sever. Take what happened to Marriott, for example. Marriott acquired Starwood Hotels in 2016, without realizing that its new subsidiary had a serious data breach two years prior. Unfortunately for the hotel chain, it was fined £18.4 million by the UK’s ICO in 2018. With the right tools, Sherwood and Marriott would have been better able to understand what data they held, and potentially could have demonstrated better compliance that resulted in a lower fine, or even prevented the breach in the first place.

Lost Visibility in the Data Lifecycle

The Spanish data protection authority, AEPD, fined Vodafone €8.15 million in 2021 for a few deviations from the GDPR. Some of those violations could have been prevented with effective, ongoing data discovery:

  • The AEDP concluded that Vodafone should have required all of its marketing partners to filter their marketing data more accurately. There were numerous complaints of improper marketing messages.
  • Vodafone failed to provide evidence of continuous data-monitoring at every stage of the data lifecycle.

The improper messaging was the result of human error—Vodafone was having managers review lists of individuals who opted out. The AEPD didn’t deem this human-driven process acceptable, and it’s hard to disagree with them.

Proper data monitoring would have made these discrepancies harder to overlook. Indeed, the best data discovery tool is no longer a human, but rather, data discovery software. The scale and scope of enterprise-level data discovery process is simply too big for humans to reliably perform, as Vodafone demonstrated.

What Does Data Discovery Look Like?

As we’ve covered, undiscovered data can lead to compliance nightmares. Consequently, businesses should strive to get this process right. At its simplest, the path from discovery to compliance should look something like this:

  1. Discovery – First, take a detailed inventory of every bit of sensitive data stored across the entire organization.
  2. Classification – Consider the information gathered during the discovery phase, such as how sensitive the data is and how much risk is associated with it. Tag all data based on its type, format, and who should have access to the information.
  3. Protection – Implement the appropriate security to protect data from internal and external threats. After implementation, continue to monitor risks and responses on an ongoing basis.
  4. Compliance – Maintain updated data processing records and complete other reporting as required by the applicable privacy laws.

There will always be challenges in enterprise data security, but without thorough sensitive data  discovery, companies efforts to keep their data private and secure are likely to fail.

Data Discovery Techniques

While it may seem impossible to lose track of a database, consider how workers at companies lose track of important documents all the time. Did Juanita from HR put all employee information in the shared drive, or is some of it on her local machine? Why are Sharon’s vendor contracts in the “Accounts Payable” folder when Admin’s are in a subfolder of the “Approved Vendors” folder?

Given this inconsistency, even if you knew what you were looking for, you might not be able to find it. And that goes for databases, too. For most, investigating all data at a company to identify sensitive information would not be a pleasant task. Worse still, given the rate at which data is created, it’s essentially an impossible task for humans to perform.

This labor-intensive process led to the rise of data discovery specialists, who could charge high fees for their expertise. Even if it is a costly process, however, manual data discovery is still susceptible to human error. Like ignorance of where your data is stored, human error is an unacceptable reason to fall out of privacy compliance, and regulatory agencies could still levy fines. Fortunately, technology has evolved enough that manual data discovery is no longer necessary.

This is where automated data scanning comes into play. State-of-the-art data discovery software completely eliminates the potential for human error while saving time and resources. Automated data discovery software uncovers all of an enterprise’s sensitive data by scanning its entire network. These solutions are built to identify even the most obscure locations, so data has nowhere to hide, and can be successfully cataloged.

Simplify the Data Discovery Process

Automated data discovery is only as useful as the tasks that have been automated. Before implementing a sensitive information discovery solution, it’s critical to confirm that the technology meets all of the organization’s technical requirements. For example, it may be worth asking the following questions:

  • Can the tool ingest both structured and unstructured data by leveraging AI to understand context?
  • Will big data and the cloud come into play?
  • Does your data discovery tool have classifications for all popular PII and PHI data?
  • Is it possible to add custom classifications if necessary?
  • Are there flexible scanning methods, such as the option for incremental scanning of only new data?
  • Does your tool return a scorecard, including confidence scores, to highlight risk?
  • Will your sensitive data discovery tools scan the underlying code to determine which users and programs have access to the data?
  • Does the automated discovery include a historical look at how data got to its current location?
  • Can leadership see an updated data flow map to understand the movement of sensitive data?
  • Does the automated data discovery tool support all of the relevant data sources?

Demonstrate Privacy Compliance

Not all data discovery tools are created equal, especially when your organization has a specific goal in mind. Confirm that your chosen solution is built to accommodate all of the data classifications mandated by the GDPR, CCPA, HIPAA, and other regulations. Sensitive data discovery tools shouldn’t just keep an organization compliant; they should make the compliance easy to demonstrate with features like the following:

  • Dashboards that weigh sensitive data to track the risk of individual datastores
  • Automated security reports that are specifically relevant to the maintenance of regulatory compliance
  • Reporting to display the presence of all sensitive data in a way that is easily digestible by auditors

As data privacy regulations become more complex, there are no points awarded for effort. A data protection plan is not “good enough” unless it’s completely correct, and there’s no way to know if data is adequately protected without a thorough discovery process. At MENTIS, we like to say that if you aren’t finding 100% of your data, it might as well be zero. Learn how to automate sensitive data discovery and hit your data discovery goals by requesting a demo of Mage Sensitive Data Discovery today.