Mage Data strengthens its data security posture with the ISO 27001 certification. READ MORE >

August 17, 2022

Data Discovery Done Right

Data discovery is a necessity for modern enterprises. Organizations have been collecting and storing data for years, and they inevitably lose track of some sensitive information at some point. Due to the role of data discovery in privacy compliance , there is no room for error.

Sensitive information discovery is the foundation on which the entire privacy plan is built. If the foundation isn’t strong, everything else will crumble.

First, Data Discovery Done Wrong

Data discovery done right means uncovering, classifying, and protecting all of an organization’s information. Unfortunately, that’s easier said than done. To fully understand what goes into a successful data discovery initiative, it helps to understand what can go wrong.

Some organizations make the mistakes of trusting (not verifying) that their existing data plans have been followed perfectly. Data is constantly evolving, which makes for a complex environment. The sensitivity of stored information must be constantly reevaluated as new data is collected or classifications evolve. If you only look where sensitive information is expected to be, you’ll never uncover anything hidden.

Data may have been stored before the data minimization plan went into effect. Or perhaps the plan has a hole in it. Maybe an employee left the company and their work was never covered, or someone just made a mistake.

Examples of Undiscovered Data

In October of 2020, H&M incurred a GDPR fine of more than €35 million. The infraction occurred when the company recorded “return to work” meetings for employees who had recently taken vacation or sick leave. The recordings were made available to managers and used to evaluate the employees. The problem was that some recordings contained information related to the employees’ family issues and religious beliefs. Had it been discovered that these recordings contained sensitive information, they could have been deleted or access could have been limited appropriately.

In another example, Swedish healthcare company Capio St. Göran was fined €2.9 million. The organization failed to limit access to medical data appropriately. Capio St. Göran let data be available to staff who didn’t require it. The unrestricted access wasn’t an intentional or malicious choice, but they failed to conduct proper data discovery and risk assessments.

Four Steps to Data Discovery and Protection

Hidden sensitive information presents a complex problem, and requires a layered solution. After all, if IT leadership at an enterprise knew where all of their sensitive data or personal information was hidden, it wouldn’t be missing in the first place. The very nature of data discovery implies that information will not be exactly where you expect to find it. This requires a solution that will leave no stone unturned, look in places a human might not, and ingest every piece of information it finds.

The solution must be able to uncover every piece of personal data or sensitive information throughout the entire enterprise. Even then, the problem has only been identified, not solved. Next, that data must be analyzed and classified. Upon receiving the appropriate classification, all stored information must be brought into the organization’s data protection plan.

This process is never truly complete because enterprises constantly gather data from so many different sources. As long as an organization is still creating or collecting data, it requires an iterative discovery process. The complete data discovery cycle must contain the following steps.

1. Discover Hidden Data

Unsurprisingly, the first step is to discover all sensitive data throughout the enterprise. Human beings can no longer perform this type of work effectively at scale. The best data discovery solutions leverage artificial intelligence to comb both structured and unstructured data. Identifying all sensitive information in this way is a crucial first step, as it removes uncertainty from the rest of the data protection process. Now that all sensitive information has been brought into the light, an enterprise can move forward with intentionality.

2. Classify All Data

The data discovery solution should be able to assign classifications to every piece of information it uncovers. Just as the data discovery process should be automated, so should the data classification process. Ideally, these two tasks are intertwined, so data receives a classification immediately upon being discovered.

Leading sensitive information discovery and classification tools have classifications for all popular PII and PHI data. They should also allow enterprises to add their own custom classifications as necessary. This allows enterprises great flexibility when they work with different types of data, have to comply with various regulations, or have niche needs.

Part of the classification process should include the assignment of metadata, such as type of information and a risk scorecard. The data discovery solution should also return information about which users and applications have access to all sensitive data. Finally, it should be able to provide some sort of data flow map to explain how the data came to be where it is.

3. Protect Data

Once data has received an appropriate classification, it can be protected accordingly. This step will look different depending on the information and how an enterprise needs to use it. For example, should data be encrypted, tokenized, or masked? The answer depends on how the data needs to be accessed, and by whom.

In general, the best practice is always to keep as little data as is necessary. Sensitive data that cannot be purged must be protected. Sometimes this means tokenizing data and restricting access. For most enterprises, top data masking solutions keep information useful and secure at the same time.

As in every other step of the process, automation is a must. An enterprise should have rules about how data is stored, used, accessed, protected, and deleted. With such rules in place, the path from data discovery to protection requires little, if any, human intervention.

When the procedures for data retention and protection are automated to minimize manual steps, there’s less room for error. It’s possible to automate most things up to and including the retirement or deletion of sensitive data that’s past its retention period. Automation prevents deviation from the enterprise’s data protection strategy and makes it easier to demonstrate compliance.

4. Rinse and Repeat

Data protection is the kind of thing you should start immediately and never stop. Scans should be regular and iterative, so no new sensitive information goes undiscovered. Constant discovery, classification, and protection ensure strict, demonstrable regulatory compliance. Enterprises that stay on top of this process are always ready for data privacy audits, so there’s nothing extra to prepare and nothing to worry about.

This is yet another reason to remove the burden of this work from IT staff. Because every step of the cycle can be automated, the process is highly scalable and repeatable. It’s even possible to run incremental scans that only focus on new data. If there are gaps in the data minimization plan or an employee stores information incorrectly, automated processes bring the issue to light immediately. From there, it’s possible to take corrective action and patch up the weak points in the cybersecurity strategy.

Identifying the Right Data Discovery Solution

For those without significant experience in data discovery, it can be difficult to vet different data discovery solutions. Don’t get lost in the technical jargon or get overwhelmed by various features. Instead, simplify the decision making process by starting with three simple considerations for data discovery solutions:

  1. How modern and sophisticated is the solution? Enterprise data is dispersed between many systems, which is why it’s so difficult to discover. Find an all-inclusive solution with the flexibility to navigate the complex data architectures.
  2. Is 99% efficacy enough? Some data discovery vendors pride themselves on discovering 99% of sensitive information. Unfortunately, that’s not good enough. Like a ship that can float 99% of the time, it’s the 1% that’s going to get you into trouble.
  3. Is your data discovery process repeatable? As mentioned above, the discovery of sensitive information must be an ongoing process. If the process is labor intensive or otherwise difficult to repeat, it will cause problems. Enterprises with burdensome discovery workflows either fall behind on data protection or waste resources in the process.

There’s a lot to data discovery, but these three considerations provide an excellent starting point. To take a deeper dive, download the Mage Sensitive Data Discovery data sheet and learn all about the leading data discovery capabilities.

Discover Your Sensitive Information Today

Simply put, data discovery and protection cannot be put off, and they must be prioritized. Modern enterprises have to comply with regulations like GDPR, CCPA, HIPAA, and others. Even more importantly, there is the ethical and professional obligation to protect personal information.

Strong data protection plans produce benefits that ripple throughout organizations, and the process doesn’t have to be burdensome. Mage offers Sensitive Data Discovery tool to automate sensitive information discovery and simplify data protection. Request a demo today to get started, and one of our Magecians will show you what’s possible.