Mage Data strengthens its data security posture with the ISO 27001 certification. READ MORE >

January 6, 2021

Sensitive Data Discovery: The First Step of a Robust Security Strategy

Over the years, the fundamental approach towards data security has evolved. Previously, organizations focused on perimeter security as a means of securing their applications and data stores. The funny thing about technology is that it is widely available. The same technology that was used to secure the perimeter, could be used to breach it. So, as technology evolved, so did the threat to data. Soon enough, data security experts realized that perimeter security is just not going to cut it, and here started the journey of data security – protecting the data at the source.

Securing data at source – sounds simple? Let’s just consider a few points:

  • The applications that organizations are using are not necessarily (in fact, never) vanilla applications. Meaning, these applications have been modified and changed to suit the need of the organization. As a result, the underlying table structure for storing the data is also changed.
  • Data being entered into the databases are done by manual operators. Each operator has their own style and way of storing information based on their comfort and ease. This therefore results in duplication of data, and thus, many a times you can see that data ends up in places where it was never intended! Such scenarios may require implementation of exact and fuzzy matching techniques that can dedupe the data.

Imagine undertaking a data security initiative, by assuming that sensitive data would be present only in places where it was originally intended to be! I hope you can see the futility of undertaking an exercise with such a flawed assumption. Having worked with many organizations’ data security, I have seen the tendency to start the wrong way round, only focusing on what the organization knows: as if pretending that the data in plain sight was the only data to consider. So, how do you bridge the gap between “where you think” your sensitive data is and “where it really is”?

This now brings us to the topic of the need for Automated Sensitive Data Discovery: why we need data discovery, and more importantly, what makes an ‘effective’ data discovery solution.

The Need for Data Discovery

As I’ve mentioned earlier, perimeter security won’t suffice to protect sensitive information. This is for the simple reason that your data is not only in your hands. More and more data are being exposed to third parties through APIs and services, resulting in risk of exposure of unprotected sensitive data. And this is one of the reasons why legislation, privacy compliance laws like the GDPR and the CCPA, is forcing us to take better control. However, it is beyond doubt that as legislation increases around data privacy, the certainty of data security decreases. This is because many become content with the certificate of compliance, and fail to realize that true security goes beyond compliance.

Sensitive data is pervasive across the enterprise – only being compliant is also not going to cut it. So, how do we ensure data protection while remaining compliant. Balancing access to data for the right person (or application/third party) at the right time, in the right context is a real challenge. And as we’ve discussed earlier, another challenge is identifying and locating all of your sensitive data, owing to the dynamic nature and complexities of data sources.

Let me give you an example. In a database, going by the column names, you would think that you know where your sensitive data is, right? But did you know that of the many clients I’ve known about, more than 70% of their sensitive data lies in undocumented and hard to find locations such as complex columns, free text fields, and temporary tables, to name a few?

Well, how did it get there? There are developers who take shortcuts, creating temporary locations where sensitive data could be stored, and end users who may enter the sensitive data where they are not supposed to. Let’s take a real-life instance of a financial application; the company had bank account details for the employees to send expense report checks. In the description field, which gets printed on the check memo, there were Social Security Numbers in plain sight. And this was not a rare occurrence.

The aftermath of an incomplete discovery in downstream protection is even more catastrophic. Your security strategy remains incomplete and any downstream data masking based on this incomplete information will lead to inconsistent data and partial monitoring of sensitive data access.

This demonstrates that the first step of a robust security strategy is an equally robust discovery, which tells you exactly where your sensitive data is. On that note, let’s head to the next part of this article – what makes a robust discovery solution.

The Importance of a Robust Data Discovery Solution

The 2019 Verizon Data Breach Stats states that 63% of organizations who suffered a data breach did not know where their sensitive data was. And this figure makes sense – we’ve seen that data discovery is a task that many find challenging. But it doesn’t have to be this way.

Most organizations use traditional methods of discovery such as rudimentary dictionary match and reg-ex matching. These methods are quite quick in delivering results, but they are not comprehensive in the least. Furthermore, in the real world, you not only need to know where your sensitive data is, but also do so with minimal false positives. This is another reason why traditional means of discovering data do not make the cut, as they result in too many false positives.

But what if there is a better solution to locate ALL of your sensitive data with minimum error? For example, finding sensitive data in hard-to-find locations like free text fields and complex columns using patterns and validations, files in database columns, and even in temporary tables which could be found only using code scanning!

What your enterprise needs is a comprehensive configurable sensitive data discovery solution that can find all locations of sensitive data across structured and unstructured data sources and with minimal false positives; an effective and efficient discovery solution that goes beyond the rudimentary to include sophisticated methods in pattern matching, master data matching, and code scanning to see who has access to the data and who is modifying it.

Why you should consider iDiscover™?

We’ve seen why its necessary to discover sensitive data, and why it’s imperative to have a robust solution in place. Armed with comprehensive sensitive data intelligence, enterprises will be able to deliver business value by effective data protection, while maintaining compliance. Such a discovery solution is also a strategic asset that can secure cross-border data access, cloud computing, and data-driven innovation.

I would encourage anyone reading this to ask yourself if you are truly comfortable with your Data Awareness? In 2020, it is becoming increasingly important that your actions on what you know to find what you don’t know will help protect you and your Organization’s Data, and lower your chances of Data loss and regulation non-Compliance.

I have had the pleasure of working with Mage and the thoroughness to their data discovery process is second to none. I don’t believe the discovery process should be anything less than 100% and from what I have experienced, Mage has cracked this problem.

Mage has a demonstrable ability to find more sensitive data locations than the ones you already know, with the help of an enterprise-wide configurable sensitive data discovery solution, iDiscover™. Their patented discovery module has also been acknowledged by analysts like Gartner and Bloor and has helped large global organizations like Ivy league institutes and the world’s leading HR service providers to name a few. To know more about their solution and its capabilities, you can visit their website or you can also download the iDiscover™ datasheet.


Lewis Hopkins

Founder at Seecuring – Software & a Service for securing Enterprise Application Security