Mage Data is the Customers’ Choice for Data Masking for the third consecutive year. READ MORE >

May 11, 2023

Why Do Test Data Management (TDM) Projects Fail?

Test data, and test data management, remain a huge challenge for tech-driven companies around the globe. Having good test data is a boon to the organization: It helps support rigorous testing to ensure that software is stable and reliable, while also mitigating security risks. But having good test data is exactly the issue. The way in which test data is created and subsequently managed has a huge effect on testers’ ability to do their jobs well.

This is why many test data management projects fail. Testers manage to create testing that is usable once, or perhaps a handful of times. But over time, problems accumulate. The data loses its freshness, for example, or the request process is not robust enough to create appropriate data sets.

While we cannot diagnose every unique issue that organizations face when it comes to test data management, there are some common challenges that we’ve seen—challenges that routinely sink test data management projects, making them less effective, more costly, and more likely to fail outright.

Here are the top six:

#1: Lack of Buy-In

Test data management isn’t likely to be high on the list of priorities for any company. Too often, it’s an afterthought. There isn’t always an internal champion making the project a priority, let alone making the argument for investing in better tools.

When this lack of buy-in exists, especially among company leadership, a TDM project is liable to fail before it ever takes off. Thus, it is important to get stakeholders on board; they should understand why test data management is important, and have some say into new projects from the start. This includes both company leadership and those who will be responsible for implementing the plan.

#2: Lack of Standardization

One sign or outcome of a lack of buy-in is a lack of standardization. Answer these questions: Does your organization have a well-developed data dictionary? Where can that be found? Who created it? What does the data model look like? If the answers to these questions are not readily known by you or the team responsible for test data management, chances are you don’t have robust standards in place.

Another manifestation of this problem is an absence of standard data request forms. This leads to data requests in different types of formats, as well as different tests, both of which ultimately lengthen testing cycles.

#3: Older Data and/or Merged Data

Test data should cover a range of relevant test cases and legitimately resemble the data in your production environments. But data is not static; it tends to change over time. For example, the demographics of clients in a client database might shift over time, or account activity might change as new interfaces and new products are introduced. This means there is an expiration date on datasets, and yet outdated test data is routinely used.

This especially becomes a problem after an acquisition or merger. Each party will bring their own data to the new entity, and merging the datasets is a challenge in its own right. If the conversion is done badly, this can throw off the data sample.

#4: Privacy and Safety Standards

Many applications traffic is sensitive data—banking apps, tax preparation software, HR software, shopping apps…the list goes on. These applications can reveal crucially important insights, but they also risk revealing sensitive personal and financial information.

This creates a trade-off between safety and relevance when it comes to test data. Using real-time data from production environments ensures relevance, but often at the cost of privacy. Using synthetic data ensures privacy but risks not having the proper relationships and, hence, not being relevant.

Using automated test data generation software can help the team to deal with this problem. This helps to create data with the relevant relations intact, but without revealing sensitive information. Masking and/or encryption should be part of this process.

#5: Problems with Referential Integrity

Ideally, any set of test data should contain a representative cross-section of the data, maintaining a high degree of referential integrity. Again, this is easier to achieve with real-time data but much harder with synthetic test data.

So, when creating synthetic testing data, it is important to have a data model that accurately defines the relationships between key pieces of data. Primary keys must be properly linked, and data relationships should be based on well-defined business rules.

Sometimes a TDM project will fail because the test data does not have this kind of referential integrity, and the entire testing process becomes an exercise in demonstrating the adage: “garbage in, garbage out.”

#6: Waterfall TDM in an Agile Development Environment

There are plenty of data management tools out there today…and many of them assume a more-or-less waterfall approach to development. In a waterfall approach, a “subset, mask, and copy” methodology usually ensures that test data is representative of live data, is small enough for efficient testing, and meets all data privacy requirements. With the testing cycle lasting weeks or months and known well in advance, it’s relatively easy to schedule data refreshes as needed to keep test data fresh.

Things are not so straightforward in an agile development environment. Besides integration challenges, commonly there also are timing challenges. Agile sprints tend to be much shorter than the traditional waterfall process, so the prep time for test data is dramatically shortened. The approach outlined above tends to impede operations by forcing a team to wait for test data and can create a backlog of completed, but untested, features waiting for deployment. Again, automation can help create test data on-the-fly and as needed.

Preventing Failure

Given that the above list represents the majority of reasons why TDM projects fail, we can reverse-engineer those reasons to put together a plan for TDM success:

  1. Get buy-in early from both teams and leadership,
  2. Start with appropriate standardization (including a data dictionary and current data model with appropriate referential integrity),
  3. Make a plan to clean older, irrelevant data and convert any data from other sources,
  4. Choose methods that will yield data with the appropriate trade-off between privacy/security and relevance (including using masking and encryption), and
  5. Automate the process wherever possible.

Mage can help with many of these steps. Our Test Data Management solution ensures the security and efficiency of your test data processes, allowing for the secure provisioning of sensitive data across teams, wherever required and whenever needed. Your teams will have a better experience with our ready-to-go, plug-and-play platform for TDM, irrespective of data store type. You can ask for a demo today.