Mage Data

Category: Blogs – TDM

  • The ROI of Test Data Management Tool

    The ROI of Test Data Management Tool

    As software teams increasingly take a “shift left” approach to software testing, the need to reduce testing cycle times and improve the rigor of tests is growing in lock-step. This creates a conundrum: Testing coverage and completeness is deeply dependent on the quality of the test dataset used—but provisioning quality test data has to take less time, not more.

    This is where Test Data Management (TDM) tools come into play, giving DevOps teams the resources to provision exactly what they need to test early and often. But, as with anything else, quality TDM tool has a cost associated with it. How can decision makers measure the return on investment (ROI) for such tool?

    To be clear, the issue is not how to do an ROI calculation; there is a well-defined formula for that. The challenge comes with knowing what to measure, and how to translate the functions of TDM tool into concrete cost savings. To get started, it helps to consider the downsides to traditional testing that make TDM attractive, proceeding from there to categorize the areas where TDM tool creates efficiencies as well as new opportunities.

    Traditional Software Testing without TDM—Slow, Ineffective, and Insecure

    The traditional method for generating test data is a largely manual process. A production database would be cloned for the purpose, and then an individual or team would be tasked with creating data subsets and performing other needed functions. This method is inefficient for several reasons:

    • Storage costs. Cloning an entire production database increases storage costs. Although the cost of storage is rather low today, production databases can be large; storing an entire copy is an unnecessary cost.
    • Cloning a database and manually preparing a subset can be a labor-intensive process. According to one survey of DevOps professionals, an average of 3.5 days and 3.8 people were needed to fulfill a request for test data that used production environment data; for 20% of the respondents, the timeframe was over a week.
    • Completeness/edge cases. Missing or misleading edge cases can skew the results of testing. A proper test data subset will need to include important edge cases, but not so many that they overwhelm test results.
    • Referential integrity. When creating a subset, that subset must be representative of the entire dataset. The data model underlying the test data must accurately define the relationships among key pieces of data. Primary keys must be properly linked, and data relationships should be based on well-defined business rules.
    • Ensuring data privacy and compliance. With the increasing number of data security and privacy laws worldwide, it’s important to ensure that your test data generation methods comply with relevant legislation.

    The goal in procuring a TDM tool is to overcome these challenges by automating large parts of the test data procurement process. Thus, the return on such an investment depends on the tool’s ability to guarantee speed, completeness, and referential integrity without consuming too many additional resources or creating compliance issues.

    Efficiency Returns—Driving Down Costs Associated with Testing

    When discussing saved costs, there are two main areas to consider: Internal costs and external ones. Internal costs reflect inefficiencies in process or resource allocation. External costs reflect missed opportunities or problems that arise when bringing a product to market. TDM can help organizations realize a return with both.

    Internal Costs and Test Data Procurement Efficiency

    There is no doubt that testing can happen faster, and sooner, when adequate data is provided more quickly with an automated process. Some industry experts report that, for most organizations, somewhere between 40% and 70% of all test data creation and provisioning can be automated.

    Part of an automated workflow should involve either subsetting the data, or virtualizing it. These steps alleviate the need to store complete copies of production databases, driving down storage costs. Even for a medium-sized organization, this can mean terabytes of saved storage space, with 80% to 90% reductions in storage space being reported by some companies.

    As for overall efficiency, team leaders say their developers are 20% to 25% more efficient when they have access to proper test data management tools.

    External Costs and Competitiveness in the Market

    Most organizations see TDM tools as a way to make testing more efficient, but just as important are the opportunity costs that accrue from slower and more error-prone manual testing. For example, the mean time to the detection of defects (MTTD) will be lower when test data is properly managed, which means software can be improved more quickly, preventing further bugs and client churn. The number of unnoticed defects is likely to decline as well. Catching an error early in development incurs only about one-tenth of the cost of fixing an error in production.

    Time-to-market (TTM) is also a factor here. Traditionally, software projects might have a TTM from six months to several years—but that timeframe is rapidly shrinking. If provisioning of test data takes a week’s worth of time, and there are several testing cycles needed, the delay in TTM due only to data provisioning can be a full month or more. That is not only a month’s worth of lost revenue, but adequate space for a competitor to become more established.

    The Balance

    To review, the cost of any TDM tool and its implementation needs to be balanced against:

    • The cost of storage space for test data
    • The cost of personnel needs (3.8 employees, on average, over 3.5 days)
    • The benefit of an increase in efficiency of your development teams
    • Overall cost of a bug when found in production rather than in testing
    • Lost opportunity due to a slower time-to-market

    TDM Tools Achieve Positive ROI When They Solve These Challenges

    Admittedly, every organization will look different when these factors are assessed. So, while there are general considerations when it comes to the ROI of TDM tools, specific examples will vary wildly. We encourage readers to derive their own estimates for the above numbers.

    That said, the real question is not whether TDM tools provide an ROI. The question is which TDM tools are most likely to do so. Currently available tools differ in terms of their feature sets and ease of use. The better the tool, the higher the ROI will be.

    A tool will achieve positive ROI insofar as it can solve these challenges:

    • Ensuring referential integrity. This can be achieved through proper subsetting and pseudonymization capabilities. The proper number and kind of edge cases should be present, too.
    • Automated provisioning with appropriate security. This means being able to rapidly provision test data across the organization while also staying compliant with all major security and privacy regulations.
    • Scalability and flexibility. The more databases an organization has, the more it will need a tool that can work seamlessly across multiple data platforms. A good tool should have flexible deployment mechanisms to make scalability easy.

    These are specifically the challenges our engineers had in mind when developing Mage’s Data TDM capabilities. Our TDM solution achieves that balance, providing an ROI by helping DevOps teams test more quickly and get to market faster. For more specific numbers and case studies, you can schedule a demo and speak with our team.

  • What is Considered Sensitive Data Under the GDPR?

    What is Considered Sensitive Data Under the GDPR?

    There are many different kinds of personal information that a company might store in the course of creating and maintaining user accounts: names, residential addresses, payment information, government ID numbers, and more. Obviously, companies have a vested interest in keeping this sensitive data safe, as data breaches can be both costly and embarrassing.

    What counts as private or sensitive data—and what sorts of responsibility companies have to protect such data—changed with the passage of the General Data Protection Regulation (GDPR) by the European Union. (The GDPR is a component of the EU’s privacy law and human rights law relevant to Article 8 of the Charter of Fundamental Rights of the European Union.) The GDPR is proving to be both expansive in what it covers and strict in what it requires of entities holding user data, and the fines levied for non-compliance can sometimes be harsh.

    The European Union’s own GDPR website has a good overview of what the regulation is, along with overviews of its many parts and guidelines for compliance. But one of the stickier points of this regulation is what is considered “sensitive data,” and how this might differ from personal data, which is at the core of the GDPR. Sensitive data forms a special protected category of data, and companies must take steps to find it using appropriate sensitive data discovery tools

    The GDPR Protects Personal Data

    At the heart of the GDPR is the concept of personal data. Personal data includes any information which can be linked to an identified or identifiable person. Examples of such information includes things like:

    • Names
    • Identification numbers.
    • Location data—this includes anything that can confirm your physical presence somewhere, such as security footage, fingerprints, etc.
    • Any data which represents physical, physiological, genetic, mental, commercial, cultural, or social identity.
    • Identifiers which are assigned to a person—telephone numbers, credit card numbers, account data, license plates, customer numbers, email addresses, and so on.
    • Subjective information such as opinions, judgments, or estimates—for example, an assessment of creditworthiness or review of work performance by an employer.

    It is important to note that some kinds of data might not successfully identify a person unless used with other data. For example, a common name like “James Smith” might apply to many people, and so would not pick out a single individual. But combining that name with an email address further narrows things down to a particular company and identifier; together, the name and email are personal information. Likewise, things like gender, ZIP Code, or date of birth would be non-sensitive, non-personal information unless combined with other information to identify someone. Hackers and bad actors will often use disparate pieces of data to identify individuals, so all potential personal information should be handled cautiously.

    That said, some personal information is also considered sensitive information; the GDPR discourages collecting, storing, processing, or displaying this information except under special circumstances—and in those cases, extra security measures are needed.

    Sensitive Information Under the GDPR

    Sensitive data under the GDPR (sometimes referred to as “sensitive personal data”) includes:

    • Any personal data revealing racial or ethnic origin, political opinions, or religious or philosophical beliefs;
    • Trade union membership;
    • Genetic data;
    • Biometric data used to identify a person;
    • Health-related data; and
    • Data concerning a person’s sex life or sexual orientation.

    According to Article 9 paragraph 1 of the GDPR, these kinds of information cannot be processed except for special cases as outlined in paragraph 2. This includes gathering and storing such data in the first place.

    Application of the GDPR: Does it Affect Your Organization?

    In short, yes, the GDPR is relevant even for companies operating largely outside of the European Union. The goal of the GDPR is to protect data belonging to EU citizens and residents; it categorizes many of its provisions as a right that people have. Thus, anyone handling data about EU residents is subject to GDPR regulations, independent of their location.

    For example, if you have a company in the U.S. with a website, and said website is accessed and used by citizens residing in the European Union, and part of that use is creating accounts which process and store user data, then your company must comply with the GDPR. (This is referred to as the “extra-territorial effect.”)

    Even more alarming is the fact that sensitive data might exist within an organization without its being aware of the scope and extent of that data’s existence. Consider:

    In short, no company should assume that it has a handle on sensitive data until it can verify the location of all sensitive personal data using a robust sensitive data discovery procedure.

    Data Subject Requests, The Right to Be Forgotten, and Data Minimization

    Processing sensitive information becomes an especially challenging conundrum when it comes to Data Subject Requests (DSRs). Such requests can include things like the Right to be Forgotten: The right that individuals have to request that information about them be deleted if they choose. According to the GDPR (and many other data protection regulations), organizations receiving requests from individuals have a limited and specific time period for honoring such requests.

    Most organizations will honor these requests simply by deleting the relevant information. But this approach runs into two problems.

    First, redundant copies of data often exist in complex environments—for example, the same personal information might appear in a testing environment, a production environment, and a marketing analytics database. Without robust sensitive data discovery, it’s possible that an individual isn’t really “forgotten” by the system after all.

    Second, there is the issue of database integrity. Deleting data might remove important bits of information, such as transaction histories. This can make it incredibly difficult to keep audit trails or maintain accurate data analytics. Companies that acquire sensitive information, then, would do better finding ways to minimize this data, rather than delete it completely.

    If you would like to learn more about data minimization, sensitive data discovery, or GDPR compliance in general, feel free to browse our articles or contact a compliance expert. In the meantime, our case study of a Swiss Bank also highlights how cross-border data-sharing can be accomplished while maintaining compliance with the GDPR.

     

  • Are There Good Open Source Tools for Sensitive Data Discovery?

    Are There Good Open Source Tools for Sensitive Data Discovery?

    Open-source tools have come into their own in the past decade, including tools for sensitive data discovery. What used to be the domain of large corporations has been democratized, and teams of passionate people can (and do) develop amazing tools. However, with the ever-growing number of data privacy and security laws, the stakes around data classification have never been higher. Getting sensitive data discovery right has significant consequences…so it’s critical you understand what you’re getting with these tools, and how you can use them in ways that will keep you (and your customer and employee data) safe.

    What Makes Data Discovery Tools Open-Source?

    We’ve already covered what makes software open source in this article in depth , but we want to give a quick recap of what we’ll be discussing here. Unlike closed-source tools, free sensitive data discovery tools are released under a license allowing others to use and alter the software for their purposes freely. Generally, instead of being created and owned by a corporation, open-source software is developed by a passionate community, who collaborate to create new features and often determine future direction democratically.

    Many talented people are working on great open-source sensitive data discovery tools like OpenDataDiscovery, ReDiscovery, DataDefender, and more. Consequently, to answer the question in the title of the article, there are good open-source tools for sensitive data discovery. However, that’s not necessarily the question you should be asking—instead, you should be trying to determine if they’ll be right for your company. And one of the best ways to make that determination is through a SWOT Analysis, taking a detailed look at the Strengths, Weaknesses, Opportunities, and Threats that come from using open-source tools for data discovery.

    Data Discovery Tools: Strengths

    First up are the strengths—the things that open-source data discovery tools do well.

    Interoperability and Flexibility

    Because there are generally a variety of perspectives involved in open-source tools, there’s often little incentive to hide features and programs behind walled gardens. In this case, that often translates into tools with a wide range of integrations and connections for data. And even when a certain database type isn’t supported, these tools often provide a way for you to build the integration yourself, ensuring that getting data is rarely a roadblock.

    Price

    And, of course, the best price you can get for anything is free. That could mean you save a bit of money or free up resources to invest in areas that need it more. Whatever the case, it will be hard to get a better deal than what you get with open-source tools.

    Data Discovery Tools: Weaknesses

    Of course, no software is perfect. Here are some things open-source data discovery tools don’t always do well.

    Unknown Development Cycles

    Many B2B tools feature a regular and predictable development cycle. Some open-source projects are very organized, and others are less so. Regardless, there’s no guarantee that a feature or fix will come out on time—or even that there will be a roadmap to start with. The inherent unpredictability of the process can sometimes be frustrating.

    Enterprise Readiness

    As companies grow, their data environments become more complex at an exponential rate. Not all open-source data discovery tools can handle the complexity of a modern enterprise data environment. And of those that can, not all will be able to provide the detailed reporting and compliance options that companies need to meet their legal obligations.

    Data Discovery Tools: Opportunities

    With open-source tools, companies have some opportunities they wouldn’t necessarily have with paid tools.

    Opportunity to Influence Development

    As a user of an open-source tool, you’re part of the community developing it. While you still won’t have ultimate control over its development direction, you’ll likely have the ability to vote on next steps and generally have greater influence on the development process than you would over most paid tools. This can provide the opportunity to get the features you need faster than traditional development.

    Customization via Forking

    And if the community doesn’t prioritize your needs, you’re allowed to fork, or make a copy of, the underlying source code, allowing your company to continue development in the way it sees fit. That’s an option you’re typically never going to have with traditional software.

    Data Discovery Tools: Threats

    Of course, there are some downsides to open-source tools.

    Poor/Nonexistent Customer Support

    Because open-source tools are generally community-run projects where people work for free, customer support is not guaranteed. People, including other users, are often very helpful through online forums, but that often doesn’t rise to the same level of support you would get from just about any paid tool. And when you have a serious issue with your software, this problem can keep you from resolving it quickly. And as a reminder, 99 percent success in data discovery isn’t good enough, and could open you up to serious legal ramifications. If you’re having an issue with sensitive data discovery, failing to find a quick solution can be an expensive mistake.

    Rogue Developers

    While it’s unlikely that the developers of an open-source data discovery tool would insert malware or create serious security vulnerabilities, it’s not unheard of. But even if no one acts maliciously, there’s a real chance that the project will eventually be abandoned without warning. And abandoned software won’t receive security updates or new features and could leave you looking for a new solution once more.

    How Mage Data Helps with Sensitive Data Discovery

    If you’ve reached the end of the above SWOT analysis feeling that the strengths and opportunities far outweigh the weaknesses and risks, then there’s a good chance that there’s a great open-source sensitive data discovery tool out there for you. But that won’t be the case for all businesses. It doesn’t mean that the tools are bad, just that they are not a good fit for all business contexts.

    Remember that sensitive data discovery is the starting point of good data management. There are so many more things that need to be done to keep data safe and companies compliant. Here at Mage, we’ve developed a world-class AI-powered sensitive data discovery tool, that’s part of a larger suite of tools designed to manage data from discovery all the way to retirement. If that sounds more like what you need, sign up for a free consultation today to learn more about what Mage Data can do for you.

  • Why Open-Source Tools Might Fall Short for Test Data Management

    Why Open-Source Tools Might Fall Short for Test Data Management

    You may have heard it said that the best things in life are free—but when it comes to Test Data Management (TDM), free is not always the best choice. For businesses, finding the right balance of value, security, stability, and performance is paramount. And while open-source tools can score well in those areas, there’s a chance that they’ll let you down when you need them most. Here’s what businesses need to know to evaluate open-source test data management tools before they commit.

    What Are Open-Source Tools?

    Before we dive into open-source test data management tools, we need to have a quick conversation about the term “open-source” as the term isn’t always used consistently. Upfront, it’s important to understand that not all free tools are open-source, and because they tend to be community-developed, they don’t have the same expectations around security and customer support that closed-source tools feature.

    Open-source refers to software “designed to be publicly accessible—anyone can see, modify, and distribute the code as they see fit.” Most of the software used in a business context isn’t open-source. For example, common applications like Outlook, Dropbox, or Google Workspace are closed source. The code that powers these applications isn’t published for review, and even if you got access to it, you wouldn’t be able to reuse it in your projects or modify it to run differently.

    Open-source software, by contrast, is intentionally designed so that the code is publicly available. Users are allowed to reuse or modify the code and, in some cases, even contribute new code to the project. Because of its open nature, open-source tools are often developed jointly by a community of passionate developers rather than by a single company. While most open-source tools are free to use, not all software that is free is open-source. An application may be distributed for free, but it’s not open-source if the code isn’t available for reuse, modification, or distribution.

    What are Open-Source TDM Tools Used For?

    For companies, open-source software sometimes makes a lot of sense. They may cost little to nothing to adopt, and if the software has an enthusiastic community, it can often receive free updates improving functionality and security for the foreseeable future. While the feature sets between different open-source test data management tools vary, you could reasonably expect them to do a mixture of the following tasks:

    • Model your data structure
    • Generate test data sets by subsetting
    • Generate synthetic data
    • Provide access rules to restrict who can view data
    • Integrate with a variety of common databases

    Some popular open-source tools in the test data management space include CloudTDMS, ERBuilder, Databucket, and OpenTDM.

    Issues with Open-Source TDM Tools

    For some purposes, the above list may cover all needs. But for businesses with more serious testing needs, there are several issues that can appear when using open-source tools, especially for test data management.

    Limited Functionality and Quality

    One of the core shortcomings of open-source tools is that they’re delivered “as is” at a pace that works for their developers. Unlike software with a team of paid developers, open-source does not guarantee future support. If the application doesn’t have a feature you need now, there’s a chance you may never get it. Unlike paid software, your requests for a new feature may carry no weight with the developers.

    With open-source test data management tools, this primarily creates issues in two areas. The first is user experience. Because these are often unpaid projects, time is a precious commodity. Consequently, development teams tend to spend more time on creating new features, with things like design and user experience being much lower priorities. Unless a designer wants to donate their time, the interfaces you use on the tool may be confusing, slow, or even broken in places.

    The second common issue is in reporting. Most open-source TDM tools come with a limited reporting capability at the very least. However, beyond small businesses with relatively small datasets, these reporting features might not be able to handle the complexity of a modern data environment. This can lead to inaccurate or misleading reporting, which can be especially damaging for businesses.

    Increased Compliance Risk

    Creating and using test data can carry substantial security and privacy risks, as it always begins with personally identifiable information. Under most modern data privacy laws, such as the GDPR or CCPA, documenting how your data is used is necessary for compliance. While you might worry that an open-source tool might leak your data, the reality is that you’ll usually be running such tools locally.

    Instead, it’s more important to consider how well the tool integrates with your existing privacy and security workflow. Is it easy to integrate? Or does the connection keep breaking with each update? Does it provide good visibility into what and how data is being used? Or is it something of a black box? That’s not to say these tools generally have poor connectivity, just that they may not have the full range of integrations and security features you might expect from paid software.

    No Guarantee of Long-Term Availability

    When volunteers run a project, its continued existence often depends on their willingness to keep working for free. While an end to their work might not immediately remove the product from the market, it will eventually fall behind other programs’ features and security. And that means you will eventually need to make a change to avoid security issues or get the latest technology.

    Some businesses will already be planning to upgrade their TDM solution regularly, so that might not be a big deal. For others, changing to something new, even if it’s a new open-source software, means costs in terms of retraining, lost productivity, and possible delays in development during the upgrade. That can be an enormous cost, and open-source solutions are more likely to shut down without significant notice than paid ones.

    Limited Support

    Service-Level Agreements are a huge part of the modern software experience. If something breaks during an upgrade, knowing that you have both easy-to-reach support and a money-back guarantee can provide significant peace of mind. With open-source software, you’re unlikely to have significant support options beyond posting on a forum, and you can forget about an SLA. That doesn’t mean that all open-source solutions are unreliable. However, if something breaks and your team can’t fix it, there’s no way of knowing when it will be fixed.

    How Mage Data Helps with Test Data Management

    For some companies, choosing an open-source test data management system will be a great move. But, some businesses need that extra layer of reliability, security, and compatibility that only paid software can provide. When evaluating these solutions, it’s important to understand the benefits and risks to choose the best option for your business. At Mage, we’ve built a solution designed to handle the most challenging TDM issues, from small businesses to multi-billion-dollar enterprises. Contact us today to schedule a free demo to learn more about what Mage can do for you.

  • Static vs Dynamic Masking: What to Choose?

    Static vs Dynamic Masking: What to Choose?

    Although both static data masking (SDM) and dynamic data masking (DDM) have been around for half a decade, there is still some general confusion as to how these tools differ. The problem is not that the technical differences are not well understood—they are. The deeper issue is that it is not always clear what kinds of situations call for a static data masking solution, and which call for a dynamic masking solution. It does not help that the companies selling these solutions tend to re-use the same tired examples every time they write about the topic.

    Although both approaches do more or less the same thing—they replace sensitive data with comparable but “fake” information—the details of where and how they do this differ, and that has some pretty big ramifications for how they should be used.

    Any organization that needs to protect sensitive data would do well to recognize when one or the other is needed. In larger organizations, both kinds of data masking may be in play.

    What is Static Data Masking?

    Static data masking (SDM) involves changing information in the source database—that is, changing information while “at rest.” Once the information is changed, the database is then used, or copied and used, in the various applications in which it is needed.

    SDM is often used to create realistic test data in application development. Instead of creating data out of whole cloth, the development team can create datasets that are realistic because they are literally generated from real production data—while still preserving the privacy of their users.

    SDM is also used when sensitive data needs to be shared with third parties, especially if that third party is located in a different country. By masking the data, relationships can be preserved while still protecting any sensitive or personal information.

    The beauty of SDM is that it is straightforward and complete. All of the data in question is replaced, so there is no way a person or application could somehow access the true data accidentally—nor could a malicious actor compromise the database. The data is protected “across the board,” without the need to configure access on a user-by-user or role-by-role basis.

    Example of a Use Case for Static Data Masking: A financial institution wants a third party to run an analysis on some of their data. The financial institution wants to protect sensitive information and financial information of its clients, and must also comply with laws about data crossing national boundaries. They mask the data in their database before giving the analytics firm access, ensuring that no sensitive data can possibly be accessed or copied.

    What is Dynamic Data Masking?

    Dynamic data masking (DDM) involves masking data on-demand at the point of use. The original data remains unchanged, but sensitive information is altered and masked on-the-fly. This allows for more fine-grained access control.

    Whereas SDM creates a copy of a database with masked data which teams then can access, DDM preserves access to the original database but modifies what a particular person can see. This means that the masked data a person sees with DDM is as close to real-time data as one could hope for, making it ideal for situations where someone needs to access fresh data but in a limited way.

    Example of a Use Case for Dynamic Data Masking: A large company might keep a large employee database that includes not only names and addresses, but Social Security numbers, direct deposit information, and more. An HR professional running payroll might need to access addresses and direct deposit information, but other HR professionals probably do not. What any given HR employee could see in the system would depend on a specific set of rules that masked data according to user or role.

    Because DDM allows organizations to enforce role-based access control, it is sometimes used for older applications that don’t have these kinds of controls built in. Again, think of older legacy HR databases, or customer service systems that might store credit card information.

    Static vs. Dynamic Masking: Main Differences

    Here is a summary, then, of some of the main differences between static data masking and dynamic data masking:

    Static Data Masking (SDM) Dynamic Data Masking (DDM)
    Deployed on Non-Production Deployed in Production
    Original data is overwritten Original data is preserved
    All users have access to the same masked data Authorized users have access to original data

    Key Questions to Ask When Deciding on a Data Masking Solution

    Many vendors tip their hand when discussing data masking solutions; it becomes obvious that they favor one or the other. Unsurprisingly, the one they favor is the particular kind that they sell.

    Here at Mage, we have data masking solutions of both types, static and dynamic. Our goal is to find the solution that best fits your use cases. Many times, it turns out that a large organization needs both—they simply need them for different purposes. It pays to engage a vendor that understands the small differences and is adept at implementing both kinds of solution.

    For example, here are some of the questions we might have a new client consider when trying to decide between SDM and DDM for a particular use case:

    • Do you require the data being masked to reflect up-to-the-minute changes? Or can you work with batched data?
    • Are you looking to implement role-based access? Or do you feel more comfortable with a more complete masking of the data in question?
    • How much of a concern is the protection of your production environment?
    • What privacy laws or regulations are in play? For example, do you need to consider HIPAA laws for protected health information (PHI)? Or regulations like the Gramm-Leach-Bliley Act (GLBA) and Sarbanes-Oxley Act (SOX) because you handle personal financial information (PFI)?
    • How are you currently identifying the data that needs to be masked? Is sensitive data discovery needed in addition to any masking tools?

    There are other considerations that go into selecting a data masking tool as well, but these questions will help guide further research into which particular type of masking your organization might need.

    And again, if it turns out you have a need for both, it is worth contacting us to discuss your needs and set up a demo. You can also see one of our masking tools in action.

     

  • Data Security Platform: Securing the Digital Perimeter

    Data Security Platform: Securing the Digital Perimeter

    In today’s data-driven world, organizations face increasing challenges in protecting sensitive information while ensuring compliance with stringent data privacy regulations. The exponential growth of data has also led to a higher risk of unauthorized access, breaches and cyber-attacks being faced by organizations. In such a scenario, protecting sensitive information is a top priority for businesses, and the use of Data Security Platforms (DSP) has emerged as a crucial component in the battle against data threats. This article delves into the significance of a DSP, its role in compliance with data privacy regulations, and the common challenges faced during adoption.

    What is a Data Security Platform?

    A Data Security Platform is designed to protect sensitive and valuable data from unauthorized access, breaches and other security threats. Gartner defines Data Security Platforms (DSPs) as products and services characterized by data security offerings that target the integration of the unique protection requirements of data across data types, storage silos and ecosystems.

    Gartner, in their report “2023 Strategic Roadmap for Data Security Platform Adoption” lists 6 capabilities required for a Data Security Platform (Fig 1)

    Fig.1

    Let us go through each of these capabilities in detail:

    Data Discovery and Classification

    Data Discovery and Classification involves the automated scanning and analysis of an organization’s data repositories to identify and categorize sensitive data. This process helps organizations understand where sensitive information resides, such as personal identifiable information (PII), financial data, intellectual property, or other confidential data.

    The data classification process tags data with relevant labels indicating its sensitivity level and compliance requirements. For example, data might be classified as “Confidential,” “Internal Use Only,” or “Public.” This classification enables organizations to enforce appropriate access controls, data protection measures, and data handling policies based on the data’s sensitivity. It also aids in compliance with data protection regulations since organizations can ensure that sensitive data is treated according to the applicable laws.

    Data Access Controls

    Data Access Controls are mechanisms that ensure only authorized users have appropriate access to specific data. This component plays a vital role in preventing unauthorized access to sensitive information, reducing the risk of data breaches and insider threats.

    Role-based access control (RBAC) is a common approach in data security platforms, where permissions are assigned based on the user’s role within the organization. Access rights can be granted or revoked based on job functions, ensuring that users only have access to data they need to perform their tasks.

    Data Access Controls work hand-in-hand with the data classification process, as the access privileges are often determined based on the sensitivity level of the data. Strong access controls help ensure that data is only accessible to authorized individuals and minimize the risk of data leaks or unauthorized disclosures.

    Data Masking

    Data Masking is the process of concealing original sensitive data by replacing it with realistic but fictional data. The purpose of data masking is to create a structurally similar version of the data without revealing the actual information. This is particularly important for non-production environments like testing or development, where real data is not needed.

    Data Masking is commonly used to protect sensitive data while ensuring that applications and processes can still function realistically with representative data. This prevents the exposure of actual sensitive data during testing or other non-production activities, reducing the risk of data breaches resulting from mishandling or accidental leaks in lower-security environments.

    Database Encryption

    Database Encryption involves converting plaintext data into ciphertext using encryption algorithms, rendering the data unreadable and useless without the appropriate decryption key.

    At-rest encryption ensures that data stored on disk or in a database is protected even if physical storage media is compromised. In contrast, in transit encryption safeguards data as it is transmitted over networks, preventing eavesdropping or interception by unauthorized parties.

    Database encryption adds an extra layer of security, making it significantly harder for attackers to access sensitive data, even if they gain unauthorized access to the underlying infrastructure.

    Database Activity Monitoring

    Database Activity Monitoring (DAM) is a real-time surveillance mechanism that captures and records user activities and behaviors related to database access and usage. It tracks queries, data modifications, login attempts, and other interactions with the database.

    DAM helps detect suspicious or unauthorized activities, such as unauthorized attempts to access sensitive data or unusual data access patterns. When abnormal behavior is detected, the system can trigger alerts to security teams, enabling them to respond promptly to potential security threats and prevent data breaches.

    Data Risk Analytics

    Data Risk Analytics involves the use of advanced analytics and machine learning techniques to assess security risks associated with an organization’s data environment. By analyzing patterns, trends, and historical data, this component can identify potential vulnerabilities and predict security risks before they escalate.

    Data Risk Analytics helps security teams gain insights into potential data security issues, such as weak access controls, suspicious user behaviors, or unsecured data repositories. These insights enable organizations to take proactive measures to strengthen their overall data security posture and mitigate potential risks before they lead to security incidents or data breaches.

    The Advantages of a Data Security Platform (DSP)

    In an era where data breaches and privacy concerns dominate headlines, organizations need to fortify their data security measures across the entire enterprise data landscape to safeguard their reputation, build customer trust, and sustain financial stability. A Data Security Platform (DSP) provides a centralized approach to data security, enabling businesses to efficiently manage data protection across various systems and applications. It serves as a comprehensive solution that comprises various components enabling data security across the sensitive data lifecycle. . By adopting a DSP, organizations can

    Figure 2

    Ensuring Compliance with Data Privacy Regulations

    The implementation of a DSP significantly aids organizations in complying with various data privacy regulations:

    GDPR Compliance

    The GDPR mandates stringent data protection measures, including data minimization, purpose limitation, and user consent management. A DSP helps organizations meet these requirements by implementing encryption, access controls, and consent management mechanisms.

    CCPA and Other Privacy Regulations

    The California Consumer Privacy Act (CCPA) and similar regulations empower individuals with greater control over their personal information. A DSP enables organizations to manage user preferences, handle data subject requests, and maintain auditable records for compliance.

    Emerging Regulations

    As new privacy regulations continue to emerge globally, a DSP provides a future-proof solution by offering flexibility and scalability to adapt to evolving compliance requirements. This ensures organizations can stay ahead of the regulatory curve.

    Overcoming Challenges during DSP Adoption

    While adopting a DSP offers significant advantages, organizations may face certain challenges:

    Integration Complexity

    Integrating a DSP with existing IT infrastructure and applications can be complex. To overcome this challenge, organizations should carefully plan the integration process, seek vendor support, and collaborate closely with IT teams to ensure a seamless deployment.

    Employee Training and Awareness

    The successful adoption of a DSP depends on the knowledge and awareness of employees. Organizations should invest in comprehensive training programs to educate employees about the DSP’s functionalities, data protection best practices, and the importance of compliance.

    Balancing Security and Usability

    Organizations may face the challenge of balancing data security measures with usability and productivity. It is crucial to strike the right balance by implementing security controls that protect data without hindering operational efficiency.

    Keeping Pace with Changing Regulations

    Data privacy regulations continue to evolve, necessitating ongoing monitoring and updates to the DSP. Organizations should stay informed about regulatory changes, actively engage with legal and compliance experts, and collaborate with the DSP vendor to ensure the platform remains up to date.

    Conclusion

    In an era where data security and compliance with privacy regulations are critical, a Data Security Platform (DSP) emerges as a comprehensive solution for organizations. By adopting a DSP, organizations can fortify their data security measures, ensure compliance with regulations, and mitigate the risks associated with data breaches. Although challenges may arise during adoption, proactive planning, employee training, and ongoing monitoring can help organizations overcome them and achieve data security excellence in today’s complex digital landscape.

    At Mage Data, we focus our efforts on empowering organizations with the tools and technologies to secure their data throughout its lifecycle – from creation and storage to processing and transmission. With Mage Data, you get access to a Data Security Platform that has been ranked as the Gartner Peer Insights Customer’s Choice for 3 years in a row and has also been named as an Overall Leader for Data Security Platforms by KuppingerCole. If you’re on the lookout for a comprehensive Data Security Platform that meets your organization’s IT strategic goals, feel free to reach out.

  • Data Privacy Solutions for the Healthcare Industry

    Data Privacy Solutions for the Healthcare Industry

    Gaps in data privacy regulation and a lack of trust in existing systems plague the healthcare industry. 80% of patients surveyed say they will not return to the same provider after an experience that caused them to lose trust. Providers must rebuild trust in healthcare, and data privacy is an important place to start.

    Legislators continue to work on mandates enforcing the privacy of protected health information (PHI). At the same time, new legislation forces organizations to make health information available and portable, giving patients greater access to their electronic data. These competing ideas create a compliance minefield for organizations to navigate.

    Healthcare organizations manage large, complex data sets. It’s already a challenge to maintain mountains of information, and the need to make information available complicates things further. Patients must be able to access their own health information with ease, but the unauthorized parties absolutely must be denied access. Portability can’t come at the expense of data privacy.

    Data Privacy Issues in the Healthcare Industry

    As the size of data grows, so does the median size of data breaches, reports the HIPAA Journal. Since 2009, the Department of Health and Human Services (HHS) has received more than 4,400 reports of significant breaches—and that only counts sizable breaches where 500 or more records were compromised.

    Overall, the number of lost, stolen, or improperly exposed healthcare records is greater than 314 million, reports the HIPAA Journal. To get a grasp of the scale of this problem, consider that the entire population of the United States is roughly 330 million people.

    The HIPAA Journal also reports more frequent hacking and related IT incidents, a 9.4X increase from 2015 to 2021. To keep pace with the sharp uptick in incidents, the Office for Civil Rights (OCR) has to pass down more penalties. The reported volume of OCR Penalties for HIPAA enforcement jumped 600% between 2010 and 2020.

    Offending healthcare organizations pay fines, settlements, and civil monetary penalties (CMPs). The total cost of such payments has exceeded $10s of millions. Healthcare breaches are dramatically more common and more costly.

    Fortunately, healthcare IT professionals can implement data privacy solutions to mitigate or eliminate risks. This is the way to protect patient information at scale.

    Key HIPAA Rules for Healthcare Privacy

    There are two key HIPAA rules to consider for healthcare privacy:

    The HIPAA Privacy Rule (45 CFR §164.530) protects PHI and medical records by limiting unauthorized use and disclosure. Patients must be able to inspect their records and make changes to their files.
    The HIPAA Security Rule (45 CFR §164.308) defines standards, methods, and procedures to protect PHI. These standards relate to data storage, accessibility, and transmission.
    These rules begin to address common data privacy challenges in the healthcare industry. HIPAA’s rules for healthcare data privacy exist to protect patients, but they also put organizations between a rock and a hard place.

    For example, hospitals can’t eliminate information or print it for storage in a secure facility – doing so would deprive patients of convenient access to their health data. On the other hand, the information can’t be too accessible to the point where unauthorized parties gain access. Organizations have to protect the privacy of data from some parties while removing all barriers to access for the appropriate patients.


    Beyond HIPAA: Patient Consent and Evolving Policies

    HIPAA rules like the two above are the bare minimum for data privacy in the healthcare industry. HIPAA preempts and overrides less protective privacy laws. It does not, however, affect laws that protect privacy more thoroughly. That is, healthcare providers may also be held accountable to other standards even beyond HIPAA.

    Federal and state laws may impose additional requirements on healthcare organizations. For example, they may regulate the ways patients consent to information disclosure. Evolving legal factors play a big part in the way healthcare IT professionals practice data privacy. Different organizations face unique challenges, but there are several common themes

    Common Healthcare Data Privacy Challenges

    Electronic Health Records (EHRs) and Health Information Exchanges (HIEs) present healthcare data security challenges. Patients must get easy access to their information, but the same information must be inaccessible to unauthorized parties. The IT infrastructure must be simple enough to be efficient, and advanced enough to repel sophisticated threats.

    The Health Information Technology for Economic and Clinical Health (HITECH) Act adds complexity. HITECH has since been folded into HIPAA. The legislation encourages transparent sharing of medical information across numerous providers. The high portability of information allows patients to receive care from any number of providers. Unfortunately, it also creates a colossal attack surface.

    Healthcare Data Privacy Solutions

    IT professionals ensure healthcare data privacy by implementing the appropriate privacy-enhancing technologies (PETs). Healthcare PETs offer multiple ways to maintain privacy.

    • Data discovery
    • Data access control
    • Database activity monitoring
    • EHR data masking
    • Healthcare organizations can use these techniques in concert. Comprehensive data privacy plans protect patients and maintain compliance.

    Data Discovery

    Sensitive information can’t be encrypted or masked until someone knows where it is. Data discovery solutions uncover sensitive information in obscure locations throughout an organization. The most complete technologies account for structured data, unstructured data, big data, and the cloud.

    Authentication, Access Control, and Activity Monitoring

    Restricting access to authorized users is at the core of data privacy. Healthcare IT professionals typically use access rights automation and database activity monitoring. Responding to Right to Access and Right to Erasure requests is faster after automating data subject access rights. Database activity monitoring provides audit-ready reporting at all times.

    Data Masking in Healthcare

    Data must be secure in all states for EHR data masking to work. That is, data masking solutions must protect data at rest, in transit, and in use.

    Static data masking techniques protect data in pre-production and non-production environments. Such techniques include encryption and tokenization. Dynamic data masking techniques provide control of sensitive data in production environments. This is critical when protecting PII, PHI, and other sensitive data in the most vulnerable states: in-transit and in-use.

    Patented Vs Open-Source Data Masking in Healthcare

    There are open-source data masking solutions. Unfortunately, native and open-source anonymization solutions have three limitations:

    • Scalability across databases
    • Flexibility of anonymization methods
    • Analytics access to achieve valuable insights without sacrificing privacy

    It isn’t always enough to tick individual boxes on a data privacy checklist. Patented solutions take the more holistic, comprehensive approach to healthcare data privacy.

    Healthcare Data Privacy: The Big Picture

    Healthcare information systems use various PETs to meet their unique needs. Data must be kept private while retaining its utility for all relevant processes. Working across multiple database platforms requires portability and flexibility, but also consistency.Healthcare data privacy solutions must be highly scalable and maintainable. The data landscape continues to sprawl and data volume is constantly growing. Outsourcing privacy-enhancing technologies is the fastest way to implement effective healthcare cybersecurity measures.

    Choosing Privacy-Enhancing Technologies for the Healthcare Industry

    Privacy-enhancing technologies help healthcare organizations achieve optimal data privacy without forsaking utility. The first step is discovering all sensitive information. From there, cybersecurity implementers can use encryption, tokenization, and masking to secure data. Consistent scanning brings threats to light immediately, and associated reporting demonstrates compliance. Schedule a demo with MAGE DATA to see how our tools can ensure the privacy and security of your healthcare data.

  • What to Look for in a Sensitive Data Discovery Tool

    What to Look for in a Sensitive Data Discovery Tool

    Selecting the right sensitive data discovery tool for your organization can be challenging. Part of the difficulty lies in the fact that you will only get a feel for how effective your choice is after purchasing and implementing it. However, there are things you can do to help maximize your return on investment before you buy by focusing your attention on the right candidates. By selecting your finalists based on their ability to execute on the best practices for sensitive data discovery, you can significantly increase the odds that your final choice is a good fit for your needs.

    Best Practices for Sensitive Data Discovery

    Of course, you can’t effectively select for the best practices in sensitive data discovery without a deep understanding of what they are and how they impact your business. While any number of approaches could be considered “best practices,” here are four that we believe are the most impactful when implementing a new sensitive data discovery system.

    Maximize Automation

    While more automation is almost always good, when it comes to sensitive data discovery, there’s a big difference between increasing automation and maximizing automation. In an ideal world, your data team would configure the rules for detecting personally identifiable information once and then spend their time on higher-value activities like monitoring and reporting. But there’s more to automation than just data types. Is the reporting automated? Does the system work well with the system that handles “right to be forgotten” requests? Any human-driven process is likely to fail when scaled up to millions or billions of data points. Success in this area means finding a solution that maximizes automation and minimizes the burden on your team.

    Merge Classification and Discovery

    Data must be classified before its insights can be unlocked. Despite its similarities to data discovery, data classification is sometimes handled by a different department with different tools. A potential downside of that approach is that a key stakeholder gets a report from each department and asks why the numbers don’t match. As a result, your team is forced to spend time reconciling the different tools’ output—which is not a great place to expend resources. An easy way to fix this problem is to use a single tool to perform both processes. If that’s not a viable approach, ensuring the tools are integrated to produce the same results can be a great way to ensure that your company has a unified and consistent view of its data.

    Develop a Multi-Channel Approach

    One trap that companies sometimes fall into is believing that the discovery process is over once data from outside the company is identified and appropriately secured on the company network. This approach neglects one of the biggest sources of risk when it comes to data: your employees. Are you monitoring your employee endpoints like laptops, desktops etc. for personally identifiable information? If so, are you able to manually or automatically remedy the situation? You won’t always be able to stop employees from making risky moves with data. However, with a multi-channel approach to sensitive data discovery, you can monitor the situation and develop procedures to limit the damage.

    Create Regular Risk Assessments

    Identifying your sensitive data is only the first step in the process. To understand your company’s overall risk, you must deeply understand the relative risk that each piece of sensitive information holds. For example, data moved across borders holds significantly more risk than that in long-term cold storage. Databases that hold customer information inherently have more risk than those holding only corporate information. To meaningfully prioritize your efforts in securing data and optimizing your processes, you need regular risk assessments. At scale, this can be difficult to do on your own—so your sensitive data discovery software either needs to do it for you or have a robust integration with a program that can.

    Choosing the Right Sensitive Data Discovery Software

    While there are many possible ways to select sensitive data discovery tool , the best practices we’ve covered offer a good starting place for most businesses. Remember that the features that one software package has vs. another is not necessarily as important as how those features support your business objectives. Maximizing automation, merging discovery and classification, developing a multi-channel approach, and creating regular risk assessments all have relatively little to do with the actual mechanics of data discovery—but they can all make a huge difference when building a healthy, secure company. There are a lot of different sensitive data discovery solutions that can solve your immediate problem. However, they may not do it in a way that holistically improves your business.

    Another important point is that data discovery is the first step in the data lifecycle that runs all the way to retirement. You could use a different tool for each stage of the process, but the end result would be a system with multiple independent parts that may or may not work well together. Ideally, you would be able to handle data throughout the lifecycle in one application. That’s where Mage Data comes in.

    How Mage Data Helps with Sensitive Data Discovery

    Mage Data’s approach to data security begins with robust data discovery through its patented Mage Sensitive Data Discovery tool, which is powered by artificial intelligence and natural language processing. It can identify more than 80 data types right out-of-the-box and can be configured for as many custom data types as you need.

    But that’s only the start of the process. Mage’s Data Masking solutions provide powerful static and dynamic masking options to keep data safe during the middle of its lifecycle, and Data Minimization tool helps companies handle data access and erasure requests and create robust data retention rules. Having an integrated platform that handles all aspects of data security and privacy can save you money and be far simpler to operate than having different platforms for different operations. We believe that it shouldn’t matter if you’re a small business or enterprise – your data solutions should just work. To learn more about how Mage Data can help you with sensitive data discovery, schedule a demo today.

  • Why Data Breaches Are So Costly…And So Difficult to Prevent

    Why Data Breaches Are So Costly…And So Difficult to Prevent

    No one in a large organization wants to hear the news that there has been a data breach, and that the organization’s data has been compromised. But many are reluctant to spend a significant portion of their budget on appropriate preventative measures. Why?

    The reason usually comes down to two misconceptions. Either the leadership of the organizations assumes that a data breach is unlikely, or that, if a breach were to happen, their risk exposure would be minimal and the problem easily fixed.

    The truth is that, today, data breaches are inevitable…and much more costly. Companies are often much more exposed than they know, which means that the potential costs of data compromise are much higher than assumed—and so is ROI of preventive measures.

    Data Breaches Are Inevitable

    In 2022, there were over 1,800 data compromises of U.S. companies alone, impacting some 422 million individuals. This is four times the number of compromises reported just a decade ago.

    Think about this risk as you would a similar risk, such as a fire at a building or a plant. As the saying goes, companies don’t carry insurance because they think something bad might happen—they get insurance because bad things do happen. On a long enough timeline, it’s a virtual guarantee that something bad will strike your business. Yes, fires are rare, but they happen, and they are devastating. The same goes for data breaches.

    But here is one important way in which a fire is different from a data breach. The risk of a fire scales linearly with the number of locations you have; the risk that unsecure data poses to your business scales exponentially, even if you have a small number of total records. As a result, many companies’ data management practices may create millions or hundreds of millions of dollars of risk. Most are not even aware of it.

    Systems Are Complex, and There is More Risk Than You Imagine

    Gone are the days when a company has a server or two in a server closet, housing their data. Today’s companies have multiple connected systems, many of which are spinning up cloud environments and transferring data on a daily basis.

    In these scenarios, data duplication creates a huge risk for companies should their systems be compromised. For example, a single company might have both client records and employee records, all of which are duplicated in a live “production” environment, a testing environment, and a data lake for analytics purposes. A single breach could potentially expose all of this data, multiplying the risk.

    (For a more full accounting of the math here, see our whitepaper on the ROI of Risk Reduction, now available for download.)

    What is the Actual Cost of Exposed Data?

    So data compromise is inevitable, and companies have richer stores of data these days. They real question is this: Does the cost associated with a data breach exceed the budget needed to prevent one?

    One of the very best resources for understanding what drives the cost behind a data breach is IBM’s annual Cost of a Data Breach report. The worldwide average cost of a breach in 2021 was $4.24 million, the highest average total cost in the history of the report. That works out to about $180 per record for customer information, and $176 per record for employee data.

    Importantly, it wasn’t just direct remediation costs that contributed to this total. Thirty-eight percent of the total cost was attributable to “customer turnover, lost revenue due to system downtime, and the increasing cost of acquiring new business due to diminished reputation,” which suggests that the pain caused by a breach lasts for years beyond the initial incident.

    Again, having duplicate records drives up costs here. A single customer, for example, might be tied to data that “lives” in several systems, both production environments and non-production environments. Which means that a single customer is not just $180 worth of risk, but potentially 4-5 times that amount.

    Prevention Needs to be Modern, Too

    In short, data breaches are much larger and more complex than they were even a decade or two ago. That also makes them much more costly. It also means that the methods for preventing breaches and reducing risk need to be similarly modern and complex.

    For example, data discovery needs to be a part of any security efforts. Discovering all databases and all instances of records in a working organization can be a massive challenge; AI-based tools are now necessary to both find and identify all the data in play.

    Once data is discovered, there are various tools that can be used to protect that data, including encryption, masking, and access controls. Which tools are appropriate for which data sets depend on factors such as how often the data needs to be accessed, who will need to access it, and system performance requirements.

    That said, there is a set procedure that should be followed to reduce the risk of exposure. Here at Mage Data, we’ve honed that procedure over the years; in some cases, we can reduce the dollar-amount risk by more than 90%.

    To see what this procedure is, and to see the math behind this reduction of risk, download our white paper, The ROI of Risk Reduction for Data Breaches.

  • Seven Key Test Data Management Metrics to Understand and Track

    Seven Key Test Data Management Metrics to Understand and Track

    If your organization performs software testing, then you probably use test data in some form. But is your process for producing, updating, and implementing that test data adding net value for the organization? Or is it a hidden roadblock to producing quality software?

    Unfortunately, most organizations assume that simply having some sort of Test Data Management (TDM) process in place is sufficient. True, doing some Test Data Management is better than doing none at all. (And there are telltale signs that an organization needs Test Data Management to move forward.) But even with a Test Data Management program in place, it’s important to set up appropriate metrics and KPIs to ensure that your Test Data Management efforts are actually producing the kind of test data needed for optimal quality control.

    Why Are Metrics Needed for Test Data Management?

    Managing test data can be challenging, especially in large and complex software systems. Many TDM projects fail because the processes involved work at first, but erode over time. For example, the test data could lose its “freshness,” or the request process is not robust enough to create effective data sets.

    This is why it is important to gain insight into the TDM process by collecting various metrics. Some of these can be captured using Test Data Management tools, while others will require some customized reporting. But the more complete your picture of the Test Data Management process is, the better your organization will be able to keep its testing process on-track and delivering according to schedule.

    7 Key Test Data Management Metrics

    Here, then, are seven key metrics to consider for tracking your Test Data Management capabilities. These can be split into two categories: Metrics that measure the test data itself and its quality, and metrics for the testing process.

    Metrics for Test Data Quality

    Data completeness. Data completeness is a measure of how well test data covers scenarios from production domains. This can especially be a concern if test data is created via subsetting, or by creating synthetic data. Special cases exist in all data sets and workflows, and those cases need to be represented in test data. There also need to be appropriate boundary cases, null cases, and negative-path cases as well. Otherwise, testing will not be sufficient.

    Data quality. While data completeness is a measure of which cases are covered, data quality is a measure of how well the test data respects the rules and constraints of the database schema, as well as the application being tested. In other words, it is a measure of how well the test data “matches” production data, which in turn affects how well the data will turn up bugs with consistency and reliability.

    Data freshness (data age). Aspects of data sets change over time; using test data that accurately represents the freshest production data is thus crucial. For example, the demographics of clients in a client database might shift over time, or account activity might change as new interfaces and new products are introduced. The freshness of one’s test data can be measured in terms of the age of the data itself, or in terms of the rate at which new test data is generated (the refresh rate).

    Metrics for Test Data Management Processes

    Data security. To what degree do the processes for generating and using test data ensure the security of the original production data? How is test data itself kept secure and compliant? Data security metrics should give numeric proof that datasets are handled in such a way as to keep sensitive information secure and its use compliant with local laws and regulations.

    Test cycle time. Test cycle time is the total time for a testing cycle to complete, from request to test data creation to actual testing and validation. The goal is to reduce test cycle time as much as possible without sacrificing quality—by introducing automation, for example.

    Data request % completion. Are all requests for reliable test data being met? Data request % completion is the other side of the coin from test cycle time; while cycle time measures the average speed of provisioning, data request % completion measures how many requests are actually being met in a timely matter.

    Test effectiveness. If all of the above metrics were to improve within an organization, then overall test effectiveness should improve as well. So, even though test effectiveness is a lagging indicator of the quality of test data and Test Data Management, it is important to track as effectiveness is what will ultimately affect the bottom line.

    Here, test effectiveness is simply a count of the number of bugs found during a period of testing, divided by the total bugs found (that is, both bugs found during testing and bugs found after shipping/deployment). For example, if all bugs are found during testing and none in production, testing effectiveness is 100%. If testing only reveals half of the bugs, with the other half discovered after deployment, testing effectiveness is 50%. The higher test effectiveness is, the better: Catching bugs in testing often makes remediation orders of magnitude cheaper than if caught in production.

    How Mage Data Helps with Test Data Management

    If you have a Test Data Management strategy in place, you’ve already taken the first step in the right direction. Now it is important to start collecting the right metrics, measuring your efforts, and bringing on board the right tools to improve the process.

    Mage’s Data own Test Data Management solution ensures the security and efficiency of your test data processes with secure provisioning of sensitive data across teams, wherever required and whenever needed. This tool allows organizations to rapidly provision high-quality test data, allowing your TDM efforts to scale while staying secure.

    To learn more about how Mage Data can help solve your Test Data Management challenges, contact us today to schedule a free demo.