Test Data Management
Frequently Asked Questions (FAQs)
Testing software for bugs or issues has gained more prominence in the recent years with the rapid pace of development and launch of new applications. To test effectively, it is imperative to obtain test data that is representative of production data yet anonymized to avoid unnecessary exposure to any inherent sensitive data and easily accessible or deliverable to testing times. This prevents unwanted bottlenecks in the testing process and thereby makes the rollout of new applications smoother. All above mentioned features are covered under the domain of Test Data Management.
Test Data Management serves a critical function in the application development and testing lifecycle. With the increasing adoption of Agile Development methodologies in traditional software development pipelines, the consequences of improper testing can mean the occurrence of more bugs in production, applications failing to meet regulatory compliance and privacy rules, higher testing and development costs leading to a negative impact on performance with the bad press and potential revenue loss that follows.
A good TDM solution will ensure complete & accurate discovery of any sensitive data contained within the database, ensure flexible & consistent anonymization of production data, provide scalable delivery options and enterprise-wide deployment with transparent governance of the processes involved.
Organizations have difficulties ensuring the best TDM practices across their enterprise data landscape through issues faced with the reliability of Data Discovery scripts, limited data masking options and the lack of scalable deployment approaches. This could lead to inconsistent masking leaving sensitive data unprotected and ultimately leading to delays in delivery of test data. A slow pace of test data provisioning can cause significant time delays in application delivery, especially if done manually. Development teams with use cases that require time-sensitive test data can also face issues if accurate, representative data is not available. Data masking , if not done right, can add to the complexity of the test data management process and lead to increased costs.
An industry ready TDM solution must enable the following functionalities:
-100% discovery of sensitive data
-Rapid provisioning of high-quality, anonymized test data
-State of the art data protection mechanisms
-Scalability across the enterprise
-Ready to deploy across On-prem, Cloud and SaaS applications
Without an industry-ready data masking solution, any Test Data Management approach will leave sensitive data exposed and unprotected during the testing process. This can lead to non-compliance with the stringent data privacy and protection regulations established in most countries around the world. Data masking techniques used in Test Data Management should aim at ensuring that any risk of a data breach in test environments is completely neutralized. As a best practice, the masking of sensitive data should involve purpose-built methods that ensure data usability while maintaining referential integrity and preserving data security.
Each TDM solution provider approaches the space from different angles based on their innate strengths and capabilities. Expertise or compatibility in the fundamental approaches to TDM, with capabilities that include data discovery, data masking, automation, ease of use and integration of additional test data capabilities are important factors to consider when evaluating TDM solutions. Over the years, the TDM market has evolved with the adoption of new initiatives to be a critical business enabler for enterprises, hence the focus is now on providing an efficient and flexible approach to TDM that contributes positively to good business outcomes.
Inefficient TDM practices are a constraint to DevOps success if weak solutions are used. Taking shortcuts with sensitive test data in order to save time can increase the risk of accidental or malicious leaks. Further, poorly secured test systems also run the risk of exposing production data to newer vulnerabilities. In such cases, enterprises must leverage the wide-encompassing security and privacy capabilities offered by a TDM solution to ensure a good testing experience.
The ease and effectiveness with which a TDM solution distributes test data is an important indicator of DevOps success. If the organization follows Agile methodology for development, test data must be provisioned continuously to the teams and refreshed during each sprint. Automating the provisioning of test data sets to make them ready for use whenever and wherever they're needed within the development pipeline is also gaining wide acceptance. Reducing the time taken to deliver high-quality test data to a developer or tester is critical to ensuring a good development and testing experience.
With the increasing rate of development of new applications and the presence of stringent data security and privacy regulations in place worldwide, good Test Data Management solutions will provision the right test data to testing teams at the right time, ensuring the anonymization of any sensitive data involved by complying with applicable privacy regulations as required. These measures ensure reliable development of applications, positively impacting performance and scalability when application goes into production.
The Mage TDM solution provides native support for multiple approaches including Database Cloning, Data Subsetting, Sensitive Data Discovery, Data Masking, and Synthetic Data Generation.What are the differentiators of Mage's Test Data Management solution?
Mage has over 80+ anonymization options that include masking, encryption, tokenization that not only preserves the formats but also the context of the data. For example, "John" would be masked to "Jack" and not "Name01". It ensures it preserves the context, demographics and any validations that are required.
A patented discovery solution helps in identifying all locations of sensitive data in customer’s non-production systems. This helps in maintaining referential integrity within and across data sources and applications.
Our market leading static data masking solution helps in elimination of sensitive data in non-production. Our Static data masking solution typically changes the underlying data be it in structured or unstructured or cloud data sources. This ensures that no application developer, or tester have access to sensitive data in non-production and pre-production environments.
To help customers with enterprise-wide deployment and roll out of data anonymization solution, Mage provides various approaches to static data masking depending upon the application infrastructure.
Static data masking can be done in four different approaches including In-place, In-transit, On-demand and As-it-happens. Users (Application/QA Testers) will find the data useable for application development and testing purpose and will not be able to differentiate between production or masked data.
The Mage TDM solution has multiple different approaches to provision secure test data including:
In-Place: Production data is cloned, and the cloned data is anonymized.
In-Transit: Production data is cloned, and while being copied to the target data store, the sensitive data is anonymized
ETL Integrations (via API) - Mage integrates into existing ETL tools and anonymizes the data as it is extracted, masked and loaded into the target data store.
Synthetic Data Generation: Mage anonymizes the operational data while keeping transactional data intact.
Mage’s patented sensitive data discovery solution identifies the locations of all sensitive data even in hard to find places like complex columns, free text fields, description fields etc. with the help of sophisticated methods and pathways. The Data Discovery module does not stop with data scans but extends its capability to scan code to find sensitive data locations and users. These Mage mechanisms ensure data integrity within a dataset.
The discovery process generates metadata (sensitive data and locations) from all connected data sources. The generated metadata is stored in the Mage Engine as a centralized repository. This shared metadata architecture is used for downstream de-identification and ensures data integrity across related data sets.
Mage has 11 different methods (pathways) with corresponding scores for finding sensitive data, with the pathways and the corresponding scores being customizable. Mage supports 81 out-of-the-box data classifications which are pre-configured with appropriate pathways and scorecards, based on the rich experience that has been built over the last 16 years. All common personal data are mostly available out-of-the-box and any other data classification can be configured. E.g., for a leading medical device manufacturing company, we have configured custom classifications like patient ID, employee ID, and MedicWhat are the strengths of the Static Data Masking capability available with Mage's TDM solution?
The Mage Static masking product is equipped with over 70 masking functions which can mask data in-place or in-transit as well as file data such as CSV or JSON and can be automated so that it masks data as it enters your system by maintaining referential integrity. Each data classification can be equipped with a masking method which can even be assigned automatically using fuzzy logic. The assigned masking methods can be saved along with respective data classifications and the locations of any identified sensitive data as a reusable masking template. Static masking can also be integrated with other processes such as data integration and ETL via API's.
Static data masking can be done in four different approaches such as:
In-place: Ensures data residency as sensitive data never leaves the database.
In-transit: Data is offloaded from Source Instance, scrambled in Mage engine, and offloaded to lower environments or back to the source database.
On-demand: Large datasets which take too much time to mask in place can be masked using this method. Data is extracted in a masked format. This helps overcoming infrastructure constraints.
As-it happens: Only incremental data is scrambled, as the data is inserted into tables. The masked data will be realistic and will preserve the format and characteristics of the source data.
Mage supports data Sub-setting using the data-time stamp of the record to limit the amount of data to be scrambled on the target database. Mage also supports sub-setting at the table level
Mage creates real-like data in the place of original data. The dependencies will stay in-tact, and the masked data will be useable for application and testing purposes.
Mage masking methods preserves application and data integrity. The masked data will be realistic and will preserve the format and characteristics of the source data. Users (Application/QA Testers) will find the data useable for application development and testing purpose and will not be able to differentiate between production or masked data.
Mage provides various tokenization methods out of the box. We provide either length preserving tokens or fixed tokens.
The product supports the technique of pseudonymization - i.e., the replacement of direct identifiers within a data record by one or more artificial identifiers, or pseudonyms.
The current version of the Mage Platform– R22.1 supports generation of anonymized data by replacing the operational data with fake, but realistic data, sourced from curated data repositories. The re-identification risk is greatly reduced by leveraging large repositories and added noise. The result set replicates all the real-world scenarios and retains the characteristics and complexities of the original data.
Mage R22.2 will leverage AI/ML models such as Generative Adversarial Networks (GAN) and variational autoencoders (VAE) to identify the data distribution (Age, gender, nationality, etc. ) of the customer production instance and then build a fake universe of synthetic data that accurately represents the customer’s actual data distribution. GA : Q3 2022
Mage R23 will extend the synthetic data generation footprint to all transaction tables, but are restricted to simple data models only (Complex data models such as Oracle/Peoplesoft/SAP ERPs won’t be included for synthetic data generation at the application wide): GA : Q4 2022
Along with multiple recognitions received from renowned research houses for the range of products offered as part of the Mage Test Data Management solution, Mage Data is also one of only two vendors to be named as the Gartner Peer Insights Customers' Choice for Data Masking consecutively for the last 3 years, receiving the highest overall score among all participating vendors.
Mage offers a complete solution for test data management, providing native options for multiple approaches to TDM including data subsetting, database cloning, synthetic data generation, sensitive data discovery, data masking and support for data virtualization (through integration with database virtualization vendors).
Mage's market-leading Sensitive Data Discovery solution utilizes a patented approach that uses several different discovery methods enabling organizations to locate all sensitive data accurately wherever it may be present with virtually no manual intervention and minimal false positives.
The award-winning Data Masking product provided by Mage is equipped with over 70 masking functions, almost all of which maintain referential integrity that can mask data in-place or in-transit and can be automated to mask data as it enters the system.
Mage provides a ‘Test Run’ option to verify the masking validations. After a successful validation of the test run, the actual run can be executed. Mage also provides a detailed log of the entire masking activities.