Test Data Management
Frequently Asked Questions (FAQs)
Test Data Management serves a critical function in the application development and testing lifecycle. With the increasing adoption of Agile Development methodologies in traditional software development pipelines, the consequences of improper testing can mean the occurrence of more bugs in production, applications failing to meet regulatory compliance and privacy rules, higher testing and development costs leading to a negative impact on performance with the bad press and potential revenue loss that follows.
A good TDM solution will ensure complete & accurate discovery of any sensitive data contained within the database, ensure flexible & consistent anonymization of production data, provide scalable delivery options and enterprise-wide deployment with transparent governance of the processes involved.
Organizations have difficulties ensuring the best TDM practices across their enterprise data landscape through issues faced with the reliability of Data Discovery scripts, limited data masking options and the lack of scalable deployment approaches. This could lead to inconsistent masking leaving sensitive data unprotected and ultimately leading to delays in delivery of test data. A slow pace of test data provisioning can cause significant time delays in application delivery, especially if done manually. Development teams with use cases that require time-sensitive test data can also face issues if accurate, representative data is not available. Data masking , if not done right, can add to the complexity of the test data management process and lead to increased costs.
Without an industry-ready data masking solution, any Test Data Management approach will leave sensitive data exposed and unprotected during the testing process. This can lead to non-compliance with the stringent data privacy and protection regulations established in most countries around the world. Data masking techniques used in Test Data Management should aim at ensuring that any risk of a data breach in test environments is completely neutralized. As a best practice, the masking of sensitive data should involve purpose-built methods that ensure data usability while maintaining referential integrity and preserving data security.
Each TDM solution provider approaches the space from different angles based on their innate strengths and capabilities. Expertise or compatibility in the fundamental approaches to TDM, with capabilities that include data discovery, data masking, automation, ease of use and integration of additional test data capabilities are important factors to consider when evaluating TDM solutions. Over the years, the TDM market has evolved with the adoption of new initiatives to be a critical business enabler for enterprises, hence the focus is now on providing an efficient and flexible approach to TDM that contributes positively to good business outcomes.
Inefficient TDM practices are a constraint to DevOps success if weak solutions are used. Taking shortcuts with sensitive test data in order to save time can increase the risk of accidental or malicious leaks. Further, poorly secured test systems also run the risk of exposing production data to newer vulnerabilities. In such cases, enterprises must leverage the wide-encompassing security and privacy capabilities offered by a TDM solution to ensure a good testing experience.
The ease and effectiveness with which a TDM solution distributes test data is an important indicator of DevOps success. If the organization follows Agile methodology for development, test data must be provisioned continuously to the teams and refreshed during each sprint. Automating the provisioning of test data sets to make them ready for use whenever and wherever they're needed within the development pipeline is also gaining wide acceptance. Reducing the time taken to deliver high-quality test data to a developer or tester is critical to ensuring a good development and testing experience.
The Mage TDM solution provides native support for multiple approaches including Database Cloning, Data Subsetting, Sensitive Data Discovery, Data Masking, and Synthetic Data Generation.What are the differentiators of Mage's Test Data Management solution?
Mage’s patented sensitive data discovery solution identifies the locations of all sensitive data even in hard to find places like complex columns, free text fields, description fields etc. with the help of sophisticated methods and pathways. The Data Discovery module does not stop with data scans but extends its capability to scan code to find sensitive data locations and users. These Mage mechanisms ensure data integrity within a dataset.
The discovery process generates metadata (sensitive data and locations) from all connected data sources. The generated metadata is stored in the Mage Engine as a centralized repository. This shared metadata architecture is used for downstream de-identification and ensures data integrity across related data sets.
Mage has 11 different methods (pathways) with corresponding scores for finding sensitive data, with the pathways and the corresponding scores being customizable. Mage supports 81 out-of-the-box data classifications which are pre-configured with appropriate pathways and scorecards, based on the rich experience that has been built over the last 16 years. All common personal data are mostly available out-of-the-box and any other data classification can be configured. E.g., for a leading medical device manufacturing company, we have configured custom classifications like patient ID, employee ID, and MedicWhat are the strengths of the Static Data Masking capability available with Mage's TDM solution?
The Mage Static masking product is equipped with over 70 masking functions which can mask data in-place or in-transit as well as file data such as CSV or JSON and can be automated so that it masks data as it enters your system by maintaining referential integrity. Each data classification can be equipped with a masking method which can even be assigned automatically using fuzzy logic. The assigned masking methods can be saved along with respective data classifications and the locations of any identified sensitive data as a reusable masking template. Static masking can also be integrated with other processes such as data integration and ETL via API's.
Mage supports data Sub-setting using the data-time stamp of the record to limit the amount of data to be scrambled on the target database. Mage also supports sub-setting at the table level
Mage creates real-like data in the place of original data. The dependencies will stay in-tact, and the masked data will be useable for application and testing purposes.
Mage masking methods preserves application and data integrity. The masked data will be realistic and will preserve the format and characteristics of the source data. Users (Application/QA Testers) will find the data useable for application development and testing purpose and will not be able to differentiate between production or masked data.
Mage provides various tokenization methods out of the box. We provide either length preserving tokens or fixed tokens.
The product supports the technique of pseudonymization - i.e., the replacement of direct identifiers within a data record by one or more artificial identifiers, or pseudonyms.
Along with multiple recognitions received from renowned research houses for the range of products offered as part of the Mage Test Data Management solution, Mage Data is also one of only two vendors to be named as the Gartner Peer Insights Customers' Choice for Data Masking consecutively for the last 3 years, receiving the highest overall score among all participating vendors.
Mage provides a ‘Test Run’ option to verify the masking validations. After a successful validation of the test run, the actual run can be executed. Mage also provides a detailed log of the entire masking activities.
Cross Border Data Sharing
Frequently Asked Questions (FAQs)
The European Union currently has a cross-border data protection law called the General Data Protection Regulation, which replaces the EU’s Data Protection Directive 95/46/EC.
The regulation addresses the transfer of personal data to locations outside the EU or EEA (European Economic Area). The transfer of citizens’ information to recipients will be generally prohibited unless:
The jurisdiction in which the recipient is located has an adequate level of data protection
Data exporters have data-protection safeguards in place
An exemption exists to the prohibition
Companies, data processors, and cloud service providers must all comply with legally transferring data out of the EEA.
According to the U.S. Department of Health and Human Services (HHS), the HIPAA Privacy Rule, or Standards for Privacy of Individually Identifiable Health Information, establishes national standards for the protection of certain health information. Additionally, the Security Rule establishes a national set of security standards for protecting specific health information that is held or transferred in electronic form.
The Security Rule operationalizes the Privacy Rule’s protections by addressing the technical and nontechnical safeguards that covered entities must put in place to secure individuals’ electronic PHI (e-PHI). Within HHS, the Office for Civil Rights (OCR) is responsible for enforcing the Privacy and Security Rules with voluntary compliance activities and civil money penalties.
Mage Sensitive Data Discovery and Data anonymization happens at the target data store, which ensures that no sensitive data leaves the target database. Mage does not in any way impact the customer’s encrypted communications.
Only metadata is transferred to the Mage engine and it uses secure communications like JDBC. The entire Mage application uses secure connections like HTTPS and the application has gone through extensive penetration testing by a top swiss bank to identify data leakage. The solution has passed all penetration testing and has the approval of Swiss legal forum for offshoring production data in the top swiss bank.
Mage’s patented sensitive data discovery solution identifies the locations of all sensitive data even in hard to find places like complex columns, free text fields, description fields etc. with the help of sophisticated methods and pathways. The Data Discovery module does not stop with data scans but extends its capability to scan code to find sensitive data locations and users. These Mage mechanisms ensure data integrity within a dataset.
The discovery process generates metadata (sensitive data and locations) from all connected data sources. The generated metadata is stored in the Mage Engine as a centralized repository. This shared metadata architecture is used for downstream de-identification and ensures data integrity across related data sets.
Static data masking can be done in four different approaches such as:
In-place: Ensures data residency as sensitive data never leaves the database.
In-transit: Data is offloaded from Source Instance, scrambled in Mage engine, and offloaded to lower environments or back to the source database.
On-demand: Large datasets which take too much time to mask in place can be masked using this method. Data is extracted in a masked format. This helps overcoming infrastructure constraints.
As-it happens: Only incremental data is scrambled, as the data is inserted into tables. The masked data will be realistic and will preserve the format and characteristics of the source data.
Mage masking solution masks the data to provide realistic yet fictitious data by preserving the format between the de-identified and original data.
For example, “John” would be masked to “Jack” and not “Name01”. Similarly, for phone number “1234567890” would be masked to “9987654321” (ten digits) keeping the data characteristics and will pass through all the application validations.
The solution should ideally address the following requirements:
Identification and access to personal data: The solution should effectively identify personal and sensitive data stored within the organization’s IT landscape. The solution also should provide a mechanism to restrict sharing of such data based on available consent.
Centralized definition and digitization of privacy policy: The data privacy policies governing the techniques for data anonymization should be defined and digitized at a central location. The policies should be managed and maintained centrally and should follow the automated workflow request and authorization mechanism.
The digital implementation of these policies should be in line with the data privacy policies defined by organization’s data privacy committee.
Consent management: Consent management capability should be deployed at respective geographies to get consent from individual and government organizations. These consents should be stored and managed centrally by a consent management and authorization module.
Data request management: All data requests and data provisioning should be routed through an automated, distributed data management system. This would ensure the application of relevant data privacy policies and audit recordings before data is shared.
Data privacy policy implementation: The data privacy policies should be implemented and executed in each country or geography as defined by their respective regulations. For instance, a geography specific policy can contain personally identifiable attributes in addition to common sensitive attributes. After sensitive data is anonymized, the solution should also verify the result against the stored consent before sharing it with data requestors.
Data breach notification: The existing data breach notification framework should be enhanced and extended to integrate with the distributed data management system.
Incident reporting: The distributed data management solution should include a mechanism to reconcile data against consent and purpose. Any mismatch with an organization’s data policy should be reported to the respective application owner in the form of an automated notification.
Database Security
Frequently Asked Questions (FAQs)
Comprehensive monitoring with no loopholes
High availability and minimal performance impact on production systems
Lightweight monitoring of sensitive data
Integration with existing SIEM tools and no requirement for any additional hardware or special tools
Monitor authorized user access to sensitive data and alert based on user defined conditions,
- User connections - (user/terminal/IP address/connection protocols)
- User statements (Programs, Data).
- In-built alerts:
- Alert if a user processes huge volume of data (10,000 rows data)
- Alert if a user queries a sensitive column
- Alert if the number rows data processed exceeds threshold (1000 rows data)
- Alert if a user login to the database outside business hours
- Alert based on Statement type executed - Create, Drop and Truncate
- Alert if a user login to the database outside business days.
- Alert if a user makes huge number of database connections
- Alert based on data classification (Credit card, Bank account etc.)
Mage iMonitor is purpose built for monitoring sensitive data access and is unique for the following reasons:
- Monitoring is at the data source- nothing is missed including Dynamic SQL
- It does not need special high availability systems
- In memory and post transaction with no overheads.
Mage supports dynamic data de-identification through its iMask module. An authorized user/program/ conditions will see original data and an unauthorized user/program/conditions will see anonymized data. Mage iMask works at the database layer, application layer and at a proxy layer.
This is required to
1. Ensure comprehensive security at the database and application layer
2. Gives you flexibility to support different application architectures
3. Gives you options to optimize performance with minimal overheads.
For dynamic data masking defined at the database lelvel ( embedded) , there won't be any performance impact for all valid business users. These valid users will be configured to have direct access to the application data ( original data ).
All other users ( configured to see different flavours of data ), will be routed via Mage agent and data masking is performed dynamically for these connections . The performance impact for these connection depends on the masking methods leveraged ( encryption vs masking vs redaction vs tokenization ..) and the volume of data extracted. As a thumb rule, we can anticipate 5 to 7% performance impact for connections requiring anonymized data set. Example : Query performing at 100 secs throughput, may require 105 to 107 seconds for connections requiring masked output.
In static data masking, the underlying data is changed permanently and hence cannot be compared with the pre anonymized data.
In special use cases, Mage translator functionality can be used with which an authorized user can unmask a particular record for production replication process. This can help in comparison between pre and post data anonymization.
In dynamic data masking, the underlying data is not changed. The pre and post anonymized data can be compared by accessing the data as authorized user to see the real data and unauthorized user to see the anonymized data.
Mage can securely de-identify inactive sensitive data while keeping the transactional integrity of data sets.
Once the lifecycle of the sensitive data is completed, enterprises can de-identify the sensitive data to make sure there is no unnecessary exposure. For instance, if an employee has left the organization, there is no need to maintain the sensitive information to increase the risk of exposure.
Mage’s unique solution iRetire, sensitive data de-identification, tokenizes inactive sensitive data and securely de-identifies them. In case the customer wants to delete the data, Mage provides the option to securely delete the data while maintaining integrity.
Retention of rules is defined based on business and regulatory requirements. Retention period (in days) for 'archive data' and 'remove after' is required to be finalized before iRetire is executed.
Entity - An entity is a business concept, like Employees, Customers, Vendors, etc. Data minimization rules are defined based on the Entity.
For example, Vendor data can be retired after 100 days of inactivity, whereas Employee data can be retired after 1 year of exit from the company.
An Entity consists of “Driver columns” and “Connection columns”.
Driver Column - This is the column based on which the application decides if the retention rules have been met.
For example, Date_of_Exit can be set as the driver column and 100 days can be the Retention Period. In this case, the application checks for rows where an Entity (say a vendor) has exited more than 100 days prior to the current date.
Connection Column - This is the column based on which the application searches the database for sensitive data related to an entity.
For example, assume Vendor_ID is the Connection column. The application filters down the candidates for Data Minimization using the Driver column (i.e. Date_of_Exit is more than 100 days ago). Let us assume there is only one vendor who matches our criteria, Vendor_ID = 1001. The application then searches the database for all rows where Vendor_ID = 1001 and tokenizes the sensitive columns of that row.
Mage supports right to forget/data retention requirements through our iRetire module. Our data minimization product helps customerde-identify or delete the specific record who has exercised Right to be forgotten or erasure. This is a separate module that customerwill need to purchase.
The complete automation of Data subject rights automation includes information the end customer on what data customerhad on their systems to customerusers on what data has been de-identified or retired. customercan create retention rules based on geography and comply with regulations.
Mage uses standard AES -256 based algorithms to tokenize the data while retaining the transactional information for the retention period and deletes the record post the retention period.
Mage masking solution masks the data to provide realistic yet fictitious data by preserving the format between the de-identified and original data.
For example, “John” would be masked to “Jack” and not “Name01”. Similarly, for phone number “1234567890” would be masked to “9987654321” (ten digits) keeping the data characteristics and will pass through all the application validations.
Mage masking solution masks the data to provide realistic yet fictitious data by preserving the format between the de-identified and original data.
For example, “John” would be masked to “Jack” and not “Name01”. Similarly, for phone number “1234567890” would be masked to “9987654321” (ten digits) keeping the data characteristics and will pass through all the application validations.
Do you have 30 minutes?
Find out how you can take control
of your data with our data security expert.