Mage Data

Category: Blogs – Privacy Enhancing Techniques

  • Reimagining Test Data: Secure-by-Design Database Virtualization

    Reimagining Test Data: Secure-by-Design Database Virtualization

    Enterprises today are operating in an era of unprecedented data velocity and complexity. The demand for rapid software delivery, continuous testing, and seamless data availability has never been greater. At the same time, organizations face growing scrutiny from regulators, customers, and auditors to safeguard sensitive data across every environment—production, test, or development.

    This dual mandate of speed and security is reshaping enterprise data strategies. As hybrid and multi-cloud infrastructures expand, teams struggle to provision synchronized, compliant, and cost-efficient test environments fast enough to keep up with DevOps cycles. The challenge lies not only in how fast data can move, but in how securely it can be replicated, masked, and managed.

    Database virtualization was designed to solve two of the biggest challenges in Test Data Management—time and cost. Instead of creating multiple full physical copies of production databases, virtualization allows teams to provision lightweight, reusable database instances that share a common data image. This drastically reduces storage requirements and accelerates environment creation, enabling developers and QA teams to work in parallel without waiting for lengthy data refresh cycles. By abstracting data from its underlying infrastructure, database virtualization improves agility, simplifies DevOps workflows, and enhances scalability across hybrid and multi-cloud environments. In short, it brings speed and efficiency to an otherwise resource-heavy process—freeing enterprises to innovate faster.

    Database virtualization was introduced to address inefficiencies in provisioning and environment management. It promised faster test data creation by abstracting databases from their underlying infrastructure. But for many enterprises, traditional approaches have failed to evolve alongside modern data governance and privacy demands.

    Typical pain points include:

    • Storage-Heavy Architectures: Conventional virtualization still relies on partial or full data copies, consuming vast amounts of storage.
    • Slow, Manual Refresh Cycles: Database provisioning often depends on DBAs, leading to delays, inconsistent refreshes, and limited automation.
    • Fragmented Data Privacy Controls: Sensitive data frequently leaves production unprotected, exposing organizations to compliance violations.
    • Limited Integration: Many solutions don’t integrate natively with CI/CD or hybrid infrastructures, making automated delivery pipelines cumbersome.
    • Rising Infrastructure Costs: With exponential data growth, managing physical and virtual copies across clouds and data centers drives up operational expenses.

    The result is an environment that might be faster than before—but still insecure, complex, and costly. To thrive in the AI and automation era, enterprises need secure-by-design virtualization that embeds compliance and efficiency at its core.

    Modern data-driven enterprises require database virtualization that does more than accelerate. It must automate security, enforce privacy, and scale seamlessly across any infrastructure—cloud, hybrid, or on-premises.

    This is where Mage Data’s Database Virtualization (DBV) sets a new benchmark. Unlike traditional tools that treat masking and governance as secondary layers, Mage Data Database Virtualization builds them directly into the virtualization process. Every virtual database created is masked, compliant, and policy-governed by default—ensuring that sensitive information never leaves production unprotected.

    Database Virtualization lightweight, flexible architecture enables teams to provision virtual databases in minutes, without duplicating full datasets or requiring specialized hardware. It’s a unified solution that accelerates innovation while maintaining uncompromising data privacy and compliance.

    1. Instant, Secure Provisioning
      Create lightweight, refreshable copies of production databases on demand. Developers and QA teams can access ready-to-use environments instantly, reducing cycle times from days to minutes.
    2. Built-In Data Privacy and Compliance
      Policy-driven masking ensures that sensitive data remains protected during every clone or refresh. Mage Data Database Virtualization is compliance-ready with frameworks like GDPR, HIPAA, and PCI-DSS, ensuring enterprises maintain regulatory integrity across all environments.
    3. Lightweight, Flexible Architecture
      With no proprietary dependencies or hardware requirements, Database Virtualization integrates effortlessly into existing IT ecosystems. It supports on-premises, cloud, and hybrid infrastructures, enabling consistent management across environments.
    4. CI/CD and DevOps Integration
      DBV integrates natively with Jenkins, GitHub Actions, and other automation tools, empowering continuous provisioning within DevOps pipelines.
    5. Cost and Operational Efficiency
      By eliminating full physical copies, enterprises achieve up to 99% storage savings and dramatically reduce infrastructure, cooling, and licensing costs. Automated refreshes and rollbacks further cut
      manual DBA effort.
    6. Time Travel and Branching (Planned)
      Upcoming capabilities will allow enterprises to rewind databases or create parallel branches, enabling faster debugging and parallel testing workflows.

    The AI-driven enterprise depends on speed—but the right kind of speed: one that doesn’t compromise security or compliance. Mage Data Database Virtualization delivers precisely that. By uniting instant provisioning, storage efficiency, and embedded privacy, it transforms database virtualization from a performance tool into a strategic enabler of governance, innovation, and trust.

    As enterprises evolve to meet the demands of accelerating development, they must modernize their entire approach to data handling—adapting for an AI era where agility, accountability, and assurance must coexist seamlessly.

    Mage Data’s Database Virtualization stands out as the foundation for secure digital transformation—enabling enterprises to accelerate innovation while ensuring privacy and compliance by design.

  • Building Trust in AI: Strengthening Data Protection with Mage Data

    Building Trust in AI: Strengthening Data Protection with Mage Data

    Artificial Intelligence is transforming how organizations analyze, process, and leverage data. Yet, with this transformation comes a new level of responsibility. AI systems depend on vast amounts of sensitive information — personal data, intellectual property, and proprietary business assets — all of which must be handled securely and ethically.

    Across industries, organizations are facing a growing challenge: how to innovate responsibly without compromising privacy or compliance. The European Commission’s General-Purpose AI Code of Practice (GPAI Code), developed under the EU AI Act, provides a structured framework for achieving this balance. It defines clear obligations for AI model providers under Articles 53 and 55, focusing on three key pillars — Safety and Security, Copyright Compliance, and Transparency.

    However, implementing these requirements within complex data ecosystems is not simple. Traditional compliance approaches often rely on manual audits, disjointed tools, and lengthy implementation cycles. Enterprises need a scalable, automated, and auditable framework that bridges the gap between regulatory expectations and real-world data management practices.

    Mage Data Solutions provides that bridge. Its unified data protection platform enables organizations to operate compliance efficiently — automating discovery, masking, monitoring, and lifecycle governance — while maintaining data utility and accelerating AI innovation.

    The GPAI Code establishes a practical model for aligning AI system development with responsible data governance. It is centered around three pillars that define how providers must build and manage AI systems.

    1. Safety and Security
      Organizations must assess and mitigate systemic risks, secure AI model parameters through encryption, protect against insider threats, and enforce multi-factor authentication across access points.
    2. Copyright Compliance
      Data sources used in AI training must respect intellectual property rights, including automated compliance with robots.txt directives and digital rights management. Systems must prevent the generation of copyrighted content.
    3. Transparency and Documentation
      Providers must document their data governance frameworks, model training methods, and decision-making logic. This transparency ensures accountability and allows regulators and stakeholders to verify compliance.

    These pillars form the foundation of the EU’s AI governance model. For enterprises, they serve as both a compliance obligation and a blueprint for building AI systems that are ethical, explainable, and secure.

    Mage Data’s platform directly maps its data protection capabilities to the GPAI Code’s requirements, allowing organizations to implement compliance controls across the full AI lifecycle — from data ingestion to production monitoring.

    GPAI Requirement

    Mage Data Capability

    Compliance Outcome

    Safety & Security (Article 53)

    Sensitive Data Discovery

    Automatically identifies and classifies sensitive information across structured and unstructured datasets, ensuring visibility into data sources before training begins.

    Safety & Security (Article 53)

    Static Data Masking (SDM)

    Anonymizes training data using over 60 proven masking techniques, ensuring AI models are trained on de-identified yet fully functional datasets.

    Safety & Security (Article 53)

    Dynamic Data Masking (DDM)

    Enforces real-time, role-based access controls in production systems, aligning with Zero Trust security principles and protecting live data during AI operations.

    Copyright Compliance (Article 55)

    Data Lifecycle Management

    Automates data retention, archival, and deletion processes, ensuring compliance with intellectual property and “right to be forgotten” requirements.

    Transparency & Documentation (Article 55)

    Database Activity Monitoring

    Tracks every access to sensitive data, generates audit-ready logs, and produces compliance reports for regulatory or internal review.

    Transparency & Accountability

    Unified Compliance Dashboard

    Provides centralized oversight for CISOs, compliance teams, and DPOs to manage policies, monitor controls, and evidence compliance in real time.

    By aligning these modules to the AI Code’s compliance pillars, Mage Data helps enterprises demonstrate accountability, ensure privacy, and maintain operational efficiency.

    Mage Data enables enterprises to transform data protection from a compliance requirement into a strategic capability. The platform’s architecture supports high-scale, multi-environment deployments while maintaining governance consistency across systems.

    Key advantages include:

    • Accelerated Compliance: Achieve AI Act alignment faster than traditional, fragmented methods.
    • Integrated Governance: Replace multiple point solutions with a unified, policy-driven platform.
    • Reduced Risk: Automated workflows minimize human error and prevent data exposure.
    • Proven Scalability: Secures over 2.5 billion data rows and processes millions of sensitive transactions daily.
    • Regulatory Readiness: Preconfigured for GDPR, CCPA, HIPAA, PCI-DSS, and EU AI Act compliance.

    This integrated approach enables security and compliance leaders to build AI systems that are both trustworthy and operationally efficient — ensuring every stage of the data lifecycle is protected and auditable.

    Mage Data provides a clear, step-by-step plan:

    This structured approach takes the guesswork out of compliance and ensures organizations are always audit-ready

    The deadlines for AI Act compliance are approaching quickly. Delaying compliance not only increases costs but also exposes organizations to risks such as:

    • Regulatory penalties that impact global revenue.
    • Data breaches harm brand trust.
    • Missed opportunities, as competitors who comply early gain a reputation for trustworthy, responsible AI.

    By starting today, enterprises can turn compliance from a burden into a competitive advantage.

    The General-Purpose AI Code of Practice sets high standards but meeting them doesn’t have to be slow or costly. With Mage Data’s proven platform, organizations can achieve compliance in weeks, not years — all while protecting sensitive data, reducing risks, and supporting innovation.

    AI is the future. With Mage Data, enterprises can embrace it responsibly, securely, and confidently.

    Ready to get started? Contact Mage Data for a free compliance assessment and see how we can help your organization stay ahead of the curve.

  • What is Homomorphic Encryption and How It’s Used

    What is Homomorphic Encryption and How It’s Used

    Most data encryption is for data that is either at rest or in transit. Most security experts do not consider encryption a viable option for data in use because it’s hard to process and analyze encrypted data. As the need for privacy and security increases, however, there is a perceived need to encrypt data even when it is in use. To encrypt data and use it at the same time is not an easy task. Enter homomorphic encryption.

    What Is Homomorphic Encryption?

    Homomorphic encryption is an emerging type of encryption that allows users or systems to perform operations using encrypted data (without decrypting it first). The result of the operation is also encrypted. Once the result is decrypted, however, it will be exactly the same as it would have been were it computed with the unencrypted data.

    When Should Homomorphic Encryption Be Used?

    Thanks to homomorphic encryption, organizations are able to use cloud computing in external environments while keeping the data there encrypted the entire time. That is, third parties can handle sensitive data without compromising the security or privacy of that data. If the third party becomes compromised in any way, the data will still be secure, because it is never decrypted while it is with the third party.

    Before, it was impossible to outsource certain data processing tasks because of privacy concerns. Because it was necessary to decrypt data to perform computations, the data would be exposed while in use. Homomorphic encryption addresses those concerns. This is a game changer for organizations in a wide variety of industries.

    For example, homomorphic encryption allows healthcare providers to outsource private medical data for computation and analysis. The benefits of homomorphic encryption are not limited to healthcare. As regulations like GDPR become more common and more strict, it becomes crucial to protect personal data at all times, even while performing data analysis on it.

    Is Homomorphic Encryption Practical?

    Homomorphic encryption has been theoretically possible for a long time. The first fully homomorphic encryption schemes are already more than 10 years old. The problem is that the process requires an immense amount of computing power. The herculean effort that goes into this particular type of encryption has prevented it from becoming a viable option for most organizations.

    Now, though, an immense amount of computing power is not as hard to come by as it used to be. We are still not seeing much homomorphic encryption adoption just yet, but more organizations are taking interest.

    Expect to see it become a hot new opportunity in cybersecurity circles as homomorphic encryption becomes more necessary and more attainable at the same time. (The increased necessity is because of strict new rules for data privacy, while the increased attainability is from the exponential growth of computing power).

    Partially Homomorphic vs. Fully Homomorphic Encryption

    There are multiple types of homomorphic encryption schemes. At two ends of the spectrum, cybersecurity experts classify these schemes as partially homomorphic or fully homomorphic. As this type of encryption becomes more viable, people are finding new ways to classify it, introducing new categories between partially and fully homomorphic.

    Currently, we talk about homomorphic encryption in the following ways:

    • Partially Homomorphic Encryption – The lowest level, only supports one type of evaluation (such as multiplication, division, subtraction, addition, etc.)
    • Somewhat Homomorphic Encryption – Supports any type of evaluation, but only for a specific number of ciphertexts
    • Fully Homomorphic Encryption – Supports an infinite amount of computations on any amount of ciphertexts

    As applications of homomorphic encryption become more plausible, expect to see greater nuance emerge. We will see pros and cons of homomorphic encryption that may not be apparent until there are more case studies.

    Potential Vulnerabilities of Homomorphic Encryption

    In March 2022, academics from North Carolina State University and Dokuz Eylul University collaborated to identify a vulnerability in homomorphic encryption. Specifically, researchers showed they could steal data during homomorphic encryption by using a side-channel attack.

    “We were not able to crack homomorphic encryption using mathematical tools,” said Aydin Aysu, an assistant professor of computer engineering at North Carolina State University. “Instead, we used side-channel attacks. Basically, by monitoring power consumption in a device that is encoding data for homomorphic encryption, we are able to read the data as it is being encrypted. This demonstrates that even next generation encryption technologies need protection against side-channel attacks.”

    Is Homomorphic Encryption Safe?

    Before this study scares you away from the potential of homomorphic encryption, it is worth noting a few things:

    1. The vulnerability discovered was only in Microsoft SEAL, an open-source implementation of homomorphic encryption technology.
    2. The researchers were studying versions of Microsoft SEAL released before December 3, 2020. Later versions of the product have replaced the algorithm that created the vulnerability.
    3. The academics did not conclude that this type of homomorphic encryption was entirely unsafe, only that it needed protection from side-channel attacks. And there are established ways to protect against side-channel attacks.

    Does this mean modern homomorphic encryption is necessarily impermeable? No. However, the results of this study are not cause for excessive concern. One big takeaway is that the vulnerability in software from 2020 was not discovered until 2022, when newer versions had already corrected the problem. With commitment to an evolving cybersecurity plan, companies can stay a step ahead of hackers (and academic researchers).

    Assistant Professor Aysu seems confident about the future of homomorphic encryption, as long as organizations also take additional precautions. “As homomorphic encryption moves forward, we need to ensure that we are also incorporating tools and techniques to protect against side-channel attacks,” he says.

    How to Use Homomorphic Encryption

    There are multiple open source homomorphic encryption libraries, and Microsoft SEAL is the most common. It was developed by the Microsoft Research Cryptography Research Group. More cybersecurity experts are becoming interested in homomorphic encryption, and it is getting faster.

    For now, though, it still is not the best option for most organizations. Upon comparing the differences between encryption, tokenization, and masking, most find that masking is currently the best option for data in use.

    Related Blogs

    The Comparative Advantages of Encryption vs. Tokenization vs. Masking

  • The Comparative Advantages of Encryption vs. Tokenization vs. Masking

    The Comparative Advantages of Encryption vs. Tokenization vs. Masking

    Any company that handles data (especially any company that handles personal data) will need a method for de-identifying (anonymizing) that data. Any technology for doing so will involve trade-offs. The various methods of de-identification—encryption, tokenization, and masking—will navigate those trade-offs differently.

    This fact has two important consequences. First, the decision of which method to use, and when, has to be made carefully. One must take into consideration the trade-offs between (for example) performance and usability. Second, companies that traffic in data all the time will want a security solution that provides all three options, allowing the organization to tailor their security solution to each use case.

    We’ve previously discussed some of the main differences among encryption, tokenization, and masking; the next step is to look more closely at these trade-offs and the subsequent use cases for each type of anonymization.

    The Security Trade-Off Triangle

    Three of the main qualities needed in a data anonymization solution are security, usability, and performance. We can think of these as forming a triangle; as one gets closer to any one quality, one is likely going to have to trade off the other two.

    Security (Data Re-Identification)

    Security is, of course, the main reason for anonymizing data in the first place. The way in which the various methods differ is in the ease with which data can be de-anonymized—that is, how easy it is for a third party to take a data set and re-identify the items in that set.

    A great example of such re-identification came from a news story several years ago, where data from a New York-based cab company was released according to the Freedom of Information Act. That data, which included over 173 million individual trips and information about the cab driver, had been anonymized using a common technique called hashing. A third party was able to prove that the data could be very easily re-identified—and with a little work, a clever hacker could even infer things like individual cab drivers’ salaries and where they lived.

    A good way to measure the relative security of a process like encryption, tokenization, or masking, then, is to assess how difficult re-identification of the data would be.

    Usability (Analytics)

    The more that a bit of data can be changed, the less risk there is for re-identification. But this also means that the pieces of data lose any kind of relationship to each other, and hence any pattern. The more the pattern is lost, the less useful that data is when doing analysis.

    Take a standard 9-digit Social Security number, for example. We could replace each digit with a single character, say XXXXXXXX or 999999999. This is highly secure, but a database full of Xs will not reveal any useful patterns. In fact, it won’t even be clear that the data are numeric.

    Now consider the other extreme, where we simply increase a single digit by 1. Thus, the Social Security number 987 65 4321 becomes 987 65 4322. In this case, much of the information is preserved. Each unique Social Security number in the database will preserve its relations with other numbers and other pieces of data. The downside is that the algorithm is easily cracked, and the data becomes easily reversible.

    This is a problem for non-production environments, too. Sure, one can obtain test data using pseudo values generated by algorithms. But even in testing environments, one often needs a large volume of data that has the same complexity of real-world data. Pseudo data simply does not have that kind of complexity.

    Performance

    Security happens in the real world, not on paper. Any step added to a data process requires compute time and storage. It is easy for such costs to add up. Having many servers running to handle encryption, for example, will quickly become costly if encryption is being used for every piece of data sent.

    How Do Encryptions, Tokenization, and Masking Compare?

    Again, setting the technical details aside for the moment, the major differences among these methods is the way in which they navigate the trade-offs in this triangle.

    Encryption

    Encryption is best suited for unstructured fields (though it also supports structured), or for databases that aren’t stored in multiple systems. It is also commonly used for protecting files and exchanging data with third parties.

    With encryption, performance varies depending on the time it takes to establish a TCP connection, plus the time for requesting and getting a response from the server. If these connections are being made in the same data center, or to servers that are very responsive, performance will not seem that bad. Performance will degrade, however, if the servers are remote, unresponsive, or simply busy handling a large number of requests.

    Thus, while encryption is a very good method for security of more sensitive information, performance can be an issue if you try to use encryption for all your data.

    Tokenization

    Tokenization is similar to encryption, except that the data in question is replaced by a random string of values (a token) instead of modified by an algorithm. The relationship between the token and original data is preserved in a table on a secure server. When the original data is needed, the application looks up the original relationship between the token and the original data.

    Tokenization always preserves the format of the data, which helps with usability, while maintaining high security. It also tends to create less of a performance hit compared to encryption, though scaling can be an issue if the size of the lookup table becomes too large. And unlike encryption, sharing data with outside parties is tricky because they, too, would need access to the same table.

    Masking

    There are different types of masking, so it is hard to generalize across all of them. One of the more sophisticated approaches to masking is to replace data with pseudo data that nevertheless retains many aspects of the original data, so as to preserve its analytical value without much risk of re-identification.

    When done this way, masking tends to require fewer resources than encryptions, but retains the highest data usability.

    Choosing on a Case-by-Case Basis

    So which method is appropriate for a given organization? That depends, of course, on the needs of the organization, the resources available, and the sensitivity of the data in question. But there need not be a single answer; the method used might vary depending on the specific use case.

    For example, consider a simple email system residing on internal on-premise servers. Encryption might be appropriate for this use as the data are unstructured, the servers are nearby and dedicated for this use, and the need for security might well be high for some communications.

    But now consider an application in a testing environment that will need a large amount of “real-world-like” data. In this case, usability and performance are much more important, and so masking would make more sense.

    And all of this might change if, for example, you find yourself having to undergo a cloud migration.

    The way forward for larger organizations with many and various needs, then, is to find a vendor that can provide all three and help with applying the right techniques in the right circumstances. Here at Mage Data, we aim to gain an understanding of our clients’ data, its characteristics, and its use, so we can help them protect that data appropriately. For more about our anonymization and other security solutions, you can download a data sheet here.

  • Differences between Anonymization and Pseudonymization

    Differences between Anonymization and Pseudonymization

    Under the umbrella of various data protection methods are anonymization and pseudonymization. More often than not, these terms are used interchangeably. But with the introduction of laws such as the GDPR, it becomes necessary to be able to distinguish both techniques clearly as anonymized data and pseudonymized data fall under different categories of the regulation. Moreover, this knowledge also helps organizations make an informed choice in the selection of data protection methods.

    So, let’s break it down. Anonymization is the permanent replacement of sensitive data with unrelated characters, which means that data, once anonymized, cannot be re-identified, wherein lies the difference between both methods. In pseudonymization, the sensitive data is replaced in such a way that it can be re-identified with the help of an identifier (additional information). In short, while anonymization eliminates direct re-identification risk, pseudonymization substitutes the identifiable data with a reversible, consistent value.

    However, it is essential to note that anonymization may sometimes carry the risk of indirect re-identification. For example, let’s say you picked up the novel The Open Window. The author’s name on the book is Saki. But this is a pen name. If you were to pick up another book of his, called The Chronicles of Clovis, you would notice that he has used his real name there, which is H. H. Munro, and that the writing style was similar. Hence, even though you didn’t know that the book was by Munro, you could put two and two together and find out that this is also a book by Saki based on the style of writing.

    The same example could also apply to a shopping experience, where you may not know the name of the customer who made the purchase but may be able to find out who it is if you can identify that this customer has had a constant buying behavior. Every day for the past one year Alex has visited Starbucks at 1500, Broadway at 10:10 am and ordered the same Tall Mocha Frappuccino. Hence, even if his personally identifiable information, such as name, address, etc., has been anonymized or eliminated, his buying behavior still allows you to re-identify him. Therefore, organizations should be meticulous when they anonymize sensitive data, careful to hide any additional information that might aid re-identification.

    There are a variety of methods available to anonymize data, such as directory replacement (modifying the individual’s name while maintaining consistency between values), scrambling (obfuscation; the process can sometimes be reversible), masking (hiding a part of the data with random characters; for example, pseudonymization with identities), personalized anonymization (custom anonymization) and blurring (make meaning of data values obsolete or re-identification of data values impossible). Pseudonymization methods include data encryption (change original data into a ciphertext; can be reversed with a decryption key) and data masking (masking of data while maintaining its usability for different functions). Organizations can select one or more techniques depending on the degree of risk and the intended use of the data.

    Mage Data approaches anonymization and pseudonymization with its leading-edge solutions, named Customers’ Choice 2020 by Gartner Peer Insights. To read more, visit Mage Data Static Data Masking and Mage Data Dynamic Data Masking.