Mage Data

Category: Blogs – TDM

  • Data Security Challenges in Healthcare Industry

    Data Security Challenges in Healthcare Industry

    The healthcare industry is constantly innovating, and has made significant improvements in technology over the last couple of years to enhance patient treatment. For example, the use of AI has changed the game for hospitals around the world, helping physicians make smarter decisions at the point of care, improving the ease and accuracy of viewing patient scans and reducing physician burnout.1

    As more and more of the healthcare sector progress in a digital fashion, unfortunately, it becomes a tempting site of attack for cybercriminals. Just like any other industry, the healthcare industry has faced its share of Mega Data breaches, such as the breaches of Banner Health and Newkirk Products, where close to 4 million people were affected.2 But unlike other industries, the healthcare industry faces the highest data breach cost of $6.45 million. What’s even more alarming is that, at 329 days, it also has the highest duration for identifying and containing a breach.3

    Even with laws like the HIPAA which mandate strict standards and processes for the protection and confidential handling of PHI, compliance doesn’t ensure one hundred percent data security, and hence isn’t solely enough to protect hospitals from cybercrime.

    Challenges

    Let’s go through some of the major data security challenges faced by medical institutions:

    • Transfer of Electronic Health Records (EHRs)

    The Health Information Technology for Economic and Clinical Health (HITECH) Act encourages healthcare providers to adopt EHRs and Health Information Exchanges (HIEs) so doctors can easily share data with their patients. However, this network of limitless medical information between numerous providers serves as a hotspot for hackers if not protected properly.

    • Maintaining compliance

    The HITECH Act offers incentives for EHR and HIE adoption. Having said that, it also creates the responsibility of having to maintain compliance. For instance, healthcare providers are required to notify their patients if there’s a breach of their unsecured data. In addition, healthcare institutions also have to comply with laws like the HIPAA, and other data protection regulations like the GDPR or the CCPA, whichever applies to them.

    • Inability of end-user to protect medical information

    Apart from medical providers having to maintain compliance, the adoption of EHRs also poses a burden in terms of end user errors. Once the user accesses his medical data from the provider’s portal, the privacy of his records is also his responsibility. By sending unsecured data across to anyone else, the user opens up an easy link for hackers to get through. While healthcare organizations are bound by data security laws, the same cannot be said for users, who often as an oversight do not keep up with data security best practices.

    • The adoption of digital platforms to store, access and transfer data

    The digital progression is very evident as greater number of hospitals move their resources to the cloud and to mobile platforms. The COVID-19 pandemic has also fundamentally changed the face of care provision across the world. Telehealth adoption in the US, for instance, has grown around 3,000% since the start of the crisis, taking much of primary care to people’s homes rather than being necessarily tied to a doctor’s office or hospital.4

    • Inefficient IT infrastructure

    Nobody said running a hospital would be cost efficient. In an episode of one of my favourite TV shows (Grey’s Anatomy), the chief of the hospital decides to cut back on fundamental necessities for the hospital since that money went to expensive medical tech. Sadly, this is true for hospitals in the real world too. While spending adequate money for something like IT infrastructure may seem like a tough decision or unimportant considered to all the other crucial activities that go on in a healthcare organization, it is better than facing the cost of a data breach.

    • Evolution of technology vis a vis the threat landscape

    As the healthcare sector continues to offer life-critical services while working to improve treatment and patient care with new technologies, criminals and cyber threat actors look to exploit the vulnerabilities that are coupled with these changes. Apart from data breaches, the following are some of the sources of frustration for healthcare IT and cybersecurity specialists:5

    • Ransomware
    • DDoS attacks
    • Insider threats
    • Business email compromise
    • Fraud scams

    As healthcare institutions keep enhancing their technology, they’re incidentally open to cyber risk exposure. The COVID-19 outbreak has also not provided any relief in this matter. The INTERPOL Cybercrime Threat Response team findings have detected a significant increase in the number of attempted ransomware attacks against key organizations and infrastructure engaged in the virus response. 6

    Solutions

    Technology is largely the cause for cybercrime, but technology is also what is needed to thwart it. People and processes are not enough; organizations should implement the right technology in place to build a strong data security posture.

    • Monitor user activity for all actions performed on sensitive data in your enterprise.
    • Choose from different methods or select a combination of techniques such as encryption, tokenization, static, and Dynamic Data Masking to secure your data, whether it’s at rest, in use, or in motion. Before this step, sensitive data discovery is a must, because if you don’t know where your data is, how will you protect it?
    • Deploy consistent and flexible data security approaches that protect sensitive data in high-risk applications without compromising the application architecture.
    • Your data security platform should be scalable and well-integrated, which is consistent across all data sources and span both production and non-production environments.
    • Finally, ensure the technology you’re implementing is well-integrated with existing data protection tools for efficient compliance reporting and breach notifications.

    Conclusion

    Cybercrime is a menacing threat for any industry, but more so for the healthcare sector, given the high cost of data breaches and the long duration it takes to identify a breach. The outcome of information theft is too great a risk, especially due to the ethical commitment medical providers share with their patients. Building a robust data security platform should be a principal goal of any hospital.

    About Mage Data

    The Mage Data platform comprises a comprehensive solution that protects sensitive data along its lifecycle in the customer’s systems - providing capabilities from Sensitive Data Discovery, masking, and monitoring to data retirement. Engineered with unique, scalable architecture and built-in separation of duties, it delivers comprehensive, consistent, and reliable data and application security across various data sources (mainframe, relational databases, unstructured data, big data, on-premise, and cloud).

    How a leading healthcare company in the US is effectively handling data security

    A leading provider of hospital medicine and related facility-based services had an Oracle environment, storing information for more than 2,000 providers in 1,500 facilities. Due to the time required to manage the Oracle data masking tool that had been in place for two and a half years, they looked at the market for a data masking solution that would have ease of use and full automation.

    The organization noted several advantages to using the Mage Data Platform instead of Oracle DM, one of the main advantages being the time required to implement and run the software. Apart from a fully automated anonymization solution, the organization was also able to discover many hidden sensitive data locations with the Mage Data sensitive data discovery tool.
    Click here to read more: Mage Data Customer Success Stories

    References

    1 Cleveland Clinic Newsroom – Cleveland Clinic Unveils Top 10 medical Innovations for 2019
    2 Digital Guardian Data Insider – Top 10 Biggest Healthcare Data Breaches of All Time
    3 Ponemon Institute – Cost of a Data Breach Report, 2019
    4 Healthcare IT News – Digital transformation in the time of COVID-19
    5 Center for Internet Security (CIS) – Cyber Attacks: In the Healthcare Sector
    6 Forbes – Cyber Attacks Against Hospitals Have ‘Significantly Increased’ As Hackers Seek To Maximize Profits
    ss_entry-meta=”margin-top: -20px;” border_radii=”on|6px|6px|6px|6px” box_shadow_style=”preset1″ box_shadow_spread=”-6px”]

  • Test Data Management Best Practices

    Test Data Management Best Practices

    Do your test data environments put Production data at risk of exposure?

    Since test data environments usually require real-world data to tackle complex issues, issues that may not be replicated with fake data, they present one of the most significant security risks to sensitive data. Credentials may not be as secure as for Production, and access may not be as stringently monitored. There’s too much access in some cases. And unauthorized access can reveal troves of production data or other information that can provide a foothold to greater access to protected data or systems.

    So how do we enable effective Test Data Management while minimizing risk?

    First of all, everyone likely agrees that access should be on the principle of least privilege (limited access to the test environment, and nothing else). Combine that with two-factor authentication as a second line of defense. So far, no problem.

    Second, don’t use real data (or mask it if you can’t avoid it).

    You have some useful options to minimize the risks of loading real data into a test data environment. Both data subsetting and data virtualization minimize risks while enabling efficiency. Using test data generation enables you to avoid loading real data altogether, and finally, data masking allows you to protect the real data. Let’s take a look at these options.

    • Data subsetting consists of taking a subset (usually of a much smaller size as a whole) from one or more production databases. This small size is a significant advantage since it makes both test data distribution and testing much faster than a complete database clone. There are some challenges with this approach. For example, you must have a way of ensuring that your subset is representative of your entire dataset, and it must be referentially intact. And it still exposes Production data.
    • Data virtualization has a similar motivation to data subsetting, at its core: take large production databases and make them efficient to distribute and test. However, data subsetting does this by reducing the amount of data; virtualization allows data stored in different types of data models, which are integrated virtually. It doesn’t replicate data from source systems, but only stores the integration logic for viewing. So, there’s still some risk in this method.
    • Manual test data generation can be a tedious and time-consuming process; additionally, it can be difficult to manually ensure that all attributes are present in the data to make it “testable.”
    • Finally, synthetic data generation breaks with data subsetting and data virtualization by opting to disregard your production data for use as test data. Instead, it allows you to create your own “synthetic” test data. This test data will look real – and will be representative of your production data – while, at the same time, being entirely fake. The biggest obstacle is how to achieve this, making sure your test data covers a range of relevant test cases. A secondary concern is how avoiding making the process so laborious that it loses any benefit over the manual creation of test data.

    Each of these options has a drawback that, when you are looking to just get the job done, may mean loading real (production) data in your test data environment. And even with data subsetting and data virtualization, you will be distributing and exposing significant quantities of production data to your testers and leaving it exposed to unauthorized access.

    Anonymizing the data is the gold standard in these cases. To make anonymization (masking) successful, these key considerations must be kept in mind:

    1. Sensitive data discovery: apply a comprehensive discovery solution to find all of the data that needs to be masked.
    2. Referential Integrity: ensure consistency and functionality of data instances during roll-out and consistent masking of the data itself across applications and databases.
    3. Data for testing: developers and testers DO NOT need to see the real data. What they do require, however, is realistic data, which preserves formats and passes validations.
    4. Efficiency: to ensure efficiency in the masking process, consider performance constraints, security policies, and environmental limitations.

    A note of warning: home-grown scripts for data masking are the path of least resistance but are not the most effective — they generally do not eliminate sensitive data and, worse, can cause inconsistency in masking rollouts.

    Conclusion

    Unless you are using synthetically generated data, you will need to a) find and b) anonymize any sensitive information within your test data before distributing it to your testers. This is usually achieved via comprehensive data Discovery and Static Data Masking capabilities, respectively. Dynamic Data Masking and encryption may also be used as ancillary capabilities to complete the toolkit. There’s no reason to expose data, even in subsets, when anonymization can create a realistic and useful test data environment.

    About Mage Data

    Mage Data Test Data Management solution includes integrated and comprehensive Discovery, Static, and Dynamic Data Masking solutions, along with a data subsetting option. Additionally, with Mage Data Identities to you can create generalized data sets from internal or external data sources, a process that is a lot more efficient and functionally capable than synthetic data generation. To read more about the Test Data Management market and vendors, download the Bloor TDM market update.

  • Differences between Anonymization and Pseudonymization

    Differences between Anonymization and Pseudonymization

    Under the umbrella of various data protection methods are anonymization and pseudonymization. More often than not, these terms are used interchangeably. But with the introduction of laws such as the GDPR, it becomes necessary to be able to distinguish both techniques clearly as anonymized data and pseudonymized data fall under different categories of the regulation. Moreover, this knowledge also helps organizations make an informed choice in the selection of data protection methods.

    So, let’s break it down. Anonymization is the permanent replacement of sensitive data with unrelated characters, which means that data, once anonymized, cannot be re-identified, wherein lies the difference between both methods. In pseudonymization, the sensitive data is replaced in such a way that it can be re-identified with the help of an identifier (additional information). In short, while anonymization eliminates direct re-identification risk, pseudonymization substitutes the identifiable data with a reversible, consistent value.

    However, it is essential to note that anonymization may sometimes carry the risk of indirect re-identification. For example, let’s say you picked up the novel The Open Window. The author’s name on the book is Saki. But this is a pen name. If you were to pick up another book of his, called The Chronicles of Clovis, you would notice that he has used his real name there, which is H. H. Munro, and that the writing style was similar. Hence, even though you didn’t know that the book was by Munro, you could put two and two together and find out that this is also a book by Saki based on the style of writing.

    The same example could also apply to a shopping experience, where you may not know the name of the customer who made the purchase but may be able to find out who it is if you can identify that this customer has had a constant buying behavior. Every day for the past one year Alex has visited Starbucks at 1500, Broadway at 10:10 am and ordered the same Tall Mocha Frappuccino. Hence, even if his personally identifiable information, such as name, address, etc., has been anonymized or eliminated, his buying behavior still allows you to re-identify him. Therefore, organizations should be meticulous when they anonymize sensitive data, careful to hide any additional information that might aid re-identification.

    There are a variety of methods available to anonymize data, such as directory replacement (modifying the individual’s name while maintaining consistency between values), scrambling (obfuscation; the process can sometimes be reversible), masking (hiding a part of the data with random characters; for example, pseudonymization with identities), personalized anonymization (custom anonymization) and blurring (make meaning of data values obsolete or re-identification of data values impossible). Pseudonymization methods include data encryption (change original data into a ciphertext; can be reversed with a decryption key) and data masking (masking of data while maintaining its usability for different functions). Organizations can select one or more techniques depending on the degree of risk and the intended use of the data.

    Mage Data approaches anonymization and pseudonymization with its leading-edge solutions, named Customers’ Choice 2020 by Gartner Peer Insights. To read more, visit Mage Data Static Data Masking and Mage Data Dynamic Data Masking.