Data security 101: Remediating sensitive data
With the right policies, strategies, and tools, your data remediation program can keep your sensitive data compliant, secure, and useful.
What is Data Remediation?
Data is the modern organization’s most valuable asset. However, this is often only true if the data is correct and protected against unauthorized access. If customer information, intellectual property, or other sensitive data is leaked, an organization may lose its competitive advantage and incur legal and regulatory penalties.
Data remediation is the process of ensuring that the data in a company’s possession meets its needs. This includes correcting errors and omissions in the data and ensuring it is properly protected against unauthorized access or potential breach.
A data remediation policy should include plans for both ensuring the completeness and correctness of data and its security. This blog focuses on the security side of data remediation, including classifying data and ensuring that it is protected in accordance with the requirements for its classification type.
When do you need a data remediation strategy and policy?
In theory, every organization should have a data remediation policy because every company has data that should be properly managed and protected. However, certain events may make it even more important to create or update a data remediation policy, including:
- Compliance and regulatory changes: Most companies are subject to at least one law or regulation that mandates the protection of sensitive customer data. Companies may also voluntarily seek compliance with various standards, such as ISO 27001 or SOC2, that have security requirements. Suppose a new regulation implements data security requirements, data retention or deletion rules, or grants data subjects the right to correct or modify their data. In that case, it implicitly requires a company to have a data remediation policy to manage compliance with these requirements. In addition, if new laws are introduced, certification requirements change, or existing regulations are updated, then the corporate data remediation policy should be updated to ensure compliance.
- Business changes: If an organization’s operations change significantly—such as opening a new business line or acquiring a company—then this may have an impact on the types of data that the company is storing and processing. When this occurs, the data remediation policy may need to be updated to reflect new regulatory responsibilities or the need to manage different types of data.
- Incident response: If an organization suffers a data breach or other incident affecting sensitive data, this may indicate a failure of the corporate data remediation policy. If this occurs, the policy should be reviewed to determine if any additional security controls or safeguards might reduce the probability of future incidents.
Who should be involved?
A data remediation strategy should include input from all relevant stakeholders. This consists of the data owner, anyone who uses it (application developers, data scientists, etc.), and those responsible for securing it. A data remediation team should also include representatives from the legal department—due to its legal and regulatory compliance implications—and from management since policies have impacts across the entire organization.
How to prepare for the discussion?
A data remediation policy aims to ensure that an organization has complete, correct, and secure data. A good starting point for a discussion regarding data remediation is a clear understanding of the data that an organization has in its possession.
Before implementing a data remediation policy, the team should perform data discovery (our friends at BigID can help with that). Once uncovered, this data can be classified based on various factors, including:
- Sensitivity: Some data that an organization holds is more sensitive than others, either to the company or its customers. For example, a customer’s identifying information, credit card number, and login credentials are all sensitive data that require special protection.
- Regulatory Considerations: Certain types of data are protected under various regulations, which impose requirements on its collection, storage, and use. For example, the GDPR protects EU citizens’ data and has rules for how long data can be retained, how it should be secured against attack, and how data anonymization affects these requirements.
- Impacts of Disclosure: If certain types of data are breached, corrupted, or deleted, they can have a significant impact on an organization’s ability to operate. For example, a trade secret like the Coca-Cola secret recipe would be devastating if it were leaked.
Often, companies will use the government or military classification system to classify various data types. For example, marketing materials may be Unclassified since they are intended for public dissemination. High-level details about R&D projects might be Classified, while more in-depth information may be Secret. Information that could cause significant harm to the organization if disclosed—such as the Coca-Cola secret recipe—would be classified as Top Secret and protected accordingly.
When assigned to applications and data, classifications and impact levels help organizations programmatically enforce these requirements, requiring various actors (i.e., users or systems) to authenticate their identity and ensure they have the proper authorization before permissioning access to the information. For example, developers may set the Classification and Impact Level such that a customer service rep may only have access to the last four digits of a customer’s social security number. Yet, at the same time, their manager can see the whole thing.
Implementing Data Remediation
For data scientists, the process of cleansing data to ensure that it is complete and correct is well-established. However, ensuring that data is properly protected can be more complicated.
An organization may use one of several different approaches to protect its sensitive data against unauthorized disclosure. When choosing between them, it is important to consider both data usability and security. The competitive advantage that an organization’s data provides is destroyed just as much if legitimate users can’t access it as if an attacker can.
Encryption
One of the most widely-known methods of protecting an organization’s sensitive data is via encryption. Modern encryption algorithms are designed to scramble data in a way that can only be reversed with the correct decryption key. Encryption moves the onus of data security from the data to the key since encrypted data is useless to an attacker without the decryption key.
The use of encryption to protect data as part of a corporate data remediation process has its advantages, including:
- Strong Data Protection: Used correctly, modern encryption algorithms provide very strong protection to the encrypted data. The exposure of sensitive data without exposure of the corresponding key is not considered a data breach.
- Regulatory Compliance: Encryption algorithms commonly appear in regulations as an option for securing sensitive data. In some cases, the algorithms may recommend or require the use of particular algorithms or configuration settings (key length, etc.).
While encryption can be a powerful tool for data security, it also has its limitations. Some of the downsides of encryption for data remediation include:
- Protection of Data in Use: Encryption algorithms can easily be used to secure data at rest or in transit, but securing data in use is more complex. Without homomorphic encryption, which is generally too inefficient for production use, it is impossible to perform calculations on encrypted data. Therefore, data must be decrypted and is at risk of exposure whenever it is used.
- Scope of Compliance: Data protection laws apply to users and systems with access to unencrypted data. Due to the inability to encrypt data in use, this can result in a large scope of compliance that includes systems that are difficult to monitor and secure such as point of sale (POS) terminals with access to payment card data.
- Key Management: Encryption protects data by making access to a decryption key necessary for access to the data. However, ensuring that only authorized users have access to the encryption key and that it is properly protected against exposure can be challenging.
- Backward Compatibility: Encryption algorithms convert data to a random-looking value of a fixed length. For example, the Advanced Encryption Standard (AES) has 128-bit ciphertexts. Since encrypted data doesn’t resemble plaintext data, implementing encryption can be challenging and require significant reengineering if databases and applications expect data of a particular format.
- Encryption Changes: The encryption algorithms used by an organization may be deprecated due to new attacks or technological advances. Changing encryption algorithms requires decrypting and re-encrypting every copy of the stored data.
Tokenization
Tokenization replaces sensitive data with a token. Storing tokens in applications and databases—instead of the original value—reduces or eliminates the compliance scope or burden that comes with securing high-risk data.
For example, imagine a startup doesn't want to take on the PCI compliance requirements of holding customer card data, but needs to charge their monthly subscribers. Using tokenization, the startup would receive and store a token as a stand-in for the payment information. Its counterpart, the sensitive card data, would be stored in an external PCI-compliant token vault offered by a tokenization platform.
When it’s time to charge the customer, the startup passes her payment token to the tokenization provider for detokenization and, ultimately, payment processing. In this scenario, the startup avoided exposing their systems to sensitive data, controlled their payment flow, complied with PCI requirements, and avoided the costs and risks of securing the data themselves.
Tokenization can provide significant benefits to a data remediation program, including:
- Permissioning: Tokens can carry properties that reference their underlying data’s Impact Levels and Classification, allowing organizations to tailor their access to comply with internal policies.
- Strong Data Protection: Tokens are non-sensitive and can only be mapped to real data using the lookup table. This enables them to be used anywhere.
- Regulatory Acceptance: Tokenization is explicitly endorsed by some regulations as a means of protecting sensitive data. Examples include the Reserve Bank of India’s (RBI) requirement that credit card transactions be tokenized and a similar NACHA requirement for ACH tokenization.
- Simplified Compliance: Tokenization can be used to outsource secure computations by ensuring that data is only stored and used on the tokenization provider’s platform, reducing the scope of regulatory compliance. Corporate systems can use tokens and request the tokenization provider to perform calculations on the data as needed.
- Regulatory Agility: With tokenization, real data is only stored on a single platform. This makes it easier to update the security controls protecting the data in the face of regulatory changes or other factors.
- Flexibility: Because tokens are abstracts to the original data, they can be easily configured to meet different business requirements and use cases. For example, Basis Theory Tokens contain programmable properties that allow their underlying data to be formatted, searched, masked, and much more.
Like encryption, tokenization is a powerful data protection solution but might not be the best fit for all use cases. When implementing tokenization, it is important to consider:
- Third-Party Impacts: One of the benefits of tokenization is that it can allow sensitive data storage and processing to be outsourced to a third-party provider. However, if an organization chooses to do so, this may result in additional latency when accessing data and creates the potential for outages if the service provider’s systems go down; then again, this risk is inherent in most all cloud implementations.
- Additional Compliance Requirements: Tokenization eliminates the risk to sensitive data on systems that only use the tokenized data. However, an organization also needs to implement security controls (encryption, access controls, etc.) to protect the master copy of the data and the lookup table that translates data to tokens, wherever these are stored. Fortunately, many tokenization platforms, like Basis Theory, come with these capabilities built into the platform or offer fine-grain access controls that satisfy these requirements.
Deletion
Data deletion—also known as nullification or redaction—involves removing sensitive fields from an organization’s data. For example, if a company doesn’t have the need to store a customer’s credit card number after processing a payment, then choosing not to do so reduces the organization’s data security risks.
Data nullification or deletion has a couple of benefits, including:
- Strong Data Protection: Deletion provides the strongest protection against data exposure because the data is no longer present to leak. However, this assumes that the deletion process is carried out properly to avoid remanent data.
- Regulatory Compliance: Some data privacy laws, such as the EU’s GDPR, mandate that organizations delete data when it is no longer required for its original purpose. After this point, data deletion is the only option for data remediation.
The main limitation of deletion as a data remediation strategy is that it is a one-way street. Once data is deleted, it can’t be restored or used, meaning that it no longer provides value to the organization.
Other Common Data Remediation Terms
When researching data remediation and security options, an organization might come across other terms beyond encryption, tokenization, and deletion. Some common terms include:
- Masking: Masking involves concealing sensitive fields within a record or data. This could be seen as a special case of partial tokenization or deletion.
- Anonymization: Anonymization makes it infeasible to determine the subject of a particular record by deleting certain identifying fields. Anonymization is difficult to perform effectively because an attacker can often aggregate data from multiple sources to de-anonymize the data.
- Deidentification: Deidentification is another term for data anonymization.
Selecting the Right Data Remediation Solution
Encryption, tokenization, and deletion are all designed to address different use cases. Often, a corporate data remediation strategy will include elements of each based on the data in question and where it is in its lifecycle.
For example, an organization may use tokenization to protect data in use across all of its systems that do not require access to the real data. On the tokenization server, the data would be stored in an encrypted format to protect it against unauthorized access and potential breach. When the data reaches the end of its lifecycle, it would be deleted.
Finally, and not be overlooked, organizations should assess their data remediation approach’s burden on its organizations—especially developers. Finding the right balance between security and utility is never easy, but the effort to do so can improve adherence, responsiveness, and your overall security and compliance postures. (It’s one of the reasons we call data security a developer experience problem).
Let’s recap
- Data Remediation is the process of ensuring that the data in a company’s possession meets its needs.
- A solid data remediation policy and strategy helps reduce the impact change has on data.
- Classifying data by their risk is a popular method for determining data access.
- Deletion, encryption, and tokenization are all viable strategies to mitigate security risks identified in your data remediation policy.
- A strong data remediation strategy considers the usability of sensitive data.
Want to supercharge your data remediation efforts? Contact us to learn more about our data tokenization platform and how we can help you secure and use your sensitive information.