What is pseudonymization? Simple definition, benefits and comparisons
Pseudonymization is one of several techniques by which an organization can remove this identifying information and operationalize data while providing both privacy and security benefits.
What is Pseudonymization?
A pseudonym—literally “false name” in Greek—is an alias or a unique identifier used instead of a name. For example, a large retailer wants to remove bias in the hiring process. To do so, they could replace a candidate’s name, like “Jon Smith”, with a different unique value. This new value could be a:
- random string of numbers or letters, like “yUg e%423”;
- a fake name, like Tom; or a
- mask, like “********” or “Jon XXXXX”.
One of the defining features of pseudonymization — as opposed to anonymization (more on that in a few) — is that it is reversible. Returning to our hiring example, the company now wants to extend an offer to the right candidate. To do so, they could look up the unique identifier to re-identify person and their contact info.
True anonymization (more on this in a moment) or pseudonymization requires stripping enough information from a record to make reidentification impossible, which would require stripping both unique values — such as names, addresses, and phone numbers — and values that can be unique in combination with one another — such as the combo of gender, birth date, and ZIP code — from the data. If you remove all of this data from a dataset, it quickly becomes unusable. Doing so some and not all, however, may not have the intended effect.
For example, a study performed in 2000 found that over 87% of Americans can be uniquely identified with the combination of gender, birth date, and ZIP code. None of these values are commonly considered sensitive, but they identify an individual more uniquely than their name.
A variety of techniques make pseudonymization possible. A straightforward option is to hash the PII — which produces a random value that can’t be reverse-engineered back to the original value — and use the hash value as the unique identifier. Alternatively, an organization could use a format-preserving pseudonymization scheme, such as replacing a data subject’s name or address with random ones selected from a table of valid names and addresses.
For generic or masked values, like Tom or “Jon XXXXX”, which will not have the uniqueness provided by hashing, your psedonymization scheme may need to generate a separate unique identifier, called a reference ID, to correlate the data.
Why Pseudonymize Data?
Data protection regulations, such as the General Data Protection Regulation (GDPR), put strict requirements on companies’ use of personally identifiable information (PII), placing limitations on how it can be processed and secured. This can make it difficult for companies to maintain existing or pursue new business opportunities, better serve their customers, or build stronger products.
Pseudonymization vs. Anonymization
Pseudonymization and anonymization both de-identify data, removing any information that could be used to uniquely identify the data subject. However, the two result from different processes and impacts the recoverability of the data subject’s identity.
Pseudonymization replaces sensitive data with an alias or unique identifier and saves the mapping of PII to unique identifier in a lookup table. This pseudonym can safely be used without revealing the identity of the data subject, but anyone with access to the lookup table can re-identify the data.
Anonymization may replace PII with a unique identifier, but it doesn’t save the lookup table. The goal is to make re-identification impossible even with access to additional information.
Pseudonymization for Compliance and Security
Pseudonymization provides value beyond a fair hiring process. Companies often use it to balance operational needs and regulatory compliance. Some of the applications include:
- Enhancing data security: Properly pseudonymized data can’t be used to identify the data subject without access to additional information. If a system only has access to a pseudonym or token representing a user’s identity, it can’t breach the sensitive data associated with that token.
- Reducing third-party risk: Companies commonly need to share data with third-party organizations for various purposes, such as shopping orders or processing payments. Pseudonymization allows companies only to share the information needed by these partners, reducing data security risks and compliance challenges.
- Simplifying compliance: Regulations such as the GDPR have security requirements for all systems with access to PII. Using de-identified data can help meet important compliance principles such as “data protection by design” under GDPR, as well as freeing up the data for more permitted use cases. .
- Application testing: Applications use test data to ensure that they function correctly; however, using real customer data for software testing is frowned upon. A format-preserving pseudonymization scheme allows testing with fake, but realistic data that can’t be traced back to the original data subject.
Pseudonymization for Payments
In the context of payments, pseudonymization can offer many benefits for merchants and consumers alike by providing:
- Enhanced Payment Data Protection: Pseudonymization can help protect sensitive payment data, such as credit card numbers and bank account details, from unauthorized access
- Secured Payment Processing: store and process payment data without directly exposing the sensitive data by pseudonymizing it.
- Data Sharing and Analysis: Pseudonymized payment data can be shared with authorized parties for analysis and can be used for fraud detection and more.
Achieving Pseudonymization with Tokenization
Pseudonymization provides significant benefits for data privacy, security, and regulatory compliance. By removing identifying information before processing it on a system or sharing it with external parties, an organization derisks itself significantly in the event of a breach.
Tokenization, or the process of exchanging a raw value (e.g. SSN) with a net new value, inherently provides the functionality needed to pseudonymize data. Using a tokenization provider, like Basis Theory, takes pseudonymization even further, providing the access controls, data-level configurability, serverless functions, and secure environment needed to confidently share or process data sets internally or with third parties.
Learn more how to quickly and easily integrate tokenization into your data security and compliance strategy with Basis Theory.