Data Tokenization FAQs and Examples
Data tokenization refers to the process of generating a digital identifier—called a token—to reference an original value. If we tokenized an original value like a credit card number, we’d create a token that might look like bb8b7ed0-fee5-11ec-9686-ff66557783a8.
Tokens allow developers to safely use sensitive data within their applications, databases, or devices without exposing the system to the risks and requirements of holding it—unlike encryption, which is designed to be reversed once the original data reaches its intended destination.
Governments, like India and the European Union are driving new and clarifying old data and residency requirements at a dizzying pace. These laws mandate companies to accommodate varying levels of access, localization, and rules across multiple countries depending on where the underlying data is stored and how it is used.
The emerging trend has added significant complexity and compliance risk to merchants, especially high-risk merchants.
As we work with merchants worldwide to build their own data tokenization solutions, several frequently asked questions and themes continue to reappear.
Working with Tokens
How do you use a token?
To create a token within Basis Theory, an application sends the original value, like a credit card number, to a specialized environment called a token vault or data tokenization system. This triggers two events:
- The original value is encrypted and stored for safekeeping.
- A token is generated and sent to an application or database for future use.
Detokenization refers to the steps to exchange a token for the right to access and use the original value.
To do this, the actor (i.e., a system or a person) sends a token to the data tokenization solution. Then, after a series of secret handshakes between the application and this system, the token is detokenized and ready for further instruction.
Mitigating Security Threats Using Tokenization
Whether a zero-day exploit or encryption-busting quantum computers, the security needs of tomorrow require a level of expertise and posture that few companies are willing to invest in today. While that leaves a significant gap that no one solution will bridge by itself, tokenization platforms offer various enduring benefits to protect your data.
- Mitigates the fragmentation and proliferation of sensitive data as system scale
- Centralizes the complexities of encryption and key management
- Upgrades encryption algorithms without loss of business functionality
- Offers distributed systems for global redundancy
- Provides dedicated tenant environments
- Certified and attested controls and environments
- Manages Infrastructure as Code
The difficulty of securing data as a developer should not be discounted. The simpler it is for developers to do the right thing, the better the security and compliance posture of the entire company.
Tokenization platforms simplify data security by abstracting the risky challenges of securing and managing data at rest.
Data Tokenization Examples
Imagine an eCommerce company trying to facilitate a faster checkout experience for repeat customers (and reduce security and compliance risks). The credit card data is tokenized; however, if your backend sends the token as is, the payment will fail.
Instead, the checkout application sends the token and payment instructions to the tokenization system. Once the tokenization system proves that the checkout application has the necessary access and permissions, it detokenizes the previously tokenized payment data.
From here, the system may forward the card and payment information to a card processor.
Payment Optimization
A popular retailer wishes to pursue payment optimizations that included intelligent payment routing to reduce card transaction fees, but their card data is locked with their current provider. The retailer migrated the card data from their payment processor to a hosted compliant environment, using tokens and a proxy service to programmatically route payments to processors offering the lowest cost.
Confidential Computing
A consortium of private companies wanted to leverage key aspects of its members’ proprietary geo-location data without sharing the larger sample set. The consortium’s members migrated its data to a hosted environment and set up necessary permissions and controls to allow members to ask the dataset Yes/No questions.
Information and Payment Clearinghouse
An embedded finance app wanted to provide a seamless end user experience, but three parties needed varying pieces of information that would’ve required the customer to sign up for two distinct services. The embedded finance company used a compliant hosted environment, permissions, and a proxy service to establish a clearinghouse where Personally Identifiable Information (PII) and related payment information could be exchanged, and only one registration was needed.
Embedded eCommerce
A popular streaming service provider wanted to allow its customers to shop ads with their existing card-on-file (located on its hardware device), but couldn’t bridge the relationship from its existing payment processor to its retail partner (one of the largest in the world). The streaming service used the tokenization platform’s proxy service to direct interactions to and from the retailer and processor, allowing the encrypted credit card number on their customer’s device to be used by its processor to pay the retailer.
Data Tokenization Solutions for Merchants
How to make data tokenization accessible?
Until recently, most companies have been unable to build the supporting systems, culture, and functionality needed to deliver the gains Big Tech enjoys from tokenization.
Tokenization platforms, however, bridge that gap by providing a hosted and compliant infrastructure to secure the data and the developer-friendly tools, experiences, and documentation to use it. As a result, organizations achieve similar privacy, security, and compliance postures for a fraction of the cost or time to build a data tokenization platform themselves.
Emerging platforms have simplified access to robust and cost-effective tokenization, democratizing its benefits and fueling its adoption.
What are tokenization platforms, and why are they valuable?
Tokenization refers to the process of creating and using a token. Data tokenization platforms refer to the package of compliant functions, support, tools, and infrastructure needed to unlock all the benefits tokens offer.
Tokenization platforms can be built in-house or purchased but typically contain some variation of the following:
- Compliant infrastructure: Auditors and regulators require that companies secure and manage sensitive data in a compliant location. These requirements can vary by country or region, data type, etc. Tokenization platforms may offer managed hosted environments that comply with these specialized requirements or accommodate a customer's existing infrastructure.
- Access controls and permissions: Sensitive data should only be accessible and editable to those with the proper access and permissions to do so. Tokenization platforms allow developers to quickly assign NIST impact levels and classifications to their applications and their data, as well as offer or connect to existing tools to manage these controls at scale better.
- Developer services, tools, and documentation: Being able to tokenize and store sensitive data is only part of the equation. Developers require tools, like APIs, services, and compute capabilities, to embed, build, and support tokenization within their system or a third party’s.
Tokenization platforms provide the foundation and supporting documentation developers need to build new products, partnerships, and insights. For example, with Basis Theory’s platform, you can search sensitive data without detokenizing or decrypting the original value, collect user data seamlessly from within your application, or route requests and responses to and from third parties.
- Token capabilities: Tokenization platforms provide immediate flexibility to their tokens, allowing developers to interact with their sensitive data a lot like plaintext. Here are some of the properties developers may receive on Day One.
- Aliasing: Tokens can be formatted or lengthened to preserve the look and feel of the underlying data
- Masking: Tokens can reveal all or part of the original value
- Tagging: Tokens can use metadata, allowing you to reference all of the data owned by a single customer
- Fingerprinting: Tokens can be correlated, allowing developers to create irreversible relationships between multiple tokens that contain identical data.
- Impact levels and classifications: Tokens can ensure only authenticated and authorized actors have access and permissions to the underlying data.
- Searching: Tokens can be indexed, allowing to search the underlying data set without decrypting the original values.
Why would a merchant need data tokenization tools?
Many applications, databases, and devices rely on seeing or holding sensitive plaintext data to complete day-to-day operations. Doing so can bring these systems “into scope,” creating significant complexity, overhead, and costs for its stewards. The more places this plaintext data exists, the more effort it takes to ensure proper compliance.
This can hinder an organization’s response to shifting markets or customer demands. To complicate matters, the rules and requirements governing this data often change based on factors like geography, data type, and usage.
By replacing sensitive data with tokens, you reduce the number of applications, databases, and devices interacting with the plaintext value (e.g., credit card numbers). In doing so, tokens reduce the scope of requirements and their impact on an organization. To give you a sense of this effect, customers using Basis Theory’s compliant environment to store encrypted plaintext values can reduce their PCI Level 1 reporting requirements by up to 90%.
Data tokenization services provide tokens that can take any shape and are safe to expose, allowing them to integrate with existing systems easily to replace sensitive data. Some of the core benefits of data tokenization include:
- Reduced risk: Applications, devices, and databases collect, store, and use sensitive data to complete day-to-day operations, making them targets for adversaries. In addition, the more fragmented the system, the more surface area criminals have to attack and exfiltrate sensitive data.
- Tokens are undecipherable, unreadable, and unusable to those without permissions and access. These attributes prevent adversaries from seeing and using any exfiltrated data from applications, databases, or devices. Meanwhile, my real credit card information stays encrypted and stored in a firewalled environment.
- Reduce or eliminate compliance requirements: Companies, industry groups, and governments have created rules to govern the use, storage, and management of sensitive data, like personally identifiable information (PII), primary account numbers (PANs), and bank account data. These mandated protections—like access controls, firewalls, and audits—aim to prevent the theft and abuse of data used by organizations.
With a smaller compliance footprint comes the ability to respond to shifting regulations more quickly. Now centralized, encrypted, and stored in a compliant location, sensitive data can more easily adapt to new data residency laws, data protection requirements, and industry standards.
And, because their applications use tokens instead of the plaintext value, they can accommodate these new requirements without disrupting day-to-day operations.
- Safely using sensitive data: Historically, the risks and compliance requirements governing sensitive data have made it difficult to move beyond sensitive data’s primary use case (e.g., only using SSNs for credit checks). By quarantining and locking down this data, organizations lose opportunities to make better risk decisions, create new partnerships, and design more unified customer experiences.
By abstracting sensitive data, replacing them with tokens, and gating access, developers can unlock new partnerships, products, and insights that drive revenue or save costs.
Where can tokens be stored or shared?
Tokens do not contain the original plaintext values, allowing them to be stored anywhere.
What kind of data can be tokenized?
Any data, files, images, etc. We like to say, “If it can be serialized, it can be tokenized.”
Comparing Tokenization with Encryption
What’s the difference between tokenization and encryption?
Fundamentally, the token creates an entirely net new value. By decoupling the sensitivity from the value, tokens allow an organization to lock down, say, a social security number in a protected environment. In contrast, its token counterpart lives out its life in any number of applications, devices, or databases.
On the other hand, encryption works by scrambling and unscrambling the original value back and forth from an unrecognizable state, called a cipher text, to its original plaintext version. While the ciphertext may look like a net new value, it still contains the SSN somewhere inside it.
How do encryption and tokenization secure payments?
For effectively all online vendors, the following process is true. All communications between the buyer and the seller are encrypted using SSL (Secure Sockets Layer). Without going into the details of the public/private key asymmetric encryption system that SSL uses, suffice to say that when the little padlock appears in the address bar, the connection is encrypted.
Once the seller has received the CHD (Cardholder Data), communication with the PSP is also encrypted. Depending on the level of their PCI-DSS compliance certification, the vendor may store much of the CHD, as well as customer PII, in their own databases; this data will also be encrypted.
That said, vendors cannot store all CHD. For instance, the Card Verification Value (CVV) should never be stored by the vendor.
Advanced vendors know to add tokenization to this trail, owing to one uncomfortable reality: if their own data storage is hacked, and the hackers are able to decrypt the information they find, all that CHD and PII can be harvested and sold. So they use a tokenization service to place all of this sensitive data in a secure vault, and store only tokens in their own systems.
This way, the merchant is protected by both encryption—securing the data in motion— and tokenization, which secures the data at rest. Here is a complete list of the types of data protected by both encryption and tokenization.
What are some of the challenges to tokenization?
Latency
Tokens rely on a request to retrieve the underlying data, adding additional latency to the process; however, this can be addressed through geo-replication, horizontal and vertical scaling of resources, concurrency, and caching.
Abstract-only
Simply generating a token won’t restrict access to the stored data or account for other compliance, security, and risk challenges needed to house and use it. To do so, developers would need to build and maintain many other compliance requirements and best practices to take advantage of all that tokenization can offer.
These may include but are not limited to managing users, enabling permissions, maintaining proper encryption key management, standing up to compliant environments, and much more.
As noted, tokenization platforms, like Basis Theory, offer these services out-of-the-box.
API Keys
If encryption keys can be stolen, so can the API keys used to broker access and permissions to the underlying data. These API keys must be protected to the fullest extent possible.
Fortunately, tokens provide more methods for authentication than encryption and are far easier to maintain as your system scales. Combined these two factors reduce a significant amount of risk that exists regardless of the tool used to secure and share sensitive data today.