February 25, 2025

Data Security

What Data Masking is and Why Mask Data

Types of Data Masking
Data Masking vs Tokenization
Is encrypting data the same as masking data?
Benefits of Data Masking

Data masking is the process of hiding elements of an original value, while still keeping enough context for the string to make sense to the user. For example, masking a credit card’s Primary Account Number (PAN) of 3566-0020-2036-0505 to reveal the last four-digits might look like XXXX-XXXX-XXXX-0505.

With the global cost of cybercrime projected to reach $10.5 trillion in 2025, and the increasing number of data privacy regulations, like CCPA and the GDPR, businesses need to use confidential information intelligently. Data masking offers a smart way to minimize or eliminate compliance requirements while maintaining day-to-day operations.

Masked values retain usability and functionality, while enhancing privacy and confidentiality. Because of this, masked data can be used for things—like sales demos, user training, account validation, and software testing—without increasing a company’s risk footprint, as plaintext data does. It also protects private information when sharing your business data with a third party.

When secured with strong privacy policies, identity access management tools, and role-based access controls, data masking provides a powerful and dynamic tool for desensitizing and using data. Let’s look at a few ways organizations mask data today.

Using Masked Data as a Visual Prompt

Masked data provides powerful visual cues to those familiar with the underlying data. For example, when a customer checks out at an online store, her digital wallet prompts her to select a stored card and uses the last four-digits of their respective PANs to help identify the correct payment method.

Using Masked Data for Testing

Humans and applications need data to test various system functions or standard operating procedures. Using sensitive plaintext data, or original values, is dangerous and expands both compliance scope and costs considerably.

When done right, masked data provides a cost-effective way to test whether a system or design will perform as expected in real-life scenarios without revealing sensitive data.

Using Masked Data to Help Migrate Data

Data masking can apply new formats to the underlying data. When combined with an abstraction layer, like tokenization, masked data may help format, structure, or clean data to satisfy new business or schema requirements encountered during a migration.

Return to Top

Types of Data Masking
While other methods exist, masked data is primarily created either as a copy of the original data or during processing. Let’s take a closer look at the two.

Static Data Masking (SDM): SDM duplicates your data with your data masking rules and algorithms applied to the new data set. This is helpful for test data environments, especially with third parties, where you may want to avoid the risk of something bad happening to your source data.
Dynamic Data Masking (DDM): DDM does not require a second data source to store the masked data dynamically. Instead, it masks and presents data according to the access policies and permissions of the actor requesting the data. For example, a customer support manager may have access to full PAN while his support representatives only have access to the last four digits.

In doing so, DDM provides access to masked data in real-time to an authorized user and limits the exposure risk static data may create.

Because there are so many ways to obfuscate data, it’s helpful to consider how each of the following techniques brings you closer to your usability and risk goals. In fact, most companies mix and match these approaches based on the effort it takes to build, maintain, and use them.

For example, while using format-preserving encryption to mask a social security number adds a significant level of security, an organization may not pursue it due to layers of complexity and cost to the application required to decrypt it. (We’ll get into encryption a bit more, later).

Return to Top

Obfuscating all but the last four-digits of a card’s Primary Account Number (PAN) provides some level of protection and usability to both users and organizations alike.

Substituting Data

Substitution replaces part of an original value with other characters or numbers—sometimes similar in nature. For example, if your email was Claude.Shannon@fakeemail.com, a substituted version might look like: Terry.Shannon@fakeemail.com.

Many methods for populating the data are used to substitute all or part of the original value. For example, you could draw from a look-up table of fake names or dates, use a generator or function to generate values, or calculate an average using aggregate values.

Redaction or Masking out Data

Redaction, also known as masking, is worth calling out as a notable sub-genre of substitution because of its popularity. This approach obscures a portion of the original value with fixed characters, like “X” or “*” and is by far the simplest form of data abstraction (it requires the least amount of power from humans and computers alike to understand and process).

Deleting or Truncating Data

Deletion or truncation desensitizes a piece of information by removing its essential elements.

Shuffling Data

Shuffling creates a mask by rearranging the characters or numbers of the original value; however, its inclusion in this is purely informational. While it’s low effort to build, shuffling is easily reverse-engineered and most experts strongly recommend against using it for sensitive data.

Return to Top

Data Masking vs Data Tokenization

Data masking modifies the existing original value while tokenizing data creates a net new one, called a token.

Masking takes a string and obfuscates an original raw value while tokenizing create a new value in the form of a token. — Obfuscating all but the last four-digits of a card’s Primary Account Number (PAN) provides some level of protection and usability to users and organizations alike.

Using masked data in applications and databases is a great way to reduce your compliance footprint in your environment, but it doesn’t eliminate the compliance obligations or security risks that come with storing the original value. Depending on the type of data being held and its usage, this could cost hundreds of thousands of dollars to build, maintain, and audit.

Tokenization platforms, like Basis Theory, store the original sensitive data in a compliant, hosted environment outside of your system. Instead of holding onto the full plaintext value, you’d store references, called Token Identifiers or Token IDs, to it in your application or database. Systems use these Token IDs to authorize and call back different properties of the original value (e.g., masks), allowing them to do many of the same things its plaintext counterparts can without increasing scope or risk. Let’s look at a tokenization example while incorporating what we know about DDM.

Say your customer support application needs access to the last four digits of a customer’s PAN to help a representative process a return on behalf of a customer. The application would use the Token ID to request a redacted version of the data. Basis Theory uses DDM, so once the application and user (i.e., the support rep) have been authorized, a mask of the customer’s card, “XXXX-XXXX-XXXX-0505”, would be generated and passed back to the application.

In this situation, the original value stays encrypted in a PCI-compliant environment and a masked value is returned to the representative so the customer can confirm which card to use to issue a refund. This workflow keeps your larger system and the web app out of PCI scope while reducing or even eliminating many of the obligations and costs that come with hosting full plaintext card numbers.

Return to Top

Is Encrypting Data the Same as Masking Data?

Encryption uses math and a combination of keys to transform an original value into an unrecognizable string of characters. The same set of keys used to encrypt the original value is then used to decrypt it.

There are many that maintain encryption is a form of masking. While encryption does obfuscate data and render it unreadable, the resulting string provides no context to a user or business trying to complete a task, so it fails to meet our definition of data masking.

While format-preserving encryption does offer some value here, the compute costs, latency, and operational overhead of managing the necessary keys and services have traditionally made the other data masking techniques much more attractive.

Learn more about managing your keys with Basis Theory’s Open Source KMS.

Return to Top

What are the Benefits of Data Masking?

Here are reasons organizations might use data masking:

Reduce Compliance Scope: Using masked data inside your applications and databases instead of plaintext greatly reduces your compliance scope and the costs, complexity, and speed of your compliance efforts.
Least Privilege: When combined with strong access rules and policies, masking ensures authorized actors have just the right amount of visibility needed to complete a job or function.
Compliance: If done well, masked data not only satisfies business needs but also sovereign or industry requirements, like GDPR, CCPA, or PCI.
Programmability: Masked data can be generated to assume or change its value based on various static or dynamic factors.
Made for Humans: Human memory is a fickle thing. It's easier for us to recognize and remember data sets when they are provided in a familiar pattern.
Reduce the Impact of a Breach: Masking data greatly decreases its usability, potentially preventing, or at least, cyber criminals' ability to cause harm.

Unlike other tokenization providers, Basis Theory allows developers to segment their data’s access controls as containers and define their tokens’ masks using a familiar Liquid syntax, called Expressions. If you’re interested in learning more, reach out for a quick demo!

Return to Top