There are an estimated 1.1 million Go developers today, making Go one of the top 10 languages in...
What is data encryption and how does it work?
Every day, businesses accumulate more data to help drive their decisions and understand their market. That data is not only confidential to the business, but often contains the personal information of its customers. Keeping this data safe is imperative to the business, to its customers, and to its partners.
In one sense, it is easy to keep the data safe: put it somewhere that it is completely inaccessible. This, of course, is not practical. Systems need to be able to access the data in order to analyze it; some users need to review and analyze the data to assist with business decisions; some data, such as payment information, needs to be used on a periodic basis to process invoices. How does a business keep the data safe and secure while still keeping it usable?
There are many different ways to encrypt and decrypt data. Most systems employ several different solutions, making the overall data security and encryption strategy more robust but also more complicated and harder to manage. Understanding the fundamentals of data encryption can help design those solutions and remove some of the pain points.
What is data encryption?
The dictionary definition of encryption is “to change (information) from one form to another especially to hide its meaning.” Many kids learned one of the most basic forms of encryption in grade school: the Caesar Cipher. Simply shifting the letters of the alphabet over results in a message that looks like nonsense, but if you know how many characters the alphabet was shifted it is easy to decode. Imagine shifting the alphabet 4 characters, where “A” becomes “G”, “B” becomes “H”, and so on. Using this most basic cipher, decrypt this message:
FI WYVI XS HVMRO CSYE SZEPXMRI
This example is extremely simple and very easy for modern computers to break; with only 26 possible shifts, a computer can try all of the possibilities in less than a second. However, it illustrates four key elements behind encryption:
- The data or message to be encrypted;
- the algorithm (in this case, the Caesar Cipher) used to encrypt the data;
- the key (in this example, “4” is the key);
- and the ciphertext, or the encrypted result.
What is decryption?
Decryption refers to the process of unscrambling the encrypted ciphertext back to the original value. At the highest-level, it’s the same process as encryption, but in reverse. We’ll unpack this a bit more in the section below around different types of encryption.
There are numerous different encryption algorithms. NIST classifies the algorithms into two categories: symmetric and asymmetric.
Symmetric encryption uses the same key for encryption and decryption. In general, symmetric encryption is more efficient than asymmetric encryption, making it more suited to bulk data encryption than asymmetric encryption. However, to use symmetric encryption, you need a means of securely sharing a secret key between the sender and recipient. The same is not true of asymmetric encryption, which uses a public/private key pair.
Asymmetric encryption is valuable when you need to share the data, allowing you to share the public key that can be used to encrypt the data, while only the private key can decrypt it. Often, asymmetric cryptographic algorithms are used to establish a shared secret key, enabling symmetric cryptography to be used to encrypt the actual data.
Many algorithms once considered to be a best practice are now considered weak. As computers get more powerful and can process more data in less time, the time and energy needed to break the algorithm becomes less and less. A couple of examples of algorithms now considered weak are Data Encryption Standard (DES), which is symmetric, and RSA with a short key (asymmetric).
Currently recommended algorithms are:
- Rivest-Shamir-Adleman (RSA): Not to be confused with the security firm, RSA is a widely used asymmetric encryption algorithm. RSA can be insecure if the key is too short, so the current recommendation from National Institute of Standards and Technology (NIST) is a 2048-bit key length or greater. The larger the key size, the harder it is to crack.
- 3DES (or Triple-DES): This algorithm uses the outdated DES algorithm, but it runs the algorithm three times over the same pieces of data. This algorithm is used by common services such as Secure Shell (SSH) and IPSec. DES uses a 56-bit key; 3DES effectively uses a 112-bit key (due to something called the meet-in-the-middle attack).
- Advanced Encryption Standards (AES): Developed as an easy-to-implement solution for both hardware and software, AES comes with three different key sizes. You’ll often see the key size as a suffix (AES128, AES192, or AES256).
- Crystals-KYBER: CrytalsKYBER is the first post-quantum encryption algorithm standardized by NIST. When quantum computing breaks classical asymmetric cryptographic algorithms (such as RSA), Crystals-KYBER and similar algorithms will provide ongoing data security.
Other algorithms still considered secure today are Blowfish and Twofish. Taking the time to find which algorithm is right for your application could mean the difference between a secure and insecure system.
Encryption algorithms can be used to protect data at all stages of its lifecycle. Encryption use cases can be generally classified into three categories:
- At-Rest Encryption: Data is encrypted “at rest” (local files on the storage devices) to protect it from being compromised if an unauthorized user gains access to a device. For example, many computers and smartphones offer full-disk encryption, and data in corporate databases is commonly encrypted to protect it from breach. Full-disk encryption is commonly performed using symmetric encryption algorithms because encryption and decryption are usually performed on the same device, eliminating the challenges of sharing secret keys.
- In-Transit Encryption: When you visit a website using the HTTPS protocol, your data is encrypted while in transit between your computer and the server. HTTPS uses the TLS protocol for encryption, which uses asymmetric cryptography to establish a shared key and symmetric encryption to protect the actual data. Additionally, hash functions are commonly used to ensure that the data has not been modified or corrupted in-transit.
- In-Use Encryption: Ideally, data would be encrypted while in use as well; however, adding two ciphertexts (or performing other operations on them) does not yield the encrypted version of the sum of their plaintexts. While encryption of data in use is possible with a fully homomorphic encryption algorithm, existing algorithms are too inefficient to be usable.
Pros and cons of Encryption
Benefits of Encryption
One immediate benefit of encryption is exactly the reason you implemented it in the first place: it helps keep your data secure. Encryption provides an immediate safeguard against leaking sensitive information. There are many layers to a data security strategy, so what other benefits make encryption such a popular solution?
- Low cost: Nearly every programming language and framework has a powerful and up-to-date encryption library that enables developers to implement encryption with ease. Encrypting data in your application doesn’t require additional hardware or infrastructure. If you are working with hardware, such as mobile devices, most devices come with encryption services built in!
- Regulatory requirements: Many regulations require significant security measures and safeguards against data breaches. Because encryption is low cost and relatively simple to implement, fulfilling some of those regulatory needs is straightforward, saving your business the headache of dealing with non-compliance issues (or worse!).
- Ubiquitous: Encryption isn’t - and shouldn’t - be handled in one spot. For instance, encrypting the data in your database keeps your data secure at rest, but if your application needs to send a message to another service, that data should also be encrypted. Data can be encrypted at rest or in transit, and can be encrypted using different methods. The combination of these different methods make it significantly harder for a malicious actor to get your sensitive data.
It is hard to argue against encryption as a critical tool for data security. Is there a catch?
Challenges of Encryption
There are several challenges when thinking about encryption. First there are the implementation details to consider: what data are you encrypting, what encryption algorithms are you using, and how will you store the keys? Those are the easy questions to ask, but not so easy to answer! There are several things to consider as you plan for your implementation:
- Missing features or capabilities: Encryption libraries will not solve every use case, but attempting to handroll your own can lead to costly consequences. Encryption algorithms are complex and nuanced, and one small mistake in your algorithm or other cryptograpich operations can lead to huge weaknesses.
- Using encrypted data: Though data can be encrypted at rest (protecting the data from being copied out of storage and opened on another system) or in transit (protecting the data from being intercepted), you still need to decrypt the data to use it. For instance, you need to decrypt credit card information to charge a customer. Safely handling the decrypted data is a topic itself!
- Sharing encrypted data: Accessing encrypted data requires access to the private key used for decryption. If an organization shares data with third-party organizations, it needs mechanisms in place to securely share and update encryption keys and to revoke access if needed.
- Key Management: Every encryption algorithm requires use of a key to encrypt the data, and only that key can be used to decrypt the data. If that key is ever compromised, all of your data is at risk. Rotating the key is a necessary component of any encryption strategy. Rotating keys lowers the risk that a key is compromised by ensuring that it is only a valid key for a short period of time. However, you cannot just change the key and expect the algorithm to still work; you will need to re-encrypt all of your data with the new key, which can be an expensive and complicated process. Planning and implementing a key management and rotation policy is critical to the success of your data security strategy.
- Nothing is future proof: As computers become more powerful, algorithms will become weaker. Scientists are working on quantum computing, which has the potential to break the asymmetric encryptions commonly used today. Who knows what the future will bring! Any encryption strategy needs to be a living document, always evolving with the latest security best practices, and keeping up to date with the strongest algorithms.
Encryption vs Tokenization
Using encrypted data is a problem space unto itself. As we described earlier, you typically cannot use the encrypted value in normal operations without decrypting the value itself. Tokenization and data vaults provide a solution to this issue. The original plaintext value gets stored in a system (vault) outside of your application’s infrastructure, keeping it safe and secure. Instead of passing the ciphertext around your systems, you instead pass a token. Unlike encrypted data, it is not decipherable or reversible: it is simply a reference, not an encrypted or hashed value. You cannot “detokenize” a token into its original value without access to the vault.
While tokenizing provides a helpful abstraction layer over otherwise hard to use encrypted data, a token’s usefulness to a developer depends on its capabilities. Some tokenization services pass back only the reference to the stored secured value, while other platforms allow developers to configure and call back multiple properties of a token. For example, Basis Theory developers can index, mask, fingerprint, and alias all or part of the token and its secured encrypted data. This flexibility makes tokens a favorite of developers needing to balance utility, security, system complexity, and compliance as the data traverses different systems.
Learn more about what tokenization is and how it relates to encryption!
Encryption vs Hashing
Hashing is often confused with encryption, but there is a distinct difference: a hash cannot be decrypted. Encryption creates a ciphertext that can be decrypted using the algorithm and the key, but hashing only creates the ciphertext and it cannot be reversed. In general, hash functions are used to protect data integrity, while encryption is focused on data confidentiality.
Hashes are used in situations where you don’t need to know the original value and only need to check if the hash matches another hashed value. Passwords are an obvious use case, because there is no scenario in which the application needs to know what the original password was; it only needs to know that the password the user entered is the same as the hashed password stored in the database. When logging in, the password the user entered is passed through the hashing algorithm and compared to the value in the database.
Hash functions are deterministic and collision resistant. This means that hashing the same input multiple times always produces the same hash output, but it is infeasible to find two different inputs that produce the same output. This makes hashing perfect for data integrity: the hash before and after an operation such as transferring a file to another system should match. If not, there was an issue with the transfer and some data is likely missing or corrupted.
However, hash function determinism has an obvious disadvantage: because the hashing algorithm always creates the same hash, hackers can use this to their advantage. If they know the algorithm, they can use existing lookup tables (known as rainbow tables) that contain both the plaintext and hashed values. For this reason, applications like password storage use a random value called a “salt” to make identical passwords yield different hashes. Even though the salt is public, it makes rainbow tables worthless because an attacker would need a table for each possible salt. This forces an attacker to brute force the password, checking various possibilities until they find the right one.
Bringing it all together
Encryption is clearly a necessary component of your data security strategy, but it’s not as easy as saying “Go encrypt our data!” Though encryption is easy to implement it requires planning before any code is written. What data are you encrypting? Where will it be stored? What algorithm are you using? How will you generate the key? How will you store and use the key? How will you rotate the key? How will you ensure the decrypted data is safe while your systems are using it for daily operations?
There is no single answer for any of these questions. They are dependent on your application and tech stack, on your locality, on regulatory and compliance requirements, and more.
The complexity of these decisions are one reason tokenization is becoming a popular option for data security. Using a third party tokenization platform removes the complexity of managing a data encryption strategy, and gives developers powerful tools to safely and securely store and use sensitive data.
Want to learn more about Basis Theory or encryption? Join our Slack!