What is symmetric cryptography?

My understanding of cryptography is very superficial. Here are some notes.

Traditional communication is easy: go into a meeting room and speak to each other. But in the modern world, we don’t always have this luxury; instead we have to talk over large distances, via wires and other people, with many people listening. Unfortunately some of these people are “adversarial” and can now do bad things:

They can read what you say.
They can change what you say.
They can pretend to be you.

Crypto is all about recovering some of the guarantees of the meeting room, while in the presence of such adversaries. Two important guarantees are that:

Adversaries can’t read your messages. (“Encryption”.)
The recipient can verify that your messages are really from you. (“Authentication”, “verification”, or “signing”.)

There are two main forms:

Private/secret/symmetric key. This is the old-school crypto which requires you to share a secret with the recipient. The hardships with it include initial key exchange, and that anyone who can decrypt your messages can also impersonate you.
Public/asymmetric key. This is the amazing post-1970 crypto which “fixes” some of those hardships.

I’d like to know more about symmetric key crypto. You can do both things with it:

Encryption: Send a message to someone else with the secret, such that adversaries can’t read it.
Authentication (signing): Send a message to someone else with the secret, such that they can verify it’s from you.

The basic API is:

ciphertext = encrypt(secret, plaintext)
plaintext = decrypt(secret, ciphertext)

signedtext = sign(secret, plaintext)
ok = verify(secret, signedtext)

There are obvious relationships between these. Decryption with the same secret is of course the inverse of encryption:

forall secret, plaintext: decrypt(secret, encrypt(secret, plaintext)) == plaintext

And signed text should verify if checked with the same secret:

forall secret, plaintext: verify(secret, sign(secret, plaintext))

Those are the mainline cases with no adversaries. There are also the important properties where the secrets differ:

forall secret1, secret2, plaintext: (decrypt(secret2, encrypt(secret1, plaintext)) == plaintext) == (secret1 == secret2)
forall secret1, secret2, plaintext: verify(secret2, sign(secret1, plaintext)) == (secret1 == secret2)

See what this means in the case of an adversary trying to read my messages:

decrypt(fake_secret, encrypt(secret, plaintext)) != plaintext  // can't read my messages
verify(fake_secret, sign(secret, plaintext)) == false  // can't see that it's from me

See what this means in the case of an adversary trying to pretend to be me:

decrypt(secret, encrypt(fake_secret, plaintext)) != plaintext  // can't pretend to be me
verify(secret, sign(fake_secret, plaintext)) == false  // can't pretend to be me

Actually, the sign/verify API is usually a bit different: instead of giving you a full “signed text”, it just gives you a signature. In crypto terminology, this is called a “Message Authentication Code”, or MAC. That is:

sig = gen_mac(secret, plaintext)
ok = verify_sig(secret, sig, plaintext)

Then, instead of sending a signed text, one sends the plaintext and the signature. The important property is then:

verify_sig(secret2, sign(secret1, plaintext1), plaintext2) == (secret1 == secret2 && plaintext1 == plaintext2)

Actually, the verify_sig function often doesn’t exist, because it is trivially implemented:

verify_sig(secret, sig, plaintext) = (sig == gen_mac(secret, plaintext))

However, the == function here can be vulnerable to a timing attack if implemented in the normal way, so the verify_sig function (can/should exist in order to protect against this common error.)

The above says that, to verify the signature, we re-generate the signature and check that it is the same as the one sent. Then the important property is:

(gen_mac(secret1, plaintext1) == gen_mac(secret2, plaintext2)) == (secret1 == secret2 && plaintext1 == plaintext2)

Notice that this process verifies two properties:

The received message is from someone with the shared secret. (“Authentication”.)
The received message is exactly what that person wrote. (“Data integrity”.)

If verify_sig fails, there are a few possible causes:

The message is not from someone with the shared secret.
An adversary tampered with the message in transit.
The message was corrupted in transit.

This MAC process has some similarity with a different concept, a message digest (aka checksum):

digest = gen_digest(plaintext)
ok = verify_digest(digest, plaintext)

A message digest verifies data integrity, i.e. that no accidental corruption occurred, but does not verify that it has not been deliberately tampered with! An adversary can easily modify the plaintext in transit, create a new digest for it, and substitute that for the old one.

Thus a MAC does strictly more than a message digest: as well as verifying that no accidental corruption has occurred, it also verifies that no deliberate corruption has occurred.

There are a few kinds of MAC. A popular kind is HMAC (Hash-based MAC), which relies on a cryptographic hash function. Conceptually, HMAC is implemented as:

hmac(secret, plaintext) = hash(secret ++ plaintext)

It is not quite implemented this way, because some underlying hash functions are vulnerable to a “length-extension attack”. This means an attacker can easily extend the plaintext without modifying the MAC. Because of this technical problem with some hash functions, HMAC instead applies the hash(secret ++ ...) function twice, to get:

hmac(secret, plaintext) = hash(secret ++ hash(secret ++ plaintext))

There are other ways of generating a MAC, but HMAC is by far the most popular.

Commonly, we’ll want to combine the two properties offered by secret-key crypto: encryption and authentication. How do we combine them? There are at least three techniques. Together, they’re called “Authenticated Encryption” (authenc).

First, Encrypt-then-MAC (“EtM”). EtM works by:

string encrypt_then_mac(secret, plaintext) {
  let ciphertext = encrypt(secret, plaintext);
  let mac = gen_mac(secret, ciphertext);
  return ciphertext ++ mac;
}

string verify_then_decrypt(secret, ciphertext_and_mac) {
  let ciphertext, mac = split(ciphertext_and_mac);
  assert(gen_mac(secret, ciphertext) == mac);
  return decrypt(secret, ciphertext);
}

Notice that in terms of API, this is basically the same as the encrypt and decrypt functions, but decrypt will only return plaintext if authentication succeeds.

The next is Encrypt-and-MAC (E&M):

string encrypt_and_mac(secret, plaintext) {
  let ciphertext = encrypt(secret, plaintext);
  let mac = gen_mac(secret, plaintext);  // uses plaintext instead of ciphertext
  return ciphertext ++ mac;
}

string verify_and_decrypt(secret, ciphertext_and_mac) {
  let ciphertext, mac = split(ciphertext_and_mac);
  let plaintext = decrypt(secret, ciphertext);
  assert(gen_mac(secret, plaintext) == mac);
  return plaintext;
}

Finally, there is MAC-then-Encrypt (MtE):

string mac_then_encrypt(secret, plaintext) {
  let mac = gen_mac(secret, plaintext);
  return encrypt(secret, plaintext ++ mac);
}

string decrypt_then_verify(secret, ciphertext) {
  let plaintext, mac = split(decrypt(secret, ciphertext));
  assert gen_mac(secret, plaintext) == mac;
  return plaintext;
}

All three approaches are used in different major systems: EtM in IPsec, E&M in SSH, and MtE in SSL/TLS. There seems to be no consensus on which is better. There are also “signcryption” algorithms which do encryption and signing at the same time, instead of mashing the two together. Signcryption does not seem to be commonly used. This guy suggests that we should

“Always compute the MACs on the ciphertext, never on the plaintext”, i.e. encrypt_then_mac.
“Use two different keys, one for encryption and one for the MAC.”

Within symmetric-key crypto, there are two forms of cipher:

Block ciphers. A block cipher encrypts fixed-length plaintexts. For example, the Blowfish algorithm works on 64-bit blocks (8 bytes).
Stream ciphers. A stream cipher encrypts variable-length plaintexts. It works “online”, i.e. can emit the cipher text as it consumes the plaintext.

To understand either, a good place to start seems to be the XOR function (denoted ⊕ in crypto-land). It’s one of the classic binary functions we were taught in school:

XOR is used EVERYWHERE. It’s the heart of the most fundamental symmetric-key algorithm, the ONE-TIME PAD. This works by having a shared secret (the “pad”) which is the same length as the plaintext, and computing the ciphertext quite simply as plaintext ⊕ pad.

Why does the one-time pad work? Consider the case where the ciphertext is a single bit, 0 or 1. Clearly, the plaintext is either 0 or 1 - the only plaintexts of the same length. Which was it? The answer depends on the pad (0 or 1), which we do not know, and which are both equally likely. Since both pads are equally likely, both plaintexts are equally likely. Now consider a ciphertext of length N bits - the same argument applies. Literally any plaintext of length N could have generated it. The same argument applies because all the bits are encrypted in an unrelated fashion - in a sense, we are seeing N 1-bit ciphertexts, generated from N 1-bit plaintexts, each with its own 1-bit pad.

The same arguments do not apply to other encryption algorithms, because the bits of the ciphertext are not independent in the same way. Typically, the bit at ciphertext[n] is not just affected by plaintext[n] and secret[n]; it is affected by plaintext[0..n] and the entire secret.

Another way to see why other algorithms are not as secure as the one-time pad is that, for a given ciphertext, the set of possible plaintexts is much smaller.

The big problem with the one-time pad is the onerous requirement to share a secret the same length as the message, and to never re-use it.

Block ciphers and stream ciphers attempt to do secure encryption of arbitrary-length messages with a small fixed-size secret. For example, a stream cipher might let you securely send a live video broadcast (of size unknown until completion), but with a secret of just 128 bits.

The fixed small key size explains why, for a given ciphertext, there is a much smaller set of possible plaintexts (compared to the one-time pad). If you receive a ciphertext, and you know that the key size was 8 bits, you can try all possible 2^8=256 keys to find all possible plaintexts. One of these will be the true plaintext.

Block/stream ciphers do not work on the one-time-pad principle that “all possible plaintexts of this size are equally likely”. Instead, they work on the principle that “you can’t try all the keys, because there are too many”. So keys are set to, say, 512 bits, because 2^512 is just too many to practically try.

The one-time pad is secure against any adversary, regardless of computational power. Modern crypto is only secure based on assumptions about adversaries’ computational power.

Block/stream ciphers are broken to the extent that attackers do not have to try all the possible keys.

Let’s look at block ciphers. A given block cipher has the API:

block = encrypt(secret, block)
block = decrypt(secret, block)

This looks just like our encrypt and decrypt functions above. The difference is that, for a given block cipher, the block type has a fixed length and the secret type has a fixed length. For example:

block64 = blowfish_encrypt(secret, block64)
block64 = blowfish_decrypt(secret, block64)

An example in Python:

>>> import blowfish
>>> cipher = blowfish.Cipher(b"this is my secret key")
>>> cipher.encrypt_block(b"12345678")
b'\x10N\xb1\x0c]\x98\xd7\xb3'
>>> cipher.encrypt_block(b"h5ageragreshs54ht")
>>> cipher.decrypt_block(cipher.encrypt_block(b"htrsh34s"))
b'htrsh34s'

The obvious problem with block ciphers is that real-world messages are rarely exactly 64 bits long. How can we encrypt longer strings? The obvious approach is to split the plaintext into block-sized pieces, and apply the block cipher to each of these. That way, we get:

ciphertext_blocks[0] = encrypt(secret, plaintext_blocks[0])
ciphertext_blocks[1] = encrypt(secret, plaintext_blocks[1])
...
ciphertext_blocks[n] = encrypt(secret, plaintext_blocks[n])

The decryption is then obvious:

plaintext_blocks[n] = decrypt(secret, ciphertext_blocks[n])

This scheme is known as “Electronic Codebook” (ECB). A key problem with ECB is that identical blocks of plaintext will encrypt to the same blocks of ciphertext. This reveals some structure of the plaintext. This flaw makes ECB insecure and rarely used.

There are various approaches which work around this problem. The classic is “Cipher Block Chaining”:

ciphertext_blocks[0] = encrypt(secret, xor(plaintext_blocks[0], initialization_vector))
ciphertext_blocks[1] = encrypt(secret, xor(plaintext_blocks[1], ciphertext_blocks[0]))
ciphertext_blocks[2] = encrypt(secret, xor(plaintext_blocks[1], ciphertext_blocks[1]))
...
ciphertext_blocks[n] = encrypt(secret, xor(plaintext_blocks[n], ciphertext_blocks[n-1]))

What is this initialization_vector? It’s another block (it should be called initialization block) filled with random bits.

The general approach is to thread some state between each encryption. there exist many “modes of operation” which allow

Let’s look at stream ciphers. They have the API:

ciphertext_byte, new_state = encrypt_byte(secret, plaintext_byte, prev_state)
plaintext_byte, new_state = decrypt_byte(secret, ciphertext_byte, prev_state)

Tagged #cryptography.

More by Jim

What does the dot do in JavaScript?

foo.bar, foo.bar(), or foo.bar = baz - what do they mean? A deep dive into prototypical inheritance and getters/setters. 2020-11-01

Smear phishing: a new Android vulnerability

Trick Android to display an SMS as coming from any contact. Convincing phishing vuln, but still unpatched. 2020-08-06

A probabilistic pub quiz for nerds

A “true or false” quiz where you respond with your confidence level, and the optimal strategy is to report your true belief. 2020-04-26

Time is running out to catch COVID-19

Simulation shows it’s rational to deliberately infect yourself with COVID-19 early on to get treatment, but after healthcare capacity is exceeded, it’s better to avoid infection. Includes interactive parameters and visualizations. 2020-03-14

The inception bar: a new phishing method

A new phishing technique that displays a fake URL bar in Chrome for mobile. A key innovation is the “scroll jail” that traps the user in a fake browser. 2019-04-27

The hacker hype cycle

I got started with simple web development, but because enamored with increasingly esoteric programming concepts, leading to a “trough of hipster technologies” before returning to more productive work. 2019-03-23

Project C-43: the lost origins of asymmetric crypto

Bob invents asymmetric cryptography by playing loud white noise to obscure Alice’s message, which he can cancel out but an eavesdropper cannot. This idea, published in 1944 by Walter Koenig Jr., is the forgotten origin of asymmetric crypto. 2019-02-16

How Hacker News stays interesting

Hacker News buried my post on conspiracy theories in my family due to overheated discussion, not censorship. Moderation keeps the site focused on interesting technical content. 2019-01-26

My parents are Flat-Earthers

For decades, my parents have been working up to Flat-Earther beliefs. From Egyptology to Jehovah’s Witnesses to theories that human built the Moon billions of years in the future. Surprisingly, it doesn’t affect their successful lives very much. For me, it’s a fun family pastime. 2019-01-20

The dots do matter: how to scam a Gmail user

Gmail’s “dots don’t matter” feature lets scammers create an account on, say, Netflix, with your email address but different dots. Results in convincing phishing emails. 2018-04-07

The sorry state of OpenSSL usability

OpenSSL’s inadequate documentation, confusing key formats, and deprecated interfaces make it difficult to use, despite its importance. 2017-12-02

I hate telephones

I hate telephones. Some rational reasons: lack of authentication, no spam filtering, forced synchronous communication. But also just a visceral fear. 2017-11-08

The Three Ts of Time, Thought and Typing: measuring cost on the web

Businesses often tout “free” services, but the real costs come in terms of time, thought, and typing required from users. Reducing these “Three Ts” is key to improving sign-up flows and increasing conversions. 2017-10-26

Granddad died today

Granddad died. The unspoken practice of death-by-dehydration in the NHS. The Liverpool Care Pathway. Assisted dying in the UK. The importance of planning in end-of-life care. 2017-05-19

How do I call a program in C, setting up standard pipes?

A C function to create a new process, set up its standard input/output/error pipes, and return a struct containing the process ID and pipe file descriptors. 2017-02-17

Your syntax highlighter is wrong

Syntax highlighters make value judgments about code. Most highlighters judge that comments are cruft, and try to hide them. Most diff viewers judge that code deletions are bad. 2014-05-11

Want to build a fantastic product using LLMs? I work at Granola where we're building the future IDE for knowledge work. Come and work with us! Read more or get in touch!

This page copyright James Fisher 2016. Content is not associated with my employer. Found an error? Edit this page.

What is symmetric cryptography?

Similar posts

More by Jim