This lesson presents material from chapter 3. Objectives important
to this lesson:
Background on cryptography
Algorithms and ciphers
Symmetric encryption
Asymmetric encryption
Public key infrastructure
Hashing
Common systems
Cryptanalysis
Possible future forms
Concepts:
Chapter 3
The chapter begins with a reference to the cipher
used by Julius Caesar. It will be helpful to define a cipher
as a substitution of characters. A code
is actually something else. Codes do not translate individual characters.
They often use short "words" or symbols to represent real sentences or
sequences of actual words. In the example in the text, each letter in
the original message (called cleartext or plaintext)
is changed to another character by a specific algorithm
(method), resulting in an encrypted message that displays
in ciphertext. The Caesar cipher is easy to break, since
the same cipher letter is substituted each time the same plaintext letter
occurs in the message. The cipher is even more vulnerable due to its simple
algorithm: add three to each plaintext letter, wrap to front of alphabet
as needed (for x, y, and z). It would be harder to break if a random substitution
of letters had been used.
More history appears on pages 54 and 55. It compares Egyptian hieroglyphics
to the Caesar Cipher, calling them another example of a substitution
cipher. In case you didn't know, most hieroglyphic symbols stood
for sounds in the Egyptian language,
what we might call phonemes.
Someone who could read the symbols as sounds could speak the message on
the wall, column, or stone where it was carved. More modern substitution
ciphers (like Julius Caesar's) substitute a particular character
for a letter in the alphabet of
the message, which takes less space on the page.
Page
55 shows part of a more elaborate substitution cipher, based on a Vigenère
square for English. The square shows a Caesar cipher variant for
each letter of the alphabet, as you can see in the image on the right.
The row for the letter G, for example, starts with G and ends with F.
If that was all there was to it, this would simply be a repeating series
of 26 ciphers. There is more to it.
The person encrypting the message needs to know a key word. For
instance, the key could be Caesar. That would mean we would
use the six letters in that word to control which row in the table is
used to encrypt or decrypt the next letter in the message. "Caesar"
would mean we would use row C, row A, row E, row S, row A again, and row
R on a rotating basis as we encrypt the whole message.
The method is more difficult to break if the key is not an actual word.
A long series of randomly chosen letters could be the key, making it hard
for a human to use, and also hard to communicate a new key whenever it
is needed. Computers make this one more workable.
Regarding the cipher in the first line of the table, note that there
is no offset. A stands for A and so on. This row appears in the cipher
to make the table symmetrical. It would never be used in an actual transmitted
message unless you were rotating for each character. As a single row cipher,
it would accomplish nothing, since the result would be identical to the
original message. A
mechanical device that incorporated changing ciphers was the Enigma
machine. There were many models, including commercial versions
dating back to 1923. Modifications were made for the versions used by
the German military in World War II. Major sections of the device are
indicated in the image on the right. Follow the link above for a very
good article that describes the operation and functions of various models.
One could argue that the Enigma mechanism was invented for security
alone, but it seems likely that it was also created to eliminate human
errors in what was a process done by hand before its invention.
In fact, such a device allows a rotating cipher system to be more elaborate,
faster to use, and more accurate.
The text mentions using a concealment cipher, which is a better
name for steganography.
(concealed writing?) It can be done several ways. The text only
mentions using the first letter of each word in a message, which is not
very concealed. Other methods of doing this include hiding a message in
unused parts of a file, hiding it in metadata, hiding it where the uninformed
would not look, and hiding it in images. An image typically has
three bytes (RGB) of color information
for each pixel in it. It is unlikely
that anyone just looking at an image could tell the difference between
pixels that are true to color and those that have had each of their least
significant color bits changed as needed to hide/provide data.
If you change one bit per color, you can hide one byte in every three
pixels. Where the color is the same, it means 1. Where the color is one
bit different in a byte, it means 0. Let me show you what I mean.
Imagine that the table below represents a series of pixels. I have used
cells in a table to make the idea more readable. I have put a reference
color in the first cell: hex code 58C314 stands for 111, because
I chose that color as the key. I have modified the color in each
of the other cells in the second row to indicate three bits. Remember,
those cells represent consecutive pixels in an image that are supposed
to be the same color as the first one. The bits are indicated by the pixel's
color deviation from the key color. By the way, the colors displayed
are as accurate as your browser can make them. Real image files would
be exchanged by real information passers.
58c314
(reference color)
57c313
57 is 1 less than 58, so 0.
c3 is a match, so 1.
13 is 1 less than 14, so 0.
58c213
58 is a match, so 1.
c2 is 1 less than c3, so 0.
13 is 1 less than 14, so 0.
58c313
58c314
57c313
57c214
111
010
100
110
111
010
001
58c213
58c214
57c314
58c214
58c213
58c313
57c313
58c314
100
101
011
101
100
110
010
111
The binary code for that sequence, which would have taken 15 pixels,
is:
first three ones are reference
010 100 11
0 111 010 0
01 100 101
011 101 10
0 110 010 1
last two ones are padding
This
example used one base color and seven variations. The sender could send
an image in which every pixel was modified if the receiver already
had a reference copy to the image for comparison. This would avoid the
need for a sequence of pixels that that were meant to be the same color.
I have done this operation by hand: an application that encrypts a message
in an image or audio file would be much faster. This is
not a foolproof system, but it has an advantage. Most people would not
know to look for it, and would not notice the difference between the eight
possible colors being used in this process.
In the image on the right, I have made eight rows, each of which is filled
with a single color. The colors are as described above, and they are true.
They do not suffer from your web browser approximating the intended color.
Trust me, there are eight colors in it. Or, don't trust me. Download
the image and examine it in an art program. Which is the base color, the
stripe at the top of the image, or the one at the bottom?
Or have I arranged them differently?
Let's move on to page 57, and the discussion of symmetric keys.
The methods in this group use the same key to encrypt
and to decrypt, which is why they are called symmetric.
They are also called private key algorithms because the
key must remain private to the users of the system or there is no security.
(This seems like an obvious point, but we will consider another system
where it is not true.) In an FYI box on page 58, the text mentions that
we can consider the length of a key as part of the strength of a cipher,
but we have to consider the complexity of the algorithm that will use
the key as well. A very long key is not a guarantee of security if the
algorithm that uses it is relatively easy to crack.
The text remarks that Data Encryption Standard (DES) was
well named when it came out in 1976. It was a standard symmetric encryption
algorithm for a long time in the lifespan of such things. It was cracked
in a demonstration in 1998. NIST asked for a replacement, and
the chosen algorithm was Advanced
Encryption Standard (AES). Just considering the calendar,
it may be time for a new standard soon. Several are listed on pages 59
and 60.
The text continues with a discussion of asymmetric key systems,
which typically mean that one key is used for encrypting a message, but
another must be used to decrypt that message. The most used system is
public key cryptography, in which a pair of keys is created
for each entity. One is called the private key, and is kept private
and secure. The other is called the public key, which is provided
to anyone wishing to send encrypted data to the first entity. The principle
is that messages encrypted with my public key can only be decrypted with
my private key, which works as long as no one else has a copy of my private
key. The video below summarizes the main points in the discussions of
symmetric and asymmetric systems.
The text provides more detail about how the world uses the public key
method: Public Key Infrastructure. Public Key Infrastructure is
not the only cipher system used in business or government, but it is widely
used by both, and by individuals to protect personal or sensitive information.
There is a difference between PKI and public key cryptography.
Public key cryptography is a system in which each entity has
two cryptographic keys, each of which is the only means
to decrypt what was encrypted by the other.
Public Key Infrastructure is a system of using public
key cryptography, distributing keys through trusted sources,
and revoking keys that have been compromised.
Public
key cryptography
is
howSSL
encryptionon a web site
works. I connect to a vendor's web site. I obtain the vendor's public
key by making the secure connection. My browser encrypts my credit card
data with the vendor's public key and sends the ciphertext to the vendor.
If the vendor's private key is secure, the vendor is the only one who
can decrypt the data sent through the public key.
That's the way it is supposed to work in a perfect world. However, attackers
have created a need for a security net around the process. In a way, PKI
is the success story of businesses that have grown up around this technology.
Components of public key infrastructure:
Certificate
authority - An entity, typically a company, that creates digital
certificates, which are verified statements of a public key and its
owner. They may also create the key pair for the customer, and are responsible
for storing and providing certificates as needed.
Registration
authority - An entity that receives requests for certificates,
verifies the requests are from recognized users (such as merchants processing
credit cards), and forwards the requests to certificate authorities.
Certificate server - A service, or the device that runs the
service, that responds to certificate requests.
Certificate repository - A database for storing digital certificates,
sometimes including records of revoked certificates.
Certificate revocation list - A list of certificates that are
no longer valid for various reasons.
Certificate
validation - A process used to make sure that a request submitted
for certificate creation actually came from the organization it appears
to come from, and that the key submitted in the request is theirs.
Key Recovery Service - A service that stores and recovers encryption
keys in case they should be lost, for example in a system crash or attack.
Time server - A service that provides a standard time reference,
used to mark the time of requests and responses. Timestamps may be used
to judge whether requests are being processed by the entity we expect
to process it.
Signing
server - In a system that is increasingly automated, this is
a central control over related services.
Basic Encryption and PKI
The text lists several encryption systems have been used. You should
be aware of some examples of symmetric and asymmetric systems.
The text lists three symmetric algorithms to be aware of:
DES - Data Encryption Standard
3DES - Triple Data Encryption Standard
AES - Advanced Encryption Standard
It also lists some asymmetric algorithms:
RSA - named for its creators, so there is no acronym meaning
Diffie-Hellman - also named for its creators; does not seem
to belong in this group, since it is only used to allow two users to
share a key, enabling them to use symmetric cryptography
Elliptic
Curve Cryptography - the link takes you to an Ars
Technica article that reviews all three methods, and may hurt to
read; just know that it exists, and that it predicted the end of RSA
within 5 years. The article was written in 2013. Well, that has not
happened yet, but it is still a good article.
Algorithms use a set of values or characters to create keys and
to encrypt messages with those keys. The set of values is
the keyspace. Larger keyspaces mean more possible keys from the
algorithm. This is what makes it harder to guess the actual contents of
a key. Think about that. We rely on secrecy about the algorithm and on
the complexity of the keyspace to make security of this type possible.
And unless we do something special with the algorithm, most are known,
so we only have to know the key and right algorithm to be able to decrypt
a message sent in symmetric key system. Are you worried now?
Digital Certificates
A very useful concept is a digital certificate, which contains
data about a keyholder, and is provided by a certificate authority. The
most critical feature is the public key, but the other factors are required
by the X.509 standard.
The link to Wikipedia tells us that X.509 is an international standard
for PKI. Some of the elements included in that standard are:
Version Number
Serial Number
Signature Algorithm ID
Issuer Name
Validity period
Not Before
Not After
Subject name
Subject Public Key Info
Public Key Algorithm
Subject Public Key
Issuer Unique Identifier (optional)
Subject Unique Identifier (optional)
Extensions (optional)
You should know that keys are destroyed when they are compromised
and when they reach the end of their intended life. This is more
about private keys than public keys. Note that lifetime should
be related to the sensitivity of the use the key serves. More sensitive
equals shorter life.
This concept is easily confused with certificate revocation. When certificates
are compromised (stolen and used by other entities) those certificates
are moved to a Certificate Revocation List (CRL) which may or may
not be part of the certificate repository.
What PKI is and is not
PKI can provide security, integrity, and nonrepudiation.
It is used for financial transactions and downloaded file integrity. PKI
is meant to be one layer of security.
It does not include authorization functions. It does not
prove the identity of someone who is only using the public
key in a key pair. If encrypted data is desired in both directions in
a session, both entities should have a key pair, and each will need the
other's public key.
Also, PKI does not prove that someone who has a key pair is trustworthy.
Unless you already trust a vendor, the fact that they use a public key
pair does not by itself mean you should trust them. Consider the difference
between a vendor who we already trust and respect, and a vendor we know
nothing about. The transaction from each may be trustworthy, but the vendor
in each case either deserves our trust or does not. PKI provides an outside
authority to vouch for the vendors they represent.
Page 73 presents a list of common computer protocols that use encryption.
Several have been discussed before:
SSH
SSL
TLS
IPSec
Password Authentication Protocol (PAP) should use encryption,
but it does not. Not recommended. Use CHAP instead.
PPTP, L2TP, and SSTP, three tunneling protocols for VPNs
The discussion of attacks at the end of the chapter is not useful, so
we will skip that.
Assignments
This week you need to complete Lab 2. It is due next week, which is
week 4.
Assignment 2 and Part 2 of an ongoing course project are due in week
5.