Cryptography on Windows Part 3 - Message digests and hashes
In my prior post, I introduced several abstractions — Cryptographic Service Providers, cryptographic contexts and key containers — that are part of Windows CryptoAPI and promised to look at cryptographic keys next.
Well, I changed my mind, figuring it might be better to first talk about simpler operations that do not require the use of keys at all. This post thus describes the generation of hashes, message digests and message integrity codes using Windows CryptoAPI and TWAPI.
Common uses of message digests
Message digests (or hashes, we use the terms interchangeably) are some of the most commonly encountered operations in computer security. Here we outline some of their applications.
The most common use of message digests is to detect if inadvertent or malicious modification of data. For example, file download sites will often publish a message digest corresponding to a download file on their web page. Anyone downloading the file can compute the digest over the content of the downloaded file and compare it against the one published on the web site. If the two are not the same, the downloaded file differs from the one posted on the site. This function is similar to the use of CRC checks on data. Assuming the web site itself is secure so the posted message digest cannot be tampered with, this also offers some level of additional protection against intentional tampering during the download process. A similar use case is for monitoring intrusions where the message digests of file contents are compared against a securely stored list of digests to detect unauthorized modification of files.
Another common use of message digests is for comparing two values without having access to the values themselves, an example being the use of digests to store passwords. The password entered by the user is hashed and compared to the message digest stored in the password database. Since the latter does not contain the actual password itself, having access to the file does not help the attacker to guess passwords, assuming of course that the hash function has no weaknesses revealing anything about the original hashed content.
Message digests are also used as auxiliary functions in a larger cryptographic contexts. For example, when digitally signing a message, because signing is a relatively slow operation, it is preferable to generate a message digest (a much faster operation) of the content to be signed and sign that digest which is generally much shorter than the content.
An interesting application of message digests in combination with digital signatures highlighting their versatility is a closed electronic bidding system where contractors submit bids for a contract. The danger here is that if the proposal from one contractor (such as cost) was leaked to another, the second contractor could undercut the first knowing the details about his bid. To negate this possibility, instead of submitting the actual bid, each contractor submits a signed message digest of the bid. After the close of the bidding period, all contractors submit the actual bids themselves. The contracting party can compute the message digests on these bids against the original submissions to verify they have not been changed. A little thought will make clear this makes the bidding process both fair and safe from any leaks during the bidding period.
Having looked at some use cases, it is time to look at how digests are generated using CryptoAPI. Cryptographic functions operate mostly on binary data so for convenience let us first define a proc to dump binary data as hexadecimal strings.
proc hex bin {
binary encode hex $bin
}
Generating a message digest using wrappers
For simple use cases, TWAPI defines wrapper commands that generate message digests for common hashing algorithms. For example, we can generate a message digest over the string abcd
using the SHA256 algorithm as
% hex [sha256 abcd]
88d4266fd4e6338d13b845fcf289579d209c897823b9217da3e161936f031589
There are similar commands in TWAPI for other message digest algorithms like md5
, sha512
etc.
Note: Because recently discovered weaknesses, you should not use SHA1 and MD5 message digests except in cases where backward compatibility with existing applications is desired.
There is however a problem with generalizing the above code as written. Conceptually digests are computed on a sequence of bytes, and not on character strings. If the message is pure ASCII, as was the case in the example above, the distinction does not matter. In all other cases, the character string must be converted to a binary string either explicitly with the encoding
command or by passing the encoding to the hashing command.
As an illustration, first the incorrect way to generate the message digest for abcdé
:
% set message "abcd\u00e9" abcdé % hex [sha256 $message] 0f402080cac48eb7c50a84c4bbac3f0fd40749bb1964afa1796c5c99223332ceThe **correct** method would be either of the following: ``` % hex [sha256 $message utf-8] bef7c2493f307962cfaccb84748251ced81bb61b87924b97304ee7d911eedf3c % hex [sha256 [encoding convertto utf-8 $message]] bef7c2493f307962cfaccb84748251ced81bb61b87924b97304ee7d911eedf3c ``` ## Generating a message digest using a hash context The wrapper commands, like `sha256` we illustrated above, are convenient to use as they encapsulate the multiple primitive commands required to generate a message digest. However, there are a couple of reasons why you may prefer to use the underlying primitives described in this section instead. * First, the wrapper commands require you to present the data to be hashed as a single chunk passed to the command. This may not be suitable in a streaming environment where it is infeasible or inefficient in memory usage to collect the entire message as one chunk. An example would be generating the message digest for a very large file. * Second, when generating digests for multiple messages, using the primitive commands may be a little more efficient. Recall from our previous post that practically all CryptoAPI operations require a cryptographic context. The wrapper commands above allocate and free a cryptographic context on every call. This cost can be saved by allocating and reusing a single cryptographic context for all the messages. At a high level, the sequence of steps for generating message digests using the primitive commands is as follows: 1. Allocate a cryptographic context via `crypt_acquire` using an appropriate CSP that supports the desired hashing algorithm. 1. Allocate a _hash context_ with `capi_hash_create` to hold state for the computation of a message digest for a single message. 1. Incrementally feed the message content into the _hash context_ in one or more calls to `capi_hash_bytes` or `capi_hash_string`. 1. Upon reaching the end of the message content, retrieve the message digest with `capi_hash_value`. 1. Free the hash context `capi_hash_free`. 1. Repeat steps 2-5 for each message. 1. Free the cryptographic context with `crypt_free`. The equivalent of our example from the previous section would involve the following sequence of commands. First, a cryptographic context is required so we allocate one. For SHA256, the default CSP cannot be used as it does not implement that algorithm. Any CSP of type `prov_rsa_aes` is required to support SHA256 so we just specify that. ``` % set hcrypt [crypt_acquire -csptype prov_rsa_aes] 1486609810096 HCRYPTPROV ``` Then we allocate a hash context within the cryptographic context. Here we need to specify the specific algorithm to compute the message digest. We will say a little more about algorithm identifiers in a future post; for now, it suffices to specify `sha_256` to indicate that the SHA256 algorithm is desired. ``` % set hhash [capi_hash_create $hcrypt sha_256] 1486609827408 HCRYPTHASH ``` We now hash the message content. We will break up the long message from our previous section into two chunks to illustrate incremental generation of the message digest. Either `capi_hash_bytes` or `capi_hash_string` can be used to pass data to the hash context. The former expects a binary string so we explicitly encode the data with the `encoding` command. The latter accepts a character string and internally converts it to a binary string using the specified character encoding. ``` % capi_hash_bytes $hhash [encoding convertto utf-8 abc] % capi_hash_string $hhash d\u00e9 utf-8 ``` Note that `capi_hash_string` assumes `utf-8` if the encoding argument is not specified so we could have left it out above. Now that we have reached the end of the message content, we can retrieve and print the message digest. ``` % hex [capi_hash_value $hhash] bef7c2493f307962cfaccb84748251ced81bb61b87924b97304ee7d911eedf3c ``` The hash context cannot be reused and must be freed. ``` % capi_hash_free $hhash ``` Any further messages would need new hash contexts to be allocated but not cryptographic contexts. As always, once done with all messages the cryptographic context also needs to be freed. ``` % crypt_free $hcrypt ``` ## Message authentication codes using HMAC Message digests do not protect against wilful tampering by a malicious attacker. Unless some additional protections are in place, the attacker can modify the original message and correspondingly change the associated message digest so they appear to match. Protecting against this requires some additional mechanism; for example, the use of message digests for ensuring integrity of downloaded files uses an out-of-band mechanism of listing message digests on a (presumably) secure page inaccessible to the attacker. A message authentication code (MAC) provides an alternative in-band solution to this problem through the use of a shared secret known only to the message sender and receiver. Instead of computing the message digest solely based on the message data, it is computed over some (carefully chosen) combination of the secret key and the message data. To verify the message integrity, the receiver again computes the hash over the combination of the secret key and message data and checks it against the received hash. In our above example, an attacker's attempt to replace the file and its hash without detection would fail because the message digest cannot be faked without access to the secret key as it is included in computation of the digest. There are several algorithms for generating a MAC. Since our topic is hashes and message digests, here we look at a specific one based on hashing called _Hash-based Message Authentication Code_, or _HMAC_. This is actually a family of algorithms since the hashing algorithm used is a parameter to HMAC. Thus, HMAC-SHA1 would generate a MAC based on the SHA1 hashing algorithm, HMAC-SHA2 would use SHA256 etc. In TWAPI, the `hmac` command generates a MAC based on this algorithm. It takes as parameters the shared secret key and the hashing algorithm to be used and returns the computed HMAC. Again, this is a binary string so we dump it using our `hex` procedure. ``` % hex [hmac "my valuable data" [conceal oursecret] sha_256] d4a6a8ba7012e47ea1eae59199f460068e2fb2ad9066e39f880ea7f143128f8e ``` The generated HMAC is passed to the receiver along with the message itself. The receiver then uses the same exact code to calculate the HMAC over the **received** message. If the computed HMAC matches the received one, the receiver is assured there has been no tampering of the message. An attacker altering the message would not be able to generate a valid HMAC as the shared secret is unknown to him. A couple of comments are in order regarding the above snippet. Shared secrets composed of a sequence of ASCII characters as in the example above are considered to be low quality in terms of cryptographic security. In a future post, we will see how to generate stronger shared secrets based on an input ASCII string. Secondly, you might be wondering about the purpose of the `conceal` command above. The short answer is that to avoid inadvertent exposure of keying information, many TWAPI commands, including `hmac` expect keys to be passed in an encrypted form. The `conceal` command returns this encrypted form of its argument so that it is acceptable to `hmac`. Normally you would not need to use it as the key would already come in protected form as commands for reading passwords and credentials, like `read_credentials` also return data in this encrypted form. Here we need to use it since we are just typing in the plaintext key on the command line. Again, the `conceal` command and related topics about in-memory protection will be the subject of a future post. Before closing this topic, a slightly subtle point to be noted that because of their use of shared secrets, HMACs cannot be used for the file distribution scenario we described above. All downloaders would need to know the shared secret which of course means it is not a secret at all! However, digital signature schemes based on asymmetric keying systems which we will describe (many, many) posts down the road do work for this scenario. ## Coming up next This post focused on one cryptographic mechanism — message digests and integrity codes — which protect against tampering of data but do not offer confidentiality. We will begin looking at the latter in our next post, starting with symmetric algorithms that provide such protection.