Cryptography with Prolog


Prolog has several features that make it extremely well-suited for cryptographic applications. For example, built-in integer arithmetic that works for arbitrarily large numbers makes it easy to reason about large prime numbers and various operations on them that frequently arise in the context of cryptography. As another example, Prolog's built-in search mechanism lets you easily experiment with brute-force attacks such as exhaustive search for keys, which has great didactic value. The Prolog toplevel lets us interactively try various predicates and their parameters. And so on.

Support for cryptographic algorithms varies between Prolog systems. In the following, we are using SWI-Prolog due to its extensive OpenSSL bindings and library(crypto). In addition, SWI-Prolog is especially useful for building web applications, and in such applications, sound cryptographic principles are often particularly important.

In this text, we consider three aspects of cryptography that are extremely relevant in practice: We shall focus on the practical application of these methods, with sample code that you can try out and use. The methods I use in this document are all deemed at least reasonably secure in January 2019. Beware though: Make sure to follow best current practices when you use these methods in your own applications. This includes studying additional reference material and at least an introductory textbook on this subject. For example, consider reading Introduction to Modern Cryptography by Jonathan Katz and Yehuda Lindell, and the references included therein.

To try the Prolog examples of this document, you need SWI-Prolog 7.5.15 or greater, and OpenSSL 1.1.0 or greater. You can find out which OpenSSL version you are using with the following Prolog query:
?- use_module(library(ssl)), current_prolog_flag(ssl_library_version, V).
V = 'OpenSSL 1.1.0f  25 May 2017'.
The convention I use in this document is that information that ought to be kept secret is written in red.

Data representation: codes, bytes, octets, hex etc.

Before we start with the actual cryptographic functionality, we consider a few different ways to represent data in Prolog, and especially binary data.

In Prolog, all data is represented by Prolog terms. In SWI-Prolog, the cryptographic predicates let you encrypt, decrypt, sign and hash the following types of terms: As long as you use atoms or strings throughout in your applications, you need not worry much about how these terms are internally represented in Prolog. However, if you are using lists of integers in this context, you should consider what you want to denote. In Prolog, such lists often denote lists of codes, such as Unicode code points. For example:
?- atom_codes(κρυπτός, Cs).
Cs = [954, 961, 965, 960, 964, 972, 962].
However, cryptographic methods typically work on bytes, and code points that are greater than 255 do not fit into a single byte. Therefore, if all you have is such a list of integers, you must make sure to use the intended interpretation. By default, they will be interpreted as a list of codes, and are internally converted to a suitable encoding (UTF-8) such that they become amenable to processing with the cryptographic routines. If you want the integers to be interpreted as bytes, you can specify this via the option encoding(octet). Of course, this only works if all occurring integers are in the range 0..255.

There are conversion predicates that let you transform every such term into any other type. Of particular interest in this context is the DCG nonterminal utf8_codes//1:
?- atom_codes(κρυπτός, Cs),
   phrase(utf8_codes(Cs), UTF8).
Cs = [954, 961, 965, 960, 964, 972, 962],
UTF8 = [206, 186, 207, 129, 207, 133, 207, 128, 207|...].
Thus, if you need total control over the representation, you can always convert your data to bytes, and use the encoding(octet) option of several cryptographic predicates.

In cryptographic applications, short lists of bytes also often need to be communicated to other programs and participants, and in these use cases it is common to represent such lists in hexadecimal notation. This means that each byte is represented by exactly two hexadecimal digits (between 0 and F) that are then stringed together. You can use hex_bytes/2 to easily convert between these encodings:
?- hex_bytes('501ACE', Bs).
Bs = [80, 26, 206].

?- hex_bytes(Hex, [80,26,206]).
Hex = '501ace'.
To create a list of cryptographically strong pseudo-random bytes, you can use crypto_n_random_bytes/2. For example, here is a 256-bit (i.e., 32 bytes) sequence that in all likelihood nobody else has ever generated before me:
?- crypto_n_random_bytes(32, Bs),
   hex_bytes(Key, Bs).
Bs = [169, 150, 152, 246, 5, 80, 194, 181, 117|...],
Key = a99698f60550c2b575cb6ffc01c065b195b32a468ff046621f4db91308c1bfd3.
Thus, we can easily generate strong keys and unique tokens when needed.

To more compactly store and embed binary data in your applications, also consider Base64 encoding. In SWI-Prolog, you can use base64_encoded/3 for this conversion. For example:
?- base64_encoded(κρυπτός, Base64, []).
Base64 = "vcTFw8fMwg==".

Cryptographic hash functions

A cryptographic hash function lets us efficiently map data of arbitrary size to a bit string of fixed size in such a way that the mapping is infeasible to invert and collisions are very unlikely.

Hash functions are needed in almost all applications of modern cryptography. In library(crypto), among the most important predicates for computing cryptographic hashes are crypto_data_hash/3 and crypto_file_hash/3.

Here is an example (click on the hash to expand it):
?- crypto_data_hash('Hello world!', Hash, [algorithm(blake2s256)]).
Hash = c63813a8f804abece06213a46acd04a2d738c8e7a58fbf94bfe066a9c7f89197.
We can use hex_bytes/2 to convert such hex-encoded values to lists of bytes, by which we mean lists of integers between 0 and 255. For example:
?- hex_bytes(c63813a8f804abece06213a46acd04a2d738c8e7a58fbf94bfe066a9c7f89197, Bytes).
Bytes = [198, 56, 19, 168, 248, 4, 171, 236, 224|...].
Security of default algorithms is an important design principle of library(crypto). For this reason, the only guarantee that the hash predicates give is that the default algorithm is cryptographically secure. The default may change in the future. To find out which algorithm was actually used, you can specify the algorithm/1 option, and use a logical variable as argument. For example:
?- crypto_data_hash('Hello world!', Hash, [algorithm(A)]).
Hash = c0535e4be2b79ffd93291305436bf889314e4a3faec05ecffcbb7df31ad9e51a,
A = sha256.
This shows that sha256 is currently the default algorithm. You can use this approach to ensure interoperability and at the same time benefit from more secure defaults that may be used in the future.

A hash can be used to assess the integrity of data: By computing the hash and comparing it against a reference value, you can detect corruption and manipulation of your data.

This raises the question: How can we be certain that such a reference value is actually authentic, i.e., truly stemming from the purported originator? One way to ensure the authenticity of data is to use digital signatures.

Storing passwords safely

Hashes can also be used to securely store user passwords for the purpose of authentication. In fact, to store a password securely, the main idea is to avoid storing the password altogether. Instead, we store only the hash of the password. Later, when the user enters a password, we compute its hash, and compare that hash against the stored value.

Two additional features make this process vastly more secure:
  1. First, we make computing the hash as slow as we can. This counteracts brute-force attacks where an attacker tries many different passwords to find one whose hash matches the stored hash. One way to do this is to compute the N-fold application of the hash function. For example, N = 217 makes brute-force attacks more than 100 000 times slower than applying the hash only once.
  2. Second, we generate a so-called salt, which is a list of random bytes that we combine with the password before it is hashed. We thus ensure (with extremely high probability) that even identical passwords yield different hashes. We store the salt together with the computed hash, so that we can use it for later reference.
All of this happens completely automatically with crypto_password_hash/2.

For example:
?- crypto_password_hash(test, Hash).
Hash = '$pbkdf2-sha512$t=131072$Xj6ZIfB4U+QOeZr3ymE/AA$2KYXsPFI2zJVMb9PHVtN+pVwQ6f7LleXF8ehbyqgOmkINcIYjO8IFhz8LelwMjzidEtojRHmC0B5RQJDEB2/tw'.

?- crypto_password_hash(test, Hash).
Hash = '$pbkdf2-sha512$t=131072$+aXCnE1r3gAFjpQ3qHcsVw$JAiD2sLbBZPQD1/FtBJUE+iWXRF0VvC/p8etsP6JGSo2dz5U+lV3a6tFDo84mluW1BufFoGZkAuaMW+K74DIaQ'.
In this case, even though the password was the same in both queries, the hashes are different.

The resulting hash encapsulates everything that is necessary to later verify a password and thus authenticate the user:
?- crypto_password_hash(test, '$pbkdf2-sha512$t=131072$+aXCnE1r3gAFjpQ3qHcsVw$JAiD2sLbBZPQD1/FtBJUE+iWXRF0VvC/p8etsP6JGSo2dz5U+lV3a6tFDo84mluW1BufFoGZkAuaMW+K74DIaQ').

?- crypto_password_hash(password, '$pbkdf2-sha512$t=131072$+aXCnE1r3gAFjpQ3qHcsVw$JAiD2sLbBZPQD1/FtBJUE+iWXRF0VvC/p8etsP6JGSo2dz5U+lV3a6tFDo84mluW1BufFoGZkAuaMW+K74DIaQ').
Experience shows that most users choose very predictable passwords, and also reuse the same passwords for different applications. Using crypto_password_hash/2 in your applications makes it hard for attackers to find out which passwords were used, even if they manage to obtain all hashes you store.

If necessary, you can use crypto_password_hash/3 to specify the applied algorithm, a custom salt, and the number of iterations.

Digital signatures

To establish the authenticity of data, there are different signature algorithms with different strengths and weaknesses. Of the widely used public key algorithms, library(crypto) currently supports ECDSA and RSA.

Private and public keys can for example be loaded from PEM and DER encoding via load_private_key/3 and load_public_key/2, respectively.

As a concrete example, let us consider the signatures in the certificate chain of the domain

Using SWI-Prolog, it is easy to obtain the certificate chain of this domain, using http_open/3 with the HTTPS scheme, followed by ssl_peer_certificate_chain/2 on the resulting SSL stream:
?- http_open('', Stream, []),
   ssl_peer_certificate_chain(Stream, Cs).
This yields in Cs a list of certificates, where each certificate is represented as a list of Name(Value) terms, one for each accessible attribute of the certificate.

Note that the SSL bindings of SWI-Prolog automatically verify the signature of each involved certificate. In the following, we want to do this ourselves to see how it is done internally. We say that we reify the verification, because we make explicit what is otherwise implicit.

Let us inspect the first certificate in this list, using portray_clause/1 to print each attribute:
?- http_open('', Stream, []),
   ssl_peer_certificate_chain(Stream, [First|_]),
   maplist(portray_clause, First).
This outputs the following Prolog terms:
issuer_name(['C'='US', 'O'='thawte, Inc.', 'OU'='Domain Validated SSL', 'CN'='thawte DV SSL SHA256 CA']).
key(public_key(rsa("DC7E4EE613582D9D6293BC792B8569C15F939AC2CCC0817D48779948AA54A001B7ECF5266DD2B05A6C266595BF0F55EBE11F64081B2F58B9F65F4B3F11BCE1FF00E8E3FFEDE7B5165CEA0694925F67E757A740BD9DED6028D1080420C0DFE48E672C0759D36FCF5141647851F7FCF9081EEDA1BEB4834F21D973AF03D30DD58EDC3C1D6FA04C812B608C97F677B45A011F8DDA92E073C83DB86A18CBF7C048AE03F8145A94333DAEA1E6B87B963EED0C78C7DE5068F8A9B80D1A4DDA62DB80BBBD81041BA1CEE491D5C630E73F1F0F0CDDCD4E198B15092C3904DD71BB348F252392EBC66858767B4F9E4EE64C792C4D68D8BE48233FDDEECD78AEDEC457C345", "010001", -, -, -, -, -, -))).
To verify the signature of this certificate, we need the following information: The output from the above query includes the to-be-signed portion, and also the algorithm that was used for signing: It is RSA-SHA256. Thus, we first use crypto_data_hash/3 to compute the SHA256 hash of the to-be-signed portion. We first convert the hex value to bytes, and then use the encoding(octet) option to compute the hash of the octet sequence, since this is how signatures of SSL certificates work:
?- hex_bytes('3082047DA00302010202102FD9ADA774690FCFD2D69B0AF2096AE2300D06092A864886F70D01010B05003065310B300906035504061302555331153013060355040A130C7468617774652C20496E632E311D301B060355040B1314446F6D61696E2056616C6964617465642053534C3120301E060355040313177468617774652044562053534C20534841323536204341301E170D3137303431333030303030305A170D3138303631323233353935395A301B3119301706035504030C107777772E6D6574616C6576656C2E617430820122300D06092A864886F70D01010105000382010F003082010A0282010100DC7E4EE613582D9D6293BC792B8569C15F939AC2CCC0817D48779948AA54A001B7ECF5266DD2B05A6C266595BF0F55EBE11F64081B2F58B9F65F4B3F11BCE1FF00E8E3FFEDE7B5165CEA0694925F67E757A740BD9DED6028D1080420C0DFE48E672C0759D36FCF5141647851F7FCF9081EEDA1BEB4834F21D973AF03D30DD58EDC3C1D6FA04C812B608C97F677B45A011F8DDA92E073C83DB86A18CBF7C048AE03F8145A94333DAEA1E6B87B963EED0C78C7DE5068F8A9B80D1A4DDA62DB80BBBD81041BA1CEE491D5C630E73F1F0F0CDDCD4E198B15092C3904DD71BB348F252392EBC66858767B4F9E4EE64C792C4D68D8BE48233FDDEECD78AEDEC457C3450203010001A38202893082028530290603551D110422302082107777772E6D6574616C6576656C2E6174820C6D6574616C6576656C2E617430090603551D1304023000302B0603551D1F042430223020A01EA01C861A687474703A2F2F746D2E73796D63622E636F6D2F746D2E63726C306E0603551D20046730653063060667810C0102013059302606082B06010505070201161A68747470733A2F2F7777772E7468617774652E636F6D2F637073302F06082B0601050507020230230C2168747470733A2F2F7777772E7468617774652E636F6D2F7265706F7369746F7279301F0603551D230418301680147D29312FC11E6EAE31056AB3EB1CCDA9DDAE809A300E0603551D0F0101FF0404030205A0301D0603551D250416301406082B0601050507030106082B06010505070302305706082B06010505070101044B3049301F06082B060105050730018613687474703A2F2F746D2E73796D63642E636F6D302606082B06010505073002861A687474703A2F2F746D2E73796D63622E636F6D2F746D2E63727430820105060A2B06010401D6790204020481F60481F300F1007600DDEB1D2B7A0D4FA6208B81AD8168707E2E8E9D01D55C888D3D11C4CDB6ECBECC0000015B679E9C9A0000040300473045022043D95C8201CF802751D104F1A39AD69C648A1D9F6D66C00FB7EE10B598762D46022100A73811D11680E14DDA9048C2BDB61BBD17A443F2EA67940C6594C3BE49FE733A007700A4B90990B418581487BB13A2CC67700A3C359804F91BDFB8E377CD0EC80DDC100000015B679E9DA40000040300483046022100E40047A5966BED0A726326D0E4CA5D1C1E4F8C0AFC67CBA074582C3AC779B0FF022100DED9A9A854867EEB37C464726AE3E6C57C15EB40019FF639C25413495C8D5CC6', Bs),
   crypto_data_hash(Bs, Hash, [encoding(octet)]).
Bs = [48, 130, 4, 125, 160, 3, 2, 1, 2|...],
Hash = b21649782577d4328540e77232ad494415b6fe67c4524b10e5b74146405e79ba.
We now fetch the public key of the entity that signed this certificate. In the list of certificates, each certificate ought to be signed by its successor, when considering the elements from left to right. Therefore, in order to verify the signature of the first certificate in the list, we must fetch the public key of the second certificate in the list:
?- http_open('', Stream, []),
   ssl_peer_certificate_chain(Stream, [_,Second|_]),
   member(key(Key), Second).
Stream = <stream>(0x1d7a370,0x1d75570),
Second = [...],
Key = public_key(rsa("B3AC0D7FADBB134D945F67426AD08971A9ED74049324C84D56A1F0919684D9846ACF5221E31AB1544CE6C69E9E4B38A996541DF5B3ED9204D06E54906E2FE97D98B48A2D12A3B442471D7F5F40E1FC7F91A601DC55A450782A633F847E2CC82B21B6C60E5EBCB8B1D41B98B3C6F8E1E828ED32441BCB7FF7E4B111EBC608B05BEEA8C2EC46AA8F29DFB9B7A403A0357A583F8B2947C1D222FA2CC6C76CCDD3F758329394D16FA92A9C0F0A2892AB140AB6DFED407A640754CEEA759732B996A075C977310274AF54774F99A2814B7959B8923FF907EA4274572E35EC558AFC613C3E5771923BABE4C1E1172C64360084B57C1A7DB041337C23F64E775A2CC14B", "010001", -, -, -, -, -, -)) .
To verify an RSA signature, we have at least two options:
  1. use integer arithmetic
  2. use rsa_verify/4 from library(crypto)
We consider both approaches in the following.

Verification via integer arithmetic

Let us consider option (1) first. To verify a signature Sig, we compute R = SigExp mod M, where:
  1. Exp is the exponent of the public key, i.e., 10001 (in hexadecimal notation, which is 65537 in decimal notation) in this case.
  2. M is the modulus of the public key, i.e., B3AC0D7FADBB134D945F67426AD08971A9ED74049324C84D56A1F0919684D9846ACF5221E31AB1544CE6C69E9E4B38A996541DF5B3ED9204D06E54906E2FE97D98B48A2D12A3B442471D7F5F40E1FC7F91A601DC55A450782A633F847E2CC82B21B6C60E5EBCB8B1D41B98B3C6F8E1E828ED32441BCB7FF7E4B111EBC608B05BEEA8C2EC46AA8F29DFB9B7A403A0357A583F8B2947C1D222FA2CC6C76CCDD3F758329394D16FA92A9C0F0A2892AB140AB6DFED407A640754CEEA759732B996A075C977310274AF54774F99A2814B7959B8923FF907EA4274572E35EC558AFC613C3E5771923BABE4C1E1172C64360084B57C1A7DB041337C23F64E775A2CC14B in this case.
In Prolog, we can use numbers in hexadecimal notation simply by prepending "0x". For example, the number 252 can be written in hexadecimal notation as 0xfc. Conversely, we can output any number in hexadecimal notation by using the the format string "~16r". For example:
?- format("~16r", [252]).
Therefore, we can compute R and output it in hexadecimal notation as follows, using (#=)/2:
?- R #= 0x35A5467AAFE158229AB363A88740881B853F3D1A3B6A6747998C9DC0F0CB3BD6062F266D19D88942D164EF6FF109699ABB9F8156340C8DA979AFE31B6D994DCB3B84EFBE8E5CC998D3D80B84D70917EEBF49CC7BF931A6930998C243B57F5E4C26FBF4EDC1F55F321D28116ED9ED4E41A510C0A4C663AB1B3DF324128C375476BC3A950E8E58A03BDE619566E943D04E40852253A38CC79A76A3AEFE85E7F45B06EC46D35747894BAB386BAD74761342C3FB5495DF1CE88BF61411B451A1E1FFCCCA057D3BC3CEDC69E29C0378F8AFDB010A8D89158C2778AC32D43A2BA7CA0911DCC50ABEBE9ED78779BC98BE3A3A03A73479E8D4A00B038E3633F2C38FA518^0x10001 mod 0xB3AC0D7FADBB134D945F67426AD08971A9ED74049324C84D56A1F0919684D9846ACF5221E31AB1544CE6C69E9E4B38A996541DF5B3ED9204D06E54906E2FE97D98B48A2D12A3B442471D7F5F40E1FC7F91A601DC55A450782A633F847E2CC82B21B6C60E5EBCB8B1D41B98B3C6F8E1E828ED32441BCB7FF7E4B111EBC608B05BEEA8C2EC46AA8F29DFB9B7A403A0357A583F8B2947C1D222FA2CC6C76CCDD3F758329394D16FA92A9C0F0A2892AB140AB6DFED407A640754CEEA759732B996A075C977310274AF54774F99A2814B7959B8923FF907EA4274572E35EC558AFC613C3E5771923BABE4C1E1172C64360084B57C1A7DB041337C23F64E775A2CC14B,
   format("~16r", [R]).
We see that the hash value which we computed is a suffix of the result of this arithmetic operation. This means that the signature is valid.

Verification with rsa_verify/4

Using rsa_verify/4, all of the above is readily encapsulated in a single predicate call that succeeds iff the signature is valid.

In particular, we need to specify: In total, we have:
?- Key = public_key(rsa("B3AC0D7FADBB134D945F67426AD08971A9ED74049324C84D56A1F0919684D9846ACF5221E31AB1544CE6C69E9E4B38A996541DF5B3ED9204D06E54906E2FE97D98B48A2D12A3B442471D7F5F40E1FC7F91A601DC55A450782A633F847E2CC82B21B6C60E5EBCB8B1D41B98B3C6F8E1E828ED32441BCB7FF7E4B111EBC608B05BEEA8C2EC46AA8F29DFB9B7A403A0357A583F8B2947C1D222FA2CC6C76CCDD3F758329394D16FA92A9C0F0A2892AB140AB6DFED407A640754CEEA759732B996A075C977310274AF54774F99A2814B7959B8923FF907EA4274572E35EC558AFC613C3E5771923BABE4C1E1172C64360084B57C1A7DB041337C23F64E775A2CC14B", "010001", -, -, -, -, -, -)),
   Hash = 'b21649782577d4328540e77232ad494415b6fe67c4524b10e5b74146405e79ba',
   Signature = "35A5467AAFE158229AB363A88740881B853F3D1A3B6A6747998C9DC0F0CB3BD6062F266D19D88942D164EF6FF109699ABB9F8156340C8DA979AFE31B6D994DCB3B84EFBE8E5CC998D3D80B84D70917EEBF49CC7BF931A6930998C243B57F5E4C26FBF4EDC1F55F321D28116ED9ED4E41A510C0A4C663AB1B3DF324128C375476BC3A950E8E58A03BDE619566E943D04E40852253A38CC79A76A3AEFE85E7F45B06EC46D35747894BAB386BAD74761342C3FB5495DF1CE88BF61411B451A1E1FFCCCA057D3BC3CEDC69E29C0378F8AFDB010A8D89158C2778AC32D43A2BA7CA0911DCC50ABEBE9ED78779BC98BE3A3A03A73479E8D4A00B038E3633F2C38FA518",
   rsa_verify(Key, Hash, Signature, [type(sha256)]).
Since this succeeds, the signature is valid.

Symmetric encryption

We now come to the topic you are probably most interested in: symmetric encryption. This means that the same key is used for encrypting and decrypting the data. For example, an important use case is encrypting files with a password, so that the same password can be used to decrypt them.

SWI-Prolog makes it extremely easy to encrypt arbitrary data in a secure way, using the predicate crypto_data_encrypt/6.

In addition to the data you want to encrypt, you must provide:
  1. the algorithm you want to use
  2. the key that is used for encryption
  3. the initialization vector (IV).
We now consider each of these parameters.

For a start, let us use a widely known and very secure encryption algorithm: AES, which is a subset of the Rijndael cipher developed by Vincent Rijmen and Joan Daemen. AES is the only publicly accessible cipher that the NSA has approved for top secret information. AES is a block cipher and can be used in several modes that guarantee different properties. As our first example, we shall use AES with a 128-bit key and 128-bit IV in Cipher Block Chaining (CBC) mode. We obtain this algorithm by specifying the atom 'aes-128-cbc'.

Ideally, key and IV are randomly chosen, with each key and IV being equally likely. Reusing the same combination of key and IV typically leaks at least some information about the plaintext. For example, identical plaintexts will then correspond to identical ciphertexts. For some algorithms, reusing an IV with the same key has disastrous results and can cause the loss of all properties that are otherwise guaranteed. Especially in such cases, an IV is also called a nonce. This is short for "number used once" and emphasizes that for a fixed key, an IV must be used at most once.

We can specify keys and IVs as lists of bytes. In this case, since the algorithm we want to try needs 128-bit keys and IVs, we specify 16 bytes. We can use crypto_n_random_bytes/2 to generate cryptographically strong pseudo-random bytes. With extremely high likelihood, each invocation will generate a completely different key and nonce.

With this knowledge, we can already try an encryption! For example:
?- crypto_n_random_bytes(16, Ks),
   crypto_n_random_bytes(16, IV),
   crypto_data_encrypt(test, 'aes-128-cbc', Ks, IV, CipherText, []).
The result depends on which random numbers are actually generated when you run it. For example, I get:
Ks = [31, 240, 156, 156, 161, 198, 9, 230, 109|...],
IV = [165, 227, 153, 21, 246, 53, 136, 179, 201|...],
CipherText = "uç'þß\026\nåÉ®#º(î\nâ".
The key must be kept absolutely secret to retain the confidentiality of the encrypted text. In contrast, the IV can be safely stored and transmitted in plain text. By default, padding is used so that input of any length can be processed even though this algorithm always operates on blocks that contain exactly 16 bytes. For this reason, the ciphertext is longer than the plaintext in this case.

Before we continue, a few test runs are highly appropriate. For example, let us see whether we can decrypt the ciphertext if we supply the same algorithm, key and IV to crypto_data_decrypt/6. In addition, let us try this not only once, but over and over, with a freshly generated key and IV in each run:
?- repeat,
      PlainText = "test",
      Algorithm = 'aes-128-cbc',
      crypto_n_random_bytes(16, Ks),
      crypto_n_random_bytes(16, IV),
      crypto_data_encrypt(PlainText, Algorithm, Ks, IV, CipherText, []),
      crypto_data_decrypt(CipherText, Algorithm, Ks, IV, PlainText, []),
As result, we get:
After a few hundred thousand iterations of this, we can be reasonably confident that what is encrypted can also be decrypted. When experimenting with different algorithms, it is a common error to specify keys or IVs that are shorter than what the chosen algorithm requires, and such test cases help us to detect these mistakes.

Authenticated encryption

As long as the key and IV are chosen (sufficiently) randomly, AES in CBC mode ensures confidentiality of the plain text. However, the cipher falls critically short in other respects: It does not guarantee integrity and therefore also not authenticity of the ciphertext. This means that an attacker can change the ciphertext during transmission or on a storage device, and we will not notice the change if we only rely on this cipher. This can cause highly dramatic consequences. For example, suppose the plaintext is "Send you a kiss!", and malicious modifications of the ciphertext yield a deviating decrypted text that reads "Send him a kiss!".

There are several ways to solve this. We start—and end—with the best way to do it: Use a cipher that, in addition to confidentiality of the plaintext, also ensures integrity and authenticity of the ciphertext.

To illustrate the idea, we now use a powerful and efficient algorithm denoted by the atom 'chacha20-poly1305': It is the ChaCha20 stream cipher that was introduced by Daniel J. Bernstein, coupled with the Poly1305 authenticator that was also introduced by Daniel J. Bernstein.

This cipher uses a 256-bit key and a 96-bit nonce, i.e., 32 and 12 bytes, respectively. Authenticated ciphers work by computing a tag that is obtained upon encryption, and must be supplied for decryption. The tag is obtained and supplied via the tag/1 option in both cases.

For example, here is a concrete encryption with ChaCha20-Poly1305, using a random key and nonce:
?- crypto_n_random_bytes(32, Ks),
   crypto_n_random_bytes(12, IV),
   crypto_data_encrypt(test, 'chacha20-poly1305', Ks, IV, CipherText, [tag(Ts)]).
In response, we get the encrypted text, and a 128-bit tag which is specified as a list of 16 bytes:
Ks = [84, 148, 85, 236, 235, 183, 51, 68, 144|...],
IV = [182, 70, 102, 111, 6, 170, 45, 76, 148|...],
CipherText = "Pç,õ",
Ts = [119, 23, 173, 207, 167, 255, 29, 135, 101|...].
Again, the key must be kept completely secret. In contrast, the tag and nonce (IV) can be safely stored and shared in plain text.

Decryption only works if the correct tag is supplied. For example:
?- crypto_data_decrypt($CipherText, 'chacha20-poly1305', $Ks, $IV, PlainText, [tag($Ts)]).
PlainText = "test",
CipherText = "Pç,õ",
Ks = [84, 148, 85, 236, 235, 183, 51, 68, 144|...],
IV = [182, 70, 102, 111, 6, 170, 45, 76, 148|...],
Ts = [119, 23, 173, 207, 167, 255, 29, 135, 101|...].
In contrast, even if we only slightly shorten, extend or modify the ciphertext, nonce, or the required tag in any way, we get for example:
?- crypto_data_decrypt($CipherText, 'chacha20-poly1305', $Ks, $IV, PlainText, [tag([1|$Ts])]).
ERROR: SSL(00000000) func(0): reason(0)
Thus, an attacker who changes the ciphertext must also make a fitting change in the tag for the change to go unnoticed. However, without knowing the secret key, such a change is extremely improbable.

Prolog is well-suited for studying how such algorithms work by prototyping their implementations. For example, here is a Prolog implementation of the Poly1305 authenticator:

A second example of an authenticated cipher is 'aes-128-gcm', which denotes AES in Galois/Counter mode.

Deriving keys and initialization vectors

In most actual applications, it is not enough to generate a random key. Instead, we typically must find a way to derive a key and IV from other information, which is called input keying material (IKM).

For example, in the Diffie-Hellman-Merkle key exchange, we first negotiate a shared secret with another participant, and then must derive a suitable key from that secret. Examples of such secrets are integers, and points on an elliptic curve.

There is a standard algorithm that lets us derive keys and IVs from arbitrary input data. It is called HKDF (HMAC-based key derivation function), and it is described in RFC 5869. In SWI-Prolog, it is available as crypto_data_hkdf/4.

For example, let us now derive a key and an IV from a password in such a way that entering the same password always yields the same key and IV.

Using HKDF, we could of course derive a key directly from a given password. However, consider what we said about storing passwords: First, attackers that try to guess the password should be slowed down. Second, using a password should not leak any information even if the same password is also used elsewhere.

Therefore, instead of passwordkey, we will do:

password and salt → intentionally slow hashkey.

Thus, we will now combine password-based hash derivation with HMAC-based key derivation, using crypto_password_hash/3 and then crypto_data_hkdf/4.

To ensure that the same key is generated also when the same password is entered later, we must fix all parameters of crypto_password_hash/3. For example, recall that by default, a random salt is used. Now, we will supply our own salt, and store it for later reference. We can easily generate a salt with crypto_n_random_bytes/2. For example, let us use a 128-bit salt (16 bytes). In addition, let us explicitly supply the algorithm we want to use ('pbkdf2-sha512', which is currently the only supported option in any case) and also the number of iterations. For example, let us use 219 iterations.

From the generated hash, we can easily derive a key and an IV: The info/1 option of crypto_data_hkdf/4 can be used to derive several different keys and IVs from the same IKM. We also fix the algorithm that is used by crypto_data_hkdf/4.

Taking all these considerations into account, we obtain for example the following predicate to derive a 128-bit (i.e., 16 bytes) key and IV from a given password and salt:
password_salt_key_iv(Password, Salt, Ks, IV) :-
        crypto_password_hash(Password, Hash, [algorithm('pbkdf2-sha512'),
        crypto_data_hkdf(Hash, 16, Ks, [info(key),algorithm(sha256)]),
        crypto_data_hkdf(Hash, 16, IV, [info(iv),algorithm(sha256)]).
Sample usage, with a fresh 128-bit salt:
?- crypto_n_random_bytes(16, Salt),
   password_salt_key_iv(test, Salt, Ks, IV).
This yields:
Salt = [203, 81, 172, 46, 86, 244, 37, 2, 215|...],
Ks = [38, 141, 86, 95, 83, 22, 243, 31, 38|...],
IV = [36, 149, 175, 179, 48, 192, 213, 175, 71|...].
When we later use the same password and salt, exactly the same results are derived:
?- password_salt_key_iv(test, $Salt, Ks, IV).
Salt = [203, 81, 172, 46, 86, 244, 37, 2, 215|...].
Ks = [38, 141, 86, 95, 83, 22, 243, 31, 38|...],
IV = [36, 149, 175, 179, 48, 192, 213, 175, 71|...],
Thus, we only need to store the salt that was used, and can later (again) use HKDF to derive all required values when the password is entered. It is safe to store the salt in plain text, since it has no discernible or realistically computable relation with the derived values, as long as the password is kept completely secret.

The SWI documentation contains an example for negotiating a shared secret via ECDH. Again, once such a secret is established, you can derive keys from it using crypto_data_hkdf/4.

Sometimes, keys are themselves derived from keys. For example, you may have a master key, and derive further keys from it, such as one key per file you want to encrypt. In such cases, you can for example specify a file name in the info/1 option of crypto_data_hkdf/4. Even if an attacker finds out one of these keys, the master key remains safe. As another example, you could prove the authenticity of the cipher text by deriving a further key that you use to compute an HMAC over the encrypted data and the IV. However, although this construction does not leak the primary key even if the derived key is broken, it is less error-prone and therefore safer to use a cipher with built-in support for authenticated encryption for such use cases.

Further reading

We have barely scratched the surface of what you can and should do with Prolog in the context of cryptography. For example, An Efficient Cryptographic Protocol Verifier Based on Prolog Rules by Bruno Blanchet outlines how you can use Prolog to verify properties of a cryptographic protocol.

In many cases, you can benefit from cryptographic features even without knowing how they work internally. For example, if you simply want to enable encrypted traffic for your web applications, see LetSWICrypt to set up an HTTPS server with SWI-Prolog.

In other cases, you can use the available functionality to implement specific applications on your own. For example, you can use Prolog to reason about Bitcoin addresses. See Bitcoinolog for more information. You can use the above methods to encrypt your Bitcoin wallets.

The cryptographic functionality of SWI-Prolog is subject to continuous improvements. Please see the SSL Roadmap for more information.

More about Prolog

Main page