Cryptosystem ME6: Statistical Tests


Introduction

Ideally a cryptosystem should generate ciphertext which is indistinguishable from a sequence of randomly generated bytes. Such a sequence has the property that, given an initial segment (no matter how long) it is not possible to guess the next byte with a probability of success of greater than 1/256. In other words, complete knowledge of the sequence up to a certain point provides no information concerning the bytes which follow in the sequence.

For example, suppose a sequence of bytes was such that twice as many 1 bits occurred as 0 bits (among the bits making up the bytes). In this case, when asked to guess the next byte in the sequence, you are likely to be right more often than not if you choose a byte with more 1 bits than 0 bits. Thus a truly random sequence of bytes should have about as many 0 bits as 1 bits (though it is not likely to have exactly the same number of each).

Ciphertext produced by Cryptosystem ME6 has been subjected to a number of statistical tests in an attempt to find some regularity that would distinguish it from randomly generated bytes. No such regularity has been found. The tests applied to ME-ciphertext are described below, with sample results.

Statistical Tests

In order to investigate whether statistical regularities in the plaintext carry over into statistical regularities in the ciphertext it is best to start with plaintext which consists of highly patterned bytes and which uses a key which is uniform. This minimizes the contribution of the irregularities of the plaintext to the randomness of the ciphertext.

Cryptosystem ME6 compresses the plaintext when possible (which is usually the case). This compression eliminates most regularities in the plaintext, and thus works against the purpose of this test, which is to see whether the encryption process itself removes all pattern. Thus in this test a version of the Cryptosystem ME6 software was used which does not compress the plaintext (and which also does not interchange the two halves of each block, as normally occurs according to the description of the encryption process).

For this test (and for the other tests described below) three files were created, each consisting only of repetitions of the 32-byte sequence

AAAABBBBCCCCDDDDEEEEFFFFGGGGHHHH
their lengths being exactly 4 KB, 64 KB and 1024 KB. When viewed with a hex editor the first 256 bytes of each of these files looks like this:

Each file was encrypted using the 16-byte key PPPPPPPPPPPPPPPP (the hexadecimal representation of P is 50 and its binary representation is 01010000). The randomness measures for the plaintext files and for the encrypted files (with and without compression) are given below.

(a) Randomness

The randomness measures were as follows:

The plaintext:

                File    Size in bytes     Randomness
             4k1.tst            4,096          0.182
            64k1.tst           65,536          0.134
          1024k1.tst        1,048,832          0.106

The ciphertext (without compression):

                File    Size in bytes     Randomness
             4k1.en0            4,439          0.939
            64k1.en0           68,105          0.945
          1024k1.en0        1,089,368          0.947

The ciphertext (with compression):

                File    Size in bytes     Randomness
             4k1.en1              994          0.952
            64k1.en1           10,046          0.952
          1024k1.en1          161,387          0.940

These randomness values show that the ciphertext appears as if composed of randomly generated bytes (whether or not compression is used).

The first 256 bytes of 4k1.en0 look like this:

The graphical displays of randomness for the three ciphertext files (without compression) are:

(b) Byte frequency

One can count the frequency with which byte values occur in the three ciphertext files. The 20 most-frequent byte values are as follows:

File 4k1.en0 contains 4439 bytes and 256 different byte values.
  Rank     Byte      Frequency   ASCII value
    1   126 = 0x7E          30      ~
    2   164 = 0xA4          30
    3    57 = 0x39          27      9
    4    35 = 0x23          26      #
    5   128 = 0x80          26
    6   221 = 0xDD          26
    7   135 = 0x87          25
    8   171 = 0xAB          25
    9    69 = 0x45          24      E
   10    76 = 0x4C          24      L
   11   187 = 0xBB          24
   12    46 = 0x2E          23      .
   13    48 = 0x30          23      0
   14    54 = 0x36          23      6
   15    77 = 0x4D          23      M
   16    84 = 0x54          23      T
   17   124 = 0x7C          23      |
   18   184 = 0xB8          23
   19   193 = 0xC1          23
   20   210 = 0xD2          23

File 64k1.en0 contains 68105 bytes and 256 different byte values.
  Rank     Byte      Frequency   ASCII value
    1     5 = 0x05         327
    2    36 = 0x24         312      $
    3    99 = 0x63         307      c
    4   182 = 0xB6         302
    5    75 = 0x4B         301      K
    6    22 = 0x16         299
    7   250 = 0xFA         299
    8     0 = 0x00         298      null
    9     8 = 0x08         298
   10   143 = 0x8F         298
   11   188 = 0xBC         298
   12   236 = 0xEC         298
   13   120 = 0x78         296      x
   14   145 = 0x91         294
   15   247 = 0xF7         294
   16    28 = 0x1C         293
   17    53 = 0x35         293      5
   18   187 = 0xBB         293
   19   135 = 0x87         292
   20   195 = 0xC3         292

File 1024k1.en0 contains 1089368 bytes and 256 different byte values.
  Rank     Byte      Frequency   ASCII value
    1   203 = 0xCB        4395
    2   236 = 0xEC        4394
    3    12 = 0x0C        4393      formfeed
    4   174 = 0xAE        4388
    5   193 = 0xC1        4383
    6   108 = 0x6C        4380      l
    7    94 = 0x5E        4376      ^
    8    10 = 0x0A        4373      linefeed
    9   206 = 0xCE        4367
   10   214 = 0xD6        4365
   11    99 = 0x63        4362      c
   12   250 = 0xFA        4362
   13   209 = 0xD1        4359
   14    29 = 0x1D        4354
   15    91 = 0x5B        4354      [
   16   146 = 0x92        4354
   17   102 = 0x66        4351      f
   18   151 = 0x97        4348
   19   160 = 0xA0        4347
   20    21 = 0x15        4345

The byte values are thus distributed uniformly, which is what we would expect for ciphertext in which the bytes appear to be random.

(c) Bit comparison

The 4k1.en0 and 64k1.en0 files were analysed as to their bit counts (1024k1.en0 was too large for the analysis program), not only for total bit counts but also for bit counts for the 8 positions within a byte. The results were:

4k1.en0 is 4439 bytes long.
Number of zero bits = 17949
Number of  one bits = 17563
Difference = 386, 1.087% of total bits
bit posn       7       6       5       4       3       2       1       0
  0 bits    2276    2252    2210    2240    2207    2265    2253    2246
  1 bits    2163    2187    2229    2199    2232    2174    2186    2193
   0 - 1     113      65     -19      41     -25      91      67      53
 % total   2.546   1.464  -0.428   0.924  -0.563   2.050   1.509   1.194

64k1.en0 is 68105 bytes long.
Number of zero bits = 273376
Number of  one bits = 271464
Difference = 1912, 0.351% of total bits
bit posn       7       6       5       4       3       2       1       0
  0 bits   34186   34373   34440   34043   33979   34115   34058   34182
  1 bits   33919   33732   33665   34062   34126   33990   34047   33923
   0 - 1     267     641     775     -19    -147     125      11     259
 % total   0.392   0.941   1.138  -0.028  -0.216   0.184   0.016   0.380

Bytes which are truly randomly generated should have about as many 0 bits as 1 bits. In ME6-ciphertext the number of 0 bits is in fact found to be close to the number of 1 bits, even for the individual bit positions within a byte, so this statistical test, like the others, does not distinguish ME6-ciphertext from randomly generated bytes.

(d) Kappa test

The kappa test is designed to detect repetitions of bytes at specified offsets in a file. Suppose that the elements of the ciphertext are any of the 256 possible bytes (0 through FF in hexadecimal). Consider the ciphertext to be a sequence of bytes (in a row). Now duplicate this sequence and place it beneath the first (with the first byte of the first sequence directly above the first byte of the second sequence). We now have a sequence of pairs of identical bytes. Slide the lower sequence to the right a certain distance, say, 8 places. Then count how many pairs there are in which the bytes are identical. If the sequence of bytes were truly random then we would expect about 1/256 of the pairs to consist of identical bytes, i.e., about 0.39% of them. We may analyse a file, calculating the indices of coincidence (also known as the kappa values) for one or more displacement values (a.k.a offset).

When we run such a program on ordinary English text we obtain values such as the following ("IC" stands for "index of coincidence"):

                  Offset    IC      coincidences
                   1      5.85%     2397 in 40968
                   2      6.23%     2551 in 40967
                   3      9.23%     3780 in 40966
                   4      8.31%     3406 in 40965
                   5      7.91%     3240 in 40964
                   6      7.88%     3227 in 40963
                   7      7.78%     3187 in 40962
                   8      7.92%     3244 in 40961
                   9      8.24%     3377 in 40960
                  10      7.98%     3268 in 40959
                  11      8.16%     3341 in 40958
                  12      8.09%     3315 in 40957
                  13      8.15%     3337 in 40956
                  14      7.97%     3264 in 40955
                  15      7.97%     3265 in 40954
                  16      8.07%     3306 in 40953
                  17      8.04%     3293 in 40952
                  18      7.85%     3214 in 40951

Typically only 80 or so different byte values occur in a file of English text. If these byte values occurred randomly then we would expect an index of coincidence for each displacement of about 1/80, i.e. about 1.25%. However, the distribution of characters in English text is not random ("e", "t" and the space character occur most frequently), which is why we obtain the larger IC values shown above.

The kappa test can be useful in breaking a weak cryptosystem. The index of coincidence for the displacement equal to the length of the encryption key will sometimes be significantly higher than the other indices, in which case one can infer the length of the key.

The offset value that is most likely to produce anomalies, in the case of ME6-ciphertext is the key size, in this case 16. The results of running the kappa test on the file 4k1.en0, testing all offset values from 1 through 512, are as follows:

Offset       IC       coincidences
     1      0.41%     18 in 4405
     2      0.45%     20 in 4404
     3      0.41%     18 in 4403
     4      0.39%     17 in 4402
     5      0.43%     19 in 4401
     6      0.43%     19 in 4400
     7      0.43%     19 in 4399
     8      0.43%     19 in 4398
     9      0.34%     15 in 4397
    10      0.30%     13 in 4396
    11      0.36%     16 in 4395
    12      0.43%     19 in 4394
    13      0.39%     17 in 4393
    14      0.34%     15 in 4392
    15      0.52%     23 in 4391
    16      0.25%     11 in 4390
    17      0.23%     10 in 4389
    18      0.30%     13 in 4388
    19      0.27%     12 in 4387
    20      0.36%     16 in 4386
            ...
   502      0.33%     13 in 3904
   503      0.54%     21 in 3903
   504      0.10%     4 in 3902
   505      0.38%     15 in 3901
   506      0.41%     16 in 3900
   507      0.44%     17 in 3899
   508      0.38%     15 in 3898
   509      0.21%     8 in 3897
   510      0.39%     15 in 3896
   511      0.49%     19 in 3895
   512      0.31%     12 in 3894

Average IC = 0.38%.  256 different byte values occur; kappa-r(256) = 0.39%.

The highest IC values were:

Offset       IC       coincidences
   134      0.61%     26 in 4272
    59      0.62%     27 in 4347
   344      0.64%     26 in 4062
   157      0.64%     27 in 4249
   120      0.68%     29 in 4286
   105      0.70%     30 in 4301

The results of the kappa test for 64k1.en0, with displacements from 1 through 10,000, are as follows:

Offset       IC       coincidences
     1      0.42%     288 in 68176
     2      0.42%     285 in 68175
     3      0.42%     286 in 68174
     4      0.43%     293 in 68173
     5      0.39%     266 in 68172
     6      0.44%     299 in 68171
     7      0.37%     252 in 68170
     8      0.44%     297 in 68169
     9      0.39%     265 in 68168
    10      0.44%     301 in 68167
    11      0.40%     274 in 68166
    12      0.37%     251 in 68165
    13      0.43%     292 in 68164
    14      0.41%     281 in 68163
    15      0.42%     286 in 68162
    16      0.37%     254 in 68161
    17      0.44%     297 in 68160
    18      0.35%     236 in 68159
    19      0.40%     274 in 68158
    20      0.41%     281 in 68157
            ...
  9995      0.37%     215 in 58182
  9996      0.38%     223 in 58181
  9997      0.38%     220 in 58180
  9998      0.41%     237 in 58179
  9999      0.39%     228 in 58178
 10000      0.38%     219 in 58177

Average IC = 0.39%.  256 different byte values occur; kappa-r(256) = 0.39%.

The highest IC values were:

Offset       IC       coincidences
  2018      0.47%     310 in 66159
    34      0.47%     319 in 68143
   372      0.47%     322 in 67805
  6417      0.48%     294 in 61760
  5132      0.48%     304 in 63045
  6263      0.49%     303 in 61914

Thus the kappa test reveals no regularities in this ciphertext, even though the plaintext consisted solely of repetitions of AAAABBBBCCCCDDDDEEEEFFFFGGGGHHHH and the key was the uniform PPPPPPPPPPPPPPPP.

The absence of any apparent regularities resulting from the encryption method is thus to be expected also in the case where the plaintext is much more diverse (as in normal English text), redundancy is eliminated by compression, and the key is not uniform.

Single-bit Change

(a) Single-bit change in the key

When evaluating an encryption program it is reasonable to ask whether the encryption method used is something as weak as a repeated exclusive-or cipher, in which the bytes of the key are repeatedly XOR'd against those of the plaintext. In a such a cryptosystem each byte of the ciphertext is affected only by the corresponding byte in the key and not by every byte in the key. In this case the system is generally easy to crack (by using the kappa test to determine the length of the key, say n, and then considering the n sets of bytes each consisting of only the bytes encypted by a particular byte in the key).

When an attempt is made to break a cryptosystem this test is likely to be one of the first to be performed, namely, take some known plaintext and encrypt it with slightly different keys and compare the resulting ciphertext to see whether a particular change in the key produces a particular change in the ciphertext. With a strong cipher a change of a single bit in the key will have a cascade effect, producing large changes in the resulting ciphertext.

In this test some plaintext is encrypted, using Cryptosystem ME6, by two keys which differ only in a single byte (and only in a single bit). As stated above, the file 4k1.tst was encrypted (without compression) using the key PPPPPPPPPPPPPPPP to produce the file 4k1.en0 (4406 bytes). 4k1.tst was then encrypted using the key PPPPPPPPPPPPPPPQ, which differs from the first key only in a single bit, to produce the file 4k1.en2 (4355 bytes).

When 4k1.en0 and 4k1.en2 are XORed the result is 4k1.xor, the first 256 bytes of which are:

4k1.xor has a randomness measure of 0.949:

The kappa values show no anomalies (the highest IC with all displacements from 1 through 512 being 0.72%) and the bit count of the 4355 bytes is:

4k1.xor is 4355 bytes long.
Number of zero bits = 17537
Number of  one bits = 17303
Difference = 234, 0.672% of total bits
bit posn       7       6       5       4       3       2       1       0
  0 bits    2254    2156    2224    2171    2215    2137    2130    2250
  1 bits    2101    2199    2131    2184    2140    2218    2225    2105
   0 - 1     153     -43      93     -13      75     -81     -95     145
 % total   3.513  -0.987   2.135  -0.299   1.722  -1.860  -2.181   3.330

so these bytes appears to be random and ME6 passes this test (i.e., a single-bit change in the key used to encrypt a file results in ciphertext so different that XORing it with the first ciphertext produces what appears to be random bytes).

(b) Single-bit change in the plaintext

In this test two pieces of plaintext, differing only in a single byte (and in a single bit), were encrypted using the same key.

The two files of plaintext were the file 4k1.tst and a file 4k2.tst (both 4096 bytes in size) differing only in one byte (and in one bit) at the middle of the file, byte 2048, which is an @ (ascii 40) instead of an A (ascii 41).

Again the key used for encryption was PPPPPPPPPPPPPPPP. The resulting ciphertext files, 4k1.en0 and 4k2.en0, each 4439 bytes in size, are identical in the first 2218 bytes except for one byte. Then differences begin to appear:

From byte 2720 onwards the two files are completely different:

Thus changing a single bit in the plaintext causes all bytes in all ciphertext sub-blocks (ciphertext segments) following the ciphertext sub-block corresponding to the plaintext sub-block in which it occurs to be different (except for chance coincidences) until the end of the block is reached. And not only different, but so different that the differences appear to be random.

Changing a bit in the plaintext affects only the ciphertext up to the end of the block in which it occurs. A change of one or more bits in one block of plaintext does not affect the ciphertext produced for other blocks of plaintext.

Common Key Cryptotext XOR

It is convenient to use a single encryption key to encrypt many different files (a different key for each file would be a little more secure but a lot more inconvenient). In some cryptosystems this may result in what is called a "cryptotext-cryptotext compromise", meaning that a comparison of the cryptotexts produced with a common encryption key provides useful information to an unauthorized decryptor.

It was shown above that the XOR of two ciphertext files produced with the same key from plaintext files which differ in a single byte consists of garbage from a point in the XOR a few hundred bytes after the position of the byte at which the two plaintext files differ.

In this test we took two quite different plaintext files (of similar length), encrypted them using the same key, formed the XOR and applied the bit count and kappa tests to the XOR. In this test the normal version of Cryptosytem ME6 (i.e., including compression) was used.

The first file was eng1.txt (52,294 bytes) and the second file was eng2.txt (47,829 bytes), each consisting solely of English text. The key was once more unto the breach, the encrypted files being named eng1.enc (32,271 bytes) and eng2.enc (29,198 bytes). These two files were XORed to produce eng.xor (29,198 bytes), whose first 256 bytes are:

eng.xor has a randomness measure of 0.954:

The bit count test applied to eng.xor produces:

eng.xor is 29198 bytes long.
Number of zero bits = 116155
Number of  one bits = 117429
Difference = -1274, -0.545% of total bits
bit posn       7       6       5       4       3       2       1       0
  0 bits   14292   14503   14556   14658   14536   14580   14547   14483
  1 bits   14906   14695   14642   14540   14662   14618   14651   14715
   0 - 1    -614    -192     -86     118    -126     -38    -104    -232
 % total  -2.103  -0.658  -0.295   0.404  -0.432  -0.130  -0.356  -0.795
Clearly the bit frequencies are uniform.

The kappa test applied to eng.xor, with offsets from 1 to 10,000 produces:

eng.xor is 29198 bytes long.

Offset       IC       coincidences
     1      0.42%     124 in 29197
     2      0.42%     123 in 29196
     3      0.41%     119 in 29195
     4      0.45%     130 in 29194
     5      0.36%     104 in 29193
     6      0.40%     116 in 29192
     7      0.36%     106 in 29191
     8      0.41%     120 in 29190
     9      0.37%     108 in 29189
    10      0.34%      99 in 29188
    11      0.43%     125 in 29187
    12      0.37%     109 in 29186
    13      0.38%     111 in 29185
    14      0.44%     129 in 29184
    15      0.38%     112 in 29183
    16      0.40%     118 in 29182
    17      0.50%     147 in 29181
    18      0.38%     111 in 29180
    19      0.41%     121 in 29179
    20      0.39%     114 in 29178
            ...
  9992      0.38%      73 in 19206
  9993      0.41%      79 in 19205
  9994      0.42%      81 in 19204
  9995      0.41%      78 in 19203
  9996      0.42%      80 in 19202
  9997      0.35%      67 in 19201
  9998      0.42%      80 in 19200
  9999      0.38%      73 in 19199
 10000      0.39%      75 in 19198

Average IC = 0.39%.  256 different byte values occur; kappa-r(256) = 0.39%.

The highest IC values were:

Offset       IC       coincidences
  1536      0.52%     144 in 27662
  9819      0.53%     103 in 19379
  8806      0.53%     109 in 20392
  8738      0.53%     109 in 20460
  6254      0.53%     121 in 22944
  2423      0.53%     142 in 26775
  4291      0.56%     139 in 24907

Thus the kappa test reveals nothing remarkable in the XOR of the two ciphertext files produced using the same key.