SlideShare une entreprise Scribd logo
1  sur  13
Télécharger pour lire hors ligne
INDIAN STATISTICAL INSTITUTE




                                          Salsa
A Detailed Study on Salsa under the guidance of
               Dr. Bimal K. Roy
        Amit Kumar Ghosh , Abhijnan Chattopadhyay, Priyanka Syal, Preetha Bhattacharjee




The document gives an overall description over specifications of Salsa as hash function, expansion
function, encryption function ; a range of benchmarks relevant to cryptographic speed ; and
explains, at a lower level, techniques to achieve this performance ; and discusses modern day
cryptanalysis and security of Salsa and point all alternative measures and other variants listed till
date.
Table of Contents

Designing Salsa20 ............................................................................................................................. 2
   Introduction .................................................................................................................................. 2
   Operations..................................................................................................................................... 2
   Encryption ..................................................................................................................................... 3
   Hashing ......................................................................................................................................... 4
Benchmarking Salsa20 ...................................................................................................................... 5
   The Salsa20 structure .................................................................................................................... 5
   Salsa20 on different Platforms ....................................................................................................... 5
Salsa20 specification ......................................................................................................................... 7
   Defining Functions ......................................................................................................................... 7
   Specification .................................................................................................................................. 8
       The Salsa20 hash function ......................................................................................................... 8
       The Salsa20 expansion function ................................................................................................. 9
       The Salsa20 encryption function ................................................................................................ 9
Security and cryptanalysis of Salsa20 ............................................................................................... 9
   Side Channel Attacks ..................................................................................................................... 9
   Notes on the uniform randomness and diagonal constants ............................................................ 9
   Differential Cryptanalysis of Salsa20/8 ......................................................................................... 10
       Truncated differential cryptanalysis of five rounds of Salsa20 .................................................. 10
   Algebraic attacks ......................................................................................................................... 11
   Other notions of security ............................................................................................................. 11
Alternative Proposals ..................................................................................................................... 11
   Extending the Salsa20 nonce ....................................................................................................... 11




                                                                                                                                     1|P a ge
Designing Salsa20

      Introduction
      eSTREAM, the ECRYPT Stream Cipher Project, called for submissions of stream ciphers in
      November 2004. It received more than 30 proposals from 97 cryptographers in 19 countries, and
      over the subsequent years collected a total of 200 papers. The final eSTREAM portfolio," containing
      four software stream ciphers and four hardware stream ciphers, was announced in April 2008. The
      portfolio was revised in September 2008 to eliminate a hardware stream cipher, F-FCSR v2, that
      had been broken.

      Salsa20/r is a software-oriented (profile 1) stream cipher proposed by Daniel J. Bernstein. The
      algorithm supports keys of 128 bits and 256 bits. During its operation, the key, a 64-bit nonce
      (unique message number), a 64-bit counter and four 32-bit constants are used to construct the
      512-bit initial state. After r iterations of the Salsa20/r round function, the updated state is used as
      a 512-bit keystream output. Each such output block is an independent combination of the key,
      nonce, and counter and, since there is no chaining between blocks, the operation of Salsa20/r
      resembles the operation of a block cipher in counter mode. Salsa20/r therefore shares the very
      same implementation advantages, in particular the ability to generate output blocks in any order
                                                                                           70
      and in parallel. The maximum length of the keystream produced by Salsa20/r is 2           bits.


      Operations
            Topic                                                       Explanations
Integer multiplications            Argument                                        Counter-argument
                                    The basic argument for integer                 integer multiplication takes
                                      multiplication is that the output bits          several cycles on typical CPUs,
                                      are complicated functions of the input          and many more cycles on some
                                      bits, mixing the inputs more thoroughly         CPUs. For comparison, a
                                      than                                            comparably complex series of
                                    A further argument against integer               simple integer operations is
                                      multiplication is that it increases the         always reasonably fast.
                                      risk of timing leaks. What really matters       Multiplication might be slightly
                                      is not the speed of integer                     faster on some CPUs but it is
                                      multiplication, but the speed of                not consistently
                                      constant-time integer multiplication,           fast.
                                      which is often much slower.
S-box lookups                      Argument                                        Counter-argument
[An S-box lookup is an array        S-boxes is that a single table lookup          A simple integer operation
lookup using an input-                can mangle its                                  takes one or two 32-bit inputs
dependent index. Most                 input quite more thoroughly than a              rather than one 8-bit input, so it
ciphers are designed to take          chain of a few simple integer                   electively mangles several
advantage of this operation.          operations taking the same amount of            8-bit inputs at once. It is not
For example, typical high-            time.                                           obvious that a series of S-box
speed AES software has              A further argument against S-box                 lookups-even with
several 1024-byte S-boxes,            lookups is that, on most platforms,             rather large S-boxes, as in AES,
each of which converts                theyare vulnerable to timing attacks.           increasing L1 cache pressure on
8-bit inputs to 32-bit outputs.]      NIST's statement to the contrary                large CPUs and forcing different
                                      (table lookup is not vulnerable to             implementation techniques for

                                                                                                    2|P a ge
timing attacks") is erroneous.                  small CPUs -is faster than
                                                                                    a comparably complex series of
                                                                                    integer operations.
Rotations                       Argument                                         Counter-argument
[Rotations account for about    The basic argument for rotations is that one
one third of the integer        xor of a rotated quantity provides as much
operations in Salsa20, and      diffusion as two xors of shifted quantities.
more on the UltraSPARC.
Replacing some of the
rotations with a comparable
number of additions might
achieve comparable di
usion in fewer rounds.]



        Encryption
             Topic                                                    Explanations
Different encryption and       The popularity of CBC appears to be a historical accident. I have found very
decryption                     few people arguing for CBC over counter mode, and none of the arguments are even
                               marginally convincing. On occasion I encounter the superstitious notion that
                               encryption by xor is too simple"; but a one-time pad (in conjunction with
                               aWegman-Carter MAC) provably achieves perfect secrecy (and any desired level of
                               integrity), so there is obviously nothing wrong with xor. There are several clear
                               arguments against CBC. One disadvantage of CBC is that it requires different code
                               for encryption and decryption, increasing costs in many contexts. Another
                               disadvantage of CBC is that the extra communication from the cryptanalyst into the
                               cipher state is a security threat; regaining the original level of confidence means
                               adding rounds, taking additional time.
Stream’s dependency over         Argument                                        Counter-argument
plaintext                         The basic argument for incorporating           One counterargument is that
                                     plaintext into the stream (specically,           “free" is a wild exaggeration.
                                     incorporating plaintext bytes into               Incorporating the plaintext into
                                     subsequent bytes of the stream) is that          the stream takes time for every
                                     this allows message authentication for          block, and generating an
                                     free." After encrypting the plaintext,           authenticator takes time for
                                     one generate a constant number of                every message.
                                     additional stream bytes and output           Incorporation of plaintext,
                                     them as an authenticator of the                  being extra communication
                                     plaintext.                                       from the cryptanalyst into the
                                     .                                                cipher state, is a security threat.
                                                                                      Regaining the original level of
                                                                                      condence means adding
                                                                                      rounds, which takes
                                                                                      additional time for every block.
State                            Argument                                        Counter-argument
                                  The argument for a larger state is that        A larger state loses time in
                                     one does not need as many cipher                 some contexts. Reuse forces
                                     rounds to achieve the same                       serialization: one cannot take
                                     conjectured security level. Copying              advantage of extra hardware to

                                                                                                 3|P a ge
state across blocks seems to provide          reduce the latency of
                               just as much mixing as the rst few            encrypting or decrypting long
                               cipher rounds. A larger                       messages. Furthermore, large
                               state therefore saves some time after         states reduce the number of
                               the first block..                             messages that can be processed
                                                                             simultaneously on limited
                                                                             hardware.
Block Size                 Argument                                       Counter-argument
                            The basic argument for a larger block         A larger block size also loses
                              size, say 256 bytes, one does not need         time. On most CPUs, the
                              as many cipher rounds to achieve the           communication cost of
                              same conjectured security level. Using         sweeping through a 256-byte
                              a larger block size, like copying state        block is a bottleneck; CPUs are
                              across blocks, seems to provide just as        designed for computations that
                              much mixing as the rst few cipher              don't involve so much
                              rounds. A larger state therefore saves         data.
                              time.



     Hashing
           Topic                                                 Explanations
Implementation of Block    Argument                                         Counter-argument
cipher                      The basic argument for a block cipher           The basic disadvantage of a
                              for keeping the k words independent              block cipher is that the k words
                              of the n words is that, for fixed k, it is       consume valuable
                              easy to make a block cipher be an                communication resources. A
                              invertible function of n. But this               64-byte block cipher with a 32-
                              feature seems to be of purely historical         byte key would need
                              interest. Invertibility is certainly not         to repeatedly sweep through 96
                              necessary for encryption.                        bytes of memory (plus a few
                                                                               bytes of temporary storage) for
                                                                               its 64 bytes of output; in
                                                                               contrast, Salsa20 repeatedly
                                                                               sweeps through just 64 bytes of
                                                                               memory (plus a few bytes of
                                                                               temporary storage) for its 64
                                                                               bytes of output.
Code-Length                Argument                                         Counter-argument
                            Using two different kinds of rounds is          The basic counterargument is
                              the idea that attacks will have some             that extra code is expensive in
                              extra difficulty passing through the             many contexts. It increases
                              switch from one kind to another. This            pressure on a CPU's L1 cache,
                              extra difficulty would allow the cipher          for example, and it increases
                              to reach the same security level with            the minimum size of a
                              fewer rounds..                                   hardware implementation.


Diffusion among words         Salsa20 views its 16 words as a 4 4 array. During the rst round, there is no
                          communication between columns; each column has its own chain of 12 serial

                                                                                         4|P a ge
operations modifying the words in that column. During the second round, there
                                is no communication between rows; each row has its own chain of 12 serial
                                operations modifying the words in that row. Et cetera.
                                      There are pairs (i; j) such that a change in word i has no opportunity to
                                affect word j until the third round. A different communication structure would
                                allow much faster diffusion of changes through all 16 words. On the other hand, it
                                doesn't appear to be possible to achieve much faster diffusion of changes through
                                all 512 bits.
Modifications other than add-   There are many plausible ways to modify each word in a column using other words
rotate-xor                      in the same column. The author settled on xor a rotated sum" as bouncing back and
                                forth between incompatible structures on the critical path. The author chose xor a
                                rotated sum" over add a rotated xor" for simple performance reasons: the x86
                                architecture has a three-operand addition (LEA) but not a three-operand xor.



                                       Benchmarking Salsa20

     The Salsa20 structure
     Encryption of a 64-byte block is xor with the output of the Salsa20 hash function, where the input
     consists of the 32-byte Salsa20 key, the 8-byte nonce (unique message number), the 8-byte block
     counter, and 16 constant bytes. The reader is cautioned that encryption time is slightly longer than
     hashing time: in particular, a 64-byte xor is not free. The Salsa20 hash function regards its 64-byte
     input x as an array of 16 words in little-endian form. It performs 320 invertible modfications, where
     each modfication changes one word of the array. The resulting words are added to the
     original words, producing, in little-endian form, the 64-byte output Salsa20(x). Each modifiation
     involves xoring into one word a rotated version of the sum of two other words. Thus the 320
     modifiations involve, overall, 320 additions, 320 rotations, and 320 xors. The rotations are all by
     constant distances. The entire series of modfications is a series of 10 identical double-rounds.
     Each double-round is a series of 2 rounds. Each round is a set of 4 parallel quarter-rounds. Each
     quarter-round is a series of 4 word modifiations.

     Salsa20 on different Platforms
Platform Name                    Implementation                                Comparison to AES timings
AMD Athlon       salsa20_word_pm software takes 29:25             Osvik reports that unpublished software|with no
                 Athlon cycles for a Salsa20 round, totalling     protection against timing leaks|takes 225 Athlon
                 585 cycles (9:15 cycles/byte) for 20 rounds,     cycles (over 14 cycles/byte) to encrypt a 16-byte
                 totalling 645 cycles (10:08 cycles/byte) for     block with a 16-byte AES key, assuming that the key
                 the Salsa20 hash function, timed as 680          was pre-expanded into 176 bytes. One can
                 cycles with 35 cycles timing overhead. The       reasonably extrapolate that similar software would
                 timings are actually 655 or 656 cycles most of   take over 300 Athlon cycles (over 18 cycles/byte) to
                 the time but 849 cycles on every eighth call,    encrypt a 16-byte block with a 32-byte AES key,
                 presumably because of branch                     assuming that the key was pre-expanded into 240
                 mispredictions.                                  bytes.
                 The compiled code occupies 1248 bytes. Its
                 main loop occupies 937 bytes and handles 4
                 rounds.
IBM PowerPC      salsa20_word_aix software takes 33 PowerPC


                                                                                                5|P a ge
RS64 IV           RS64 IV cycles for each Salsa20 round,
                  totalling 660 cycles (10:32 cycles/byte) for 20
                  rounds, totaling 756 cycles (11:82
                  cycles/byte) for the Salsa20 hash function,
                  timed as 770 cycles with 14 cycles timing
                  overhead.
                  The compiled code for the Salsa20 hash
                  function occupies 768 bytes. Its main
                  loop occupies 392 bytes and handles 2
                  rounds.
Intel Pentium     salsa20_word_pii software takes 37:5              Osvik reports that unpublished software|with no
III               pentium III cycles for each Salsa20 round,        protection against timing leaks|takes 224 Pentium
                  totalling 750 cycles (11:72 cycles/byte) for 20   III cycles (14 cycles/byte) to encrypt a 16-byte block
                  rounds, totalling 837 cycles (13:08               with a 16-byte AES key, assuming that the key was
                  cycles/byte) for the Salsa20 hash function,       pre-expanded into 176 bytes.
                  timed as 872 cycles with 35 cycles timing         One can reasonably extrapolate that similar
                  overhead. (The timings are actually 859           software would take over 300 Pentium III cycles
                  cycles most of the time but 908 cycles on         (over 18 cycles/byte) to encrypt a 16-byte block
                  every fourth call, presumably because of          with a 32- byte AES key, assuming that the key was
                  branch mispredictions.)                           pre-expanded into 240 bytes.
                  The compiled code for the Salsa20 hash
                  function occupies 1280 bytes. Its main loop
                  occupies 937 bytes and handles 4 rounds.
Intel Pentium 4   salsa20_word_p4 software takes 48 Pentium         Osvik reports that unpublished software|with no
f12               4 f12 (Willamette) cycles for each Salsa20        protection against timing leaks|takes 260 Pentium
                  round, totalling 960 cycles (15 cycles/byte)      4 (f12?) cycles (16:25 cycles/byte) to encrypt a 16-
                  for 20 rounds, totaling 1052 cycles (16:44        byte block with a 16-byte AES key, assuming that
                  cycles/byte) for the Salsa20 hash function,       the key was pre-expanded into 176 bytes. Matsui
                  timed as 1136 cycles with 84 cycles timing        and Fukuda report that unpublished software|with
                  overhead.                                         no protection against timing leaks|takes 251
                  The compiled code for the Salsa20 hash            Pentium 4 (f29?) cycles (15:68 cycles/byte) and
                  function occupies 1144 bytes. Its main loop       284 Pentium 4 f33 cycles (17:75 cycles/byte).
                  occupies 629 bytes and handles 4 rounds.          One can reasonably extrapolate that similar
                                                                    software would take over 340 Pentium 4 f12 cycles
                                                                    (over 21 cycles/byte) to encrypt a 16-byte block
                                                                    with a 32-byte AES key, assuming that the key was
                                                                    pre-expanded into 240 bytes.
Intel Pentium     salsa20_word_pm software takes 33:75              The Pentium M might compute AES in marginally
M                 Pentium M cycles for each Salsa20 round,          less time than the Pentium III, but both CPUs face
                  totalling 675 cycles (10:55 cycles/byte) for 20   the same basic AES bottleneck: encrypting a 16-
                  rounds, totalling 740 cycles (11:57               byte block with a 16-byte AES key requires 200 S-
                  cycles/byte) for the Salsa20 hash function,       box lookups, which cannot take fewer than 200
                  timed as 790 cycles with 50 cycles timing         cycles (12:5 cycles/byte). Similarly, encrypting a 16-
                  overhead. (The timings are actually 780 or        byte block with a 32-byte AES key requires 280 S-
                  781 cycles most of the time but 856 cycles on     box lookups, which cannot take fewer than 280
                  every eighth call, presumably because of          cycles (17:5 cycles/byte). Even more S-box lookups
                  branch mispredictions.)                           are required if keys are not pre-expanded.
                  The compiled code for the Salsa20 hash
                  function occupies 1248 bytes. Its main loop
                  occupies 937 bytes and handles 4 rounds.
Motorola          salsa20_word_macos software takes 24:5            Lipmaa reports that AES software by Ahrens|with,


                                                                                                  6|P a ge
PowerPC 7410     PowerPC 7410 cycles for each Salsa20 round,       presumably, no protection against timing
                 totalling 490 cycles (7:66 cycles/byte) for 20    leaks|takes 401 PowerPC 7400 cycles (over 25
                 rounds, totaling approximately 570 cycles         cycles/byte) to encrypt a 16-byte block with a 16-
                 (8:91 cycles/byte) for the Salsa20 hash           byte AES key, assuming that the key was
                 function, timed as approximately 584 cycles       pre-expanded into 176 bytes. I am not aware of any
                 with 14 cycles timing overhead. (Precise          relevant differences between the PowerPC 7400
                 timings are dicult: the CPU's cycle counter       and the PowerPC 7410.
                 has 16-cycle resolution.)                         It should be possible to do somewhat better|my
                 The compiled code for the Salsa20 hash            own public-domain AES software, including key
                 function occupies 768 bytes. Its main loop        expansion, takes about 490 cycles on the PowerPC
                 occupies 392 bytes and handles 2 rounds.          7410|but AES is clearly much slower than Salsa20
                                                                   on this CPU
Sun              salsa20_word_sparc software takes 40:5            Lipmaa reports that unpublished software|with,
UltraSPARC II    UltraSPARC II cycles for each Salsa20 round,      presumably, no protection against timing
                 totalling 810 cycles (12:66 cycles/byte) for 20   leaks|takes 270 UltraSPARC II cycles (over 16
                 rounds, totaling 881 cycles (13:77                cycles/byte) to encrypt a 16-byte block with a 16-
                 cycles/byte) for the Salsa20 hash function,       byte AES key, assuming that the key was
                 timed as 892 cycles with 11 cycles timing         pre-expanded into 176 bytes. One can reasonably
                 overhead.                                         extrapolate that similar software would take over
                 The compiled code for the Salsa20 hash            370 UltraSPARC II cycles (over 23 cycles/byte) to
                 function occupies 936 bytes. Its main loop        encrypt a 16-byte block with a 32-byte AES key,
                 occupies 652 bytes and handles 2 rounds.          assuming that the key was pre-expanded into 240
                                                                   bytes.
Sun              salsa20_word_sparc software takes 41              AES on an UltraSPARC III is at least as slow as AES
UltraSPARC III   UltraSPARC III cycles for each Salsa20 round,     on an UltraSPARC II.
                 totalling 820 cycles (12:82 cycles/byte) for 20
                 rounds, totaling 889 cycles (13:90
                 cycles/byte) for the Salsa20 hash function,
                 timed as 905 cycles with 16 cycles timing
                 overhead.
                 The compiled code for the Salsa20 hash
                 function occupies 936 bytes. Its main loop
                 occupies 652 bytes and handles 2 rounds.



                                        Salsa20 specification

     Defining Functions
   Functions           Inputs &                                          Definition
                        Outputs
The               If y is a 4-word
quarterround      sequence then
function          quarterround(y)
                  is a 4-word
                  sequence




                                                                                               7|P a ge
The rowround          If y is a 16-word
function              sequence then
                      rowround(y) is
                      a 16-word
                      sequence.


The                   If x is a 16-word
columnround           sequence then
function              columnround(x)
                      is a 16-word
                      sequence.




The doubleround       If x is a 16-word   A double round is a column round followed by a row round: doubleround(x) =
function              sequence then       rowround(columnround(x)).
                      doubleround(x)
                      is a 16-word
                      sequence.
The littleendian      If b is a 4-byte
function              sequence then
                      littleendian(b)
                      is a word.



      Specifications
  Functions          Inputs & Outputs                                       Definition
The Salsa20        If x is a 64-byte
hash function      sequence then
                   Salsa20(x) is a 64-
                   byte sequence.




                                                                                                8|P a ge
The Salsa20      If k is a 32-byte or
expansion        16-byte sequence
function         and n is a 16-byte
                 sequence then
                 Salsa20k(n)
                 is a 64-byte
                 sequence.
The Salsa20      Let k be a 32-byte
encryption       or 16-byte
function         sequence. Let v be
                 an 8-byte
                 sequence. Let m
                 be a l-byte
                 sequence for some
                 l€{1,2,…, }. The
                 Salsa20 encryption
                 of m
                 with nonce v
                 under key k,
                 denoted
                 Salsa      (v) m, is
                 an l-byte
                 sequence.



                             Security and cryptanalysis of Salsa20
      Side-channel attacks
      Natural Salsa20 implementations take constant time on a huge variety of CPUs; here constant means
      input-independent. There is no incentive for the authors of Salsa20 software to use variable-time
      operations such as S-box lookups. Timing attacks against Salsa20 are therefore just as di_cult as pure
      cryptanalysis of the Salsa20 outputs. The operations in Salsa20 are also among the easiest to protect
      against power attacks and other side-channel attacks.

      Notes on the uniform randomness and diagonal constants

          †   Salsa20 column round: Each Salsa20 column round affects each column in the same way
              starting from the diagonal. Each Salsa20 row round affects each row in the same way
              starting from the diagonal. Consequently, shifting the entire Salsa20 hash-function input
              array along the diagonal has exactly the same effect on the output.
          †   Salsa20 expansion function:
                  o   Eliminates this shift structure by limiting the attacker's control over the hash-
                      function input. In particular, the input diagonal is always 0x61707865, 0x3320646e,
                      0x79622d32, 0x6b206574, which is different from all its nontrivial shifts. In other
                      words, two distinct arrays with this diagonal are always in distinct orbits under the
                      shift group.
                  o   Eliminates this rotation structure. The input diagonal is different from all its
                      nontrivial shifts and all its nontrivial rotations and all nontrivial shifts of its nontrivial


                                                                                                        9|P a ge
rotations. In other words, two distinct arrays with this diagonal are always in distinct
                  orbits under the shift/rotate group.
    †   Salsa20 hash function: Operations are almost compatible with rotation of each input
        word by, say, 10 bits. Rotation changes the effect of carries that cross the rotation boundary,
        but it is consistent with all other carries, and with the Salsa20 operations other than
        addition.
    †   Attacks based on Non-randomness: Simon Fischer, Willi Meier, Côme Berbain, Jean-
        François Biasse and M. J. B. Robshaw published a paper which shows that Stream cipher
        initialisation should ensure that the initial state or keystream is not detectably related to the
        key and initialisation vector. In this paper we analyse the key/IV setup of the eSTREAM
        Phase 2 candidates Salsa20 and TSC-4. In the case of Salsa20 we demonstrate a key recovery
        attack on six rounds and observe non-randomness after seven. For TSC-4, non-randomness
        over the full eight-round initialisation phase is detected, but would also persist for more
        rounds.
Differential Cryptanalysis of Salsa20/8
The idea of a differential attack is that some “small” differences in input states have a perceptible
chance of producing “small” differences after the first step of the computation, the second step of
the computation, etc.
                        Salsa                                                 AES
Salsa20 is quite different in this respect from       Salsa20 has 16-byte inputs, 64-byte outputs, and
ciphers such as AES where the input size is as        32-byte keys; there are      ,choices of (n, ,k) so
large as the state size. AES has 16-byte inputs,      there is no a-priori reason to believe that any of
16-byte outputs, and (at least) 16-byte keys;         the choices have the 128-bit quantity
there are 2384 choices of (n, ,k) so presumably and the 512-bit quantity and Salsak                Salsak
there are more than        ,choices in which both     (n) are “small”.
of the 128-bit quantities           and AESk
    AESk (n) are “small”.

    †   Yukiyasu Tsunoo , Teruo Saito , Hiroyasu Kubo , Tomoyasu Suzaki, and Hiroki Nakashima
        published a paper which presents a cryptanalysis of the Salsa20 stream cipher proposed in
        2005. Salsa20 was submitted to eSTREAM, the ECRYPT Stream Cipher Project. The cipher
        uses bitwise XOR, addition modulo     , and constant-distance rotation operations on an
        internal state of 16 32-bit words.

    †   It is reported that there is a significant bias in the differential probability for Salsa20’s 4th
        round internal state. It is further shown that using this bias, it is possible to break the 256-bit
        secret key 8-round reduced Salsa20 model with a lower computational complexity than an
        exhaustive key search. The cryptanalysis method exploits characteristics of addition, and
        succeeds in reducing the computational complexity compared to previous methods.



Truncated differential cryptanalysis of five rounds of Salsa20


Going further detail of the paper presented by Yukiyasu Tsunoo , Teruo Saito , Hiroyasu Kubo ,
Tomoyasu Suzaki, and Hiroki Nakashima; Paul Crowley published another paper stating “Truncated
differential cryptanalysis of five rounds of Salsa20” which present an attack on Salsa20 reduced to
five of its twenty rounds.This attack uses many clusters of truncated differentials and requires    ,
work and        plaintexts.

This conclusion leaves some open questions.



                                                                                                10 | P a g e
It is clear that a naive attack of this type cannot be extended to more than a handful of rounds; this
has no negative implications for the security of the full Salsa20- 32/20 presented to eSTREAM.

Nonetheless, the degree of clustering exhibited by these differential characteristics is surprising; it is
more usual for a single differential trail to dominate. It is also striking to find differential trails whose
overall probability is so greatly mispredicted by the products of the probabilities of its components,
marking a violation of the independence assumption usual in differential cryptanalysis. In both
instances, it would bear investigation whether other ciphers that rely heavily on addition mod 2n to
introduce nonlinearity in GF(2) would also show these properties in differential cryptanalysis, or
related properties in other forms of cryptanalysis.



Algebraic attacks

General-purpose equation-solving methods, notably Buchberger's algorithm for computing Groebner
bases, are remarkably powerful. Clegg, Edmonds, and Impagliazzo in proved for a comparable
problem, namely finding proofs in propositional logic|that a Groebner-basis computation can quickly
solve any problem that can be quickly solved by various ad-hoc proof-finding techniques.

Even better, the Groebner-basis computation can quickly solve other problems that cannot be quickly
solved by the ad-hoc techniques. It would be interesting to see analogous theorems regarding various
ad-hoc cryptanalytic techniques. Fortunately, there does not seem to exist any “small” set of
equations for the state bits in Salsa20. Each of the 320 32-bit additions in the Salsa20 computation
requires dozens of quadratic equations, producing a substantially larger system of equations than are
required to describe, for example, the bits in AES. Groebner-basis techniques for solving the AES-bit
equations are, by the most optimistic estimates, slightly faster than brute-force search for a 256-bit
key, but they use vastly more memory and thus have a much worse price-performance ratio.
Algebraic attacks against Salsa20 appear to be even more difficult.


Other notions of security

        Attacks                                                    Explanation
Weak-key attacks           This type of attack seems highly implausible for Salsa20. The Salsa20 key is
                           mangled along with the input in an extremely complicated way. Any key differences
                           rapidly spread through the entire Salsa20 state for the same reason that input
                           differences do.
Equivalent-key attacks     This type of attack, like a weak-key attack, seems highly implausible for
                           Salsa20 as machine would violate the Salsa20 security conjecture.
                           In other words, there is no need to make a separate conjecture regarding
                           equivalent keys.
Related-key attacks        The standard solutions to all the standard cryptographic problems -encryption,
                           authentication, etc. - are protocols that do not allow related-key attacks on the
                           underlying primitives.There is no evidence of violence till date.
Key Recovery Attack        At FSE 2008 Aumasson et al. improved this attack on Salsa20/7 and presented the
                           first key-recovery attack on Salsa20/8. . It is a differential attack based on a
                           technique called probabilistic neutral bits.
                           The authors identify collision and preimage attacks for two simplified variants, then
                           we discuss differential attacks on the original version, and exploit a high-probability
                           differential to reduce complexity of collision search from 2256 to 279 for 3-round
                           Rumba.




                                                                                                 11 | P a g e
Alternative Proposals

Extending the Salsa20 nonce

Daniel J. Bernstein, the creator of Salsa published an another paper entitled “Extending the Salsa20
nonce” which introduces the XSalsa20 stream cipher. XSalsa20 is based upon the Salsa20 stream
cipher but has a much longer nonce: 192 bits instead of 64 bits. XSalsa20 has exactly the same
streaming speed as Salsa20, and its extra nonce-setup cost is slightly smaller than the cost of
generating one block of Salsa20 output. The paper proves that XSalsa20 is secure if Salsa20 is
secure: any fast attack on XSalsa20 using q queries and succeeding with probability p can be
converted into a fast attack on Salsa20 succeeding with probability at least p/(q + 1).

The paper introduces a new family of stream ciphers, XSalsa20. XSalsa20 is, at first glance, quite
similar to Salsa20: it is built from exactly the same operations, has exactly the same protections
against side-channel attacks, has exactly the same streaming speed, supports 256-bit keys, and
allows reduced- round variants such as XSalsa20/12. Note that the speed reports above are for
full-round Salsa20/20, not Salsa20/12. The advantage of XSalsa20 over Salsa20 is a longer nonce:
192 bits rather than 64 bits. The disadvantage is that nonce setup is less efficient-but the
extra cost here is comparable to, and in fact slightly smaller than, the cost of generating a single
Salsa20 output block.

XSalsa20 might at first appear to be an ad-hoc design, following standard principles but potentially
vulnerable to new attacks. On the contrary! The paper proves that any fast successful attack on
XSalsa20 can be converted into a fast successful attack on Salsa20. Confidence in the security of
Salsa20 therefore implies confidence in the security of XSalsa20.

The paper is not meant to take a position in the dispute regarding the necessity of longer nonces. The
paper does not claim any benefits for XSalsa20 in an application that already works with Salsa20's
64-bit nonces. What the paper shows is that-in case an application does want longer nonces-the
Salsa20 nonce can be safely extended at surprisingly low cost.



References
    1. Daniel J. Bernstein, Salsa20 - Design, Specification, Security and Speed. URL:
       http://www.ecrypt.eu.org/stream/p3ciphers/salsa20/salsa20_p3.zip
    2.   Paul Crowley, Truncated differential cryptanalysis of five rounds of Salsa20. URL:
         http://eprint.iacr.org/2005/375
    3. Yukiyasu Tsunoo , Teruo Saito , Hiroyasu Kubo , Tomoyasu Suzaki, and Hiroki Nakashima,
         Differential Cryptanalysis of Salsa20/8. URL:
         http://sasc.crypto.rub.de/files/sasc2007_039.pdf
    4.   Jean-Philippe Aumasson, Simon Fischer, Shahram Khazaei, Willi Meier and Christian
         Rechberger , New Features of Latin Dances: Analysis of Salsa, ChaCha, and Rumba. URL:
         http://www.springerlink.com/content/j35241j881018085/
    5.   Simon Fischer, Willi Meier, Côme Berbain, Jean-François Biasse and M. J. B. Robshaw, Non-
         randomness in eSTREAM Candidates Salsa20 and TSC-4. URL:
         http://www.springerlink.com/content/46wv58h040218wp4/
    6.   Daniel J. Bernstein, Extending the Salsa20 nonce. URL: http://cr.yp.to/snuffle/xsalsa-
         20110204.pdf
    7.   Robshaw, Matthew; Billet, Olivier (Eds.), New Stream Cipher Designs. URL:
         http://www.springer.com/computer/security+and+cryptology/book/978-3-540-68350-6



                                                                                           12 | P a g e

Contenu connexe

Tendances

“Zen 3”: AMD 2nd Generation 7nm x86-64 Microprocessor Core
“Zen 3”: AMD 2nd Generation 7nm x86-64 Microprocessor Core“Zen 3”: AMD 2nd Generation 7nm x86-64 Microprocessor Core
“Zen 3”: AMD 2nd Generation 7nm x86-64 Microprocessor CoreAMD
 
Tcache Exploitation
Tcache ExploitationTcache Exploitation
Tcache ExploitationAngel Boy
 
Mastering java bytecode with ASM - GeeCON 2012
Mastering java bytecode with ASM - GeeCON 2012Mastering java bytecode with ASM - GeeCON 2012
Mastering java bytecode with ASM - GeeCON 2012Anton Arhipov
 
The Deep Learning Compiler
The Deep Learning CompilerThe Deep Learning Compiler
The Deep Learning CompilerTae Young Lee
 
Genetic algorithm raktim
Genetic algorithm raktimGenetic algorithm raktim
Genetic algorithm raktimRaktim Halder
 
HML: Historical View and Trends of Deep Learning
HML: Historical View and Trends of Deep LearningHML: Historical View and Trends of Deep Learning
HML: Historical View and Trends of Deep LearningYan Xu
 
OptView2 - C++ on Sea 2022
OptView2 - C++ on Sea 2022OptView2 - C++ on Sea 2022
OptView2 - C++ on Sea 2022Ofek Shilon
 
Spiking neural network: an introduction I
Spiking neural network: an introduction ISpiking neural network: an introduction I
Spiking neural network: an introduction IDalin Zhang
 
Bayesian Neural Networks
Bayesian Neural NetworksBayesian Neural Networks
Bayesian Neural NetworksNatan Katz
 
TCAMのしくみ
TCAMのしくみTCAMのしくみ
TCAMのしくみogatay
 
Convolutional neural network in practice
Convolutional neural network in practiceConvolutional neural network in practice
Convolutional neural network in practice남주 김
 
강화학습 해부학 교실: Rainbow 이론부터 구현까지 (2nd dlcat in Daejeon)
강화학습 해부학 교실: Rainbow 이론부터 구현까지 (2nd dlcat in Daejeon)강화학습 해부학 교실: Rainbow 이론부터 구현까지 (2nd dlcat in Daejeon)
강화학습 해부학 교실: Rainbow 이론부터 구현까지 (2nd dlcat in Daejeon)Kyunghwan Kim
 
FPGAによる津波シミュレーション -- GPUを超える高性能計算の手法
FPGAによる津波シミュレーション -- GPUを超える高性能計算の手法FPGAによる津波シミュレーション -- GPUを超える高性能計算の手法
FPGAによる津波シミュレーション -- GPUを超える高性能計算の手法Kentaro Sano
 
The Path to "Zen 2"
The Path to "Zen 2"The Path to "Zen 2"
The Path to "Zen 2"AMD
 

Tendances (20)

vector QUANTIZATION
vector QUANTIZATIONvector QUANTIZATION
vector QUANTIZATION
 
“Zen 3”: AMD 2nd Generation 7nm x86-64 Microprocessor Core
“Zen 3”: AMD 2nd Generation 7nm x86-64 Microprocessor Core“Zen 3”: AMD 2nd Generation 7nm x86-64 Microprocessor Core
“Zen 3”: AMD 2nd Generation 7nm x86-64 Microprocessor Core
 
Tcache Exploitation
Tcache ExploitationTcache Exploitation
Tcache Exploitation
 
Mastering java bytecode with ASM - GeeCON 2012
Mastering java bytecode with ASM - GeeCON 2012Mastering java bytecode with ASM - GeeCON 2012
Mastering java bytecode with ASM - GeeCON 2012
 
The Deep Learning Compiler
The Deep Learning CompilerThe Deep Learning Compiler
The Deep Learning Compiler
 
From IA-32 to avx-512
From IA-32 to avx-512From IA-32 to avx-512
From IA-32 to avx-512
 
Genetic algorithm raktim
Genetic algorithm raktimGenetic algorithm raktim
Genetic algorithm raktim
 
HML: Historical View and Trends of Deep Learning
HML: Historical View and Trends of Deep LearningHML: Historical View and Trends of Deep Learning
HML: Historical View and Trends of Deep Learning
 
OptView2 - C++ on Sea 2022
OptView2 - C++ on Sea 2022OptView2 - C++ on Sea 2022
OptView2 - C++ on Sea 2022
 
Spiking neural network: an introduction I
Spiking neural network: an introduction ISpiking neural network: an introduction I
Spiking neural network: an introduction I
 
Logic Programming and ILP
Logic Programming and ILPLogic Programming and ILP
Logic Programming and ILP
 
Neural Networks: Introducton
Neural Networks: IntroductonNeural Networks: Introducton
Neural Networks: Introducton
 
Bayesian Neural Networks
Bayesian Neural NetworksBayesian Neural Networks
Bayesian Neural Networks
 
DPDK KNI interface
DPDK KNI interfaceDPDK KNI interface
DPDK KNI interface
 
Intel dpdk Tutorial
Intel dpdk TutorialIntel dpdk Tutorial
Intel dpdk Tutorial
 
TCAMのしくみ
TCAMのしくみTCAMのしくみ
TCAMのしくみ
 
Convolutional neural network in practice
Convolutional neural network in practiceConvolutional neural network in practice
Convolutional neural network in practice
 
강화학습 해부학 교실: Rainbow 이론부터 구현까지 (2nd dlcat in Daejeon)
강화학습 해부학 교실: Rainbow 이론부터 구현까지 (2nd dlcat in Daejeon)강화학습 해부학 교실: Rainbow 이론부터 구현까지 (2nd dlcat in Daejeon)
강화학습 해부학 교실: Rainbow 이론부터 구현까지 (2nd dlcat in Daejeon)
 
FPGAによる津波シミュレーション -- GPUを超える高性能計算の手法
FPGAによる津波シミュレーション -- GPUを超える高性能計算の手法FPGAによる津波シミュレーション -- GPUを超える高性能計算の手法
FPGAによる津波シミュレーション -- GPUを超える高性能計算の手法
 
The Path to "Zen 2"
The Path to "Zen 2"The Path to "Zen 2"
The Path to "Zen 2"
 

En vedette

Salsa Music Presentation
Salsa Music PresentationSalsa Music Presentation
Salsa Music PresentationKMartinez1117
 
GENERO MUSICAL SALSA
GENERO MUSICAL SALSAGENERO MUSICAL SALSA
GENERO MUSICAL SALSAortizmaria
 
ChaCha20-Poly1305 Cipher Summary - AdaLabs SPARKAda OpenSSH Ciphers
ChaCha20-Poly1305 Cipher Summary - AdaLabs SPARKAda OpenSSH CiphersChaCha20-Poly1305 Cipher Summary - AdaLabs SPARKAda OpenSSH Ciphers
ChaCha20-Poly1305 Cipher Summary - AdaLabs SPARKAda OpenSSH CiphersAdaLabs
 
Latin dance presentation
Latin dance presentationLatin dance presentation
Latin dance presentationKim Allen
 

En vedette (6)

Salsa Music Presentation
Salsa Music PresentationSalsa Music Presentation
Salsa Music Presentation
 
GENERO MUSICAL SALSA
GENERO MUSICAL SALSAGENERO MUSICAL SALSA
GENERO MUSICAL SALSA
 
The Rhythm of Salsa
The Rhythm of SalsaThe Rhythm of Salsa
The Rhythm of Salsa
 
ChaCha20-Poly1305 Cipher Summary - AdaLabs SPARKAda OpenSSH Ciphers
ChaCha20-Poly1305 Cipher Summary - AdaLabs SPARKAda OpenSSH CiphersChaCha20-Poly1305 Cipher Summary - AdaLabs SPARKAda OpenSSH Ciphers
ChaCha20-Poly1305 Cipher Summary - AdaLabs SPARKAda OpenSSH Ciphers
 
Salsa music
Salsa musicSalsa music
Salsa music
 
Latin dance presentation
Latin dance presentationLatin dance presentation
Latin dance presentation
 

Similaire à Salsa20

Super scaling singleton inserts
Super scaling singleton insertsSuper scaling singleton inserts
Super scaling singleton insertsChris Adkin
 
Multicore programmingandtpl
Multicore programmingandtplMulticore programmingandtpl
Multicore programmingandtplYan Drugalya
 
Multicore programmingandtpl(.net day)
Multicore programmingandtpl(.net day)Multicore programmingandtpl(.net day)
Multicore programmingandtpl(.net day)Yan Drugalya
 
[COSCUP 2022] 腳踏多條船-利用 Coroutine在 Software Transactional Memory上進行動態排程
[COSCUP 2022] 腳踏多條船-利用 Coroutine在  Software Transactional Memory上進行動態排程[COSCUP 2022] 腳踏多條船-利用 Coroutine在  Software Transactional Memory上進行動態排程
[COSCUP 2022] 腳踏多條船-利用 Coroutine在 Software Transactional Memory上進行動態排程littleuniverse24
 
Deep learning with kafka
Deep learning with kafkaDeep learning with kafka
Deep learning with kafkaNitin Kumar
 
Optimizing the Graphics Pipeline with Compute, GDC 2016
Optimizing the Graphics Pipeline with Compute, GDC 2016Optimizing the Graphics Pipeline with Compute, GDC 2016
Optimizing the Graphics Pipeline with Compute, GDC 2016Graham Wihlidal
 
Risc cisc Difference
Risc cisc DifferenceRisc cisc Difference
Risc cisc DifferenceSehrish Asif
 
Next Generation Indexes For Big Data Engineering (ODSC East 2018)
Next Generation Indexes For Big Data Engineering (ODSC East 2018)Next Generation Indexes For Big Data Engineering (ODSC East 2018)
Next Generation Indexes For Big Data Engineering (ODSC East 2018)Daniel Lemire
 
osdi23_slides_lo_v2.pdf
osdi23_slides_lo_v2.pdfosdi23_slides_lo_v2.pdf
osdi23_slides_lo_v2.pdfgmdvmk
 
General Purpose Computing using Graphics Hardware
General Purpose Computing using Graphics HardwareGeneral Purpose Computing using Graphics Hardware
General Purpose Computing using Graphics HardwareDaniel Blezek
 
CNNECST: an FPGA-based approach for the hardware acceleration of Convolutiona...
CNNECST: an FPGA-based approach for the hardware acceleration of Convolutiona...CNNECST: an FPGA-based approach for the hardware acceleration of Convolutiona...
CNNECST: an FPGA-based approach for the hardware acceleration of Convolutiona...NECST Lab @ Politecnico di Milano
 
NNECST: an FPGA-based approach for the hardware acceleration of Convolutional...
NNECST: an FPGA-based approach for the hardware acceleration of Convolutional...NNECST: an FPGA-based approach for the hardware acceleration of Convolutional...
NNECST: an FPGA-based approach for the hardware acceleration of Convolutional...NECST Lab @ Politecnico di Milano
 

Similaire à Salsa20 (20)

Super scaling singleton inserts
Super scaling singleton insertsSuper scaling singleton inserts
Super scaling singleton inserts
 
Multicore programmingandtpl
Multicore programmingandtplMulticore programmingandtpl
Multicore programmingandtpl
 
Multicore programmingandtpl(.net day)
Multicore programmingandtpl(.net day)Multicore programmingandtpl(.net day)
Multicore programmingandtpl(.net day)
 
[COSCUP 2022] 腳踏多條船-利用 Coroutine在 Software Transactional Memory上進行動態排程
[COSCUP 2022] 腳踏多條船-利用 Coroutine在  Software Transactional Memory上進行動態排程[COSCUP 2022] 腳踏多條船-利用 Coroutine在  Software Transactional Memory上進行動態排程
[COSCUP 2022] 腳踏多條船-利用 Coroutine在 Software Transactional Memory上進行動態排程
 
Aes
AesAes
Aes
 
cns 2marks
cns 2markscns 2marks
cns 2marks
 
Deep learning with kafka
Deep learning with kafkaDeep learning with kafka
Deep learning with kafka
 
Webinaron muticoreprocessors
Webinaron muticoreprocessorsWebinaron muticoreprocessors
Webinaron muticoreprocessors
 
Optimizing the Graphics Pipeline with Compute, GDC 2016
Optimizing the Graphics Pipeline with Compute, GDC 2016Optimizing the Graphics Pipeline with Compute, GDC 2016
Optimizing the Graphics Pipeline with Compute, GDC 2016
 
The Spectre of Meltdowns
The Spectre of MeltdownsThe Spectre of Meltdowns
The Spectre of Meltdowns
 
Risc cisc Difference
Risc cisc DifferenceRisc cisc Difference
Risc cisc Difference
 
A04660105
A04660105A04660105
A04660105
 
G04701051058
G04701051058G04701051058
G04701051058
 
Smallsat 2021
Smallsat 2021Smallsat 2021
Smallsat 2021
 
Next Generation Indexes For Big Data Engineering (ODSC East 2018)
Next Generation Indexes For Big Data Engineering (ODSC East 2018)Next Generation Indexes For Big Data Engineering (ODSC East 2018)
Next Generation Indexes For Big Data Engineering (ODSC East 2018)
 
osdi23_slides_lo_v2.pdf
osdi23_slides_lo_v2.pdfosdi23_slides_lo_v2.pdf
osdi23_slides_lo_v2.pdf
 
General Purpose Computing using Graphics Hardware
General Purpose Computing using Graphics HardwareGeneral Purpose Computing using Graphics Hardware
General Purpose Computing using Graphics Hardware
 
CNNECST: an FPGA-based approach for the hardware acceleration of Convolutiona...
CNNECST: an FPGA-based approach for the hardware acceleration of Convolutiona...CNNECST: an FPGA-based approach for the hardware acceleration of Convolutiona...
CNNECST: an FPGA-based approach for the hardware acceleration of Convolutiona...
 
NNECST: an FPGA-based approach for the hardware acceleration of Convolutional...
NNECST: an FPGA-based approach for the hardware acceleration of Convolutional...NNECST: an FPGA-based approach for the hardware acceleration of Convolutional...
NNECST: an FPGA-based approach for the hardware acceleration of Convolutional...
 
Aes
AesAes
Aes
 

Plus de Amit Ghosh

Time Series Analysis
Time Series AnalysisTime Series Analysis
Time Series AnalysisAmit Ghosh
 
India Surf Festival
India Surf Festival India Surf Festival
India Surf Festival Amit Ghosh
 
Hybrid Marketing Activites
Hybrid Marketing ActivitesHybrid Marketing Activites
Hybrid Marketing ActivitesAmit Ghosh
 
Freshers' Party
Freshers' PartyFreshers' Party
Freshers' PartyAmit Ghosh
 
The influence of heredity and environment on intelligence
The influence of heredity and environment on  intelligenceThe influence of heredity and environment on  intelligence
The influence of heredity and environment on intelligenceAmit Ghosh
 
Observational learning
Observational learningObservational learning
Observational learningAmit Ghosh
 
Schedules of reinforcement
Schedules of reinforcementSchedules of reinforcement
Schedules of reinforcementAmit Ghosh
 

Plus de Amit Ghosh (7)

Time Series Analysis
Time Series AnalysisTime Series Analysis
Time Series Analysis
 
India Surf Festival
India Surf Festival India Surf Festival
India Surf Festival
 
Hybrid Marketing Activites
Hybrid Marketing ActivitesHybrid Marketing Activites
Hybrid Marketing Activites
 
Freshers' Party
Freshers' PartyFreshers' Party
Freshers' Party
 
The influence of heredity and environment on intelligence
The influence of heredity and environment on  intelligenceThe influence of heredity and environment on  intelligence
The influence of heredity and environment on intelligence
 
Observational learning
Observational learningObservational learning
Observational learning
 
Schedules of reinforcement
Schedules of reinforcementSchedules of reinforcement
Schedules of reinforcement
 

Salsa20

  • 1. INDIAN STATISTICAL INSTITUTE Salsa A Detailed Study on Salsa under the guidance of Dr. Bimal K. Roy Amit Kumar Ghosh , Abhijnan Chattopadhyay, Priyanka Syal, Preetha Bhattacharjee The document gives an overall description over specifications of Salsa as hash function, expansion function, encryption function ; a range of benchmarks relevant to cryptographic speed ; and explains, at a lower level, techniques to achieve this performance ; and discusses modern day cryptanalysis and security of Salsa and point all alternative measures and other variants listed till date.
  • 2. Table of Contents Designing Salsa20 ............................................................................................................................. 2 Introduction .................................................................................................................................. 2 Operations..................................................................................................................................... 2 Encryption ..................................................................................................................................... 3 Hashing ......................................................................................................................................... 4 Benchmarking Salsa20 ...................................................................................................................... 5 The Salsa20 structure .................................................................................................................... 5 Salsa20 on different Platforms ....................................................................................................... 5 Salsa20 specification ......................................................................................................................... 7 Defining Functions ......................................................................................................................... 7 Specification .................................................................................................................................. 8 The Salsa20 hash function ......................................................................................................... 8 The Salsa20 expansion function ................................................................................................. 9 The Salsa20 encryption function ................................................................................................ 9 Security and cryptanalysis of Salsa20 ............................................................................................... 9 Side Channel Attacks ..................................................................................................................... 9 Notes on the uniform randomness and diagonal constants ............................................................ 9 Differential Cryptanalysis of Salsa20/8 ......................................................................................... 10 Truncated differential cryptanalysis of five rounds of Salsa20 .................................................. 10 Algebraic attacks ......................................................................................................................... 11 Other notions of security ............................................................................................................. 11 Alternative Proposals ..................................................................................................................... 11 Extending the Salsa20 nonce ....................................................................................................... 11 1|P a ge
  • 3. Designing Salsa20 Introduction eSTREAM, the ECRYPT Stream Cipher Project, called for submissions of stream ciphers in November 2004. It received more than 30 proposals from 97 cryptographers in 19 countries, and over the subsequent years collected a total of 200 papers. The final eSTREAM portfolio," containing four software stream ciphers and four hardware stream ciphers, was announced in April 2008. The portfolio was revised in September 2008 to eliminate a hardware stream cipher, F-FCSR v2, that had been broken. Salsa20/r is a software-oriented (profile 1) stream cipher proposed by Daniel J. Bernstein. The algorithm supports keys of 128 bits and 256 bits. During its operation, the key, a 64-bit nonce (unique message number), a 64-bit counter and four 32-bit constants are used to construct the 512-bit initial state. After r iterations of the Salsa20/r round function, the updated state is used as a 512-bit keystream output. Each such output block is an independent combination of the key, nonce, and counter and, since there is no chaining between blocks, the operation of Salsa20/r resembles the operation of a block cipher in counter mode. Salsa20/r therefore shares the very same implementation advantages, in particular the ability to generate output blocks in any order 70 and in parallel. The maximum length of the keystream produced by Salsa20/r is 2 bits. Operations Topic Explanations Integer multiplications Argument Counter-argument  The basic argument for integer  integer multiplication takes multiplication is that the output bits several cycles on typical CPUs, are complicated functions of the input and many more cycles on some bits, mixing the inputs more thoroughly CPUs. For comparison, a than comparably complex series of  A further argument against integer simple integer operations is multiplication is that it increases the always reasonably fast. risk of timing leaks. What really matters Multiplication might be slightly is not the speed of integer faster on some CPUs but it is multiplication, but the speed of not consistently constant-time integer multiplication, fast. which is often much slower. S-box lookups Argument Counter-argument [An S-box lookup is an array  S-boxes is that a single table lookup  A simple integer operation lookup using an input- can mangle its takes one or two 32-bit inputs dependent index. Most input quite more thoroughly than a rather than one 8-bit input, so it ciphers are designed to take chain of a few simple integer electively mangles several advantage of this operation. operations taking the same amount of 8-bit inputs at once. It is not For example, typical high- time. obvious that a series of S-box speed AES software has  A further argument against S-box lookups-even with several 1024-byte S-boxes, lookups is that, on most platforms, rather large S-boxes, as in AES, each of which converts theyare vulnerable to timing attacks. increasing L1 cache pressure on 8-bit inputs to 32-bit outputs.] NIST's statement to the contrary large CPUs and forcing different (table lookup is not vulnerable to implementation techniques for 2|P a ge
  • 4. timing attacks") is erroneous. small CPUs -is faster than a comparably complex series of integer operations. Rotations Argument Counter-argument [Rotations account for about The basic argument for rotations is that one one third of the integer xor of a rotated quantity provides as much operations in Salsa20, and diffusion as two xors of shifted quantities. more on the UltraSPARC. Replacing some of the rotations with a comparable number of additions might achieve comparable di usion in fewer rounds.] Encryption Topic Explanations Different encryption and The popularity of CBC appears to be a historical accident. I have found very decryption few people arguing for CBC over counter mode, and none of the arguments are even marginally convincing. On occasion I encounter the superstitious notion that encryption by xor is too simple"; but a one-time pad (in conjunction with aWegman-Carter MAC) provably achieves perfect secrecy (and any desired level of integrity), so there is obviously nothing wrong with xor. There are several clear arguments against CBC. One disadvantage of CBC is that it requires different code for encryption and decryption, increasing costs in many contexts. Another disadvantage of CBC is that the extra communication from the cryptanalyst into the cipher state is a security threat; regaining the original level of confidence means adding rounds, taking additional time. Stream’s dependency over Argument Counter-argument plaintext  The basic argument for incorporating  One counterargument is that plaintext into the stream (specically, “free" is a wild exaggeration. incorporating plaintext bytes into Incorporating the plaintext into subsequent bytes of the stream) is that the stream takes time for every this allows message authentication for block, and generating an free." After encrypting the plaintext, authenticator takes time for one generate a constant number of every message. additional stream bytes and output  Incorporation of plaintext, them as an authenticator of the being extra communication plaintext. from the cryptanalyst into the . cipher state, is a security threat. Regaining the original level of condence means adding rounds, which takes additional time for every block. State Argument Counter-argument  The argument for a larger state is that  A larger state loses time in one does not need as many cipher some contexts. Reuse forces rounds to achieve the same serialization: one cannot take conjectured security level. Copying advantage of extra hardware to 3|P a ge
  • 5. state across blocks seems to provide reduce the latency of just as much mixing as the rst few encrypting or decrypting long cipher rounds. A larger messages. Furthermore, large state therefore saves some time after states reduce the number of the first block.. messages that can be processed simultaneously on limited hardware. Block Size Argument Counter-argument  The basic argument for a larger block  A larger block size also loses size, say 256 bytes, one does not need time. On most CPUs, the as many cipher rounds to achieve the communication cost of same conjectured security level. Using sweeping through a 256-byte a larger block size, like copying state block is a bottleneck; CPUs are across blocks, seems to provide just as designed for computations that much mixing as the rst few cipher don't involve so much rounds. A larger state therefore saves data. time. Hashing Topic Explanations Implementation of Block Argument Counter-argument cipher  The basic argument for a block cipher  The basic disadvantage of a for keeping the k words independent block cipher is that the k words of the n words is that, for fixed k, it is consume valuable easy to make a block cipher be an communication resources. A invertible function of n. But this 64-byte block cipher with a 32- feature seems to be of purely historical byte key would need interest. Invertibility is certainly not to repeatedly sweep through 96 necessary for encryption. bytes of memory (plus a few bytes of temporary storage) for its 64 bytes of output; in contrast, Salsa20 repeatedly sweeps through just 64 bytes of memory (plus a few bytes of temporary storage) for its 64 bytes of output. Code-Length Argument Counter-argument  Using two different kinds of rounds is  The basic counterargument is the idea that attacks will have some that extra code is expensive in extra difficulty passing through the many contexts. It increases switch from one kind to another. This pressure on a CPU's L1 cache, extra difficulty would allow the cipher for example, and it increases to reach the same security level with the minimum size of a fewer rounds.. hardware implementation. Diffusion among words  Salsa20 views its 16 words as a 4 4 array. During the rst round, there is no communication between columns; each column has its own chain of 12 serial 4|P a ge
  • 6. operations modifying the words in that column. During the second round, there is no communication between rows; each row has its own chain of 12 serial operations modifying the words in that row. Et cetera.  There are pairs (i; j) such that a change in word i has no opportunity to affect word j until the third round. A different communication structure would allow much faster diffusion of changes through all 16 words. On the other hand, it doesn't appear to be possible to achieve much faster diffusion of changes through all 512 bits. Modifications other than add- There are many plausible ways to modify each word in a column using other words rotate-xor in the same column. The author settled on xor a rotated sum" as bouncing back and forth between incompatible structures on the critical path. The author chose xor a rotated sum" over add a rotated xor" for simple performance reasons: the x86 architecture has a three-operand addition (LEA) but not a three-operand xor. Benchmarking Salsa20 The Salsa20 structure Encryption of a 64-byte block is xor with the output of the Salsa20 hash function, where the input consists of the 32-byte Salsa20 key, the 8-byte nonce (unique message number), the 8-byte block counter, and 16 constant bytes. The reader is cautioned that encryption time is slightly longer than hashing time: in particular, a 64-byte xor is not free. The Salsa20 hash function regards its 64-byte input x as an array of 16 words in little-endian form. It performs 320 invertible modfications, where each modfication changes one word of the array. The resulting words are added to the original words, producing, in little-endian form, the 64-byte output Salsa20(x). Each modifiation involves xoring into one word a rotated version of the sum of two other words. Thus the 320 modifiations involve, overall, 320 additions, 320 rotations, and 320 xors. The rotations are all by constant distances. The entire series of modfications is a series of 10 identical double-rounds. Each double-round is a series of 2 rounds. Each round is a set of 4 parallel quarter-rounds. Each quarter-round is a series of 4 word modifiations. Salsa20 on different Platforms Platform Name Implementation Comparison to AES timings AMD Athlon salsa20_word_pm software takes 29:25 Osvik reports that unpublished software|with no Athlon cycles for a Salsa20 round, totalling protection against timing leaks|takes 225 Athlon 585 cycles (9:15 cycles/byte) for 20 rounds, cycles (over 14 cycles/byte) to encrypt a 16-byte totalling 645 cycles (10:08 cycles/byte) for block with a 16-byte AES key, assuming that the key the Salsa20 hash function, timed as 680 was pre-expanded into 176 bytes. One can cycles with 35 cycles timing overhead. The reasonably extrapolate that similar software would timings are actually 655 or 656 cycles most of take over 300 Athlon cycles (over 18 cycles/byte) to the time but 849 cycles on every eighth call, encrypt a 16-byte block with a 32-byte AES key, presumably because of branch assuming that the key was pre-expanded into 240 mispredictions. bytes. The compiled code occupies 1248 bytes. Its main loop occupies 937 bytes and handles 4 rounds. IBM PowerPC salsa20_word_aix software takes 33 PowerPC 5|P a ge
  • 7. RS64 IV RS64 IV cycles for each Salsa20 round, totalling 660 cycles (10:32 cycles/byte) for 20 rounds, totaling 756 cycles (11:82 cycles/byte) for the Salsa20 hash function, timed as 770 cycles with 14 cycles timing overhead. The compiled code for the Salsa20 hash function occupies 768 bytes. Its main loop occupies 392 bytes and handles 2 rounds. Intel Pentium salsa20_word_pii software takes 37:5 Osvik reports that unpublished software|with no III pentium III cycles for each Salsa20 round, protection against timing leaks|takes 224 Pentium totalling 750 cycles (11:72 cycles/byte) for 20 III cycles (14 cycles/byte) to encrypt a 16-byte block rounds, totalling 837 cycles (13:08 with a 16-byte AES key, assuming that the key was cycles/byte) for the Salsa20 hash function, pre-expanded into 176 bytes. timed as 872 cycles with 35 cycles timing One can reasonably extrapolate that similar overhead. (The timings are actually 859 software would take over 300 Pentium III cycles cycles most of the time but 908 cycles on (over 18 cycles/byte) to encrypt a 16-byte block every fourth call, presumably because of with a 32- byte AES key, assuming that the key was branch mispredictions.) pre-expanded into 240 bytes. The compiled code for the Salsa20 hash function occupies 1280 bytes. Its main loop occupies 937 bytes and handles 4 rounds. Intel Pentium 4 salsa20_word_p4 software takes 48 Pentium Osvik reports that unpublished software|with no f12 4 f12 (Willamette) cycles for each Salsa20 protection against timing leaks|takes 260 Pentium round, totalling 960 cycles (15 cycles/byte) 4 (f12?) cycles (16:25 cycles/byte) to encrypt a 16- for 20 rounds, totaling 1052 cycles (16:44 byte block with a 16-byte AES key, assuming that cycles/byte) for the Salsa20 hash function, the key was pre-expanded into 176 bytes. Matsui timed as 1136 cycles with 84 cycles timing and Fukuda report that unpublished software|with overhead. no protection against timing leaks|takes 251 The compiled code for the Salsa20 hash Pentium 4 (f29?) cycles (15:68 cycles/byte) and function occupies 1144 bytes. Its main loop 284 Pentium 4 f33 cycles (17:75 cycles/byte). occupies 629 bytes and handles 4 rounds. One can reasonably extrapolate that similar software would take over 340 Pentium 4 f12 cycles (over 21 cycles/byte) to encrypt a 16-byte block with a 32-byte AES key, assuming that the key was pre-expanded into 240 bytes. Intel Pentium salsa20_word_pm software takes 33:75 The Pentium M might compute AES in marginally M Pentium M cycles for each Salsa20 round, less time than the Pentium III, but both CPUs face totalling 675 cycles (10:55 cycles/byte) for 20 the same basic AES bottleneck: encrypting a 16- rounds, totalling 740 cycles (11:57 byte block with a 16-byte AES key requires 200 S- cycles/byte) for the Salsa20 hash function, box lookups, which cannot take fewer than 200 timed as 790 cycles with 50 cycles timing cycles (12:5 cycles/byte). Similarly, encrypting a 16- overhead. (The timings are actually 780 or byte block with a 32-byte AES key requires 280 S- 781 cycles most of the time but 856 cycles on box lookups, which cannot take fewer than 280 every eighth call, presumably because of cycles (17:5 cycles/byte). Even more S-box lookups branch mispredictions.) are required if keys are not pre-expanded. The compiled code for the Salsa20 hash function occupies 1248 bytes. Its main loop occupies 937 bytes and handles 4 rounds. Motorola salsa20_word_macos software takes 24:5 Lipmaa reports that AES software by Ahrens|with, 6|P a ge
  • 8. PowerPC 7410 PowerPC 7410 cycles for each Salsa20 round, presumably, no protection against timing totalling 490 cycles (7:66 cycles/byte) for 20 leaks|takes 401 PowerPC 7400 cycles (over 25 rounds, totaling approximately 570 cycles cycles/byte) to encrypt a 16-byte block with a 16- (8:91 cycles/byte) for the Salsa20 hash byte AES key, assuming that the key was function, timed as approximately 584 cycles pre-expanded into 176 bytes. I am not aware of any with 14 cycles timing overhead. (Precise relevant differences between the PowerPC 7400 timings are dicult: the CPU's cycle counter and the PowerPC 7410. has 16-cycle resolution.) It should be possible to do somewhat better|my The compiled code for the Salsa20 hash own public-domain AES software, including key function occupies 768 bytes. Its main loop expansion, takes about 490 cycles on the PowerPC occupies 392 bytes and handles 2 rounds. 7410|but AES is clearly much slower than Salsa20 on this CPU Sun salsa20_word_sparc software takes 40:5 Lipmaa reports that unpublished software|with, UltraSPARC II UltraSPARC II cycles for each Salsa20 round, presumably, no protection against timing totalling 810 cycles (12:66 cycles/byte) for 20 leaks|takes 270 UltraSPARC II cycles (over 16 rounds, totaling 881 cycles (13:77 cycles/byte) to encrypt a 16-byte block with a 16- cycles/byte) for the Salsa20 hash function, byte AES key, assuming that the key was timed as 892 cycles with 11 cycles timing pre-expanded into 176 bytes. One can reasonably overhead. extrapolate that similar software would take over The compiled code for the Salsa20 hash 370 UltraSPARC II cycles (over 23 cycles/byte) to function occupies 936 bytes. Its main loop encrypt a 16-byte block with a 32-byte AES key, occupies 652 bytes and handles 2 rounds. assuming that the key was pre-expanded into 240 bytes. Sun salsa20_word_sparc software takes 41 AES on an UltraSPARC III is at least as slow as AES UltraSPARC III UltraSPARC III cycles for each Salsa20 round, on an UltraSPARC II. totalling 820 cycles (12:82 cycles/byte) for 20 rounds, totaling 889 cycles (13:90 cycles/byte) for the Salsa20 hash function, timed as 905 cycles with 16 cycles timing overhead. The compiled code for the Salsa20 hash function occupies 936 bytes. Its main loop occupies 652 bytes and handles 2 rounds. Salsa20 specification Defining Functions Functions Inputs & Definition Outputs The If y is a 4-word quarterround sequence then function quarterround(y) is a 4-word sequence 7|P a ge
  • 9. The rowround If y is a 16-word function sequence then rowround(y) is a 16-word sequence. The If x is a 16-word columnround sequence then function columnround(x) is a 16-word sequence. The doubleround If x is a 16-word A double round is a column round followed by a row round: doubleround(x) = function sequence then rowround(columnround(x)). doubleround(x) is a 16-word sequence. The littleendian If b is a 4-byte function sequence then littleendian(b) is a word. Specifications Functions Inputs & Outputs Definition The Salsa20 If x is a 64-byte hash function sequence then Salsa20(x) is a 64- byte sequence. 8|P a ge
  • 10. The Salsa20 If k is a 32-byte or expansion 16-byte sequence function and n is a 16-byte sequence then Salsa20k(n) is a 64-byte sequence. The Salsa20 Let k be a 32-byte encryption or 16-byte function sequence. Let v be an 8-byte sequence. Let m be a l-byte sequence for some l€{1,2,…, }. The Salsa20 encryption of m with nonce v under key k, denoted Salsa (v) m, is an l-byte sequence. Security and cryptanalysis of Salsa20 Side-channel attacks Natural Salsa20 implementations take constant time on a huge variety of CPUs; here constant means input-independent. There is no incentive for the authors of Salsa20 software to use variable-time operations such as S-box lookups. Timing attacks against Salsa20 are therefore just as di_cult as pure cryptanalysis of the Salsa20 outputs. The operations in Salsa20 are also among the easiest to protect against power attacks and other side-channel attacks. Notes on the uniform randomness and diagonal constants † Salsa20 column round: Each Salsa20 column round affects each column in the same way starting from the diagonal. Each Salsa20 row round affects each row in the same way starting from the diagonal. Consequently, shifting the entire Salsa20 hash-function input array along the diagonal has exactly the same effect on the output. † Salsa20 expansion function: o Eliminates this shift structure by limiting the attacker's control over the hash- function input. In particular, the input diagonal is always 0x61707865, 0x3320646e, 0x79622d32, 0x6b206574, which is different from all its nontrivial shifts. In other words, two distinct arrays with this diagonal are always in distinct orbits under the shift group. o Eliminates this rotation structure. The input diagonal is different from all its nontrivial shifts and all its nontrivial rotations and all nontrivial shifts of its nontrivial 9|P a ge
  • 11. rotations. In other words, two distinct arrays with this diagonal are always in distinct orbits under the shift/rotate group. † Salsa20 hash function: Operations are almost compatible with rotation of each input word by, say, 10 bits. Rotation changes the effect of carries that cross the rotation boundary, but it is consistent with all other carries, and with the Salsa20 operations other than addition. † Attacks based on Non-randomness: Simon Fischer, Willi Meier, Côme Berbain, Jean- François Biasse and M. J. B. Robshaw published a paper which shows that Stream cipher initialisation should ensure that the initial state or keystream is not detectably related to the key and initialisation vector. In this paper we analyse the key/IV setup of the eSTREAM Phase 2 candidates Salsa20 and TSC-4. In the case of Salsa20 we demonstrate a key recovery attack on six rounds and observe non-randomness after seven. For TSC-4, non-randomness over the full eight-round initialisation phase is detected, but would also persist for more rounds. Differential Cryptanalysis of Salsa20/8 The idea of a differential attack is that some “small” differences in input states have a perceptible chance of producing “small” differences after the first step of the computation, the second step of the computation, etc. Salsa AES Salsa20 is quite different in this respect from Salsa20 has 16-byte inputs, 64-byte outputs, and ciphers such as AES where the input size is as 32-byte keys; there are ,choices of (n, ,k) so large as the state size. AES has 16-byte inputs, there is no a-priori reason to believe that any of 16-byte outputs, and (at least) 16-byte keys; the choices have the 128-bit quantity there are 2384 choices of (n, ,k) so presumably and the 512-bit quantity and Salsak Salsak there are more than ,choices in which both (n) are “small”. of the 128-bit quantities and AESk AESk (n) are “small”. † Yukiyasu Tsunoo , Teruo Saito , Hiroyasu Kubo , Tomoyasu Suzaki, and Hiroki Nakashima published a paper which presents a cryptanalysis of the Salsa20 stream cipher proposed in 2005. Salsa20 was submitted to eSTREAM, the ECRYPT Stream Cipher Project. The cipher uses bitwise XOR, addition modulo , and constant-distance rotation operations on an internal state of 16 32-bit words. † It is reported that there is a significant bias in the differential probability for Salsa20’s 4th round internal state. It is further shown that using this bias, it is possible to break the 256-bit secret key 8-round reduced Salsa20 model with a lower computational complexity than an exhaustive key search. The cryptanalysis method exploits characteristics of addition, and succeeds in reducing the computational complexity compared to previous methods. Truncated differential cryptanalysis of five rounds of Salsa20 Going further detail of the paper presented by Yukiyasu Tsunoo , Teruo Saito , Hiroyasu Kubo , Tomoyasu Suzaki, and Hiroki Nakashima; Paul Crowley published another paper stating “Truncated differential cryptanalysis of five rounds of Salsa20” which present an attack on Salsa20 reduced to five of its twenty rounds.This attack uses many clusters of truncated differentials and requires , work and plaintexts. This conclusion leaves some open questions. 10 | P a g e
  • 12. It is clear that a naive attack of this type cannot be extended to more than a handful of rounds; this has no negative implications for the security of the full Salsa20- 32/20 presented to eSTREAM. Nonetheless, the degree of clustering exhibited by these differential characteristics is surprising; it is more usual for a single differential trail to dominate. It is also striking to find differential trails whose overall probability is so greatly mispredicted by the products of the probabilities of its components, marking a violation of the independence assumption usual in differential cryptanalysis. In both instances, it would bear investigation whether other ciphers that rely heavily on addition mod 2n to introduce nonlinearity in GF(2) would also show these properties in differential cryptanalysis, or related properties in other forms of cryptanalysis. Algebraic attacks General-purpose equation-solving methods, notably Buchberger's algorithm for computing Groebner bases, are remarkably powerful. Clegg, Edmonds, and Impagliazzo in proved for a comparable problem, namely finding proofs in propositional logic|that a Groebner-basis computation can quickly solve any problem that can be quickly solved by various ad-hoc proof-finding techniques. Even better, the Groebner-basis computation can quickly solve other problems that cannot be quickly solved by the ad-hoc techniques. It would be interesting to see analogous theorems regarding various ad-hoc cryptanalytic techniques. Fortunately, there does not seem to exist any “small” set of equations for the state bits in Salsa20. Each of the 320 32-bit additions in the Salsa20 computation requires dozens of quadratic equations, producing a substantially larger system of equations than are required to describe, for example, the bits in AES. Groebner-basis techniques for solving the AES-bit equations are, by the most optimistic estimates, slightly faster than brute-force search for a 256-bit key, but they use vastly more memory and thus have a much worse price-performance ratio. Algebraic attacks against Salsa20 appear to be even more difficult. Other notions of security Attacks Explanation Weak-key attacks This type of attack seems highly implausible for Salsa20. The Salsa20 key is mangled along with the input in an extremely complicated way. Any key differences rapidly spread through the entire Salsa20 state for the same reason that input differences do. Equivalent-key attacks This type of attack, like a weak-key attack, seems highly implausible for Salsa20 as machine would violate the Salsa20 security conjecture. In other words, there is no need to make a separate conjecture regarding equivalent keys. Related-key attacks The standard solutions to all the standard cryptographic problems -encryption, authentication, etc. - are protocols that do not allow related-key attacks on the underlying primitives.There is no evidence of violence till date. Key Recovery Attack At FSE 2008 Aumasson et al. improved this attack on Salsa20/7 and presented the first key-recovery attack on Salsa20/8. . It is a differential attack based on a technique called probabilistic neutral bits. The authors identify collision and preimage attacks for two simplified variants, then we discuss differential attacks on the original version, and exploit a high-probability differential to reduce complexity of collision search from 2256 to 279 for 3-round Rumba. 11 | P a g e
  • 13. Alternative Proposals Extending the Salsa20 nonce Daniel J. Bernstein, the creator of Salsa published an another paper entitled “Extending the Salsa20 nonce” which introduces the XSalsa20 stream cipher. XSalsa20 is based upon the Salsa20 stream cipher but has a much longer nonce: 192 bits instead of 64 bits. XSalsa20 has exactly the same streaming speed as Salsa20, and its extra nonce-setup cost is slightly smaller than the cost of generating one block of Salsa20 output. The paper proves that XSalsa20 is secure if Salsa20 is secure: any fast attack on XSalsa20 using q queries and succeeding with probability p can be converted into a fast attack on Salsa20 succeeding with probability at least p/(q + 1). The paper introduces a new family of stream ciphers, XSalsa20. XSalsa20 is, at first glance, quite similar to Salsa20: it is built from exactly the same operations, has exactly the same protections against side-channel attacks, has exactly the same streaming speed, supports 256-bit keys, and allows reduced- round variants such as XSalsa20/12. Note that the speed reports above are for full-round Salsa20/20, not Salsa20/12. The advantage of XSalsa20 over Salsa20 is a longer nonce: 192 bits rather than 64 bits. The disadvantage is that nonce setup is less efficient-but the extra cost here is comparable to, and in fact slightly smaller than, the cost of generating a single Salsa20 output block. XSalsa20 might at first appear to be an ad-hoc design, following standard principles but potentially vulnerable to new attacks. On the contrary! The paper proves that any fast successful attack on XSalsa20 can be converted into a fast successful attack on Salsa20. Confidence in the security of Salsa20 therefore implies confidence in the security of XSalsa20. The paper is not meant to take a position in the dispute regarding the necessity of longer nonces. The paper does not claim any benefits for XSalsa20 in an application that already works with Salsa20's 64-bit nonces. What the paper shows is that-in case an application does want longer nonces-the Salsa20 nonce can be safely extended at surprisingly low cost. References 1. Daniel J. Bernstein, Salsa20 - Design, Specification, Security and Speed. URL: http://www.ecrypt.eu.org/stream/p3ciphers/salsa20/salsa20_p3.zip 2. Paul Crowley, Truncated differential cryptanalysis of five rounds of Salsa20. URL: http://eprint.iacr.org/2005/375 3. Yukiyasu Tsunoo , Teruo Saito , Hiroyasu Kubo , Tomoyasu Suzaki, and Hiroki Nakashima, Differential Cryptanalysis of Salsa20/8. URL: http://sasc.crypto.rub.de/files/sasc2007_039.pdf 4. Jean-Philippe Aumasson, Simon Fischer, Shahram Khazaei, Willi Meier and Christian Rechberger , New Features of Latin Dances: Analysis of Salsa, ChaCha, and Rumba. URL: http://www.springerlink.com/content/j35241j881018085/ 5. Simon Fischer, Willi Meier, Côme Berbain, Jean-François Biasse and M. J. B. Robshaw, Non- randomness in eSTREAM Candidates Salsa20 and TSC-4. URL: http://www.springerlink.com/content/46wv58h040218wp4/ 6. Daniel J. Bernstein, Extending the Salsa20 nonce. URL: http://cr.yp.to/snuffle/xsalsa- 20110204.pdf 7. Robshaw, Matthew; Billet, Olivier (Eds.), New Stream Cipher Designs. URL: http://www.springer.com/computer/security+and+cryptology/book/978-3-540-68350-6 12 | P a g e