By Wolfgang Keller
Originally written 2020-09-05
Last modified 2022-06-19
What we want to find is an algorithm for \begin{align*} \mathit{exp}: \left(M \times \prod_{n \in \mathbb{Z}_{\geq 0}} \{0,1\}^{\{0, \ldots, n-1\}}\right) & \rightarrow M: \\ a\ n\ f & \mapsto a^{\sum_{i=0}^{n-1} f_i \cdot 2^i}. \end{align*} where \(M\) is a (multiplicatively written) monoid.
For \(k \in \{n, \ldots, 0\}\), set \[ \texttt{R0}_k := a^{\sum_{i=k}^{n-1} f_i \cdot 2^{i-k}}. \] We have for \(k \in \{n-1, \ldots, 0\}\): \[ \texttt{R0}_k = \begin{cases} \texttt{R0}_{k+1} \cdot \texttt{R0}_{k+1} & \text{if } f_k = 0, \\ \texttt{R0}_{k+1} \cdot \texttt{R0}_{k+1} \cdot a & \text{if } f_k = 1, \end{cases} \]
This yields the following algorithm:
In particular for cryptographic applications, often desires that independently of the execution path that the code takes, the exact number of executed monoid multiplications is the same. The idea behind this is that for many cryptographical applications, the exponent \(e\) must be kept secret (with, for example. only \(\deg e\) known to the public).
One method for this, which we present in this section, is called a Montgomery ladder.
Using a Montgomery ladder does, of course, not suffice to ensure that the code is secure against timing attacks (even if the monoid operations are implemented in a way that they are executed in constant time). Instead, if the operands of the monoid are stored in memory (because they do not fit into registers), cache timing information can leak which operands were used and thus which code branch was taken.
Because of this, in most cases a better implementation option for code that is secure against timing attacks is to execute both possible branches (current bit is 0 vs 1) and use a (constant-time) conditional move afterwards to select the “correct” branch that the code “actually took”.
In the sense of the previous warning, we describe the Montgomery ladder for the mathematical ideas behind it.
The first step is to observe that in the loop of Algorithm ,
R0
is set to
R0 * R0
(R0 * R0) * a = R0 * (R0 * a)
.
So, we introduce another variable R1
whose value is always R0 * a
before and after each iteration of the loop, i.e. for \(k \in \{n, \ldots, 0\}\), set
\[
\texttt{R1}_k := a^{\sum_{i=k}^{n-1} f_i \cdot 2^{i-k} + 1}.
\]
This yields the following algorithm:
The next step is to observe that
the code block
R0 := R0 * R0
R1 := R0 * a
can be replaced by
R1 := R0 * R1
R0 := R0 * R0
the code block
R0 := R0 * R0
R0 := R0 * a
R1 := R0 * a
can be replaced by
R0 := R0 * R1
R1 := R1 * R1
This yields the following algorithm, the Montgomery ladder:
A version of the Montgomery ladder that one often finds in the literature is based on the following observation: For some groups, the neutral element is “inconvenient” to use. For example, for the group of an elliptic curve, the neutral element is “the point at \(\infty\)”.
So, one demands that \(f_{n-1} = 1\), uses this fact to unroll one loop iteration in Algorithm , and obtains the following algorithm:
If one simplifies the initial code lines of Algorithm , one obtains:
Because of the \(f_{n-1} = 1\) condition in Algorithm , Algorithm “leaks” the value of \(\left\lfloor \log_2 \left(\sum_{i=0}^{n-1} f_i \cdot 2^i\right)\right\rfloor = n-1\).
On the other hand, Algorithm only leaks the upper bound \(n-1\) for \(\left\lfloor \log_2 \left(\sum_{i=0}^{n-1} f_i \cdot 2^i\right)\right\rfloor\), since if \(n' := \left\lfloor \log_2 \left(\sum_{i=0}^{n-1} f_i \cdot 2^i\right)\right\rfloor\), the situation that \(n' < n-1\) can occur in Algorithm , because for the latter algorithm, we do not require \(f_{n-1} = 1\). If \(n' < n-1\), we, of course, have \(f_{n'+1} = \ldots = f_{n-1} = 0\).
For \(k \in \{-1, \ldots, n-1\}\), set \begin{align*} \texttt{R0}_k & := a^{\sum_{i=0}^k f_i \cdot 2^i}, \\ \texttt{R2}_k & := a^{2^{k+1}}. \end{align*} We have for \(k \in \{0, \ldots, n-1\}\): \begin{align*} \texttt{R0}_k & = \begin{cases} \texttt{R0}_{k-1} & \text{if } f_k = 0, \\ \texttt{R0}_{k-1} \cdot \texttt{R2}_{k-1} & \text{if } f_k = 1, \end{cases} \\ \texttt{R2}_k & = \texttt{R2}_{k-1} \cdot \texttt{R2}_{k-1}. \end{align*}
This yields the following algorithm:
Let \(R\) be a commutative ring, let \(M\) be an \(R\)-module, let \(k \in \mathbb{Z}_{\geq 1}\), and let \(f_{-(k-1)}, \ldots, f_0 \in M\) (start values) and \(a_{-k}, \ldots, a_{-1} \in R\) (coefficients) be given. Consider the sequence \begin{align*} f: \mathbb{Z}_{\geq -(k-1)} & \rightarrow M: \\ n & \mapsto \begin{cases} f_n & \text{if } n \in \{-(k-1), \ldots, 0\}, \\ \sum\limits_{i=-k}^{-1} a_i f_{n+i} & \text{otherwise.} \end{cases} \end{align*} Of course, we can use an iterative algorithm to compute the \(n\)th element of the sequence. But we want to use one of the exponentiation by squaring algorithms from section to obtain a faster algorithm.
Let \begin{align*} A & := \left( \begin{array}{ccccc} a_{-1} & a_{-2} & \cdots & a_{-(k-1)} & a_{-k} \\ 1 & & & & \\ & 1 & & & \\ & & \ddots & & \\ & & & 1 & \end{array} \right), \\ f_{start} & := \left( \begin{array}{c} f_0 \\ f_{-1} \\ \vdots \\ f_{-(k-1)} \end{array} \right). \end{align*} The trick for the algorithm is to consider that for \(n \in \mathbb{Z}_{\geq 0}\), we have \[ f_n = \left( A^n \cdot f_{start} \right)_0. \]
Matrices over a commutative ring form a monoid with respect to multiplication. Thus, we can use one of the exponentiation by squaring algorithms from section to compute \(A^n\).