# Formal Software Methods for Cryptosystems Implementation Security

Pablo Rauzy rauzy@enst.fr pablo.rauzy.name

Advisor: Sylvain Guilley



#### PhD Defense

July 13, 2015 @ ENS Ulm, Paris.

Pablo Rauzy (Telecom ParisTech)

PhD Defense





 $N = p \cdot q$  $d \cdot e \equiv 1 \bmod \varphi(N)$ 

Private key $\left( d,N\right)$ 





$$\boxed{S = M^d \bmod N}$$

Step 2

 $\underbrace{M \stackrel{?}{\equiv} S^e \mod N}_{\text{Step 4}}$ 

Introduction Implementations

$$N = p \cdot q$$
$$d \cdot e \equiv 1 \mod \varphi(N)$$



 $\begin{array}{l} d_p \doteq d \bmod \varphi(p) \\ d_q \doteq d \bmod \varphi(q) \\ i_q \doteq q^{-1} \bmod p \end{array}$ 

$$N = p \cdot q$$
$$d \cdot e \equiv 1 \mod \varphi(N)$$

Private key  $(p, q, d_p, d_q, i_q)$ 





 $\begin{array}{l} d_p \doteq d \bmod \varphi(p) \\ d_q \doteq d \bmod \varphi(q) \\ i_q \doteq q^{-1} \bmod p \end{array}$ 

$$N = p \cdot q$$
$$d \cdot e \equiv 1 \mod \varphi(N)$$

Private key  $(p, q, d_p, d_q, i_q)$ 

Public key (e, N)





- Protection against this kind of attacks?
- Proof of the protection?
- Automation of the protection? Of the proof?

- Fault injection attacks:
  - introduced in 1996 by Boneh, DeMillo, and Lipton.
- Side-channel attacks:
  - introduced in 1996 by Kocher.

- Since then, many countermeasures have been developed:
  - at hardware- and software-level,
  - mostly in the industry, but also in academia.

► However, virtually all of them are a matter of artisanal work:

- there is no automation,
- parameter choices are scarcely motivated,
- there are seldom rigorous security proofs.

### ▶ Use tools from mathematics and theoretical computer science:

- prove that systems respect some functional and security properties,
- automatize the proof.
- Enable optimization:
  - security is costly,
  - mechanized proofs can be used as non-regression tests.

Formal methods are uncommon in the cryptologic community:

- cryptology is a fast-moving field,
- animated by many engineers and hackers.

> And even more unusual for security against physical attacks:

- even more engineering (e.g., trial-and-error development),
- physical behavior seemingly difficult to model.

- ► However, a few attempts have been made before I started my PhD.
- Proofs:
  - Rivain and Prouff (2010): provable masking.
  - ► Christofi et al. (2012): proved BellCoRe countermeasure.
- Automation:
  - Bayrak et al. (2011): automatic random precharging.
  - Moss et al. (2012): assisted masking.

- 1. Proof and automation of a generic side-channel countermeasure.
- 2. Proof and simplifications of fault injection countermeasures.
- 3. Automation of a proven generic fault injection countermeasure.

- ▶ Proof and automation of a *balancing* countermeasure:
  - null side-channel signal-to-noise ratio,
  - i.e., constant leakage,
  - ▶ using *Dual-rail with Precharge Logic* in software.
- Automation of the (proven correct) code transformation.
- Automation of the security proof:
  - symbolic execution ensures the constant-leakage property holds.



Formal study of CRT-RSA countermeasures against fault attacks:

- security proofs,
- optimization (speed, number of tests, randomness requirement),
- symbolic evaluation by term rewriting in an arithmetic framework.
- Classification of a family of existing countermeasures:
  - extract protection principles from the employed techniques,
  - design method to resist an arbitrary (but fixed) number of faults.

Extraction of the principles of CRT-RSA countermeasures:

- integrity verification,
- independent from the fault model,
- independent from the algorithm,
- applicable to all asymmetric cryptography.
- Security proof of the generic countermeasure.
- Automation of the (proved correct) code transformation:
  - protection of previously unprotected (but attacked) algorithms.

### Introduction

- Towards Generic Countermeasures Against Fault Injection Attacks

- RSA, CRT-RSA, the BellCoRe attack, countermeasures.
- Formal study, finja, proved countermeasures.
- Classification of countermeasures, building better countermeasures.
- Integrity verification, enredo, generic countermeasures.



### RSA (Rivest, Shamir, Adleman)

RSA is an algorithm for public key cryptography. It can be used as both an encryption and a signature algorithm.

- ▶ Let M be the message, (N, e) the public key, and (N, d) the private key, such that  $d \cdot e \equiv 1 \mod \varphi(N)$ .
- The signature S is computed by  $S \equiv M^d \mod N$ .
- The signature can be verified by checking that  $M \equiv S^e \mod N$ .

## CRT (Chinese Remainder Theorem)

CRT-RSA is an optimization of the RSA computation which allows a fourfold speedup.

- Let p and q be the primes from the key generation  $(N = p \cdot q)$ .
- ▶ These values are pre-computed (considered part of the private key):

$$d_p \doteq d \mod (p-1) d_q \doteq d \mod (q-1) i_q \doteq q^{-1} \mod p$$

► S is then computed as follows:

## BellCoRe (Bell Communications Research)

Definition

The BellCoRe attack consists in revealing the secret primes p and q by faulting the computation. It is very powerful as it works even with very random faulting.

- ▶ If  $S_p$  (resp.  $S_q$ ) is faulted as  $\widehat{S_p}$  (resp.  $\widehat{S_q}$ ), the attacker:
  - gets an erroneous signature  $\widehat{S}$ ,
  - ▶ can recover p (resp. q) as  $gcd(N, S \hat{S})$ .

- For all integer x, gcd(N, x) can only take 4 values:
  - 1, if N and x are co-prime,
  - p, if x is a multiple of p,
  - q, if x is a multiple of q,
  - ▶ N, if x is a multiple of both p and q, i.e., of N.
- If S<sub>p</sub> is faulted (i.e., replaced by S<sub>p</sub> ≠ S<sub>p</sub>):
  S Ŝ = q · ((i<sub>q</sub> · (S<sub>p</sub> S<sub>q</sub>) mod p) (i<sub>q</sub> · (Ŝ<sub>p</sub> S<sub>q</sub>) mod p))
  ⇒ gcd(N, S Ŝ) = q
- If S<sub>q</sub> is faulted (i.e., replaced by S<sub>q</sub> ≠ S<sub>q</sub>):
  S S ≡ (S<sub>q</sub> S<sub>q</sub>) (q mod p) · i<sub>q</sub> · (S<sub>q</sub> S<sub>q</sub>) mod p
  ⇒ gcd(N, S S) = p

Many countermeasures have been proposed:

- ► ~20 papers,
- from 1999 to now,
- both from academia and industry.
- Including:
  - Shamir (1999),
  - Aumüller et al. (2002),
  - ▶ Vigilant (2008) + Coron et al. (2010).

► The goal is making sure countermeasures are trustworthy:

- by proving the algorithm at high-level (the proof should apply to any refinement),
- by covering a very general attacker model.

$$M o \overline{\mathrm{RSA}} o S$$
 vs



#### Attacker model

Definition

The attacker can request CRT-RSA computations, inject fault(s) during the computation, and read the final result of the computation.

- Data fault (on intermediate values):
  - zeroing or randomizing,
  - permanent or transient.
- Code fault:
  - skipping any number of consecutive instructions.
- Attack order:
  - number of fault injections during the computation (an attack is said *high-order* if its order is > 1).

### Equivalence between faults on the code and on the data

Lemma

The effect of a skipping fault (i.e., fault on the code) can be captured by considering only randomizing and zeroing faults (i.e., fault on the data).

proof sketch:

- If the skipped instructions are part of an arithmetic operation:
  - either the computation has not been done at all: its result becomes zero (if initialized) or random (if not),
  - or the computation has partly been done: its result is thus considered random at our modeling level.
- If the skipped instruction is a branching instruction, it is equivalent to fault the result of the branching condition:
  - at zero (i.e., false), to avoid branching,
  - at random (i.e., true), to force branching.

Inputs:

- a high-level description of the algorithm,
- an attack success condition,
- a fault model.

### Output:

- the list of working attacks, or
- ▶ a proof that the computation is resistant to fault injections.
- http://pablo.rauzy.name/sensi/finja.html

- 1. The algorithm is parsed into an internal representation (an AST):
  - that finja can execute symbolically (simplified),
  - that encodes properties of the intermediate variables.
- 2. finja makes a copy of the original tree and simplifies it.
- 3. For each possible fault(s) injection(s) in the fault model, finja:
  - produces a copy of the original tree,
  - injects the fault in the copy,
  - simplifies the faulted tree,
  - checks attack success condition holds,
    - if yes, the working attack is reported,
    - if not, the countermeasure is considered secure against this attack.
- 4. finja outputs an HTML report.

- Most of the  $\mathbb{Z}$  ring axioms,
- $\mathbb{Z}_N$  subrings,
- And a few theorems.

▶ Most of the Z ring axioms:

- neutral elements (0 for sums, 1 for products);
- absorbing element (0, for products);
- inverses and opposites;
- associativity and commutativity;
- but no distributivity (not confluent).
- $\mathbb{Z}_N$  subrings,
- And a few theorems.

- Most of the  $\mathbb{Z}$  ring axioms,
- $\mathbb{Z}_N$  subrings:
  - identity:
    - $\blacktriangleright (a \mod N) \mod N = a \mod N,$
    - $\blacktriangleright N^k \mod N = 0;$
  - inverse:
    - $(a \mod N) \times (a^{-1} \mod N) \mod N = 1$ ,
    - $\blacktriangleright (a \mod N) + (-a \mod N) \mod N = 0;$
  - associativity and commutativity:
    - $\blacktriangleright (b \mod N) + (a \mod N) \mod N = a + b \mod N,$
    - $(a \mod N) \times (b \mod N) \mod N = a \times b \mod N;$
  - ▶ subrings:  $(a \mod N \times m) \mod N = a \mod N$ .
- And a few theorems.

- Most of the  $\mathbb{Z}$  ring axioms,
- ▶  $\mathbb{Z}_N$  subrings,
- And a few theorems:
  - Fermat's little theorem;
  - its generalization, Euler's theorem;
  - Chinese remainder theorem;
  - Binomial theorem in  $\mathbb{Z}_{r^2}$  rings
    - $(1+r)^d \equiv 1 + dr \mod r^2.$

#### minimal-example.fia

```
noprop a, b, c ;
t := a + b * c ;
return t ;
%%
```

@ !=[b] a

• Computation:  $t = a + b \times c$ .

- Attack success condition:  $t \not\equiv a \mod b$ .
- ▶ finja -r minimal-example.fia
- ▶ finja -z minimal-example.fia

#### randomizing fault on $\, {\rm c}$

```
noprop a, b, c ;
t := a + b * Random ;
return t ;
%%
```

#### zeroing fault on $\operatorname{a}$

@ !=[b] a

```
noprop a, b, c ;
t := Zero + b * c ;
return t ;
%%
```

0 !=[b] a

- @ !=[b] a
   => a + b \* Random !=[b] a
   => a != a
   => false
- Harmless fault injection.

```
    @ !=[b] a
    => Zero + b * c !=[b] a
    => b * c !=[b] a
    => 0 != a
    => true
```

Attack successful.

- Using finja, I have proved the security of:
  - Aumüller et al. (2002) at PROOFS 2013 and
  - ▶ Vigilant (2008) + Coron et al. (2010) at PPREW 2014.
- I have optimized:
  - Aumüller: from 7 to 5 verifications,
  - Vigilant: from 9 to 3 verifications, from 5 to 1 random number (plus removed unnecessary computations).

- High-order attacks?
- High-order countermeasures?
- Proven high-order countermeasures?
## High-order attacks have been studied and shown practical:

- Fault Attacks for CRT Based RSA: New Attacks, New Results, and New Countermeasures, by C. H. Kim and J.-J. Quisquater at WISTP'07.
- Multi Fault Laser Attacks on Protected CRT-RSA, by E. Trichina and R. Korkikyan at FDTC'10.

- A few countermeasures claim to be second-order:
  - Practical fault countermeasures for chinese remaindering based RSA, by M. Ciet and M. Joye at FDTC'05.
  - On Second-Order Fault Analysis Resistance for CRT-RSA Implementations, by E. Dottax, C. Giraud, M. Rivain, and Y. Sierra at WISTP'09.
- But they do not work in our more general fault model:
  - ▶ finja -t -n 2 -z -z -s crt-rsa\_ciet-joye.fia
  - ▶ finja -t -n 2 -r -z -s crt-rsa\_dottax-etal.fia
- We found no countermeasure claiming to resist > 2 faults.

- ▶ If we want a high-order countermeasure, we have to create it:
  - What is a countermeasure?
  - What makes a countermeasure work? What makes it fail?
  - How do the existing first-order countermeasures work?

- ▶ What are the methods used by the existing countermeasures?
- ▶ We used 4 main parameters to classify countermeasures:
  - 1. Shamir's or Giraud's family,
  - 2. test-based or infective,
  - 3. intended order,
  - 4. usage of the small subrings.

| Countermeasure         | Family | Verification<br>method/count | Ord<br>intended | er<br>actual | Small subrings usage                                                                                |
|------------------------|--------|------------------------------|-----------------|--------------|-----------------------------------------------------------------------------------------------------|
| Shamir (1999)          | Shamir | test / 1                     | 1               | 0            | $r_1 \ = \ r_2$ , consistency of intermediate signatures                                            |
| Joye et al. (2001)     | Shamir | test / 2                     | 1               | 0            | checksums of the intermediate CRT sig-<br>natures                                                   |
| Aumüller et al. (2002) | Shamir | test / 5                     | 1               | 1            | $r_1=r_2$ , consistency of the checksums of both intermediate signatures                            |
| Blömer et al. (2003)   | Shamir | infection / 2                | 1               | 1            | direct verification of the intermediate<br>CRT signatures, CRT recombination<br>happens in overring |
| Ciet & Joye (2005)     | Shamir | infection / 2                | 2               | 1            | checksums of the intermediate CRT sig-<br>natures, CRT recombination happens in<br>overring         |
| Giraud (2006)          | Giraud | test / 1                     | 1               | 1            | NA                                                                                                  |
| Boscher et al. (2007)  | Giraud | test / 1                     | 1               | 1            | NA                                                                                                  |
| Vigilant (2008)        | Shamir | test / 7                     | 1               | 1            | $r_1 = r_2$ , embedded control values, CRT recombination happens in overring                        |
| Rivain (2009)          | Giraud | test / 2                     | 1               | 1            | NA                                                                                                  |
| Kim et al. (2011)      | Giraud | infection / 6                | 1               | 1            | NA                                                                                                  |

- ► Formal study and classification of countermeasures:
  - provided a better understanding of their working factors,
  - > allow to fix broken countermeasures, and build better ones.

Algorithm: CRT-RSA with Shamir's countermeasure **Output:** Signature  $M^d \mod N$ , or error Input: Message M, key  $(p, q, d, i_q)$ 1 Choose a small random integer r.  $2 \quad p' = p \cdot r$ 3  $q' = q \cdot r$  $S'_{p} = M^{d \mod \varphi(p')} \mod p'$ // Intermediate signature in  $\mathbb{Z}_{pr}$ 6  $S'_{q} = M^{d \mod \varphi(q')} \mod q'$ // Intermediate signature in  $\mathbb{Z}_{qr}$ 7 if  $S'_p \not\equiv S'_q \mod r$  then return error 8  $S_p = S'_p \mod p$ // Retrieve intermediate signature in  $\mathbb{Z}_n$ 9  $S_q = S'_q \mod q$ // Retrieve intermediate signature in  $\mathbb{Z}_q$ 10  $S = S_q + q \cdot (i_q \cdot (S_p - S_q) \mod p)$ // Recombination in  $\mathbb{Z}_N$ 

12 return S

Algorithm: CRT-RSA with Shamir's countermeasure **Output:** Signature  $M^d \mod N$ , or error Input: Message M, key  $(p, q, d, i_q)$ 1 Choose a small random integer r. 2  $p' = p \cdot r$ 3  $a' = a \cdot r$ 4 if  $p' \not\equiv 0 \mod p$  or  $q' \not\equiv 0 \mod q$  then return error  $S'_{p} = M^{d \mod \varphi(p')} \mod p'$ // Intermediate signature in  $\mathbb{Z}_{pr}$ 6  $S'_{q} = M^{d \mod \varphi(q')} \mod q'$ // Intermediate signature in  $\mathbb{Z}_{qr}$ 7 if  $S'_p \not\equiv S'_q \mod r$  then return error 8  $S_p = S'_p \mod p$ // Retrieve intermediate signature in  $\mathbb{Z}_p$ 9  $S_q = S'_q \mod q$ // Retrieve intermediate signature in  $\mathbb{Z}_{q}$ 10  $S = S_q + q \cdot (i_q \cdot (S_p - S_q) \mod p)$ // Recombination in  $\mathbb{Z}_N$ 11 if  $S \not\equiv S'_p \mod p$  or  $S \not\equiv S'_q \mod q$  then return error 12 return S

Simplification of Vigilant's countermeasure in 4 steps:

- our first simplifications + those of Coron et al.'s corrections,
- remove additional computation with random numbers,
- verify the 3 necessary invariants in a single step,
- bonus: transform the countermeasure into an infective variant.

**Output:** Signature  $M^d \mod N$ , or error **Input:** Message M, key  $(p, q, d_n, d_q, i_q)$ 1 Choose a small random integer r,  $R_1$ ,  $R_2$ ,  $R_3$ ,  $R_4$ .  $N = p \cdot q$  $2 \quad p' = p \cdot r^2$  $i_{pr} = p^{-1} \mod r^2$ 4  $M_p = M \mod p'$ 5  $B_n = p \cdot i_{nr}$ ;  $A_n = 1 - B_n \mod p'$  $\mathbf{6} \quad M'_p = A_p \cdot M_p + B_p \cdot (1+r) \mod p'$ // CRT insertion of verification value in  $M'_n$  $\tau \quad d'_p = d_p + R_3 \cdot (p-1)$ 8  $S'_n = M'_n {d'_p \mod \varphi(p') \mod p'}$ // Intermediate signature in  $\mathbb{Z}_{nr2}$ 9 if  $M'_n \not\equiv M \mod p$  or  $d'_n \not\equiv d_p \mod p - 1$  or  $B_p \cdot S'_n \not\equiv B_p \cdot (1 + d'_n \cdot r) \mod p'$  then return error 10  $S_{pr} = S'_p - B_p \cdot (1 + d'_p \cdot r - R_1)$ // Verification value of  $S_p'$  swapped with  $R_1$ 11  $a' = a \cdot r^2$ 12  $i_{ar} = q^{-1} \mod r^2$ 13  $M_a = M \mod a'$ 14  $B_a = q \cdot i_{ar}$ ;  $A_a = 1 - B_a \mod q'$ 15  $M'_q = A_q \cdot M_q + B_q \cdot (1+r) \mod q'$ // CRT insertion of verification value in  $M_a^\prime$ 16  $d'_{a} = d_{a} + R_{4} \cdot (q-1)$ 17  $S'_{a} = M'_{a}^{d'_{q} \mod \varphi(q')} \mod q'$ // Intermediate signature in  $\mathbb{Z}_{ar2}$ 18 if  $M'_a \not\equiv M \mod q$  or  $d'_a \not\equiv d_q \mod q - 1$  or  $B_q \cdot S'_a \not\equiv B_q \cdot (1 + d'_a \cdot r) \mod q'$  then return error 19  $S_{qr} = S'_q - B_q \cdot (1 + d'_q \cdot r - R_2)$ // Verification value of  $S_q^\prime$  swapped with  $R_2$ 20 if  $M_p \not\equiv M_q \mod r^2$  then return error 21  $S_r = S_{ar} + q \cdot (i_q \cdot (S_{pr} - S_{ar}) \mod p')$ // Recombination checksum in  $\mathbb{Z}_{Nr2}$ 23 if  $N \cdot (S_r - R_2 - q \cdot i_q \cdot (R_1 - R_2)) \not\equiv 0 \mod Nr^2$  then return error 24 if  $q \cdot i_q \not\equiv 1 \mod p$  then return error

25 return  $S = S_r \mod N$ 

// Retrieve result in  $\mathbb{Z}_N$ 

**Output:** Signature  $M^d \mod N$ , or error **Input:** Message M, key  $(p, q, d_n, d_q, i_q)$ 1 Choose a small random integer r,  $R_1$ ,  $R_2$ ,  $R_3$ ,  $R_4$ .  $N = p \cdot q$  $2 \quad p' = p \cdot r^2$  $i_{pr} = p^{-1} \mod r^2$  $4 \quad M_p = M \mod p'$ 5  $B_p = p \cdot i_{pr}$ ;  $A_p = 1 - B_p \mod p'$  $\mathbf{6} \quad M_p' = A_p \cdot M_p + B_p \cdot (1+r) \mod p'$ // CRT insertion of verification value in  $M'_n$  $\tau \quad d'_p = d_p + R_3 \cdot (p-1)$ 8  $S'_n = M'_n {d'_p \mod \varphi(p') \mod p'}$ // Intermediate signature in  $\mathbb{Z}_{nr2}$ 9 if  $M'_p \not\equiv M \mod p$  or  $d'_n \not\equiv d_p \mod p - 1$  or  $B_p \cdot S'_n \not\equiv B_p \cdot (1 + d'_n \cdot r) \mod p'$  then return error 10  $S_{pr} = S'_p - B_p \cdot (1 + d'_p \cdot r - R_1)$ // Verification value of  $S_p'$  swapped with  $R_1$ 11  $a' = a \cdot r^2$ 12  $i_{ar} = q^{-1} \mod r^2$ 13  $M_a = M \mod a'$ 14  $B_a = q \cdot i_{ar}$ ;  $A_a = 1 - B_a \mod q'$ 15  $M'_q = A_q \cdot M_q + B_q \cdot (1+r) \mod q'$ // CRT insertion of verification value in  $M_a^\prime$ 16  $d'_a = d_a + R_4 \cdot (q-1)$ 17  $S'_q = M'_q q' \mod \varphi(q') \mod q'$ // Intermediate signature in  $\mathbb{Z}_{ar^2}$ 18 if  $M'_a \not\equiv M \mod q$  or  $d'_a \not\equiv d_q \mod q - 1$  or  $B_q \cdot S'_a \not\equiv B_q \cdot (1 + d'_q \cdot r) \mod q'$  then return error 19  $S_{qr} = S'_q - B_q \cdot (1 + d'_q \cdot r - R_2)$ // Verification value of  $S_q^\prime$  swapped with  $R_2$ 21  $S_r = S_{qr} + q \cdot (i_q \cdot (S_{pr} - S_{qr}) \mod p')$ // Recombination checksum in  $\mathbb{Z}_{Nr2}$ if  $pq \cdot (S_r - R_2 - q \cdot i_q \cdot (R_1 - R_2)) \not\equiv 0 \mod Nr^2$  then return error 25 return  $S = S_r \mod N$ // Retrieve result in  $\mathbb{Z}_N$ 

**Output:** Signature  $M^d \mod N$ , or error **Input:** Message M, key  $(p, q, d_n, d_a, i_a)$ 1 Choose a small random integer  $\hat{r}$ ,  $\hat{R_1}$ ,  $\hat{R_2}$ .  $N = p \cdot q$  $_{2} p' = p \cdot r^{2}$  $i_{pr} = p^{-1} \mod r^2$ 4  $M_p = M \mod p'$ 5  $B_p = p \cdot i_{pr}$ ;  $A_p = 1 - B_p \mod p'$  $\mathbf{6} \quad M'_p = A_p \cdot M_p + B_p \cdot (1+r) \mod p'$ // CRT insertion of verification value in  $M'_n$ 8  $S'_n = M'_n {}^{d_p \mod \varphi(p')} \mod p'$ // Intermediate signature in  $\mathbb{Z}_{nr2}$ 9 if  $M'_p \not\equiv M \mod p$  or  $B_p \cdot S'_p \not\equiv B_p \cdot (1 + d_p \cdot r) \mod p'$  then return error 10  $S_{pr} = S'_p - B_p \cdot (1 + d_p \cdot r - R_1)$ // Verification value of  $S_{n}^{\prime}$  swapped with  $R_{1}$ 11  $a' = a \cdot r^2$ 12  $i_{qr} = q^{-1} \mod r^2$ 13  $M_q = M \mod q'$ 14  $B_q = q \cdot i_{qr}$ ;  $A_q = 1 - B_q \mod q'$ 15  $M'_q = A_q \cdot M_q + B_q \cdot (1+r) \mod q'$ // CRT insertion of verification value in  $M_a^\prime$ 17  $S'_q = M'_q \stackrel{\text{mod } \varphi(q')}{\mod q'} \mod q'$ // Intermediate signature in  $\mathbb{Z}_{qr^2}$ 18 if  $M'_a \not\equiv M \mod q$  or  $B_q \cdot S'_a \not\equiv B_q \cdot (1 + d_q \cdot r) \mod q'$  then return error 19  $S_{qr} = S'_{q} - B_{q} \cdot (1 + d_{q} \cdot r - R_{2})$ // Verification value of  $S_a^\prime$  swapped with  $R_2$ 21  $S_r = S_{ar} + q \cdot (i_a \cdot (S_{pr} - S_{ar}) \mod p')$ // Recombination checksum in  $\mathbb{Z}_{Nr^2}$ if  $pq \cdot (S_r - R_2 - q \cdot i_q \cdot (R_1 - R_2)) \not\equiv 0 \mod Nr^2$  then return error 25 return  $S = S_r \mod N$ // Retrieve result in  $\mathbb{Z}_N$ 

Input: Message M, key  $(p, q, d_p, d_q, i_q)$ 1 Choose a small random integer r.  $N = p \cdot q$  $2 \quad p' = p \cdot r^2$  $i_{pr} = p^{-1} \mod r^2$  $4 \quad M_p = M \mod p'$ 5  $B_p = p \cdot i_{pr}$ ;  $A_p = 1 - B_p \mod p'$  $\mathbf{6} \quad M'_p = A_p \cdot M_p + B_p \cdot (1+r) \mod p'$ 8  $S'_n = M'_n {}^{d_p \mod \varphi(p')} \mod p'$ 9 if  $M'_p + N \not\equiv M \mod p$  then return error 10  $S_{nr} = 1 + d_n \cdot r$ 11  $a' = a \cdot r^2$ 12  $i_{ar} = q^{-1} \mod r^2$ 13  $M_a = M \mod q'$ 14  $B_a = q \cdot i_{ar}$ ;  $A_a = 1 - B_a \mod q'$ 15  $M'_q = A_q \cdot M_q + B_q \cdot (1+r) \mod q'$ 17  $S'_q = M'_q \mod \varphi(q') \mod q'$ 18 if  $M'_q + N \not\equiv M \mod q$  then return error 19  $S_{ar} = 1 + d_a \cdot r$ 21  $S_r = S_{qr} + q \cdot (i_q \cdot (S_{pr} - S_{qr}) \mod p')$ 22  $S' = S'_{a} + q \cdot (i_{q} \cdot (S'_{n} - S'_{a}) \mod p')$ 23 if  $S' \not\equiv S_r \mod r^2$  then return error

25 return  $S = S' \mod N$ 

**Output:** Signature  $M^d \mod N$ , or error // CRT insertion of verification value in  $M_n^\prime$ // Intermediate signature in  $\mathbb{Z}_{pr2}$ // Checksum in  $\mathbb{Z}_{r^2}$  for  $S'_n$ // CRT insertion of verification value in  $M_a^\prime$ // Intermediate signature in  $\mathbb{Z}_{ar^2}$ // Checksum in  $\mathbb{Z}_{a2}$  for  $S_a'$ 

// Recombination checksum in  $\mathbb{Z}_{r^2}$  // Recombination in  $\mathbb{Z}_{Nr^2}$ 

// Retrieve result in  $\mathbb{Z}_N$ 

**Output:** Signature  $M^d \mod N$ , or a random value in  $\mathbb{Z}_N$ Input: Message M, key  $(p, q, d_p, d_q, i_q)$ 1 Choose a small random integer r.  $N = p \cdot q$  $2 \quad p' = p \cdot r^2$  $i_{pr} = p^{-1} \mod r^2$  $4 \quad M_p = M \mod p'$ 5  $B_p = p \cdot i_{pr}$ ;  $A_p = 1 - B_p \mod p'$  $6 \quad M'_p = A_p \cdot M_p + B_p \cdot (1+r) \mod p'$ // CRT insertion of verification value in  $M_{n}^{\prime}$ 8  $S'_n = M'_n {}^{d_p \mod \varphi(p')} \mod p'$ // Intermediate signature in  $\mathbb{Z}_{nr2}$ 9  $c_p = M'_p + N - M + 1 \mod p$ 10  $S_{nr} = 1 + d_p \cdot r$ // Checksum in  $\mathbb{Z}_{r^2}$  for  $S'_n$ 11  $a' = a \cdot r^2$ 12  $i_{ar} = q^{-1} \mod r^2$ 13  $M_q = M \mod q'$ 14  $B_a = q \cdot i_{ar}$ ;  $A_a = 1 - B_a \mod q'$ 15  $M'_q = A_q \cdot M_q + B_q \cdot (1+r) \mod q'$ // CRT insertion of verification value in  $M_a^\prime$ 17  $S'_q = M'_q \stackrel{\text{mod } \varphi(q')}{\mod q'} \mod q'$ // Intermediate signature in  $\mathbb{Z}_{ar^2}$ 18  $c_q = M'_q + N - M + 1 \mod q$ 19  $S_{ar} = 1 + d_a \cdot r$ // Checksum in  $\mathbb{Z}_{a2}$  for  $S_a'$ 21  $S_r = S_{qr} + q \cdot (i_q \cdot (S_{pr} - S_{qr}) \mod p')$ // Recombination checksum in  $\mathbb{Z}_{n2}$ 22  $S' = S'_q + q \cdot (i_q \cdot (S'_p - S'_q) \mod p')$ // Recombination in  $\mathbb{Z}_{N_n 2}$ 23  $c_S = S' - S_r + 1 \mod r^2$ 25 return  $S = S' c_p c_q c_S$  $\mod N$ // Retrieve result in  $\mathbb{Z}_N$ 

# High-Order Countermeasures

Against randomizing faults, all first-order correct countermeasures are high-order.

However, there are no generic high-order countermeasures if the three types of faults in our attack model are taken into account, but it is possible to build Dth-order countermeasures for any D.

proof sketch:

- Random faults cannot induce a verification skip (whether test-based or infective).
- Repeating verifications D times can force to inject D+1 faults.

Algorithm: Generation of CRT-RSA with Vigilant's countermeasure at order D Input: order D Output: CRT-RSA algorithm protected against fault injection attack of order D **print** Choose a small random integer r. print  $N = p \cdot q$ print  $p' = p \cdot r^2$ ;  $i_{pr} = p^{-1} \mod r^2$ ;  $M_p = M \mod p'$ ;  $B_p = p \cdot i_{pr}$ ;  $A_p = 1 - B_p \mod p'$ 3 print  $M'_p = A_p \cdot M_p + B_p \cdot (1+r) \mod p'$ **5** print  $q' = q \cdot r^2$ ;  $i_{ar} = q^{-1} \mod r^2$ ;  $M_a = M \mod q'$ ;  $B_a = q \cdot i_{ar}$ ;  $A_a = 1 - B_a \mod q'$ 6 print  $M'_q = A_q \cdot M_q + B_q \cdot (1+r) \mod q'$ 7 print  $S'_p = M'_n {}^{d_p \mod \varphi(p')} \mod p'$ 8 print  $S'_{a} = M'_{a}^{dq \mod \varphi(q')} \mod q'$ 9 print  $S_{pr} = 1 + d_p \cdot r$ 10 print  $\hat{S}_{ar} = 1 + \hat{d}_a \cdot r$ 11 print  $S_r = S_{qr} + q \cdot (i_q \cdot (S_{pr} - S_{qr}) \mod p')$ print  $S' = S'_a + q \cdot (i_q \cdot (S'_p - S'_a) \mod p')$ print  $S_0 = S' \mod N$ 13 for  $i \leftarrow 1$  to D do 14 print if  $M'_p + N \not\equiv M \mod p$  then return error 15 print S'; print i print = S; print i-116 print if  $M'_a + N \not\equiv M \mod q$  then return error 17 print S''; print *i* print = S'; print *i* 18 print if  $S \not\equiv S_r \mod r^2$  then return error 19 print S; print i print = S''; print i 20 21 end print return S; print  $_D$ 22

**Output:** Signature  $M^d \mod N$ , or error Input: Message M, key  $(p, q, d_p, d_q, i_q)$ 1 Choose a small random integer r. 2  $N = p \cdot q$ 3  $p' = p \cdot r^2$ ;  $i_{pr} = p^{-1} \mod r^2$ ;  $M_p = M \mod p'$ ;  $B_p = p \cdot i_{pr}$ ;  $A_p = 1 - B_p \mod p'$  $4 \quad M'_p = A_p \cdot M_p + B_p \cdot (1+r) \mod p'$ 5  $q' = q \cdot r^2$ ;  $i_{ar} = q^{-1} \mod r^2$ ;  $M_q = M \mod q'$ ;  $B_q = q \cdot i_{qr}$ ;  $A_q = 1 - B_q \mod q'$ 6  $M'_q = A_q \cdot M_q + B_q \cdot (1+r) \mod q'$ 7  $S'_n = M'_n {}^{d_p \mod \varphi(p')} \mod p'$ ;  $S_{pr} = 1 + d_p \cdot r$ 8  $S'_q = M'_q \operatorname{mod} \varphi(q') \mod q'$ ;  $S_{qr} = 1 + d_q \cdot r$ 9  $S_r = S_{ar} + q \cdot (i_a \cdot (S_{pr} - S_{ar}) \mod p')$ 10  $S' = S'_{a} + q \cdot (i_{q} \cdot (S'_{p} - S'_{a}) \mod p')$ 11  $S_0 = S' \mod N$ 12 if  $M'_p + N \not\equiv M \mod p$  then return error else  $S'_1 = S_0$ 13 if  $M'_q + N \not\equiv M \mod q$  then return error else  $S''_1 = S'_1$ 14 if  $S \not\equiv S_r \mod r^2$  then return error else  $S_1 = S_1''$ 15 if  $M'_p + N \not\equiv M \mod p$  then return error else  $S'_2 = S_1$ 16 if  $M'_q + N \not\equiv M \mod q$  then return error else  $S''_2 = S'_2$ 17 if  $S \not\equiv S_r \mod r^2$  then return error else  $S_2 = S_2''$ 18 if  $M'_p + N \not\equiv M \mod p$  then return error else  $S'_3 = S_2$ 19 if  $M'_q + N \not\equiv M \mod q$  then return error else  $S''_3 = S'_3$ 20 if  $S \not\equiv S_r \mod r^2$  then return error else  $S_3 = S_3''$ 21 return  $S_3$ 

- The working factors of countermeasures:
  - are not tied to the BellCoRe attack,
  - nor to the CRT-RSA algorithm.
- ► Cost-effective integrity verification of any arithmetic computation.

Obvious idea: repeat the computation and compare the results:

- may be easy to inject the same fault twice,
- costs too much.
- Signature verification is an RSA-specific possibility:
  - but it requires to have both parts of the key,
  - may cost too much.

Existing countermeasures are optimizations of the redundancy idea.

• Use the isomorphism between  $\mathbb{F}_p \times \mathbb{F}_r$  and  $\mathbb{Z}_{pr}$ .



Notation:  $\mathbb{Z}_n$  is a shorthand for  $\mathbb{Z}/n\mathbb{Z}$ .

## Divisions optimization

#### Proposition

To get the inverse of z in  $\mathbb{F}_p$  while computing in  $\mathbb{Z}_{pr}$ , one has:

 $\triangleright z = 0 \mod r \implies (z^{p-2} \mod pr) \equiv z^{-1} \mod p$ ,

• otherwise 
$$(z^{-1} \mod pr) \equiv z^{-1} \mod p$$
.

proof sketch:

Inputs:

- > an asymmetric cryptography algorithm to be protected,
- a desired redundancy level.

Output:

- the (proved to be the) same algorithm
- provably protected against fault injection attacks.

http://pablo.rauzy.name/sensi/enredo.html

- 1. The algorithm is parsed and type-checked:
  - ▶ type-checker gather necessary information for the transformation.
- 2. enredo applies the *entanglement* transformation:
  - the transformation has been formally defined,
  - and it is proved correct (semantic preserving).
- 3. enredo outputs the protected algorithm.

## Correctness

contribution

Proposition

The transformation is correct if at all time during the execution the invariant defining the transformation of the memory holds, and when a value is returned, it is the same as in the original program. The enredo transformation is correct.

proof sketch:



# Security

### Proposition

The algorithm is secure if when it returns a value it is either the right one or an error constant. It is not secure only with a probability asymptotically inversely proportional to the security parameter r.

proof sketch:

► Faulted results are polynomials of corrupted intermediate values:

- the result can be expressed as a polynomial of the inputs and the faults,
- ▶ a fault on x is not detected if:  $P(\hat{x}) = P(x) \mod r \text{ and } P(\hat{x}) \neq P(x) \mod p,$
- i.e., when  $\widehat{x_1}$  is a root of  $\Delta P(\widehat{x_1}) = P(\widehat{x_1}) P(x_1)$  in  $\mathbb{Z}_r$ .
- ▶ Non-detection probability  $\mathbb{P}_{n.d.}$  is inversely proportional to r:
  - $\mathbb{P}_{n.d.} \approx \# roots(\Delta P)/r$  in  $\mathbb{Z}_r$ ,
  - If  $\Delta P$  is uniformly distributed, when  $r \to \infty$ ,  $\# roots(\Delta P)$  tends to 1,
  - ▶ in practice  $\mathbb{P}_{n.d.} \gtrsim \frac{1}{r}$ , i.e.,  $\mathbb{P}_{n.d.} \ge \frac{1}{r}$  but is close to  $\frac{1}{r}$ .

# Towards Generic Countermeasures Against Fault Injection Attacks Practical Case Study with ECSM on 32-bit ARMv7



| Point order                    | ord = 0xffffffffffffffffffffffffffffffffffff                                                                                                                        |
|--------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Point coordinates              | $\begin{array}{l} Gx = \texttt{0x188da80eb03090f67cbf20eb43a18800f4ff0afd82ff1012} \\ Gy = \texttt{0x07192b95ffc8da78631011ed6b24cdd573f977a11e794811} \end{array}$ |
| Curve equation<br>coefficients | $a = 0 \\ xffffffffffffffffffffffffffffffffffff$                                                                                                                    |
| Field characteristic           | p = 0xffffffffffffffffffffffffffffffffffff                                                                                                                          |

Parameters of our ECSM implementation (namely NIST P-192)

Towards Generic Countermeasures Against Fault Injection Attacks / Practical Case Study with ECSM on 32-bit ARMv7 Security Results

| r value              | r size<br>(bit) | Positives (%)<br>true false |      | Negatives (%)<br>true false |       |
|----------------------|-----------------|-----------------------------|------|-----------------------------|-------|
| 1                    | 1               | 0.00                        | 0.00 | 2.74                        | 97.26 |
| 251                  | 8               | 63.65                       | 0.00 | 2.56                        | 33.79 |
| 1021                 | 10              | 89.09                       | 0.00 | 2.96                        | 7.95  |
| 2039                 | 11              | 98.82                       | 0.00 | 0.00                        | 1.18  |
| 4093                 | 12              | 97.61                       | 0.00 | 1.91                        | 0.48  |
| 65521                | 16              | 97.79                       | 0.00 | 2.21                        | 0.00  |
| 4294967291           | 32              | 97.19                       | 0.00 | 2.81                        | 0.00  |
| 18446744073709551557 | 64              | 99.79                       | 0.00 | 0.21                        | 0.00  |

 $\approx 1000$  tests for each value of r



Pablo Rauzy (Telecom ParisTech)

PhD Defense

| r value              | r size<br>(bit) | $\mathbb{Z}_{pr}$ | time (ms)<br>$\mathbb{F}_r$ | test    | overhead      |
|----------------------|-----------------|-------------------|-----------------------------|---------|---------------|
| 1                    | 1               | 683               | 24                          | $\ll 1$ | ×1.04         |
| 251                  | 8               | 883               | 91                          | $\ll 1$ | $\times 1.43$ |
| 1021                 | 10              | 899               | 100                         | $\ll 1$ | $\times 1.46$ |
| 2039                 | 11              | 902               | 197                         | $\ll 1$ | $\times 1.61$ |
| 4093                 | 12              | 903               | 197                         | $\ll 1$ | $\times 1.61$ |
| 65521                | 16              | 883               | 189                         | $\ll 1$ | $\times 1.56$ |
| 4294967291           | 32              | 832               | 172                         | $\ll 1$ | $\times 1.47$ |
| 18446744073709551557 | 64              | 996               | 246                         | $\ll 1$ | $\times 1.82$ |

Signature verification overhead  $\approx \times 4.5$ .

Code C + mini-gmp compiled with gcc -OO (no optimization).

- ► CRT-RSA implementations can provably resist the BellCoRe attack.
- CRT-RSA implementations can provably resist multiple faults.
- CRT-RSA implementations can provably resist new fault attacks.
- Any asymmetric cryptography implementations can be protected.



- ► CRT-RSA implementations can provably resist the BellCoRe attack.
- CRT-RSA implementations can provably resist multiple faults.
- CRT-RSA implementations can provably resist new fault attacks.
- Any asymmetric cryptography implementations can be protected.



- ► CRT-RSA implementations can provably resist the BellCoRe attack.
- CRT-RSA implementations can provably resist multiple faults.
- CRT-RSA implementations can provably resist new fault attacks.
- Any asymmetric cryptography implementations can be protected.



- CRT-RSA implementations can provably resist multiple faults.
- CRT-RSA implementations can provably resist new fault attacks.
- Any asymmetric cryptography implementations can be protected.



- CRT-RSA implementations can provably resist multiple faults.
- CRT-RSA implementations can provably resist new fault attacks.
- Any asymmetric cryptography implementations can be protected.



- CRT-RSA implementations can provably resist multiple faults.
- CRT-RSA implementations can provably resist new fault attacks.
- Any asymmetric cryptography implementations can be protected.





- CRT-RSA implementations can provably resist new fault attacks.
- Any asymmetric cryptography implementations can be protected.





- CRT-RSA implementations can provably resist new fault attacks.
- Any asymmetric cryptography implementations can be protected.




- CRT-RSA implementations can provably resist new fault attacks.
- Any asymmetric cryptography implementations can be protected.





Any asymmetric cryptography implementations can be protected.









- Articles in conferences with proceedings:
  - PPREW 2014, FDTC 2014, HOST 2015.
- Articles in journals
  - ► Journal of Cryptographic Engineering (×2).
- Communications in workshops without proceedings:
  - COSADE 2013 (short paper + talk),
  - PROOFS 2013 and 2014 (full papers + talks),
  - CHES 2013 and 2015 (posters),
  - TRUDEVICE 2015 (short paper + talk),
  - talks: DigiCosme working group, GDR SoC-SiP Security Day, Formal Methods and Security seminar Inria/DGA, Crypto'Spain Itinerant Seminar, SAS deptartment seminar @ EMSE, ...
- Two articles under submission and one book chapter in progress.

- Direct follow-up.
- ► Formalize security beyond cryptology.

# Questions?

Pablo Rauzy (Telecom ParisTech)

- Two main families of countermeasures:
  - descendants of Giraud's countermeasure (2006),
  - descendants of Shamir's countermeasure (1999).

- Use particular exponentiation algorithms that:
  - keep track of variables involved in intermediate steps,
  - check an invariant that is supposed to be spread till the last steps.
- Examples of countermeasures in this family include:
  - Boscher et al. (2007),
  - Rivain (2009) (and its recently improved version from 2014),
  - Kim et al. (2011).
- > Study of Giraud's family countermeasures was left as future work:
  - Ágnes Kiss did it with Juliane Krämer at TU Berlin,
  - we co-authored a paper (to be published).

- Use a "checksum" in a smaller mathematical structure:
  - ▶ RSA computes in rings  $\mathbb{Z}_a$  (typically, a = p, a = q, or a = pq),
  - any small number b is coprime with a,
  - ▶ there is an isomorphism between the *direct* product  $\mathbb{Z}_{ab}$  and  $\mathbb{Z}_a \times \mathbb{Z}_b$ ,
  - original computation and checksum can be conducted together in  $\mathbb{Z}_{ab}$ ,
  - redundant checksum can be computed in parallel in  $\mathbb{Z}_b$ .
- Verify that invariants on the computations and the checksums hold.



Notation:  $\mathbb{Z}_n$  is a shorthand for  $\mathbb{Z}/n\mathbb{Z}$ .

- Invariant verification can be done in two ways:
  - step-wise internal checks during the CRT computation,
  - computationally, using an infective computation strategy.

## Test-based countermeasure

A countermeasure is said to be *test-based* if it attempts to detect fault injections by verifying that some arithmetic invariants are respected, and branch to return an error instead of the numerical result of the algorithm in case of invariant violation.

- Examples of test-based countermeasures:
  - Shamir (1999),
  - Joye et al. (2001),
  - Aumüller et al. (2002),
  - Vigilant (2008).



## Infective countermeasure

Definition

A countermeasure is said to be *infective* if rather than testing arithmetic invariants it uses them to compute a neutral element of some arithmetic operation in a way that would not result in this neutral element if the invariant is violated.

It then uses the results of these computations to infect the result of the algorithm before returning it to make it unusable by the attacker (thus, it does not need branching instructions).

- Examples of infective countermeasures:
  - Blömer et al. (2003),
  - Ciet & Joye (2005),
  - Kim et al. (2011).

# Equivalence between test-based and infective verification Proposition

Each test-based (resp. infective) countermeasure has a direct equivalent infective (resp. test-based) countermeasure.

proof sketch:

- Invariants verified by countermeasures are modular equality:
  - $\blacktriangleright a \stackrel{?}{\equiv} b \mod m.$
  - ▶ Test-based: if a != b [mod m] then return error.
  - Infective: c := a b + 1 mod m; ... return S<sup>c</sup>.

- ▶ In our fault model, all existing countermeasures:
  - resist any number of randomizing faults,
  - can be broken by a second-order attack:
     a well targeted fault injection, and
     a verification skip (e.g., using a skipping or a zeroing fault).
- ► The concept of integrity verification does not depend on the order.

- In most countermeasures:
  - computations in  $\mathbb{Z}_p$  and  $\mathbb{Z}_q$  are lifted in  $\mathbb{Z}_{pr_1}$  and  $\mathbb{Z}_{qr_2}$ ,
  - $\blacktriangleright$  allowing the retrieval of the results modulo p and q, and
  - the verification of the signature modulo  $r_1$  and  $r_2$ .
- The following questions were raised:
  - Are the smaller rings used to verify the intermediate signatures?
  - Or are they used directly to compute checksums that are verified?
  - Does CRT recombination takes place in an overring?
  - When  $r_1 = r_2$ , what is permitted by the resulting symmetry?



## Polynomials of Faults #roots probability for ECSM [k]G.





Pablo Rauzy (Telecom ParisTech)

- ► The DPL countermeasure consists in computing on a redundant representation: each bit *y* is implemented as a pair (*y*<sub>False</sub>, *y*<sub>True</sub>).
- ▶ The bit pair is then used in a protocol made up of two phases:
  - 1. a *precharge* phase, during which all the bit pairs are zeroized  $(y_{\text{False}}, y_{\text{True}}) = (0, 0)$ , such that the computation starts from a known reference state;
  - 2. an *evaluation* phase, during which the  $(y_{\text{False}}, y_{\text{True}})$  pair is equal to (1,0) if it carries the logical value 0, or (0,1) if it carries the logical value 1.

- ► Each sensitive instruction should replaced by a *DPL macro*.
- ▶ The DPL macro assumes that the system is in a valid DPL state.
- And leaves it in a valid DPL state to make the macros chainable.
- ► The basic idea is to concatenate two DPL encoded values.
- ► Then use the result as an index in a look-up table.

| • | In this example we use the two LSB.<br>Logical value 1 is 1 (01).<br>Logical value 0 is 2 (10). |
|---|-------------------------------------------------------------------------------------------------|
|   |                                                                                                 |
|   | Precharge phases (activity: 1 if sensitive)                                                     |
|   | Evaluation phases (activity: 1)                                                                 |
|   | Masks (activity: normally 0)                                                                    |
|   | Shifts (activity: 2)                                                                            |
|   | Concatenation (activity: 1)                                                                     |
|   | Look-up (activity: 1 + 2)                                                                       |

| $r_1$ | $\leftarrow$ | $r_0$          |
|-------|--------------|----------------|
| $r_1$ | $\leftarrow$ | a              |
| $r_1$ | $\leftarrow$ | $r_1 \wedge 3$ |
| $r_1$ | $\leftarrow$ | $r_1 \ll 1$    |
| $r_1$ | $\leftarrow$ | $r_1 \ll 1$    |
| $r_2$ | $\leftarrow$ | $r_0$          |
| $r_2$ | $\leftarrow$ | b              |
| $r_2$ | $\leftarrow$ | $r_2 \wedge 3$ |
| $r_1$ | $\leftarrow$ | $r_1 \vee r_2$ |
| $r_3$ | $\leftarrow$ | $r_0$          |
| $r_3$ | $\leftarrow$ | $op[r_1]$      |
| d     | $\leftarrow$ | $r_0$          |
| d     | $\leftarrow$ | $r_3$          |
|       |              |                |

 $\begin{array}{l} \mathsf{DPL} \mbox{ macro for} \\ d = a \mbox{ op } b \end{array}$ 

- In this example we use the two LSB.
- Logical value 1 is 1 (01).
- Logical value 0 is 2 (10).
- Precharge phases (activity: 1 if sensitive)
- Evaluation phases (activity: 1)
- Masks (activity: normally 0)
- Shifts (activity: 2)
- Concatenation (activity: 1)
- Look-up (activity: 1 + 2)

| $r_1$ | $\leftarrow$ | $r_0$          |
|-------|--------------|----------------|
| $r_1$ | $\leftarrow$ | a              |
| $r_1$ | $\leftarrow$ | $r_1 \wedge 3$ |
| $r_1$ | $\leftarrow$ | $r_1 \ll 1$    |
| $r_1$ | $\leftarrow$ | $r_1 \ll 1$    |
| $r_2$ | $\leftarrow$ | $r_0$          |
| $r_2$ | $\leftarrow$ | b              |
| $r_2$ | $\leftarrow$ | $r_2 \wedge 3$ |
| $r_1$ | $\leftarrow$ | $r_1 \vee r_2$ |
| $r_3$ | $\leftarrow$ | $r_0$          |
| $r_3$ | $\leftarrow$ | $op[r_1]$      |
| d     | $\leftarrow$ | $r_0$          |
| d     | $\leftarrow$ | $r_3$          |

DPL macro for d = a op b

- In this example we use the two LSB.
- Logical value 1 is 1 (01).
- ▶ Logical value 0 is 2 (10).
- Precharge phases (activity: 1 if sensitive)
- Evaluation phases (activity: 1)
- Masks (activity: normally 0)
- Shifts (activity: 2)
- Concatenation (activity: 1)
- Look-up (activity: 1 + 2)

| $r_1$ | $\leftarrow$ | $r_0$          |
|-------|--------------|----------------|
| $r_1$ | $\leftarrow$ | a              |
| $r_1$ | $\leftarrow$ | $r_1 \wedge 3$ |
| $r_1$ | $\leftarrow$ | $r_1 \ll 1$    |
| $r_1$ | $\leftarrow$ | $r_1 \ll 1$    |
| $r_2$ | $\leftarrow$ | $r_0$          |
| $r_2$ | $\leftarrow$ | b              |
| $r_2$ | $\leftarrow$ | $r_2 \wedge 3$ |
| $r_1$ | $\leftarrow$ | $r_1 \vee r_2$ |
| $r_3$ | $\leftarrow$ | $r_0$          |
| $r_3$ | $\leftarrow$ | $op[r_1]$      |
| d     | $\leftarrow$ | $r_0$          |
| d     | $\leftarrow$ | $r_3$          |

 $\begin{array}{l} \mathsf{DPL} \text{ macro for} \\ d = a \text{ op } b \end{array}$ 

| ► In this example we use the two LSB.       |
|---------------------------------------------|
| ► Logical value 1 is 1 (01).                |
| Logical value 0 is 2 (10).                  |
| Precharge phases (activity: 1 if sensitive) |
| Evaluation phases (activity: 1)             |
| Masks (activity: normally 0)                |
| Shifts (activity: 2)                        |
| Concatenation (activity: 1)                 |
| Look-up (activity: 1 + 2)                   |

| $r_1$                         | $\leftarrow$                                                                                                                                                 | $r_0$                                      |
|-------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------|
| $r_1$                         | $\leftarrow$                                                                                                                                                 | a                                          |
| $r_1$                         | $\leftarrow$                                                                                                                                                 | $r_1 \wedge 3$                             |
| $r_1$                         | $\leftarrow$                                                                                                                                                 | $r_1 \ll 1$                                |
| $r_1$                         | $\leftarrow$                                                                                                                                                 | $r_1 \ll 1$                                |
| $r_2$                         | $\leftarrow$                                                                                                                                                 | $r_0$                                      |
| $r_2$                         | $\leftarrow$                                                                                                                                                 | b                                          |
| $r_2$                         | $\leftarrow$                                                                                                                                                 | $r_2 \wedge 3$                             |
| -                             |                                                                                                                                                              | . 2                                        |
| $r_1$                         | $\leftarrow$                                                                                                                                                 | $r_1 \vee r_2$                             |
| $r_1 \\ r_3$                  | ←<br>←                                                                                                                                                       | $r_1 \lor r_2 \\ r_0$                      |
| $r_1 \\ r_3 \\ r_3 \\ r_3$    | $\leftarrow \leftarrow \leftarrow \leftarrow$                                                                                                                | $r_1 \lor r_2$ $r_0$ $op[r_1]$             |
| $r_1$<br>$r_3$<br>$r_3$<br>d  | $\begin{array}{c} \leftarrow \\ \leftarrow \\ \leftarrow \\ \leftarrow \\ \leftarrow \\ \leftarrow \end{array}$                                              | $r_1 \lor r_2$ $r_0$ $op[r_1]$ $r_0$       |
| $r_1 \\ r_3 \\ r_3 \\ d \\ d$ | $\begin{array}{c} \downarrow \\ \downarrow $ | $r_1 \lor r_2$ $r_0$ $op[r_1]$ $r_0$ $r_3$ |

 $\begin{array}{l} \mathsf{DPL} \mbox{ macro for} \\ d = a \mbox{ op } b \end{array}$ 

| ► In this example we use the two LSB        |
|---------------------------------------------|
| Logical value 1 is 1 (01).                  |
| • Logical value 0 is $2 (10)$ .             |
| Precharge phases (activity: 1 if sensitive) |
| Evaluation phases (activity: 1)             |
| Masks (activity: normally 0)                |
| Shifts (activity: 2)                        |
| Concatenation (activity: 1)                 |
| ► Look-up (activity: 1 + 2)                 |

| $r_1$ | $\leftarrow$ | $r_0$          |
|-------|--------------|----------------|
| $r_1$ | $\leftarrow$ | a              |
| $r_1$ | $\leftarrow$ | $r_1 \wedge 3$ |
| $r_1$ | $\leftarrow$ | $r_1 \ll 1$    |
| $r_1$ | $\leftarrow$ | $r_1 \ll 1$    |
| $r_2$ | $\leftarrow$ | $r_0$          |
| $r_2$ | $\leftarrow$ | b              |
| $r_2$ | $\leftarrow$ | $r_2 \wedge 3$ |
| $r_1$ | $\leftarrow$ | $r_1 \vee r_2$ |
| $r_3$ | $\leftarrow$ | $r_0$          |
| $r_3$ | $\leftarrow$ | $op[r_1]$      |
| d     | $\leftarrow$ | $r_0$          |
| d     | $\leftarrow$ | $r_3$          |
|       |              |                |

DPL macro for d = a op b

| • | In this example we use the two LSB.         |
|---|---------------------------------------------|
|   | Logical value $1$ is $1 (01)$ .             |
| Þ | Logical value $0$ is $2 (10)$ .             |
|   | Precharge phases (activity: 1 if sensitive) |
|   | Evaluation phases (activity: 1)             |
|   | Masks (activity: normally 0)                |
|   | Shifts (activity: 2)                        |
| • | Concatenation (activity: 1)                 |
|   | Look-up (activity: 1 + 2)                   |

| $r_1$ | $\leftarrow$ | $r_0$          |
|-------|--------------|----------------|
| $r_1$ | $\leftarrow$ | a              |
| $r_1$ | $\leftarrow$ | $r_1 \wedge 3$ |
| $r_1$ | $\leftarrow$ | $r_1 \ll 1$    |
| $r_1$ | $\leftarrow$ | $r_1 \ll 1$    |
| $r_2$ | $\leftarrow$ | $r_0$          |
| $r_2$ | $\leftarrow$ | b              |
| $r_2$ | $\leftarrow$ | $r_2 \wedge 3$ |
| $r_1$ | $\leftarrow$ | $r_1 \vee r_2$ |
| $r_3$ | $\leftarrow$ | $r_0$          |
| $r_3$ | $\leftarrow$ | $op[r_1]$      |
| d     | $\leftarrow$ | $r_0$          |
| d     | $\leftarrow$ | $r_3$          |
|       |              |                |

DPL macro for d = a op b

| <ul> <li>In this example we use the two LSB.</li> <li>Logical value 1 is 1 (01).</li> <li>Logical value 0 is 2 (10).</li> </ul> |
|---------------------------------------------------------------------------------------------------------------------------------|
| Precharge phases (activity: 1 if sensitive)                                                                                     |
| Evaluation phases (activity: 1)                                                                                                 |
| Masks (activity: normally 0)                                                                                                    |
| Shifts (activity: 2)                                                                                                            |
| Concatenation (activity: 1)                                                                                                     |
| ► Look-up (activity: 1 + 2)                                                                                                     |

| $r_1$ | $\leftarrow$ | $r_0$          |
|-------|--------------|----------------|
| $r_1$ | $\leftarrow$ | a              |
| $r_1$ | $\leftarrow$ | $r_1 \wedge 3$ |
| $r_1$ | $\leftarrow$ | $r_1 \ll 1$    |
| $r_1$ | $\leftarrow$ | $r_1 \ll 1$    |
| $r_2$ | $\leftarrow$ | $r_0$          |
| $r_2$ | $\leftarrow$ | b              |
| $r_2$ | $\leftarrow$ | $r_2 \wedge 3$ |
| $r_1$ | $\leftarrow$ | $r_1 \vee r_2$ |
| $r_3$ | $\leftarrow$ | $r_0$          |
| $r_3$ | $\leftarrow$ | $op[r_1]$      |
| d     | $\leftarrow$ | $r_0$          |
| d     | $\leftarrow$ | $r_3$          |

 $\begin{array}{l} \mathsf{DPL} \mbox{ macro for} \\ d = a \mbox{ op } b \end{array}$ 

- 1. Bitslice code.
- 2. DPL macros expansion.
- 3. Look-up tables.

- Always possible (by Turing machines equivalence theorem)
- But, hard to do automatically in practice.
- However, there are a lot of already (manually) bitsliced implementations, since it is a common optimization technique.
- $\rightarrow\,$  We take already bitsliced code as input.

## Sensitive value

Definition

A *value* is said *sensitive* if it depends on sensitive data. A sensitive data depends on the secret key or the plaintext.

## Sensitive instruction

An *instruction* is said *sensitive* if it may modify the Hamming weight of a sensitive value.

- ► All the sensitive instructions must be expanded to a DPL macro.
- ► Thus, all the sensitive data must be transformed too.

- Bitsliced code means that only the logical (bit level) operators, except shifts, are used in sensitive instructions.
- DPL protocol implies that not instructions are replaced by xor.
- $\rightarrow$  Only and, or, and xor instructions need to be expanded to DPL macros.
- Addresses of the look-up tables are sensitive too: their indices are sensitive values.
- ► Thus, the addresses bits corresponding to the accessed cell must be 0.
- ▶ In our example, the look-up table addresses must be multiple of 16.

| index              | 0000,               | 0001,            | 0010,                 | 0011,            | 0100,                       | 0101,           | 0110,0111                              |
|--------------------|---------------------|------------------|-----------------------|------------------|-----------------------------|-----------------|----------------------------------------|
| and                | 00,                 | 00,              | 00,                   | 00,              | 00,                         | 01 ,            | 10 , 00                                |
| or                 | 00,                 | 00,              | 00,                   | 00,              | 00,                         | 01 ,            | 01 , 00                                |
| xor                | 00,                 | 00,              | 00,                   | 00,              | 00,                         | 10 ,            | 01 , 00                                |
|                    |                     |                  |                       |                  |                             |                 |                                        |
|                    |                     |                  |                       |                  |                             |                 |                                        |
| index              | 1000,               | 1001             | , 101                 | 0,10             | 11, 11(                     | 0, 1101         | , 1110, 1111                           |
| index<br>and       | 1000,<br>00 ,       | 1001             | , 101<br>, 10         | 0,10             | 11, 110<br>0 , 00           | )0,1101         | , 1110, 1111<br>, 00 , 00              |
| index<br>and<br>or | 1000,<br>00,<br>00, | 1001<br>10<br>01 | , 101<br>, 10<br>, 10 | 0,10<br>,0<br>,0 | 11, 110<br>0 , 00<br>0 , 00 | )0,1101<br>),00 | , 1110, 1111<br>, 00 , 00<br>, 00 , 00 |

- Addresses of the look-up tables are sensitive too: their indices are sensitive values.
- ► Thus, the addresses bits corresponding to the accessed cell must be 0.
- ▶ In our example, the look-up table addresses must be multiple of 16.

| index | 0000, | 0001, | 001 | D, O | 011,  | 010   | 0,  | 0101 | ,  | 011  | 0  | , ( | )111 |
|-------|-------|-------|-----|------|-------|-------|-----|------|----|------|----|-----|------|
| and   | 00,   | 00,   | 00  | , (  | 00,   | 00    | ),  | 01   | ,  | 10   |    | ,   | 00   |
| or    | 00,   | 00,   | 00  | , (  | 00,   | 00    | ),  | 01   | ,  | 01   |    | ,   | 00   |
| xor   | 00,   | 00,   | 00  | , (  | 00,   | 00    | ),  | 10   | ,  | 01   |    | ,   | 00   |
|       |       |       |     |      |       |       |     |      |    |      |    |     |      |
| index | 1000, | 1001  | , 1 | 010  | , 101 | 11, 1 | 110 | 0,11 | 01 | , 11 | 10 | , - | 1111 |
| and   | 00,   | 10    | ,   | LO   | , 0(  | ),    | 00  | , C  | 0  | , 0  | 0  | ,   | 00   |
| or    | 00,   | 01    | ,   | LO   | , 00  | Э,    | 00  | , C  | 0  | , 0  | 0  | ,   | 00   |
|       |       |       |     |      |       |       |     |      |    |      |    |     |      |

# Correct DPL transformation

Let S be a valid state of the system (values in registers and memory).

Let c be a sequence of instructions of the system.

Let  $\widehat{S}$  be the state of the system after the execution of c with state S, we denote that by  $S \xrightarrow{c} \widehat{S}$ .

We write dpl(S) for the DPL state equivalent to the state S.

We say that c' is a *correct DPL transformation* of the code c if  $S \xrightarrow{c} \widehat{S} \implies dpl(S) \xrightarrow{c'} dpl(\widehat{S}).$ 

## Correctness of our code transformation

The expansion of the sensitive instructions into DPL macros is a correct DPL transformation.

Proof in the paper.

Proposition

| a, b            | 10, 10 |      |    |    | Sensitive |
|-----------------|--------|------|----|----|-----------|
|                 | d      | r1   | r2 | r3 | activity  |
| mov r1 r0       | ?      | 0    | ?  | ?  | 0         |
| mov rl <i>a</i> | ?      | 10   | ?  | ?  | 1         |
| and r1 r1 #3    | ?      | 10   | ?  | ?  | 0         |
| shl r1 r1 #1    | ?      | 100  | ?  | ?  | 2         |
| shl r1 r1 #1    | ?      | 1000 | ?  | ?  | 2         |
| mov r2 r0       | ?      | 1000 | 0  | ?  | 0         |
| mov r2 <i>b</i> | ?      | 1000 | 10 | ?  | 1         |
| and r2 r2 #3    | ?      | 1000 | 10 | ?  | 0         |
| orr r1 r1 r2    | ?      | 1010 | 10 | ?  | 1         |
| mov r3 r0       | ?      | 1010 | 10 | 0  | 0         |
| mov r3 !r1, and | ?      | 1010 | 10 | 10 | 3         |
| mov $d$ r0      | 0      | 1010 | 10 | 10 | 0         |
| mov d r3        | 10     | 1010 | 10 | 10 | 1         |

| a, b            | 10,01 |      |    |    | Sensitive |
|-----------------|-------|------|----|----|-----------|
|                 | d     | r1   | r2 | r3 | activity  |
| mov r1 r0       | ?     | 0    | ?  | ?  | 0         |
| mov rl <i>a</i> | ?     | 10   | ?  | ?  | 1         |
| and r1 r1 #3    | ?     | 10   | ?  | ?  | 0         |
| shl r1 r1 #1    | ?     | 100  | ?  | ?  | 2         |
| shl r1 r1 #1    | ?     | 1000 | ?  | ?  | 2         |
| mov r2 r0       | ?     | 1000 | 0  | ?  | 0         |
| mov r2 <i>b</i> | ?     | 1000 | 01 | ?  | 1         |
| and r2 r2 #3    | ?     | 1000 | 01 | ?  | 0         |
| orr r1 r1 r2    | ?     | 1001 | 01 | ?  | 1         |
| mov r3 r0       | ?     | 1001 | 01 | 0  | 0         |
| mov r3 !r1, and | ?     | 1001 | 01 | 10 | 3         |
| mov $d$ r0      | 0     | 1001 | 01 | 10 | 0         |
| mov d r3        | 10    | 1001 | 01 | 10 | 1         |

| a, b            | 01,10 |      |    |    | Sensitive |
|-----------------|-------|------|----|----|-----------|
|                 | d     | r1   | r2 | r3 | activity  |
| mov r1 r0       | ?     | 0    | ?  | ?  | 0         |
| mov rl <i>a</i> | ?     | 01   | ?  | ?  | 1         |
| and r1 r1 #3    | ?     | 01   | ?  | ?  | 0         |
| shl r1 r1 #1    | ?     | 010  | ?  | ?  | 2         |
| shl r1 r1 #1    | ?     | 0100 | ?  | ?  | 2         |
| mov r2 r0       | ?     | 0100 | 0  | ?  | 0         |
| mov r2 <i>b</i> | ?     | 0100 | 10 | ?  | 1         |
| and r2 r2 #3    | ?     | 0100 | 10 | ?  | 0         |
| orr r1 r1 r2    | ?     | 0110 | 10 | ?  | 1         |
| mov r3 r0       | ?     | 0110 | 10 | 0  | 0         |
| mov r3 !r1, and | ?     | 0110 | 10 | 10 | 3         |
| mov $d$ r0      | 0     | 0110 | 10 | 10 | 0         |
| mov d r3        | 10    | 0110 | 10 | 10 | 1         |

| a, b            | 01, 01 |      |    |    | Sensitive |
|-----------------|--------|------|----|----|-----------|
|                 | d      | r1   | r2 | r3 | activity  |
| mov r1 r0       | ?      | 0    | ?  | ?  | 0         |
| mov rl <i>a</i> | ?      | 01   | ?  | ?  | 1         |
| and r1 r1 #3    | ?      | 01   | ?  | ?  | 0         |
| shl r1 r1 #1    | ?      | 010  | ?  | ?  | 2         |
| shl r1 r1 #1    | ?      | 0100 | ?  | ?  | 2         |
| mov r2 r0       | ?      | 0100 | 0  | ?  | 0         |
| mov r2 <i>b</i> | ?      | 0100 | 01 | ?  | 1         |
| and r2 r2 #3    | ?      | 0100 | 01 | ?  | 0         |
| orr r1 r1 r2    | ?      | 0101 | 01 | ?  | 1         |
| mov r3 r0       | ?      | 0101 | 01 | 0  | 0         |
| mov r3 !r1, and | ?      | 0101 | 01 | 01 | 3         |
| mov $d$ r0      | 0      | 0101 | 01 | 01 | 0         |
| mov d r3        | 01     | 0101 | 01 | 01 | 1         |

- > Our tool does this verification automatically for the whole program.
- ▶ It uses *symbolic computations* to keep track of possible leakages.
- The strategy is to *simulate* a CPU and memory in software, and *compute with sets* of values.
- ▶ Initially, all sensitive data values can be either 0 or 1.
- ► At each cycle and for each possible combination of actual values:
  - it looks at the Hamming weight of values and Hamming distance of updates in registers, memory, and addresses; and
  - if one can have different values, it reports a leak.
- ► This verification is independent from the code transformation.

|          | bitslice | DPL      | cost          |
|----------|----------|----------|---------------|
| code (B) | 1620     | 3056     | $\times 1.88$ |
| RAM (B)  | 288      | 352      | +64           |
| #cycles  | 78,403   | 235, 427 | $\times 3$    |
|          |          |          |               |

DPL cost.

- We attacked three implementations:
  - a bitsliced but unprotected one,
  - a DPL protected one using the two less significant bits,
  - ▶ a DPL protected one taking the hardware characterization into account.
- ▶ We took 100,000 execution traces.
- ▶ We computed the success rate of using *monobit* CPA of the output of the S-Box as a model.

- ▶ The unprotected implementation breaks using about 400 traces.
- $\blacktriangleright$  The poorly balanced one is still not broken using 100,000 traces.
- $\rightarrow$  But we want to show that the hardware characterization is beneficial!
- Let's make the attacker "cheat".
- ▶ We used our knowledge of the key to select a narrow part of the traces where we knew that the attack would work.
- We used the NICV to select the point where the signal-to-noise ratio of the CPA attack is the highest.

- ▶ The unprotected implementation breaks using about 400 traces.
- $\blacktriangleright$  The poorly balanced one is still not broken using 100,000 traces.
- $\rightarrow$  But we want to show that the hardware characterization is beneficial!
- Let's make the attacker "cheat".
- We used our knowledge of the key to select a narrow part of the traces where we knew that the attack would work.
- We used the NICV to select the point where the signal-to-noise ratio of the CPA attack is the highest.

- ▶ The unprotected implementation breaks using 138 traces.
- The poorly balanced one breaks using 1,470 traces.
- The better balanced one breaks using 4,810 traces.



Monobit CPA attack on unprotected bitslice implementation.



Monobit CPA attack on poorly balanced DPL implementation (bits 0 and 1).



Monobit CPA attack on better balanced DPL implementation (bits 1 and 2).