ardrand/tex/status.tex

327 lines
16 KiB
TeX

\documentclass[a4paper]{article} % Switch to report for front page
\usepackage{amsmath,amsfonts,amsthm,amssymb}
\usepackage[utf8]{inputenc}
%\usepackage[icelandic]{babel}
\usepackage[T1]{fontenc}
\usepackage{setspace} % Allows more fine-grained control over line spacing
\usepackage{fancyhdr} % Headers and footers
\usepackage{lastpage} % Allows references to the last page of the document
\usepackage{chngpage} % Change format mid-page ?
\usepackage{soul} % Highlights text, with \hl{}
\usepackage[usenames,dvipsnames]{color} % Add color to text
\usepackage{graphicx,float,wrapfig}
\usepackage{ifthen} % \ifthenelse{}
\usepackage{listings}
%\usepackage{courier} % Write in a monospace font
%\usepackage{geometry} % Because 'fullpage' is outdated
\usepackage{hyperref}
\newcommand{\Title}{The Arduino as a Hardware Random-Number Generator}
\newcommand{\SubTitle}{Status Report}
\newcommand{\DueDate}{\today} % Or \today
\newcommand{\Class}{Ardrand}
\newcommand{\AuthorClearSpace}{3in} % How far below the title the author name shoudl appear
\newcommand{\ClassInstructor}{Ýmir Vigfússon}
\newcommand{\AuthorName}{Benedikt Kristinsson}
\newcommand{\DueLang}{} % Icelandic (perhaps some ifelse on language pack)
%\newcommand{\DueLang}{Due on} % English
%\topmargin=-0.45in
%\evensidemargin=0in
%\oddsidemargin=0in
%\textwidth=6.5in
%\textheight=9.0in
%\headsep=0.25in
% This is the color used for comments below
\definecolor{MyDarkGreen}{rgb}{0.0,0.4,0.0}
\definecolor{MyDarkRed}{rgb}{0.4,0.0,0.0}
\lstloadlanguages{Python}
\lstset{language=Python,
%frame=single, % Single frame around code
%basicstyle=\small\ttfamily, % Use small true type font
basicstyle=\small, % Don't use ttf
keywordstyle=[1]\color{Blue}, %\bf % functions green (bold commented out)
keywordstyle=[2]\color{Green}, % function arguments purple
keywordstyle=[3]\color{Red}\underbar, % User functions underlined and blue
identifierstyle=,
commentstyle=\usefont{T1}{pcr}{m}{sl}\color{MyDarkRed}\small,
stringstyle=\color{MyDarkGreen}, % Strings
showstringspaces=false, % Don't put marks in string spaces
tabsize=4,
% To add more keywords
%morekeywords={},
% Function parameters
%morekeywords=[2]{on, off, interp},
%%% Put user defined functions here
%morekeywords=[3]{FindESS, homework_example},
%
%morecomment=[l][\color{Grey}]{...}, % Line continuation (...) like blue comment
%numbers=left, % Line numbers on left
%firstnumber=1, % Line numbers start with line 1
%numberstyle=\tiny\color{Grey}, % Line numbers are blue
%stepnumber=1 % Line numbers go in steps of 1
}
% Setup the header and footer
\pagestyle{fancy} % Pagestyle allows for header/foother
%\pagestyle{plain} % No header/footerer
\lhead{\AuthorName}
\chead{\SubTitle}
\rhead{\Class}
%\cfoot{\thepage}
% \rfoot{Page\ \thepage\ of\ \protect\pageref{LastPage}}
\renewcommand\headrulewidth{0.4pt}
%\renewcommand\footrulewidth{0.4pt}
% This is used to trace down (pin point) problems in latexing a document:
%\tracingall
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Some tools
% Includes a figure
% The first parameter is the label, which is also the name of the figure
% with or without the extension (e.g., .eps, .fig, .png, .gif, etc.)
% IF NO EXTENSION IS GIVEN, LaTeX will look for the most appropriate one.
% The second parameter is the width of the figure normalized to column width
% (e.g. 0.5 for half a column, 0.75 for 75% of the column)
% The third parameter is the caption.
\newcommand{\scalefig}[3]{
\begin{figure}[ht!]
% Requires \usepackage{graphicx}
\centering
\includegraphics[width=#2\columnwidth]{#1}
%%% I think \captionwidth (see above) can go away as long as
%%% \centering is above
%\captionwidth{#2\columnwidth}%
\caption{#3}
\label{#1}
\end{figure}}
% Includes code
% The first parameter is the label, which also is the name of the script
% with the file extension the '#1' in the command belove
% The second parameter is the optional caption.
\newcommand{\code}[2]
{\begin{itemize}\item[]\lstinputlisting[caption=#2,label=#1]{#1}\end{itemize}}
\setcounter{secnumdepth}{0}
\newcommand{\problem}[2]
{
\subsubsection*{\sc{#1}}
#2
}
\newcommand{\tmpsection}[1]{}
\let\tmpsection=\section
\renewcommand{\section}[2]{
\ifthenelse{
\equal{#2}{Heimildir} % I have to be oddly specific here
}
{
\tmpsection{\sc{#1} }
\tmpsection{\sc{#2} }
}
{\tmpsection{\sc{#1} } }
}
\title{
\Class:\ \Title
\ifthenelse{\equal{\SubTitle}{}}{}{\\{\SubTitle}}
}
\date{\small{\DueLang\ \DueDate}}
\author{\AuthorName\\Advisor: \ClassInstructor}
\begin{document}
\maketitle
% Uncomment the \tableofcontents and \newpage lines to get a Contents page
% Uncomment the \setcounter line as well if you do NOT want subsections
% listed in Contents
% Remeber to compile twice
%\setcounter{tocdepth}{1}
%\tableofcontents
%\newpage
%\clearpage
%x\section{Lausn verkefnis og útfærsla}
\section{Introduction}
It has been somewhat of an urban legend surrounding the Arduino that it can easily be used as a hardware random number generator. The idea for this project came while working on the TSeense project\footnote{\url{http://code.google.com/p/tsense}} in the summer of 2010.
The Arduino has 6 analog inputs (see \url{http://arduino.cc/en/Reference/AnalogRead}) that are 10-bit analog to digital converters.
When not connected to anything these outputs should be influenced by analog noise, much in the same was a static on a radio that is not tuned in to anything. This is dependent on the design of the connectors and some other factors.
\subsection{Goals}
The original, yet optimistic, goal was to design and implement a True Random-Number Generator on the Arduino. As the project has progressed it has become evidently clear that it is not possible to do this properly (without adding some hardware, such as a diode and read the noise in its p-n junction).
However, Arduino themselves claim in the Arduino Reference Manual\footnote{\url{http://arduino.cc/en/Reference/HomePage}} that \texttt{analogRead()} can be used to generate a ``fairly random'' integer to seed the PRNG algorithm\footnote{\url{http://arduino.cc/en/Reference/Random}}. I claim that the vanilla output of \texttt{analogRead()} is not even ``fairly random''. The goal of the project is now to show this and ultimately write a program that given a PRNG sequence from the Arduino, finds the seed value.
\section{Attempts at harvesting entropy}
The Arduino toolkit has the \texttt{analogRead()} function that reads from a given analog pin on the board and returns a 10-bit integer. This is what we are trying to use in order to extract entropy on the Arduino. The function maps voltages between 5 and 0 volts to integers in the range $[0..1023]$.
The raw readings should be influenced by analog noise but our experiments have shown that the output is fairly regular, although operating conditions can influence it greatly. We see a much wider range of values when operating in either heat or cold (such as a heating element or freezer). As Figures 3 and 4 show, there are certain interference patterns that show up (they are rather interesting). Although we are not sure what causes them, electrical fluctuations seem like a worthy candidate. In the case of the fridge, it stops ``operating'' when it reaches a certain temperature and starts up again when the temp rises too high. It seems very likely that it gives away electromagnetical waves that ``disturb'' the readings of \texttt{analogRead()}. These patterns might also be a product of the nature of the analog pins themselves, or their manufacturing process. Unfortunately we bricked one Arduino by leaving it in the fridge overnight.
As we can see in Figure 1, the output of \texttt{analogRead()} spans a very short range. The different locations even give off very similar readings at the same or similar temperatore. Notice that the purple values are slightly different --- they are collected inside a standard Dragon computer case. The temperature is slightly above room temperature and it might be affected by electromagnectic waves amongst other things.
Figure 2 shows a short span of a sample collected at room temperature in a bedroom. Just by staring at the plot for a minute or two, we can see that there isn't much going on --- the entropy is very low. This causes very low performance in generating the random bits. We have seen bitrates between 3 bps and 25 bps.
This data should disprove the claims put forth in the Arduino reference manual.
\subsection{Requires further investigation}
Note the drop at the beginning for all readings. This needs to be investigated further. We have collected some data but nothing conclusive yet.
\section{The (in)feasibility of the Arduino as a TRNG}
So far we have implemented and tested roughly three different algorithms to generate random bits. So far none has even passed the monobit test according to the FIPS specifications\footnote{See Menezes Ch.5 and FIPS 140-1}.
\subsection{RAND Algorithms}
Two algorithms have been implemented and tested. One more has been implemented but not tested enough (due to the unfortunately-timed-bricked Arduino).
The \texttt{Mean-RAND} algorithm is implemented by keeping a list of the $k$ last values and their mean. Then we compare the new reading to the mean and evalute to 0 if it is less, otherwise 1. To remove bias and lessen correlation we run it through the von Neumann-box.
The von Neumann box is a function that inputs two bits. If we pass 00 or 11 to it, it discards the bits. If we input 01 then it outputs a 0 and if 10 then it outputs a 1.
\begin{lstlisting}[caption=The \texttt{Mean-RAND} algorithm in Python-ish pseudocode]
def meanrand(n):
buf = deque([0]*k)
meanval = sum(buf)/len(buf)
for i in [0..n]:
while True:
meanval -= buf.pop()/k
buf.push(analogRead())
meanval += buf[-1]/k
m = ceil(meanval)
b0 and b1 = 1 if analogRead() > m else 0
if b0 == b1:
discard
else:
break # Break out of the vN-box
if b0 == 1:
yield '1'
else:
yield '0'
\end{lstlisting}
The \texttt{Updown-RAND} algorithm first reads an initial value $v_0$ which is then used to determins if the next bit value $v_1$ is 1 if $v_1 > v_0$ and 0 otherwise. We do this twice, i.e. we collect $v_{1,0}$ and $v_{1,1}$ and compare them with the von Neumann box until we obtain a legit bit.
\begin{lstlisting}[caption=The \texttt{Updown-RAND} algorithm]
def updownrand(n):
v0 = analogRead()
for i in [0..n]:
while True:
v10 and v11 = 1 if analogRead() > v0 else 0
if v10 == v11:
discard
else:
break # Break out of the vN-box
if v10 == 1:
yield '1'
else:
yield '0'
\end{lstlisting}
A third algorithm, \texttt{MixUpDownMean-RAND} has been implemented but we do not have any conclusive results on it yet, since the Arduino meant for testing it is bricked. It works by generating one bit $b_M$ from \texttt{Mean-RAND} and another bit $b_U$ from \texttt{Updown-RAND} and then return $b = b_M \oplus b_U$
\begin{lstlisting}[caption=The \texttt{MixUpDownMean-RAND} algorithm]
def mixrand(n):
m = meanrand(n)
u = updownrand(n)
for i in [0..n]:
while True:
bm = m.next()
bu = u.next()
if bm == bu:
discard
else:
break # Break out of the vN-box
yield bm^bu
\end{lstlisting}
\subsection{Testing for entropy}
The Monobit test measures if the number of 1's and 0's are approximately the same, which we would expect from a random bit sequence. We let $n_1$ and $n_o$ denote the number of 1's and 0's, where $n = n_0 + n_1$. Then the stastic used is
\[
X_1 = \frac{(n_0 - n_1)^2}{n}.
\]
The FIPS 140-1 docmument specifies certain requirements that a RNG has to satisfy. The odd thing is that FIPS doesnt say anything about the statistic $X_1$ itself for the Monobit test. Rather, it says that a bitstring $s$ of length 20,000 has to to have a value $n_1$ that satisfies
\[
9654 < n_1 < 10346.
\]
Neither the \texttt{Mean-RAND} or \texttt{Updown-RAND} have been able to produce a satisfying $n_1$. Further, due to the lack of entropy, generating 20,000 bits has taken anywhere between 13 minutes to 108 minutes in our experiments (Or, 3 bps to 25 bps).\\
\begin{centering}
\begin{tabular}{l l}
Algorithm & $n_1$ (best observed case) \\
\hline
\hline
\texttt{Mean-RAND} & Fill me in \\
\texttt{Updown-RAND} & Fill me in too \\
\hline
& \\
\end{tabular}
\end{centering}
Until we have something that actually passes the Monobit test (the most basic of the tests) we will not use more advanced tests.
\subsection{Disproving claims made by Arduino}
Since we have found that using a vanilla Arduino as a RNG is infeasible, this changes the goal of the project. Instead we will show that what the Arduino manual claims to be a ``fairly random'' seed is, on the contrarly, not that random at all and a very bad seed source.
First off, \texttt{analogRead()} returns a 10-bit integer. The \texttt{randomSeed()} function inputs a regular 32-bit integeger, i.e. there are 22 bits left ``unutilized''. Instead of a full range from $[0..2^{32}]$ we have the much smaller range $[0..2^{10}]$.
By looking at Figure 1, we can see that for almost every single reading under normal operating conditions we read a value roughly in the range $[210..375]$. This means that a possible adversary would only have to try roughly 160 different seeds or psuedo-random sequences. With a truly random seed, he would have to try $2^{32}$ different seeds, but in the case of \texttt{analogRead()} he can do much better.
Even under the more extreme conditions there is a fairly low number of seed values or sequences the adversary would have to try. In the case of the freezer, we see output in the range of $[50..410]$, that is 360 possible values. The most interesting case is the heating element, were the large cluster of values is in the interval $[200..500]$ but we see values all the way down to 0 much more commonly than in the case of the room temperature experiments.
The drops in the beginning have to be investigated further before we can say anything conclusive about those values. The bottom values appear rather linerally and non-randomly and seem to happen when we read with short enough intervals and the serial buffer fills up or sends malformed data (We have found out heuristically (?) that $0.02\%$ of all readings are either empty of malformed strings).
\subsection{Fallback}
The original goal obviously doesn't hold any more and we have shown that using \texttt{analogRead()} as a seed source is a bad idea. The fallback-plan is to implement a program that finds the seed given a PRNG-sequce.
\section{Figures}
Will be small in print, buf if you are reading the PDF you can zoom in (.png pictures)
\scalefig{img/RoomTempOverlay.png}{0.5}{Readings of \texttt{analogRead()} at room temperature}
\scalefig{img/Room_1500-1700_zoom.png}{1}{Reading at room temp. Shows $x = [1500, 1700]$}
\scalefig{img/Fridge50k.png}{0.5}{Readings inside a fridge at $1^\circ$C}
\scalefig{img/Freezer10k.png}{0.5}{Readings inside a freezer at $-11^\circ$C}
\scalefig{img/HeatingElement50k.png}{0.5}{Readings on a heating element at approx $40^\circ$C}
\end{document}