1_isacmp/arm.tex

   1 \section{ARM}
   2
   3 ARM wurde anfangs von der Firma ACORN entwickelt und steht f\"ur Advanced Risc Machine. Es existieren
   4 mehrere verschiedene Versionen des Instruktionssatzes, diese Ausarbeitung gibt aber lediglich einen \"Uberblick
   5 \"uber die prinzipiellen Konzepte der Architektur.
   6
   7 \subsection{Einsatzgebiet}
   8
   9 Im allgemeinen wird bei ARM eine Instruktionsl\"ange von 32 Bits verwendet. Die gek\"urzte Thumb-Version verwendet lediglich 16 Bit.
  10 Es existieren 31 general-purpose Register, 16 davon k\"onnen direkt \"uber die Registeradressen in den Instruktionen adressiert werden.
  11 Die anderen Register werden verwendet um die Fehlerbehandlung zu beschleunigen. Aus diesem Grund beschreibt die ARM Architektur
  12 eine Registermaschine mit einer RISC Architektur.
  13
  14 Die Register \texttt{R13}, \texttt{R14} und \texttt{R15} finden besondere Verwendung. \texttt{R13} ist der Stack-Pointer, \texttt{R14} das Link Register und \texttt{R15} der Program-Counter.
  15 Diese Register k\"onnen allerdings auch \"uber normale Instruktionen beschrieben werden. Das Link-Register beinhaltet die R\"ucksprungadresse
  16 im Falle eines Subroutinen-Calls.
  17
  18 ARM bietet eine breite Varianz an Kernen f\"ur den Einsatz in verschiedenen Gebieten.
  19 Beispielsweise ist die ARM710 Familie designed um in Hand-Helds und anderen Multimediabereichen Anwendung zu finden.
  20 Die leistungsf\"ahigere ARM10 Familie bietet eine Vector-Floating-Point Einheit. Der Cortex A8, welcher den ARMv7 Instruktionssatz
  21 verwendet, betreibt das IPhone 3GS, der Nintendo DS wird auch von einem ARM angetrieben. Diverse Linux-Distributionen laufen auf leistungsf\"ahigeren ARM-Prozessoren.
  22 Einem Bericht aus 2007 zufolge verwenden zirka 98 Prozent der verkauften Mobiltelefone einen ARM Prozessor (Wikipedia).
  23
  24 %\subsection{Where are processors that implement the ISA deployed? In embedded systems (microcontrollers,
  25 %communication, multimedia), in servers, in desktop computers?}
  26
  27 %ARM offers a big variety of cores for usage in different areas.
  28 %The ARM710 family is designed for usage in hand-helds and multimedia. The more stronger ARM10 family is providing a Vector Floating Point unit.
  29 %The Cortex A8, which uses the ARMv7 ISA, powers the IPhone 3GS. Many Linux distributions work on stronger ARM processors.
  30 %Nintendo DS uses an ARM core.
  31 %As of 2007, about 98 percent of the more than one billion mobile phones sold each year use at least one ARM processor (Wikipedia).
  32
  33
  34 %\subsection{Does the ISA describe an accumulator, a register, or a stack machine and does it describe a CISC or RISC architecture?}
  35
  36 %Except from the Thumb-Version, the instructions have a fixed length of 32 bit and can address one destination and two
  37 %operand registers at once. ARM offers 31 general-purpose registers, where at any time 16 of them can be addressed by the
  38 %register specifiers in the instructions. The other registers are used to speed up the exception processing.
  39 %So this describes a register machine with a RISC architecture.
  40
  41 %Registers R13, R14 and R15 are used as the Stack Pointer (R13), the Link Register (R14) and the Program Counter (R15),
  42 %but they can be addressed via the normal instruction set. The Link Register contains the return address in case of a
  43 %sub routine call.
  44
  45 \subsection{Conditional Instructions and Jumps}
  46
  47 Im Vergleich zu \"alteren RISC Architekturen kommt ARM ohne die Verwendung von Branch-Delay-Slots aus, somit sind Latenzen
  48 nicht am Software-Level sichtbar. Des weiteren gibt es Predicated Instructions, welche erlauben, dass die Instruktion nur
  49 im Falle eines gesetzten Pr\"adikats ausgef\"uhrt wird. ARM verwendet f\"ur nahezu jede Instruktion 4 Bits an Pr\"adikaten. Somit
  50 k\"onnen theoretisch 16 verschiedene Ausf\"uhrungsm\"oglichkeiten abgeleitet werden, es werden 15 gen\"utzt.
  51 Nach dem Testen einer Bedingung (\texttt{CMP}) wird dieses Ergebnis im
  52 Statusregister vermerkt (zum Beispiel \texttt{CMP R1, R2}).
  53 Nun kann beispielsweise \texttt{ADDEQ R0, R2, R3} und \texttt{ADDNE R7, R2, R3} folgen, was folgenden Code abbildet:
  54 \begin{lstlisting}[caption=Beispiel]{}
  55 if R1 = R2
  56         R0 = R2 + R3
  57 else
  58         R7 = R2 + R3
  59 \end{lstlisting}
  60 Somit k\"onnen kleine if-Bl\"ocke auf sequenzielle Statements ohne Sprung abgebildet werden.
  61 Sollte f\"ur eine Instruktion die Kondition nicht erf\"ullt sein wird diese einfach durch ein \texttt{NOP} ersetzt.
  62
  63 Ein m\"ogliches Pr\"adikat ist auch Always, somit wird keine Bedingung im Statusregister gepr\"uft.
  64
  65 %\subsection{Are latencies handled by the hardware or are they visible at the ISA level? For example,
  66 %branch delay slots expose the latencies of branches at the ISA level.}
  67
  68 %Instead of older RISC Systems ARM doesn't have a branch delay slot, so latencies are not visible at the software level.
  69 %Conditional instructions allow to implement small if-blocks in a fast and code-decimating way.
  70
  71 %\subsection{Are conditional branches (compute condition and jump) performed in a single step, or
  72 %are testing and branching unbundled?}
  73
  74 %Every ARM Instruction as a 4-bit condition field at the beginning. The predicated instruction is only executed, when the condition
  75 %fits to the bits set in the state register. In example the bit vector $(0000)_2$ means equal, $(0001)_2$ means non equal and $(11110)_2$ means always.
  76 %In case the condition doesn't fit, the instruction is replaced by one NOP.
  77 %To update the state register a compare instruction is needed. So you first have to test the condition and afterwards
  78 %you can place as many conditional statements as needed.
  79
  80 \subsection{Ziele}
  81
  82 Da diese Architektur immer h\"aufiger in mobilen Multimediager\"aten eingesetzt wird liegen die Ziele in
  83 den folgenden Bereichen: Performance, Die Area, Energy Efficiency und Code Size. Die Liste k\"onnte noch weiter
  84 fortgesetzt werden, dies sind aber die wichtigsten Punkte.
  85 Energieeffizient ist wichtig f\"ur lange Batterielebenszeiten bzw. das nicht vorhanden sein von diversen K\"uhleinrichtungen.
  86 Die Codesize wird durch das Verwenden von Predicated Statements und der Destination + 2 Operanden Register Adressierung pro Befehl
  87 realisiert. Dieser Punkt wirkt sich auch positiv auf die Performance des Prozessors aus. Allerdings m\"ussen durch die
  88 Load/Store Architektur Variablen immer zuerst in Register geladen werden.
  89
  90 %\subsection{What are the goals of the architecture? Performance, die area, energy efficiency, code
  91 %size, . . .  and how are these goals reflected in the ISA?}
  92
  93 %Because of the extensive use in mobile and multi media devices, the goals are all of the mentioned parts.
  94 %Energy efficiency plays a huge role as far as you don't have the posibility to add a cooling device to your smartphone.
  95 %Code size is reflected in the ISA by using predicated instructions. The code gets much smaller when you don't need to
  96 %add a branch for every small if-else-block. The opportunity of accessing 2 operand registers in a computation instruction
  97 %and the load multiple instruction are increasing the performance of the system.
  98
  99 \subsection{Gut gel\"ost bzw. Verbesserungsw\"urdig}
 100
 101 Das Verwenden von Predicated Statements ist meiner Meinung nach sehr intelligent verwirklicht, da man
 102 das Ausf\"uhren eines Befehls von einer breiten Varianz an Bedingungen abh\"angig machen kann. Außerdem
 103 ist es durch Verwendung von konditionierten Instruktionen m\"oglich eine konstante Ausf\"uhrungszeit eines Algorithmus zu erzielen.
 104
 105 Ein negativer Aspekt ist das Fehlen von Shift- bzw. Rotate Instruktionen. Diese werden direkt in den einzelnen Befehlen
 106 implementiert, man kann solches Verhalten auf einen Operanden einer Anweisung anwenden. Dies hat aber nat\"urlich auch seine Vorteile.
 107
 108 %\subsection{Which features of the ISA do you like, which features would you change?}
 109
 110 %The usage of the predicated statements is very genious, because this firstly smalls the code size and secondly is a very important
 111 %feature for WCET. It allows you to implement algorithms having a constant execution time, what is very necessary in safety-critical and fault
 112 %tolerant real time systems.
 113 %Barrel shifter (shift as part of an instruction) is great, but a single shift operation is missing.
 114
 115 \subsection{Implementierung des Beispielcodes}
 116
 117 \begin{lstlisting}[caption=ARM Code]{ARM-Code}
 118         ; r0 = len
 119         ; r1 = ptr to arr
 120         ; wenn len=0 return 0
 121         cmp r0, #0
 122         moveq r0, 0
 123         bxeq lr;
 124         add r0, r1, r0, LSL#2
 125         mov r2, r1
 126 loop:
 127         ldr r3, [r1], #4
 128         add r2,r2,r3
 129         cmp r1, r0
 130         bne loop;
 131         mov r0,r2
 132         bx lr
 133 \end{lstlisting}
 134 Jeder Schleifendurchlauf f\"uhrt ein Load, ein Add, ein Compare und ein Branch-Equal aus.
 135 Implementiert auf einem ARM7TDMI, dieser verwendet eine dreistufige Pipeline, erhalten wir folgende
 136 Aussagen \"uber die Clockcycles:
 137
 138 \texttt{ldr} 3, \texttt{add} 1, \texttt{cmp} 1, \texttt{bne} 3, was bedeutet, dass ein Schleifendurchlauf acht Zyklen ben\"otigt (Informationen aus dem Manual des Prozessors entnommen).
 139
 140 Die Anzahl der Instruktionen in der Schleife ist vier, da auch der Sprung zum
 141 Schleifenbeginn mitgez\"ahlt wird. Die Codesize der Schleife betr\"agt 16 Byte,
 142 da jede Instruktion eine L\"ange von 32 Bit hat.