doc/handbook/x86.tex

   1 \section{IA32 (x86, i386) code generator}
   2 \label{sectionia32codegenerator}
   3
   4
   5 \subsection{Introduction}
   6
   7 The IA32 architecture is the most important architecture on the
   8 desktop market. Since the current IA32 processors are getting faster
   9 and more powerful, the IA32 architecture also becomes more important
  10 in the low-end and mid-end server market. Major Java Virtual Machine
  11 vendors, like Sun or IBM, have highly optimized IA32 ports of their
  12 Virtual Machines, so it's fairly important for an Open Source Java
  13 Virtual Machine to have a good IA32 performance.
  14
  15 Porting CACAO to the IA32 platform was more effort than
  16 expected. CACAO was designed to run on RISC machines from ground up,
  17 so the whole code generation part has to be adapted. The first
  18 approach was to replace the simple RISC macros with IA32 code, but
  19 this turned out to be not successful. So new IA32 code generation
  20 macros were written, with no respect to the old RISC macros.
  21
  22 Some smaller problems occured since the IA32 port was the first 32 bit
  23 target platform, like segmentation faults due to heap corruption,
  24 which turned out to be a simple \texttt{for} loop bug only hit on 32
  25 bit systems. Most of the CACAO system already was
  26 \textit{32-bit-ready}, namely an architecture dependent
  27 \texttt{types.h} with definitions of the used datatypes and some
  28 feature flags, which features the processor itself natively
  29 supports. Most noticeable change was the \texttt{s8} and \texttt{u8}
  30 datatype, changed from \texttt{long} to \texttt{long long} to support
  31 64 bit calculations.
  32
  33
  34 \subsection{Code generation}
  35
  36 One big difference in writing the new code generation macros was, that
  37 the IA32 architecture is not a \textit{load-store architecture} like
  38 the RISC machines, but the \textit{machine instructions} can handle
  39 both \textit{memory operands} and \textit{register operands}. This led
  40 to a much more complicated handling of the various ICMDs. The typical
  41 handling of an ICMD on RISC machines looks like this (on the example
  42 of the integer add ICMD):
  43
  44 \begin{verbatim}
  45         case ICMD_IADD:
  46             var_to_reg_int(s1, src->prev, REG_ITMP1);
  47             var_to_reg_int(s2, src, REG_ITMP2);
  48             d = reg_of_var(iptr->dst, REG_ITMP3);
  49             M_IADD(s1, s2, d);
  50             store_reg_to_var_int(iptr->dst, d);
  51             break;
  52 \end{verbatim}
  53
  54 This means loading the two \textit{source operands} from memory to
  55 temporary registers, if necessary, getting a \textit{destination
  56 register}, do the calculation and store the result to memory, if the
  57 destination variable resides in memory. If all operands are assigned
  58 to registers, only the calculation is done. This design also works on
  59 IA32 machines but leads to much bigger code size, reduces decoding
  60 bandwith and increases register pressure in the processor itself,
  61 which results in lower performance~\cite{IA32opt}. Thus CACAO uses all
  62 kinds of instruction types that are available and decide which one is
  63 used in some \texttt{if} statements:
  64
  65 \begin{verbatim}
  66         if (IS_INMEMORY(iptr->dst)) {
  67             if (IS_INMEMORY(src) && IS_INMEMORY(src->prev)) {
  68                 ...
  69             } else if (IS_INMEMORY(src) && !IS_INMEMORY(src->prev)) {
  70                 ...
  71             } else if (!IS_INMEMORY(src) && IS_INMEMORY(src->prev)) {
  72                 ...
  73             } else {
  74                 ...
  75             }
  76         } else {
  77             if (IS_INMEMORY(src) && IS_INMEMORY(src->prev)) {
  78                 ...
  79             } else if (IS_INMEMORY(src) && !IS_INMEMORY(src->prev)) {
  80                 ...
  81             } else if (!IS_INMEMORY(src) && IS_INMEMORY(src->prev)) {
  82                 ...
  83             } else {
  84                 ...
  85             }
  86         }
  87 \end{verbatim}
  88
  89 For most ICMDs the generated code can be further optimized when one
  90 source variable and the destination variable share the same local
  91 variable.
  92
  93 To be backward compatible, mostly in respect of embedded systems, all
  94 generated code can be run on i386 compatible systems.
  95
  96 Another problem was the access to the functions' data segment. Since
  97 RISC platforms like Alpha and MIPS have a procedure vector register,
  98 for the IA32 platform there had to be implemented a special handling
  99 for accesses to the data segment, like \texttt{ICMD\_PUTSTATIC} and
 100 \texttt{ICMD\_GETSTATIC} instructions. The solution is like the
 101 handling of \textit{jump references} or \textit{check cast
 102 references}, which also have to be code patched, when the code and
 103 data segment are relocated. This means, there has to be an extra
 104 \textit{immediate-to-register} move (\texttt{i386\_mov\_imm\_reg()})
 105 before every \texttt{ICMD\_PUTSTATIC}/\texttt{ICMD\_GETSTATIC}
 106 instruction, which moves the start address of the procedure, and thus
 107 the start address of the data segment, in one of the temporary
 108 registers (code snippet from \texttt{ICMD\_PUTSTATIC}):
 109
 110 \begin{verbatim}
 111         i386_mov_imm_reg(0, REG_ITMP2);
 112         dseg_adddata(mcodeptr);
 113 \end{verbatim}
 114
 115 The \texttt{dseg\_adddata()} call inserts the current postion of the
 116 code generation pointer into a datastructure which is later processed
 117 by \texttt{codegen\_finish()}, where the final address of the data
 118 segment is patched.
 119
 120
 121 \subsection{Constant handling}
 122
 123 Unlike RISC machines the IA32 architecture has \textit{immediate move}
 124 instructions which can handle the maximum bitsize of the
 125 registers. Thus the IA32 port of CACAO does not have to load big
 126 constants indirect from the data segment, which means a \textit{memory
 127 load} instruction, but can move 32 bit constants \textit{inline} into
 128 their destination registers.
 129
 130 \begin{verbatim}
 131         i386_mov_imm_reg(0xcafebabe, REG_ITMP1);
 132 \end{verbatim}
 133
 134 For constants bigger than 32 bits up to 64 bits, we split the move
 135 up into two immediate move instructions.
 136
 137
 138 \subsection{Calling conventions}
 139
 140 The normal calling conventions of the IA32 processor is passing all
 141 function arguments on the stack~\cite{IA32vol1}. The store size on the
 142 stack depends on the data type (see table
 143 \ref{ia32callingconventionstackstoresizes}).
 144
 145 \begin{table}
 146 \begin{center}
 147 \begin{tabular}[b]{|l|c|}
 148 \hline
 149 JAVA Data Type   & Bytes \\ \hline
 150 \texttt{boolean} &       \\
 151 \texttt{byte}    &       \\
 152 \texttt{char}    &       \\
 153 \texttt{short}   & 4     \\
 154 \texttt{int}     &       \\
 155 \texttt{void}    &       \\
 156 \texttt{float}   &       \\ \hline
 157 \texttt{long}    &       \\
 158 \texttt{double}  & 8     \\ \hline
 159 \end{tabular}
 160 \caption{IA32 calling convention stack store sizes}
 161 \label{ia32callingconventionstackstoresizes}
 162 \end{center}
 163 \end{table}
 164
 165 This convention has been changed for CACAO in a way, that each
 166 datatype uses always 8 bytes on the stack. due to this fact after
 167 calling the function
 168
 169 \begin{verbatim}
 170         void sub(int i, long l, float f, double d);
 171 \end{verbatim}
 172
 173 the stack layout looks like in figure \ref{stacklayout}.
 174
 175 \begin{figure}[htb]
 176 \begin{center}
 177 \setlength{\unitlength}{1mm}
 178 \begin{picture}(50,54)
 179 \thicklines
 180 \put(0,0){\framebox(24,54){}}
 181 \put(0,42){\line(1,0){24}}
 182 \put(0,36){\line(1,0){24}}
 183 \put(0,30){\line(1,0){24}}
 184 \put(0,18){\line(1,0){24}}
 185 \put(0,12){\line(1,0){24}}
 186 \put(0,6){\line(1,0){24}}
 187 \put(30,0){\vector(-1,0){6}}
 188 \put(30,3){\makebox(24,6){\textit{+4 bytes}}}
 189 \put(30,-3){\makebox(24,6){\textit{stack pointer}}}
 190
 191 \put(0,45){\makebox(24,6){\texttt{d}}}
 192 \put(0,36){\makebox(24,6){\textit{unused}}}
 193 \put(0,30){\makebox(24,6){\texttt{f}}}
 194 \put(0,21){\makebox(24,6){\texttt{l}}}
 195 \put(0,12){\makebox(24,6){\textit{unused}}}
 196 \put(0,6){\makebox(24,6){\texttt{i}}}
 197 \put(0,0){\makebox(24,6){return address}}
 198 \end{picture}
 199 \caption{CACAO IA32 stack layout after function call}
 200 \label{stacklayout}
 201 \end{center}
 202 \end{figure}
 203
 204 If the function passes a 32-bit variable, CACAO just push 4 bytes onto
 205 the stack and leave the remaining 4 bytes untouched. This does not
 206 make any problems since CACAO does not read a 64-bit value from a
 207 32-bit location. Passing a 64-bit value is straightforward.
 208
 209 With this adaptation, it is possible to use the \textit{stack space
 210 allocation algorithm} without any changes. The drawback of this
 211 decision is, that all arguments of a native function calls have to be
 212 copied into a new stack frame and the memory footprint is slightly
 213 bigger.
 214
 215 But calling a native function always means a stack manipulation,
 216 because the \textit{JNI environment}, and additionally for
 217 \texttt{static} functions the \textit{class pointer}, have to be
 218 stored in front of the function parameters. So this negligible.
 219
 220 For some \texttt{BUILTIN} functions there are assembler function
 221 counterparts, which copy the 8 byte parameters in their correct size
 222 in a new stack frame. But this only affects \texttt{BUILTIN} functions
 223 with more than one function parameter. To be precise, two functions,
 224 namely \texttt{arrayinstanceof} and \texttt{newarray}. So this is not
 225 a big speed impact.
 226
 227 Return parameters are stored in different places, this depends on the
 228 return type of the function:
 229
 230 \begin{itemize}
 231  \item \texttt{boolean}, \texttt{byte}, \texttt{char}, \texttt{short},
 232  \texttt{int}, \texttt{void}: return value resides in \texttt{\%eax}
 233  (\texttt{REG\_RESULT})
 234
 235  \item \texttt{long}: return value is split up onto the register pair
 236  \texttt{\%edx:\%eax}
 237  (\texttt{REG\_RESULT2:REG\_RESULT}, high 32-bit in
 238  \texttt{\%edx}, low 32-bit in \texttt{\%eax})
 239
 240  \item \texttt{float}, \texttt{double}: return value resides in the
 241  \textit{top of stack} element of the \textit{floating point unit}
 242  stack (\texttt{st(0)}, described in more detail in section
 243  \ref{ia32floatingpointarithmetic})
 244 \end{itemize}
 245
 246
 247 \subsection{Register allocation}
 248 \label{sectionia32registerallocation}
 249
 250 Register usage was another problem in porting the CACAO to IA32. An
 251 IA32 processor has 8 integer general-purpose registers (GPR), of which
 252 one is the \textit{stack pointer} (SP) and thus can not be used for
 253 arithmetic instructions. From the remaining 7 registers, in
 254 \textit{worst-case instructions} like \texttt{CHECKCAST} or
 255 \texttt{INSTANCEOF}, 3 temporary registers need to be reserved for
 256 storing temporary values. Due to this fact there are 4 integer
 257 registers available for arithmetic operations.
 258
 259 CACAO uses \texttt{\%ebp}, \texttt{\%esi}, \texttt{\%edi} as callee
 260 saved registers, which are callee saved registers in the IA32 ABI and
 261 \texttt{\%ebx} as scratch register, which is also a callee saved
 262 register in the IA32 ABI. The remaining \texttt{\%eax}, \texttt{\%ecx}
 263 and \texttt{\%edx} registers are used as the previously mentioned
 264 temporary registers.
 265
 266 The register allocator itself is very straightforward, this means, it
 267 does neither \textit{linear scan} nor any other analyse of the methods
 268 variables, but allocates registers for the local variables in order as
 269 they are defined---\textit{first-come-first-serve}. This may result in
 270 a fairly good register allocation on RISC machines, as there are
 271 almost always enough registers available for the functions local
 272 variables, but can result in a really bad allocation on IA32
 273 processors.
 274
 275 So the first step to make the IA32 port more competitive with Sun's or
 276 IBM's JVM would be to rewrite the register allocator.
 277
 278 Only small register allocator changes were necessary for the IA32
 279 port. Since CACAO---on the IA32 architecture---stores all
 280 \texttt{long} variables, because of lack of integer general-purpose
 281 registers, in memory locations (described in more detail in section
 282 \ref{sectionia32longarithmetic}) the register allocator has to be
 283 adapted to support this feature. This means all \texttt{long}
 284 variables are assigned to stack locations and tagged with the
 285 \texttt{INMEMORY} flag.
 286
 287
 288 \subsection{Long arithmetic}
 289 \label{sectionia32longarithmetic}
 290
 291 Unlike the PowerPC port, the IA32 port cannot easily store
 292 \texttt{long}'s in two 32-bit integer registers, since there are too
 293 little of them. Maybe this could bring a speedup, if the register
 294 allocator would be more intelligent or in leaf functions which have
 295 only \texttt{long} variables. But this is not implemented yet. So, the
 296 current approach is to store all \texttt{long}'s in memory, this means
 297 they are always spilled.
 298
 299 Nearly all \texttt{long} instructions are inline, except two of them:
 300 \texttt{ICMD\_LDIV} and \texttt{ICMD\_LREM}. These two are computed
 301 via \texttt{BUILTIN} calls. It would also be possible to inline them,
 302 but the code size would be too big and the latency of the
 303 \texttt{idiv} machine instruction is so high, that the function calls
 304 are negligible.
 305
 306 The IA32 processor has some machine instructions which are
 307 specifically designed for 64-bit operations. With them several 64-bit
 308 integer arithemtic operations can be implemented very
 309 efficiently~\cite{AMDopt}. Some of them are
 310
 311 \begin{itemize}
 312  \item \texttt{cltd} --- Convert Signed Long to Signed Double Long
 313  \item \texttt{adc} --- Integer Add With Carry
 314  \item \texttt{sbb} --- Integer Subtraction With Borrow
 315 \end{itemize}
 316
 317 Thus some of the 64-bit calculations like \texttt{ICMD\_LADD} or
 318 \texttt{ICMD\_LSUB} could be executed in two instructions, if both
 319 operand would reside in registers. But this does not apply to CACAO,
 320 yet.
 321
 322 The generated machine code of intermediate commands which operate on
 323 \texttt{long} variables instructions always operate on 64-bit, even if
 324 it is not necessary. The dependency information that would be required
 325 to just operate on the lower or upper half of the \texttt{long}
 326 variable, is not generated by CACAO.
 327
 328
 329 \subsection{Floating point arithmetic}
 330 \label{ia32floatingpointarithmetic}
 331
 332 Since the i386, with it's i387 extension or the i486, the IA32
 333 processor has a \textit{floating point unit} (FPU). This FPU is
 334 implemented as a stack with 8 elements (see table \ref{FPUStack}).
 335
 336 \begin{table*}
 337 \begin{center}
 338 \begin{tabular}[b]{|l|l|}
 339 \hline
 340 st(x) & FPU Data Register Stack \\ \hline
 341 0     & TOS (Top Of Stack) \\ \hline
 342 1     & \\ \hline
 343 2     & \\ \hline
 344 3     & \\ \hline
 345 4     & \\ \hline
 346 5     & \\ \hline
 347 6     & \\ \hline
 348 7     & \\ \hline
 349 \end{tabular}
 350 \caption{x87 FPU Data Register Stack}
 351 \label{FPUStack}
 352 \end{center}
 353 \end{table*}
 354
 355 This stack is designed to wrap around if values are loaded to the
 356 \textit{top of stack} (TOS). For this reason, it has a special register which
 357 points to the TOS. This pointer is increased if a load instruction to
 358 the TOS is executed and decreased for a store from the TOS.
 359
 360 At first sight, the stack design of the FPU is perfect for the stack
 361 based design of a Java Virtual Machine. But there is one problem. The
 362 JVM stack has no fixed size, so it can grow up to much more than 8
 363 elements and this simply results in an stack wrap around and thus an
 364 stack overflow. For this reason it it necessary to implement an
 365 \textit{stack-element-to-register-mapping}.
 366
 367 A very basic design idea, is to define a pointer to the current TOS
 368 offset (\texttt{fpu\_st\_offset}). With this offset the current
 369 register position in the FPU stack of an arbitrary register can
 370 determined. From the 8 stack elements the last two ones
 371 (\texttt{st(6), st(7)}) are reserved, so two memory operands can be
 372 loaded onto the stack and to preform an arithmetic operation. Most
 373 IA32 floating point arithmetic operations have an \textit{do
 374 arithmetic and pop one element} version of the instruction, that means
 375 the float arithmetic is done and the TOS element is poped off. The
 376 remaining stack element has the result of the calculation. On the
 377 example of the \texttt{ICMD\_FADD} intermediate command with two
 378 memory operands, it looks like this:
 379
 380 \begin{verbatim}
 381 var_to_reg_flt(s1, src->prev, REG_FTMP1); /* load 1st operand in st(0), increase
 382                                              fpu_st_offset                          */
 383 var_to_reg_flt(s2, src, REG_FTMP2);       /* load 2nd operand in st(0), increase
 384                                              fpu_st_offset                          */
 385 i386_faddp();       /* add 2 upper most elements st(1) = st(1) + st(0) -- pop st(0) */
 386 fpu_st_offset--;                        /* decrease fpu_st_offset from previous pop */
 387 store_reg_to_var_flt(iptr->dst, d); /* store result -- decrease fpu_st_offset       */
 388 \end{verbatim}
 389
 390 This mapping works very good with \textit{scratch registers}
 391 only. Defining \textit{callee saved float registers} makes some
 392 problemes since the IA32 ABI has no callee saved float registers. This
 393 would need a special handling in the \textit{native stub} of a native
 394 function, namely saving the registers and cleaning the whole FPU
 395 stack, because a C function expects a clear FPU stack.
 396
 397 Basically the IA32 FPU can handle 3 float data types:
 398
 399 \begin{itemize}
 400  \item single-precision (32 bit)
 401  \item double-precision (64 bit)
 402  \item double extended-precision (80 bit)
 403 \end{itemize}
 404
 405 The FPU has a 16 bit \textit{control register} which has a
 406 \textit{precision control field} (PC) and a \textit{rounding control
 407 field} (RC), each of 2 bit length (see table \ref{PCField} and
 408 \ref{RCField}).
 409
 410 \begin{table*}
 411 \begin{center}
 412 \begin{tabular}[b]{|l|c|}
 413 \hline
 414 Precision                          & PC Field \\ \hline
 415 single-precision (32 bit)          & 00B      \\ \hline
 416 reserved                           & 01B      \\ \hline
 417 double-precision (64 bit)          & 10B      \\ \hline
 418 double extended-precision (80 bit) & 11B      \\ \hline
 419 \end{tabular}
 420 \caption{Precision Control Field (PC)}
 421 \label{PCField}
 422 \end{center}
 423 \end{table*}
 424
 425 \begin{table*}
 426 \begin{center}
 427 \begin{tabular}[b]{|l|c|}
 428 \hline
 429 Rounding Mode                 & RC Field \\ \hline
 430 round to nearest (even)       & 00B      \\ \hline
 431 round down (toward -$\infty$) & 01B      \\ \hline
 432 round up (toward +$\infty$)   & 10B      \\ \hline
 433 round toward zero (truncate)  & 11B      \\ \hline
 434 \end{tabular}
 435 \caption{Rounding Control Field (RC)}
 436 \label{RCField}
 437 \end{center}
 438 \end{table*}
 439
 440 The internal data type used by the FPU is always the \textit{double
 441 extended-precision} (80 bit) format. Therefore implementing a IEEE 754
 442 compliant floating point code on IA32 processors is not trivial.
 443
 444 Correct rounding to \textit{single-precision} or
 445 \textit{double-precision} is only done if the value is stored into
 446 memory. This means for certain instructions, like \texttt{ICMD\_FMUL}
 447 or \texttt{ICMD\_FDIV}, a special technique called
 448 \textit{store-load}, has to be implemented~\cite{OgKoNa02}. This
 449 technique is in fact very simple but takes two memory accesses more
 450 for this instruction.
 451
 452 For single-precision floats the \textit{store-load} code could looks
 453 like this:
 454
 455 \begin{verbatim}
 456 i386_fstps_membase(REG_SP, 0);    /* store single-precision float to stack  */
 457 i386_flds_membase(REG_SP, 0);     /* load single-precision float from stack */
 458 \end{verbatim}
 459
 460 Another technique which has to be implemented to be IEEE 754
 461 compliant, is \textit{precision mode switching}. Mode switching on the
 462 IA32 processor is made with the \texttt{fldcw}---load control
 463 word---instruction. A \texttt{fldcw} instruction has a quite large
 464 overhead, so frequently mode switches should be avoided. For this
 465 technique there are two different approaches:
 466
 467 \begin{itemize}
 468  \item \textbf{Mode switch on float arithmetic} --- switch the FPU on
 469  initialization in one precision mode, mostly \textit{double-precision
 470  mode} because \texttt{double} arithmetic is more common. With this
 471  setting \texttt{double}s are calculated correctly. To handle
 472  \texttt{float}s in this approach, the FPU has to be switched into
 473  \textit{single-precision mode} before each \texttt{float} instruction
 474  and switched back afterwards. Needless to say, that this is only
 475  useful, if \texttt{float} arithmetic is sparse. For a simple
 476  calculation like
 477
 478  \begin{verbatim}
 479         float f = 1.234F;
 480         float g = 2.345F;
 481         float e = f * f + g * g;
 482  \end{verbatim}
 483
 484  the generated ICMD's look like this (with comments where precision
 485  mode switches take place):
 486
 487  \begin{verbatim}
 488         ...
 489         <fpu in double-precision mode>
 490         FLOAD         1
 491         FLOAD         1
 492         <switch fpu to single-precision mode>
 493         FMUL
 494         <switch fpu to double-precision mode>
 495         FLOAD         2
 496         FLOAD         2
 497         <switch fpu to single-precision mode>
 498         FMUL
 499         <switch fpu to double-precision mode>
 500         <switch fpu to single-precision mode>
 501         FADD
 502         <switch fpu to double-precision mode>
 503         FSTORE        3
 504         ...
 505  \end{verbatim}
 506
 507  For this corner case situation the mode switches are extrem, but the
 508  example should demonstrate how this technique works.
 509
 510  \item \textbf{Mode switch on float data type change} --- switch the
 511  FPU on initialization in on precision mode, like before, in
 512  \textit{double-precision mode}. But the difference on this approach
 513  is, that the precision mode is only switched if the float data type
 514  is changed. That means if your calculation switches from
 515  \texttt{double} arithmetic to \texttt{float} arithmetic or
 516  backwards. This technique makes much sense due to the fact that there
 517  are always a bunch of instructions of the same data type in one row
 518  in a normal program. Now the same example as before with this
 519  approach:
 520
 521  \begin{verbatim}
 522         ...
 523         <fpu in double-precision mode>
 524         FLOAD         1
 525         FLOAD         1
 526         <switch fpu to single-precision mode>
 527         FMUL
 528         FLOAD         2
 529         FLOAD         2
 530         FMUL
 531         FADD
 532         FSTORE        3
 533         ...
 534  \end{verbatim}
 535
 536  After this code sequence the FPU is in \textit{single-precision mode}
 537  and if a function return would occur, the caller function would not
 538  know which FPU mode is currently set. One solution would be to reset
 539  the FPU to \textit{double-precision mode} on a function return, if
 540  the actual mode is \textit{single-precision}.
 541 \end{itemize}
 542
 543 Each of these FPU techniques described have been implemented in
 544 CACAO's IA32 port, but the results were not completly IEEE 754
 545 compliant. So the CACAO developer team decided to be on the safe side
 546 and to store all float variables in memory, until we have found a
 547 solution which is fast and 100\% compliant.
 548
 549
 550 \subsection{Exception handling}
 551
 552 The exception handling for the IA32 architecture is implemented as
 553 intended to be for CACAO. To handle the common and unexpected, but
 554 often checked, \texttt{java.lang.NullPointerException} very fast,
 555 CACAO uses \textit{hardware null-pointer checking}. That means a
 556 signal handler for the \texttt{SIGSEGV} operating system signal is
 557 installed. If the signal is hit, the CACAO signal handler forwards the
 558 exception to CACAO's internal exception handling system. So if an
 559 instruction tries to access the memory at address \texttt{0x0}, a
 560 \texttt{SIGSEGV} signal is raised because the memory page is not read
 561 or writeable. After the signal is hit, the handler has to be
 562 reinstalled, so that further exceptions can be catched. This is done
 563 in the handler itself.
 564
 565 The \texttt{SIGSEGV} handler is used on any architecture CACAO has
 566 been ported to. Additionally on the IA32 architecture a handler for
 567 the \texttt{SIGFPE} signal is installed. With this handler a
 568 \texttt{java.lang.ArithmeticException}'s for integer \textit{/ by
 569 zero} can be catched in hardware and there is no need to write helper
 570 functions, like \texttt{asm\_builtin\_idiv}, which check the division
 571 operands as this is done for the Alpha or MIPS port.