From: twisti Date: Wed, 28 Jan 2004 10:51:58 +0000 (+0000) Subject: Second try. X-Git-Url: http://wien.tomnetworks.com/gitweb/?p=cacao.git;a=commitdiff_plain;h=b26f4bc2a8cdf8b44aa4fee85145b4efb89c0934 Second try. --- diff --git a/doc/handbook/x86.tex b/doc/handbook/x86.tex index 9aeee691f..c0baf95c6 100644 --- a/doc/handbook/x86.tex +++ b/doc/handbook/x86.tex @@ -38,9 +38,9 @@ arguments on the stack. The store size depends on the data type (the following types reflect the JAVA data types): \begin{itemize} -\item \texttt{byte}, \texttt{char}, \texttt{short}, \texttt{int}, - \texttt{float}, \texttt{void} --- 4 bytes -\item \texttt{long}, \texttt{double} --- 8 bytes + \item \texttt{byte}, \texttt{char}, \texttt{short}, \texttt{int}, + \texttt{float}, \texttt{void} --- 4 bytes + \item \texttt{long}, \texttt{double} --- 8 bytes \end{itemize} We changed this convention in a way, that we are using always 8 bytes @@ -109,4 +109,179 @@ not generated by CACAO. Since the i386, with it's i387 counterpart or the i486, the x86 processor has an \textit{floating point unit} (FPU). This FPU is -implemented as a stack with 8 elements. +implemented as a stack with 8 elements (see table \ref{FPUStack}). + +\begin{table*} +\begin{center} +\begin{tabular}[b]{|l|l|} +\hline + & FPU Data Register Stack \\ \hline +0 & TOS (Top Of Stack) \\ \hline +1 & \\ \hline +2 & \\ \hline +3 & \\ \hline +4 & \\ \hline +5 & \\ \hline +6 & \\ \hline +7 & \\ \hline +\end{tabular} +\caption{x87 FPU Data Register Stack} +\label{FPUStack} +\end{center} +\end{table*} + +This stack is designed to wrap around if values are loaded to the +\textit{top of stack} (TOS). For this reason, it has a special register which +points to the TOS. This pointer is increased if a load instruction to +the TOS is executed and decreased for a store from the TOS. + +The x86 FPU can handle 3 float data types: + +\begin{itemize} + \item single-precision (32 bit) + \item double-precision (64 bit) + \item double extended-precision (80 bit) +\end{itemize} + +The FPU has a 16 bit \textit{control register} which has a +\textit{precision control field} (PC) and a \textit{rounding control +field} (RC), each of 2 bit length (see table \ref{PCField} and +\ref{RCField}). + +\begin{table*} +\begin{center} +\begin{tabular}[b]{|l|c|} +\hline +Precision & PC Field \\ \hline +single-precision (32 bit) & 00B \\ \hline +reserved & 01B \\ \hline +double-precision (64 bit) & 10B \\ \hline +double extended-precision (80 bit) & 11B \\ \hline +\end{tabular} +\caption{Precision Control Field (PC)} +\label{PCField} +\end{center} +\end{table*} + +\begin{table*} +\begin{center} +\begin{tabular}[b]{|l|c|} +\hline +Rounding Mode & RC Field \\ \hline +round to nearest (even) & 00B \\ \hline +round down (toward -$\infty$) & 01B \\ \hline +round up (toward +$\infty$) & 10B \\ \hline +round toward zero (truncate) & 11B \\ \hline +\end{tabular} +\caption{Rounding Control Field (RC)} +\label{RCField} +\end{center} +\end{table*} + +The internal data type used by the FPU is always the \textit{double +extended-precision} (80 bit) format. Therefore implementing a IEEE 754 +compliant floating point code on x86 processors is not trivial. + +Correct rounding to \textit{single-precision} or +\textit{double-precision} is only done if the value is stored into +memory. This means for certain instructions, like \texttt{FMUL} or +\texttt{FDIV}, a special technique called \textit{store-reload}, has +to be implemented. This technique is in fact very simple but takes two +memory accesses more for this instruction. + +For single-precision floats the \textit{store-reload} code could looks +like this: + +\begin{verbatim} +i386_fstps_membase(REG_SP, 0); /* store single-precision float to stack */ +i386_flds_membase(REG_SP, 0); /* load single-precision float from stack */ +\end{verbatim} + +Another technique which has to be implemented to be IEEE 754 +compliant, is \textit{precision mode switching}. Mode switching on the +x86 processor is made with the \texttt{fldcw} (load control word) +instruction. A \texttt{fldcw} instruction has a quite large overhead, +so frequently mode switches should be avoided. For this technique +there are two different approaches: + +\begin{itemize} + \item \textbf{Mode switch on float arithmetic} --- switch the FPU on + initialization in one precision mode, mostly \textit{double-precision + mode} because \texttt{double} arithmetic is more common. With this + setting \texttt{doubles} are calculated correctly. To handle + \texttt{floats} in this approach, the FPU has to be switched into + \textit{single-precision mode} before each \texttt{float} instruction + and switched back afterwards. Needless to say, that this is only + useful, if \texttt{float} arithmetic is sparse. For a simple + calculation like + + \begin{verbatim} + float f = 1.234F; + float g = 2.345F; + float e = f * f + g * g; + \end{verbatim} + + the generated ICMD's look like this (with comments where precision + mode switches take place): + + \begin{verbatim} + ... + + FLOAD 1 + FLOAD 1 + + FMUL + + FLOAD 2 + FLOAD 2 + + FMUL + + + FADD + + FSTORE 3 + ... + \end{verbatim} + + For this corner case situation the mode switches are extrem, but the + example should demonstrate how this technique works. + + \item \textbf{Mode switch on float data type change} --- switch the + FPU on initialization in on precision mode, like before, in + \textit{double-precision mode}. But the difference on this approach + is, that the precision mode is only switched if the float data type + is changed. That means if your calculation switches from + \texttt{double} arithmetic to \texttt{float} or backwards. This + technique makes much sense due to the fact that there are always a + bunch of instructions of the same data type in one row in a normal + program. Now the same example as before with this approach: + + \begin{verbatim} + ... + + FLOAD 1 + FLOAD 1 + + FMUL + FLOAD 2 + FLOAD 2 + FMUL + FADD + FSTORE 3 + ... + \end{verbatim} + + After this code sequence the FPU is in \textit{single-precision mode} + and if a function return would occur, the caller function would not + know in which mode the FPU is currently. One solution would be to + reset the FPU to \textit{double-precision} on a function return, if + the actual mode is \textit{single-precision}. +\end{itemize} + +TODO: description of stack-to-register mapping + +None of these techniques worked prefectly for CACAO, so the decision +was to store all \texttt{float}'s and \texttt{double}'s into memory, +so the rounding was correct. This behaviour will be changed, when we +have found a solution that works.