\chapter{Run Time System} \section{Introduction} \section{Object layout} \section{Method invocation} Java and the programming language Theta do not implement multiple inheritance, but single inheritance with multiple subtyping. This important difference makes object layout and method dispatch more efficient. Although the bidirectional layout was designed for a language with multiple subtyping, it has the problem that more than one {\em vtbl} pointer has to be included in objects. The CACAO JIT compiler \cite{KrGr97b} moves the dispatch header of the bidirectional layout into the class information with negative offsets from the {\em vtbl}. For each implemented interface there is a distinct interface {\em vtbl}. Unimplemented interfaces are represented by null pointers. % \begin{figure*}[htb] \begin{center} \setlength{\unitlength}{1mm} \begin{picture}(114,72) \put( 0,42){\vector(1,0){12}} \put( 0,42){\makebox(8,6){\it objptr}} \put(12,60){\makebox(24,6){\it object}} \put(48,63){\makebox(30,6){\it class}} \put(90,66){\makebox(24,6){\it code}} \put(12,42){\makebox(24,18){\it instance data}} \put(12,36){\makebox(24,6){\it class pointer}} \put(12,42){\line(1,0){24}} \put(36,39){\vector(1,0){12}} \put(48,57){\makebox(30,6){\it method pointer}} \put(48,57){\line(1,0){30}} \put(48,51){\makebox(30,6){\it method pointer}} \put(48,51){\line(1,0){30}} \put(48,45){\makebox(30,6){\it method pointer}} \put(48,45){\line(1,0){30}} \put(48,39){\makebox(30,6){\it class info}} \put(48,27){\makebox(30,6){\it interface pointer}} \put(48,33){\line(1,0){30}} \put(48,33){\makebox(30,6){\it interface pointer}} \put(48, 0){\makebox(30,6){\it method pointer}} \put(48, 6){\makebox(30,6){\it method pointer}} \put(48, 6){\line(1,0){30}} \put(48,21){\makebox(30,6){\it interfaces}} \put(90,36){\framebox(24,6){\it method code}} \put(44,15){\vector(1,0){4}} \put(44,15){\line(0,1){15}} \put(44,30){\line(1,0){4}} \put(40, 0){\vector(1,0){8}} \put(40, 0){\line(0,1){36}} \put(40,36){\line(1,0){8}} \put(78, 3){\line(1,0){8}} \put(86, 3){\line(0,1){51}} \put(78, 9){\line(1,0){4}} \put(82, 9){\line(0,1){39}} \put(78,18){\line(1,0){4}} \put(78,54){\line(1,0){8}} \put(86,36){\vector(1,0){4}} \put(90,48){\framebox(24,6){\it method code}} \put(78,48){\vector(1,0){12}} \put(90,60){\framebox(24,6){\it method code}} \put(78,60){\vector(1,0){12}} \thicklines \put(48, 0){\framebox(30,12){}} \put(48,15){\framebox(30,6){\it method pointer}} \put(12,36){\framebox(24,24){}} \put(48,27){\framebox(30,36){}} \put(48,39){\line(1,0){30}} \end{picture} \caption{CACAO object and compact class descriptor layout} \label{objectcompactlayout} \end{center} \end{figure*} To call a virtual method, two memory access instructions are necessary (load the class pointer, load the method pointer) followed by the call instruction. Calling an interface method needs an additional indirection. % \begin{figure*}[htb] \begin{center} \setlength{\unitlength}{1mm} \begin{picture}(114,42) \put(0,15){\vector(1,0){12}} \put(0,15){\makebox(12,6){\it objptr}} \put(12,33){\makebox(24,6){\it object}} \put(48,36){\makebox(30,6){\it class}} \put(90,39){\makebox(24,6){\it code}} \put(12,15){\makebox(24,18){\it instance data}} \put(12,9){\makebox(24,6){\it class pointer}} \put(12,15){\line(1,0){24}} \put(36,12){\vector(1,0){12}} \put(48,30){\makebox(30,6){\it method pointer}} \put(48,30){\line(1,0){30}} \put(48,24){\makebox(30,6){\it method pointer}} \put(48,24){\line(1,0){30}} \put(48,18){\makebox(30,6){\it method pointer}} \put(48,18){\line(1,0){30}} \put(48,12){\makebox(30,6){\it class info}} \put(48,0){\makebox(30,6){\it ifmethod pointer}} \put(48,6){\line(1,0){30}} \put(48,6){\makebox(30,6){\it ifmethod pointer}} \put(90,3){\framebox(24,6){\it method code}} \put(78,3){\vector(1,0){12}} \put(82,3){\line(0,1){18}} \put(78,21){\line(1,0){4}} \put(78,27){\line(1,0){8}} \put(78,9){\line(1,0){8}} \put(86,9){\line(0,1){18}} \put(86,18){\vector(1,0){4}} \put(90,18){\framebox(24,6){\it method code}} \put(78,33){\vector(1,0){12}} \put(90,33){\framebox(24,6){\it method code}} \thicklines \put(12,9){\framebox(24,24){}} \put(48,0){\framebox(30,36){}} \put(48,12){\line(1,0){30}} \end{picture} \caption{CACAO object and fast class descriptor layout} \label{objectfastlayout} \end{center} \end{figure*} In the faster scheme, we store interface methods in an additional table at negative offsets from the {\em vtbl} (see figure~\ref{objectfastlayout}). Segregating the interface virtual function table keeps the standard virtual function table small and allows interface methods to be called with just two memory accesses. The memory consumption of virtual function tables containing interface and class methods would be {\em number of (classes + interfaces) $\times$ number of distinct methods}. The memory consumption of the interface tables is only {\em number of classes which implement interfaces $\times$ number of interface methods}. Colouring can be used to reduce the number of distinct offsets for interface methods further, but complicates dynamic class loading, leading to renumbering and code patching. The Jalapeno virtual machine implements an interface method invocation similar to the fast class descriptor layout of CACAO, but instead of colouring hashing of the method indices is used \cite{Alpern+01}. The table for the interface method pointers is allocated with a fixed size much smaller than the number of interface methods. When two method indices hash to the same offset, a conflict resolving stub is called instead of the interface methods directly. For conflict resolution the stub is passed the method index in a scratch register as additional argument. An interface method invocation can be executed with the following four machine instructions: % \begin{verbatim} LD vtblptr,(obj) ; load vtbl pointer LD mptr,hash(method_ptr)(vtblptr) ; load method pointer MV mindex,idreg ; load method index JSR (mptr) ; call method (or conflict stub) \end{verbatim} % The number of machine instructions is the same as in the compact class descriptor layout of CACAO, but the indirection is eliminated, which reduces the number of cycles needed for the execution of this instruction sequence on a pipelined architecture. Compared to the old interface method invocation in the Jalapeno VM, which searched the superclass hierarchy for a matching method signature, the new method resulted in changes in the run-time from one percent slowdowns to speedups up to 51 percent. \section{Run time type checking} \label{sectionruntimetypechecking} Since type tests for trees are more (space) efficient than type tests for DAGs, a possible solution is to split a DAG into a tree part and the remaining graph. For languages with single inheritance and multiple subtyping this partitioning of the class hierarchy is already done in the language itself. \begin{figure}[htb] \begin{center} \setlength{\unitlength}{1mm} \begin{picture}(86,26) \thicklines \put(23, 3){\circle{6}} \put(43, 3){\circle{6}} \put(63, 3){\circle{6}} \put(33,13){\circle{6}} \put(53,13){\circle{6}} \put(43,23){\circle{6}} \put(20, 0){\makebox(6,6){D}} \put(40, 0){\makebox(6,6){E}} \put(60, 0){\makebox(6,6){F}} \put(30,10){\makebox(6,6){B}} \put(50,10){\makebox(6,6){C}} \put(40,20){\makebox(6,6){A}} \put(19, 0){\makebox(0,6)[r]{\{3,0\}}} \put(47, 0){\makebox(6,6)[l]{\{4,0\}}} \put(67, 0){\makebox(0,6)[l]{\{6,0\}}} \put(29,10){\makebox(0,6)[r]{\{2,2\}}} \put(57,10){\makebox(0,6)[l]{\{5,1\}}} \put(47,20){\makebox(0,6)[l]{\{1,5\}}} \put(25, 5){\line( 1,1){6}} \put(35,15){\line( 1,1){6}} \put(41, 5){\line(-1,1){6}} \put(51,15){\line(-1,1){6}} \put(61, 5){\line(-1,1){6}} \end{picture} \caption{Relative numbering with \{{\tt baseval}, {\tt diffval}\} pairs} \label{relnumbering} \end{center} \end{figure} CACAO uses a subtype test based on relative numbering for classes and a kind of matrix implementation for interfaces. Two numbers {\em low} and {\em high} are stored for each class in the class hierarchy. A depth first traversal of the hierarchy increments a counter for each class and assigns the counter to the {\em low} field when the class is first encountered and assigns the counter to the {\em high} field when the traversal leaves the class. In languages with dynamic class loading a renumbering of the hierarchy is needed whenever a class is added. A class {\em sub} is a subtype of another class {\em super}, if {\em super.low $\le$ sub.low $\le$ super.high}. Since a range check is implemented more efficiently by an unsigned comparison, CACAO stores the difference between the {\em low} and {\em high} values and compares it against the difference of the {\em low} values of both classes. The code for {\tt instanceof} looks similar to: \begin{verbatim} return (unsigned) (sub->vftbl->baseval - super->vftbl->baseval) <= (unsigned) (super->vftbl->diffval); \end{verbatim} For leaf nodes in the class hierarchy the {\tt diffval} is 0 which results in a faster test (a simple comparison of the {\tt baseval} fields of the sub- and superclass). In general a JIT compiler can generate the faster test only for final classes. An AOT compiler or a JIT compiler which does patching of the already generated machine code may additionally replace both the {\tt baseval} and the {\tt diffval} of the superclass by a constant. Currently CACAO uses constants only when dynamic class loading is not used. CACAO stores an interface table at negative offsets from the virtual method table (see figure~\ref{objectcompactlayout}). This table is needed for the invocation of interface methods. The same table is additionally used by the subtype test for interfaces. If the table is empty for the index of the superclass, the subtype test fails. The code for {\tt instanceof} looks similar to: \begin{verbatim} return (sub->vftbl->interfacetable[-super->index] != NULL); \end{verbatim} Both subtype tests can be implemented by very few machine code instructions without using branches which are expensive on modern processors. \section{Exception handling} \subsection{Introduction} Exceptions in Java occur either implicitly or explicitly. Typical implicit exceptions are references to the {\tt null} pointer, array index out of bounds and division by zero. Exceptions also can be raised explicitly with the {\tt throw} instruction. To handle exceptions occurring during execution, code which can raise an exception is included in a {\tt try} block. An efficient implementation of exception handling has to take care of managing {\tt try} blocks and to check for implicit exceptions efficiently . \subsection{Known implementation techniques} Three standard methods exist for implementing exception handling: \begin{itemize} \item dynamically create a linked list of {\tt try} block data structures, \item use static {\tt try} block tables and search these tables at run time (suggested for JavaVM interpreters), \item use functions with two return values. \end{itemize} The first method has been used in portable implementations of exception handling for C++ \cite{Cameron+92} or Ada \cite{Giering+94} using {\tt setjmp} and {\tt longjmp}. A linked exception handling data structure is created when entering a {\tt try} block and the structure is discarded when leaving the protected block. Java requires precise exceptions. It means that all expressions evaluated before the exception raising instruction must have finished and all expressions after the raising instruction must not have been started. Therefore, in practice, some instructions may not be moved freely. In the case of subroutine calls, the callee-saved registers must be restored to their original value. The data structure can be used to store such information. The disadvantage of this method is that creating and discarding of the data structure takes some time even if an exception is never raised. The second method has been suggested for an efficient exception handling implementation of C++ \cite{KoenigStroustrup90} and is used in Java implementations. For every method, the JavaVM maintains an exception table. This exception table contains the program counter of the start and the end of the {\tt try} block, the program counter of the exception handler and the type of the exception. A JavaVM interpreter can easily interpret this structure and dispatch to the corresponding handler code. If the byte code is translated to native code, the equivalent technique is more complicated. To simplify restoration of the registers, the old CACAO implementation used a different scheme \cite{KrGr97b}. A method has two return values: the real return value and an exception value stored in a register. After each method call, the exception register is checked and, if it is non-zero, the exception handling code is executed. Since an exception is rarely raised, the branch is easy to predict and cheap. Entering and leaving a {\tt try} block have no associated costs. At compile time, the dispatch information contained in the exception table is translated into method dispatching native code. Run time checks for null pointers and array bounds are quite frequent, but can be eliminated in many cases. It is often possible to move a loop invariant null pointer check before the loop or to eliminate a bound check. Some more sophisticated compilers use these code motion techniques. \subsection{Motivation for a change} \begin{table*} \begin{center} \begin{tabular}[b]{|l|c|c|c|c|c|} \hline & JavaLex & javac & espresso & Toba & java\_cup \\ \hline null pointer checks & 6859 & 8197 & 11114 & 5825 & 7406 \\ \hline method calls & 3226 & 7498 & 7515 & 4401 & 5310 \\ \hline try blocks & 20 & 113 & 44 & 28 & 27 \\ \hline \end{tabular} \caption{Number of pointer checks, method invocations and try blocks} \label{MethodTry} \end{center} \end{table*} The old CACAO implementation was simple, but it only makes sense if the number of try blocks is high. We made an empirical study to count the numbers of static occurrences of method invocations and of {\tt try} blocks in some applications (see table \ref{MethodTry}). The number of method invocations is two magnitudes bigger than the number of {\tt try} blocks. Furthermore, an exception is rarely raised during program execution. This led us to a new implementation of exception handling. \subsection{The new exception handling scheme} The new exception handling scheme is similar to that in a JavaVM interpreter. If an exception occurs, information in the exception table is interpreted. However native code complicates the matter. The pointers to Java byte code must be replaced by pointers to native code. It requires that, during native code generation, the order of basic blocks not be allowed to change. If basic blocks are eliminated because of dead code, the information about a block can not be discarded if there is a pointer to it in the exception table. A CACAO stack frame only contains copies of saved or spilled registers. There is no saved frame pointer. The size of a stack frame is only contained in the instructions which allocate and deallocate the stack. Therefore, to support exception handling, additional information has to be stored elsewhere. The code for a method needs access to constants (mostly address constants). Since a global constant table would be too large for short address ranges and, because methods are compiled on demand, every method has its own constant area which is allocated directly before the start of the method code (see fig.\ \ref{methodlayout}). A register is reserved which contains the method pointer. Constants are addressed relative to the method pointer. \begin{figure}[htb] \begin{center} \setlength{\unitlength}{1mm} \begin{picture}(50,18) \put(24,6){\vector(-1,0){6}} \put(24,3){\makebox(25,6){\it method pointer}} \put(0,0){\makebox(18, 6){\it constants}} \put(0,6){\makebox(18,12){\it code}} \thicklines \put(0,6){\line(1,0){18}} \put(0,0){\framebox(18,18){}} \end{picture} \caption{CACAO method layout} \label{methodlayout} \end{center} \end{figure} During a method call, the method pointer of the calling method is destroyed. However the return address is stored in a register which is preserved during execution of the called method and has to be used for returning from the method. After a method return, the method pointer of the calling method is recomputed using the return address. The following code for a method call demonstrates the method calling convention: \begin{verbatim} LDQ cp,(obj) ; load class pointer LDQ mp,met(cp) ; load method pointer JSR ra,(mp) ; call method LDA mp=ra+offset ; recompute method pointer \end{verbatim} At the beginning of the constant area, there are fields which contain the necessary information for register restoration: \begin{mylist}{{\tt framesize}} \item[{\tt framesize}] the size of the stack frame \item[{\tt isleaf}] a flag which is true if the method is a leaf method \item[{\tt intsave}] number of saved integer registers \item[{\tt floatsave}] number of saved floating point registers \item[{\tt extable}] the exception table -- similar to the JavaVM table \end{mylist} The exception handler first checks if the current executing method has an associated handler and may dispatch to this handler. If there is no handler, it unwinds the stack and searches the parent methods for a handler. The information in the constant area is used to restore the registers and update the stack pointer. The return address and the offset in the immediately following {\tt LDA} instruction is used to recompute the method pointer. \begin{table*} \begin{center} \begin{tabular}[b]{|l|c|c|c|c|c|} \hline & JavaLex & javac & espresso & Toba & java\_cup \\ \hline CACAO old & 61629 & 156907 & 122951 & 67602 & 87489 \\ \hline CACAO new & 37523 & 86346 & 69212 & 41315 & 52386 \\ \hline \end{tabular} \caption{Number of generated native instructions} \label{CodeSize} \end{center} \end{table*} The change to the new scheme allowed us to implement the null pointer check for free. We protect the first 64 Kbyte of memory against read and write access. If an bus error is raised, we catch the signal and check if the faulting address is within the first 64K. If this is the case, we dispatch to our exception handler, otherwise we propagate the signal. This gives faster programs and reduces the work of the compiler in generating pointer checking code. As shown in table \ref{MethodTry}, the numbers of null pointer checks are quite high. \section{Garbage collector}