\chapter{Class Loader} \section{Introduction} A \textit{Java Virtual Machine} (JVM) dynamically loads, links and initializes classes and interfaces when they are needed. Loading a class or interface means locating the binary representation---the class files---and creating a class of interface structure from that binary representation. Linking takes a loaded class or interface and transfers it into the runtime state of the \textit{Java Virtual Machine} so that it can be executed. Initialization of a class or interface means executing the static class of interface initializer \texttt{}. The following sections describe the process of loading, linking and initalizing a class or interface in the CACAO \textit{Java Virtual Machine} in greater detail. Further the used data structures and techniques used in CACAO and the interaction with the GNU classpath are described. \section{System class loader} The class loader of a \textit{Java Virtual Machine} (JVM) is responsible for loading all type of classes and interfaces into the runtime system of the JVM. Every JVM has a \textit{system class loader} which is implemented in \texttt{java.lang.ClassLoader} and this class interacts via native function calls with the JVM itself. \begingroup \tolerance 10000 The \textit{GNU classpath} implements the system class loader in \texttt{gnu.java.lang.SystemClassLoader} which extends \texttt{java.lang.ClassLoader} and interacts with the JVM. The \textit{bootstrap class loader} is implemented in \texttt{java.lang.ClassLoader} plus the JVM depended class \texttt{java.lang.VMClassLoader}. \texttt{java.lang.VMClassLoader} is the main class how the bootstrap class loader of the GNU classpath interacts with the JVM. The main functions of this class is \endgroup \begin{verbatim} static final native Class loadClass(String name, boolean resolve) throws ClassNotFoundException; \end{verbatim} \begingroup \tolerance 10000 This is a native function implemented in the CACAO JVM, which is located in \texttt{nat/VMClassLoader.c} and calls the internal loader functions of CACAO. If the \texttt{name} argument is \texttt{NULL}, a new \texttt{java.lang.NullPointerException} is created and the function returns \texttt{NULL}. \endgroup If the \texttt{name} is non-NULL a new UTF8 string of the class' name is created in the internal \textit{symbol table} via \begin{verbatim} utf *javastring_toutf(java_lang_String *string, bool isclassname); \end{verbatim} This function converts a \texttt{java.lang.String} string into the internal used UTF8 string representation. \texttt{isclassname} tells the function to convert any \texttt{.} (periods) found in the class name into \texttt{/} (slashes), so the class loader can find the specified class. Then a new \texttt{classinfo} structure is created via the \begin{verbatim} classinfo *class_new(utf *classname); \end{verbatim} function call. This function creates a unique representation of this class, identified by its name, in the JVM's internal \textit{class hashtable}. The newly created \texttt{classinfo} structure is initialized with correct values, like \texttt{loaded = false;}, \texttt{linked = false;} and \texttt{initialized = false;}. This guarantees a definite state of a new class. The next step is to actually load the class requested. Thus the main loader function \begin{verbatim} classinfo *class_load(classinfo *c); \end{verbatim} is called, which is a wrapper function to the real loader function \begin{verbatim} classinfo *class_load_intern(classbuffer *cb); \end{verbatim} This wrapper function is required to ensure some requirements: \begin{itemize} \item enter a monitor on the \texttt{classinfo} structure, so that only one thread can load the same class at the same time \item initialize the \texttt{classbuffer} structure with the actual class file data \item remove the \texttt{classinfo} structure from the internal table if we got an exception during loading \item free any allocated memory and leave the monitor \end{itemize} The \texttt{class\_load\_intern} functions preforms the actual loading of the binary representation of the class or interface. During loading some verifier checks are performed which can throw an error. This error can be a \texttt{java.lang.ClassFormatError} or a \texttt{java.lang.NoClassDefFoundError}. Some of these \texttt{java.lang.ClassFormatError} checks are \begin{itemize} \item \textit{Truncated class file} --- unexpected end of class file data \item \textit{Bad magic number} --- class file does not start with the magic bytes (\texttt{0xCAFEBABE}) \item \textit{Unsupported major.minor version} --- the bytecode version of the given class file is not supported by the JVM \end{itemize} The actual loading of the bytes from the binary representation is done via the \texttt{suck\_*} functions. These functions are \begin{itemize} \item \texttt{suck\_u1}: load one \texttt{unsigned byte} (8 bit) \item \texttt{suck\_u2}: load two \texttt{unsigned byte}s (16 bit) \item \texttt{suck\_u4}: load four \texttt{unsigned byte}s (32 bit) \item \texttt{suck\_u8}: load eight \texttt{unsigned byte}s (64 bit) \item \texttt{suck\_float}: load four \texttt{byte}s (32 bit) converted into a \texttt{float} value \item \texttt{suck\_double}: load eight \texttt{byte}s (64 bit) converted into a \texttt{double} value \item \texttt{suck\_nbytes}: load \textit{n} bytes \end{itemize} Loading \texttt{signed} values is done via the \texttt{suck\_s[1,2,4,8]} macros which cast the loaded bytes to \texttt{signed} values. All these functions take a \texttt{classbuffer} (Figure \ref{classbuffer}) structure pointer as argument. \begin{figure}[h] \begin{verbatim} typedef struct classbuffer { classinfo *class; /* pointer to classinfo structure */ u1 *data; /* pointer to byte code */ s4 size; /* size of the byte code */ u1 *pos; /* current read position */ } classbuffer; \end{verbatim} \caption{\texttt{classbuffer} structure} \label{classbuffer} \end{figure} This \texttt{classbuffer} structure is filled with data via the \begin{verbatim} classbuffer *suck_start(classinfo *c); \end{verbatim} function. This function tries to locate the class, specifed with the \texttt{classinfo} structure, in the \texttt{CLASSPATH}. This can be a plain class file in the filesystem or a file in a \texttt{zip}/\texttt{jar} file. If the class file is found, the \texttt{classbuffer} is filled with data collected from the class file, including the class file size and the binary representation of the class. Before reading any byte of the binary representation with a \texttt{suck\_*} function, the remaining bytes in the \texttt{classbuffer} data array must be checked with the \begin{verbatim} static inline bool check_classbuffer_size(classbuffer *cb, s4 len); \end{verbatim} function. If the remaining bytes number is less than the amount of the bytes to be read, specified by the \texttt{len} argument, a \texttt{java.lang.ClassFormatError} with the detail message \textit{Truncated class file}---as mentioned before---is thrown. The following subsections describe chronologically in greater detail the individual loading steps of a class or interface from it's binary representation. \subsection{Constant pool loading} The class' constant pool is loaded via \begin{verbatim} static bool class_loadcpool(classbuffer *cb, classinfo *c); \end{verbatim} from the \texttt{constant\_pool} table in the binary representation of the class of interface. The constant pool needs to be parsed in two passes. In the first pass the information loaded is saved in temporary structures, which are further processed in the second pass, when the complete constant pool has been traversed. Only when the whole constant pool entries have been loaded, any constant pool entry can be completely resolved, but this resolving can only be done in a specific order: \begin{enumerate} \item \texttt{CONSTANT\_Class} \item \texttt{CONSTANT\_String} \item \texttt{CONSTANT\_NameAndType} \item \texttt{CONSTANT\_Fieldref}, \texttt{CONSTANT\_Methodref} and \texttt{CONSTANT\_InterfaceMethodref} --- these are combined into one structure \end{enumerate} \begingroup \tolerance 10000 The remaining constant pool types \texttt{CONSTANT\_Integer}, \texttt{CONSTANT\_Float}, \texttt{CONSTANT\_Long}, \texttt{CONSTANT\_Double} and \texttt{CONSTANT\_Utf8} can be completely resolved in the first pass and need no further processing. \endgroup The temporary structures, shown in Figure \ref{constantpoolstructures}, are used to \textit{forward} the data from the first pass into the second. \begin{figure}[h] \begin{verbatim} /* CONSTANT_Class entries */ typedef struct forward_class { struct forward_class *next; u2 thisindex; u2 name_index; } forward_class; /* CONSTANT_String */ typedef struct forward_string { struct forward_string *next; u2 thisindex; u2 string_index; } forward_string; /* CONSTANT_NameAndType */ typedef struct forward_nameandtype { struct forward_nameandtype *next; u2 thisindex; u2 name_index; u2 sig_index; } forward_nameandtype; /* CONSTANT_Fieldref, CONSTANT_Methodref or CONSTANT_InterfaceMethodref */ typedef struct forward_fieldmethint { struct forward_fieldmethint *next; u2 thisindex; u1 tag; u2 class_index; u2 nameandtype_index; } forward_fieldmethint; \end{verbatim} \caption{temporary constant pool structures} \label{constantpoolstructures} \end{figure} The \texttt{classinfo} structure has two pointers to arrays which contain the class' constant pool infos, namely: \texttt{cptags} and \texttt{cpinfos}. \texttt{cptags} contains the type of the constant pool entry. \texttt{cpinfos} contains a pointer to the constant pool entry itself. In the second pass the references are resolved and the runtime structures are created. In further detail this includes for \begin{itemize} \item \texttt{CONSTANT\_Class}: get the UTF8 name string of the class, store type \texttt{CONSTANT\_Class} in \texttt{cptags}, create a class in the class hashtable with the UTF8 name and store the pointer to the new class in \texttt{cpinfos} \item \texttt{CONSTANT\_String}: get the UTF8 string of the referenced string, store type \texttt{CONSTANT\_String} in \texttt{cptags} and store the UTF8 string pointer into \texttt{cpinfos} \begingroup \tolerance 10000 \item \texttt{CONSTANT\_NameAndType}: create a \texttt{constant\_nameandtype} (Figure \ref{constantnameandtype}) structure, get the UTF8 name and description string of the field or method and store them into the \texttt{constant\_nameandtype} structure, store type \texttt{CONSTANT\_NameAndType} into \texttt{cptags} and store a pointer to the \texttt{constant\_nameandtype} structure into \texttt{cpinfos} \endgroup \begin{figure}[h] \begin{verbatim} typedef struct { /* NameAndType (Field or Method) */ utf *name; /* field/method name */ utf *descriptor; /* field/method type descriptor string */ } constant_nameandtype; \end{verbatim} \caption{\texttt{constant\_nameandtype} structure} \label{constantnameandtype} \end{figure} \begingroup \tolerance 10000 \item \texttt{CONSTANT\_Fieldref}, \texttt{CONSTANT\_Methodref} and \texttt{CONSTANT\_InterfaceMethodref}: create a \texttt{constant\_FMIref} (Figure \ref{constantFMIref}) structure, get the referenced \texttt{constant\_nameandtype} structure which contains the name and descriptor resolved in a previous step and store the name and descriptor into the \texttt{constant\_FMIref} structure, get the pointer of the referenced class, which was created in a previous step, and store the pointer of the class into the \texttt{constant\_FMIref} structure, store the type of the current constant pool entry in \texttt{cptags} and store a pointer to \texttt{constant\_FMIref} in \texttt{cpinfos} \endgroup \begin{figure}[h] \begin{verbatim} typedef struct { /* Fieldref, Methodref and InterfaceMethodref */ classinfo *class; /* class containing this field/method/interface */ utf *name; /* field/method/interface name */ utf *descriptor; /* field/method/interface type descriptor string */ } constant_FMIref; \end{verbatim} \caption{\texttt{constant\_FMIref} structure} \label{constantFMIref} \end{figure} \end{itemize} Any UTF8 strings, \texttt{constant\_nameandtype} structures or referenced classes are resolved with the \begin{verbatim} voidptr class_getconstant(classinfo *c, u4 pos, u4 ctype); \end{verbatim} function. This functions checks for type equality and then returns the requested \texttt{cpinfos} slot of the specified class. \subsection{Interface loading} Interface loading is very simple and straightforward. After reading the number of interfaces, for every interface referenced, a \texttt{u2} constant pool index is read from the currently loading class or interface. This index is used to resolve the class via the \texttt{class\_getconstant} function from the class' constant pool. This means, interface \textit{loading} is more interface \textit{resolving} than loading. \subsection{Field loading} The number of fields of the class or interface is read as \texttt{u2} value. For each field the function \begin{verbatim} static bool field_load(classbuffer *cb, classinfo *c, fieldinfo *f); \end{verbatim} is called. The \texttt{fieldinfo *} argument is a pointer to a \texttt{fieldinfo} structure allocated by the class loader. The fields' \texttt{name} and \texttt{descriptor} are resolved from the class constant pool via \texttt{class\_getconstant}. If the verifier option is turned on, the fields' \texttt{flags}, \texttt{name} and \texttt{descriptor} are checked for validity and can result in a \texttt{java.lang.ClassFormatError}. Each field can have some attributes. The number of attributes is read as \texttt{u2} value from the binary representation. If the field has the \texttt{ACC\_FINAL} bit set in the flags, the \texttt{ConstantValue} attribute is available. This is the only attribute processed by \texttt{field\_load} and can occur only once, otherwise a \texttt{java.lang.ClassFormatError} is thrown. The \texttt{ConstantValue} entry in the constant pool contains the value for the \texttt{final} field. Depending on the fields' type, the proper constant pool entry is resolved and assigned. \subsection{Method loading} As for the fields, the number of the class or interface methods is read from the binary representation as \texttt{u2} value. For each method the function \begin{verbatim} static bool method_load(classbuffer *cb, classinfo *c, methodinfo *m); \end{verbatim} is called. The beginning of the method loading code is nearly the same as the field loading code. The \texttt{methodinfo *} argument is a pointer to a \texttt{methodinfo} structure allocated by the class loader. The method's \texttt{name} and \texttt{descriptor} are resolved from the class constant pool via \texttt{class\_getconstant}. With the verifier turned on, some method checks are carried out. These include \texttt{flags}, \texttt{name} and \texttt{descriptor} checks and argument count check. The method loading function has to distinguish between a \texttt{native} and a ''normal'' JAVA method. Depending on the \texttt{ACC\_NATIVE} flags, a different stub is created. For a JAVA method, a \textit{compiler stub} is created. The purpose of this stub is to call the CACAO jit compiler with a pointer to the byte code of the JAVA method as argument to compile the method into machine code. During code generation a pointer to this compiler stub routine is used as a temporary method call, if the method is not compiled yet. After the target method is compiled, the new entry point of the method is patched into the generated code and the compiler stub is needless, thus it is freed. If the method is a \texttt{native} method, the loader tries to find the native function. If the function was found, a \textit{native stub} is generated. This stub is responsible to manipulate the method's arguments to be suitable for the \texttt{native} method called. This includes inserting the \textit{JNI environment} pointer as first argument and, if the \texttt{native} method has the \texttt{ACC\_STATIC} flag set, inserting a pointer to the methods class as second argument. If the \texttt{native} method is \texttt{static}, the native stub also checks if the method's class is already initialized. If the method's class is not initialized as the native stub is generated, a \texttt{asm\_check\_clinit} calling code is emitted. Each method can have some attributes. The method loading function processes two of them: \texttt{Code} and \texttt{Exceptions}. The \texttt{Code} attribute is the byte code of the JAVA method itself. If the method is either \texttt{native} or \texttt{abstract}, it must not have a \texttt{Code} attribute, otherwise it must have exactly one \texttt{Code} attribute. Additionally to the byte code, the \texttt{Code} attribute contains the exception table and attributes to \texttt{Code} attribute itself. One exception table entry contains the \texttt{start\_pc}, \texttt{end\_pc} and \texttt{handler\_pc} of the \texttt{try-catch} block, each read as \texttt{u2} value, plus a reference to the class of the \texttt{catch\_type}. Currently there are two attributes of the \texttt{Code} attribute defined in the JVM specification: \texttt{LineNumberTable} and \texttt{LocalVariableTable}. CACAO only processes the \texttt{LineNumberTable} attribute. A \texttt{LineNumberTable} entry contains the \texttt{start\_pc} and the \texttt{line\_number}. Any attributes which are not processed by the CACAO class loading system, are skipped via \begin{verbatim} static bool skipattributebody(classbuffer *cb); \end{verbatim} which skips one attribute or \begin{verbatim} static bool skipattributes(classbuffer *cb, u4 num); \end{verbatim} which skips a specified number \texttt{num} of attributes. If any problem occurs in the method loading function, a \texttt{java.lang.ClassFormatError} with a specific detail message is thrown. \subsection{Attribute loading} Attribute loading is done via the \begin{verbatim} static bool attribute_load(classbuffer *cb, classinfo *c, u4 num); \end{verbatim} function. The currently loading class or interface can contain some additional attributes which have not already been loaded. The CACAO system class loader processes two of them: \texttt{InnerClasses} and \texttt{SourceFile}. The \texttt{InnerClass} attribute is a \textit{variable-length} attribute in the \texttt{attributes} table of the binary representation of the class or interface. A \texttt{InnerClass} entry contains the \texttt{inner\_class} constant pool index itself, the \texttt{outer\_class} index, the \texttt{name} index of the inner class' name and the inner class' flags bitmask. All these values are read in \texttt{u2} chunks. The constant pool indexes are used with the \begin{verbatim} voidptr innerclass_getconstant(classinfo *c, u4 pos, u4 ctype); \end{verbatim} function call to resolve the classes or UTF8 strings. The other attribute, \texttt{SourceFile}, is just one \texttt{u2} constant pool index value to get the reference of the class' \texttt{SourceFile} name. Both attributes must occur only once. Other attributes than these two are skipped with the earlier mentioned \texttt{skipattributebody} function. After the attribute loading is done and no error occured, the \texttt{class\_load\_intern} function returns the \texttt{classinfo} pointer to signal that there was no problem. If \texttt{NULL} is returned, there was an exception. \section{Dynamic class loader} \section{Eager - lazy class loading} \section{Linking} \section{Initialization}