6 A \textit{Java Virtual Machine} (JVM) dynamically loads, links and
7 initializes classes and interfaces when they are needed. Loading a
8 class or interface means locating the binary representation---the
9 class files---and creating a class of interface structure from that
10 binary representation. Linking takes a loaded class or interface and
11 transfers it into the runtime state of the \textit{Java Virtual
12 Machine} so that it can be executed. Initialization of a class or
13 interface means executing the static class of interface initializer
16 The following sections describe the process of loading, linking and
17 initalizing a class or interface in the CACAO \textit{Java Virtual
18 Machine} in greater detail. Further the used data structures and
19 techniques used in CACAO and the interaction with the GNU classpath
23 \section{System class loader}
25 The class loader of a \textit{Java Virtual Machine} (JVM) is
26 responsible for loading all type of classes and interfaces into the
27 runtime system of the JVM. Every JVM has a \textit{system class
28 loader} which is implemented in \texttt{java.lang.ClassLoader} and
29 this class interacts via native function calls with the JVM itself.
33 The \textit{GNU classpath} implements the system class loader in
34 \texttt{gnu.java.lang.SystemClassLoader} which extends
35 \texttt{java.lang.ClassLoader} and interacts with the JVM. The
36 \textit{bootstrap class loader} is implemented in
37 \texttt{java.lang.ClassLoader} plus the JVM depended class
38 \texttt{java.lang.VMClassLoader}. \texttt{java.lang.VMClassLoader} is
39 the main class how the bootstrap class loader of the GNU classpath
40 interacts with the JVM. The main functions of this class is
45 static final native Class loadClass(String name, boolean resolve)
46 throws ClassNotFoundException;
51 This is a native function implemented in the CACAO JVM, which is
52 located in \texttt{nat/VMClassLoader.c} and calls the internal loader
53 functions of CACAO. If the \texttt{name} argument is \texttt{NULL}, a
54 new \texttt{java.lang.NullPointerException} is created and the
55 function returns \texttt{NULL}.
59 If the \texttt{name} is non-NULL a new UTF8 string of the class' name
60 is created in the internal \textit{symbol table} via
63 utf *javastring_toutf(java_lang_String *string, bool isclassname);
66 This function converts a \texttt{java.lang.String} string into the
67 internal used UTF8 string representation. \texttt{isclassname} tells
68 the function to convert any \texttt{.} (periods) found in the class
69 name into \texttt{/} (slashes), so the class loader can find the
72 Then a new \texttt{classinfo} structure is created via the
75 classinfo *class_new(utf *classname);
78 function call. This function creates a unique representation of this
79 class, identified by its name, in the JVM's internal \textit{class
80 hashtable}. The newly created \texttt{classinfo} structure is
81 initialized with correct values, like \texttt{loaded = false;},
82 \texttt{linked = false;} and \texttt{initialized = false;}. This
83 guarantees a definite state of a new class.
85 The next step is to actually load the class requested. Thus the main
89 classinfo *class_load(classinfo *c);
92 is called, which is a wrapper function to the real loader function
95 classinfo *class_load_intern(classbuffer *cb);
98 This wrapper function is required to ensure some requirements:
101 \item enter a monitor on the \texttt{classinfo} structure, so that
102 only one thread can load the same class at the same time
104 \item initialize the \texttt{classbuffer} structure with the actual
107 \item remove the \texttt{classinfo} structure from the internal table
108 if we got an exception during loading
110 \item free any allocated memory and leave the monitor
113 The \texttt{class\_load\_intern} functions preforms the actual loading
114 of the binary representation of the class or interface. During loading
115 some verifier checks are performed which can throw an error. This
116 error can be a \texttt{java.lang.ClassFormatError} or a
117 \texttt{java.lang.NoClassDefFoundError}. Some of these
118 \texttt{java.lang.ClassFormatError} checks are
121 \item \textit{Truncated class file} --- unexpected end of class file
124 \item \textit{Bad magic number} --- class file does not start with
125 the magic bytes (\texttt{0xCAFEBABE})
127 \item \textit{Unsupported major.minor version} --- the bytecode
128 version of the given class file is not supported by the JVM
131 The actual loading of the bytes from the binary representation is done
132 via the \texttt{suck\_*} functions. These functions are
135 \item \texttt{suck\_u1}: load one \texttt{unsigned byte} (8 bit)
137 \item \texttt{suck\_u2}: load two \texttt{unsigned byte}s (16 bit)
139 \item \texttt{suck\_u4}: load four \texttt{unsigned byte}s (32 bit)
141 \item \texttt{suck\_u8}: load eight \texttt{unsigned byte}s (64 bit)
143 \item \texttt{suck\_float}: load four \texttt{byte}s (32 bit)
144 converted into a \texttt{float} value
146 \item \texttt{suck\_double}: load eight \texttt{byte}s (64 bit)
147 converted into a \texttt{double} value
149 \item \texttt{suck\_nbytes}: load \textit{n} bytes
152 Loading \texttt{signed} values is done via the
153 \texttt{suck\_s[1,2,4,8]} macros which cast the loaded bytes to
154 \texttt{signed} values. All these functions take a
155 \texttt{classbuffer} (Figure \ref{classbuffer}) structure pointer as
160 typedef struct classbuffer {
161 classinfo *class; /* pointer to classinfo structure */
162 u1 *data; /* pointer to byte code */
163 s4 size; /* size of the byte code */
164 u1 *pos; /* current read position */
167 \caption{\texttt{classbuffer} structure}
171 This \texttt{classbuffer} structure is filled with data via the
174 classbuffer *suck_start(classinfo *c);
177 function. This function tries to locate the class, specifed with the
178 \texttt{classinfo} structure, in the \texttt{CLASSPATH}. This can be
179 a plain class file in the filesystem or a file in a
180 \texttt{zip}/\texttt{jar} file. If the class file is found, the
181 \texttt{classbuffer} is filled with data collected from the class
182 file, including the class file size and the binary representation of
185 Before reading any byte of the binary representation with a
186 \texttt{suck\_*} function, the remaining bytes in the
187 \texttt{classbuffer} data array must be checked with the
190 static inline bool check_classbuffer_size(classbuffer *cb, s4 len);
193 function. If the remaining bytes number is less than the amount of the
194 bytes to be read, specified by the \texttt{len} argument, a
195 \texttt{java.lang.ClassFormatError} with the detail message
196 \textit{Truncated class file}---as mentioned before---is thrown.
198 The following subsections describe chronologically in greater detail
199 the individual loading steps of a class or interface from it's binary
203 \subsection{Constant pool loading}
205 The class' constant pool is loaded via
208 static bool class_loadcpool(classbuffer *cb, classinfo *c);
211 from the \texttt{constant\_pool} table in the binary representation of
212 the class of interface. The constant pool needs to be parsed in two
213 passes. In the first pass the information loaded is saved in temporary
214 structures, which are further processed in the second pass, when the
215 complete constant pool has been traversed. Only when the whole
216 constant pool entries have been loaded, any constant pool entry can be
217 completely resolved, but this resolving can only be done in a specific
221 \item \texttt{CONSTANT\_Class}
223 \item \texttt{CONSTANT\_String}
225 \item \texttt{CONSTANT\_NameAndType}
227 \item \texttt{CONSTANT\_Fieldref}, \texttt{CONSTANT\_Methodref} and
228 \texttt{CONSTANT\_InterfaceMethodref} --- these are combined into one
234 The remaining constant pool types \texttt{CONSTANT\_Integer},
235 \texttt{CONSTANT\_Float}, \texttt{CONSTANT\_Long},
236 \texttt{CONSTANT\_Double} and \texttt{CONSTANT\_Utf8} can be
237 completely resolved in the first pass and need no further processing.
241 The temporary structures, shown in Figure
242 \ref{constantpoolstructures}, are used to \textit{forward} the data
243 from the first pass into the second.
247 /* CONSTANT_Class entries */
248 typedef struct forward_class {
249 struct forward_class *next;
254 /* CONSTANT_String */
255 typedef struct forward_string {
256 struct forward_string *next;
261 /* CONSTANT_NameAndType */
262 typedef struct forward_nameandtype {
263 struct forward_nameandtype *next;
267 } forward_nameandtype;
269 /* CONSTANT_Fieldref, CONSTANT_Methodref or CONSTANT_InterfaceMethodref */
270 typedef struct forward_fieldmethint {
271 struct forward_fieldmethint *next;
275 u2 nameandtype_index;
276 } forward_fieldmethint;
278 \caption{temporary constant pool structures}
279 \label{constantpoolstructures}
282 The \texttt{classinfo} structure has two pointers to arrays which
283 contain the class' constant pool infos, namely: \texttt{cptags} and
284 \texttt{cpinfos}. \texttt{cptags} contains the type of the constant
285 pool entry. \texttt{cpinfos} contains a pointer to the constant pool
286 entry itself. In the second pass the references are resolved and the
287 runtime structures are created. In further detail this includes for
290 \item \texttt{CONSTANT\_Class}: get the UTF8 name string of the
291 class, store type \texttt{CONSTANT\_Class} in \texttt{cptags}, create
292 a class in the class hashtable with the UTF8 name and store the
293 pointer to the new class in \texttt{cpinfos}
295 \item \texttt{CONSTANT\_String}: get the UTF8 string of the
296 referenced string, store type \texttt{CONSTANT\_String} in
297 \texttt{cptags} and store the UTF8 string pointer into
302 \item \texttt{CONSTANT\_NameAndType}: create a
303 \texttt{constant\_nameandtype} (Figure \ref{constantnameandtype})
304 structure, get the UTF8 name and description string of the field or
305 method and store them into the \texttt{constant\_nameandtype}
306 structure, store type \texttt{CONSTANT\_NameAndType} into
307 \texttt{cptags} and store a pointer to the
308 \texttt{constant\_nameandtype} structure into \texttt{cpinfos}
314 typedef struct { /* NameAndType (Field or Method) */
315 utf *name; /* field/method name */
316 utf *descriptor; /* field/method type descriptor string */
317 } constant_nameandtype;
319 \caption{\texttt{constant\_nameandtype} structure}
320 \label{constantnameandtype}
325 \item \texttt{CONSTANT\_Fieldref}, \texttt{CONSTANT\_Methodref} and
326 \texttt{CONSTANT\_InterfaceMethodref}: create a
327 \texttt{constant\_FMIref} (Figure \ref{constantFMIref}) structure,
328 get the referenced \texttt{constant\_nameandtype} structure which
329 contains the name and descriptor resolved in a previous step and
330 store the name and descriptor into the \texttt{constant\_FMIref}
331 structure, get the pointer of the referenced class, which was created
332 in a previous step, and store the pointer of the class into the
333 \texttt{constant\_FMIref} structure, store the type of the current
334 constant pool entry in \texttt{cptags} and store a pointer to
335 \texttt{constant\_FMIref} in \texttt{cpinfos}
341 typedef struct { /* Fieldref, Methodref and InterfaceMethodref */
342 classinfo *class; /* class containing this field/method/interface */
343 utf *name; /* field/method/interface name */
344 utf *descriptor; /* field/method/interface type descriptor string */
347 \caption{\texttt{constant\_FMIref} structure}
348 \label{constantFMIref}
353 Any UTF8 strings, \texttt{constant\_nameandtype} structures or
354 referenced classes are resolved with the
357 voidptr class_getconstant(classinfo *c, u4 pos, u4 ctype);
360 function. This functions checks for type equality and then returns the
361 requested \texttt{cpinfos} slot of the specified class.
364 \subsection{Interface loading}
366 Interface loading is very simple and straightforward. After reading
367 the number of interfaces, for every interface referenced, a
368 \texttt{u2} constant pool index is read from the currently loading
369 class or interface. This index is used to resolve the class via the
370 \texttt{class\_getconstant} function from the class' constant
371 pool. This means, interface \textit{loading} is more interface
372 \textit{resolving} than loading.
375 \subsection{Field loading}
377 The number of fields of the class or interface is read as \texttt{u2}
378 value. For each field the function
381 static bool field_load(classbuffer *cb, classinfo *c, fieldinfo *f);
384 is called. The \texttt{fieldinfo *} argument is a pointer to a
385 \texttt{fieldinfo} structure allocated by the class loader. The
386 fields' \texttt{name} and \texttt{descriptor} are resolved from the
387 class constant pool via \texttt{class\_getconstant}. If the verifier
388 option is turned on, the fields' \texttt{flags}, \texttt{name} and
389 \texttt{descriptor} are checked for validity and can result in a
390 \texttt{java.lang.ClassFormatError}.
392 Each field can have some attributes. The number of attributes is read
393 as \texttt{u2} value from the binary representation. If the field has
394 the \texttt{ACC\_FINAL} bit set in the flags, the
395 \texttt{ConstantValue} attribute is available. This is the only
396 attribute processed by \texttt{field\_load} and can occur only once,
397 otherwise a \texttt{java.lang.ClassFormatError} is thrown. The
398 \texttt{ConstantValue} entry in the constant pool contains the value
399 for the \texttt{final} field. Depending on the fields' type, the
400 proper constant pool entry is resolved and assigned.
403 \subsection{Method loading}
405 As for the fields, the number of the class or interface methods is read from
406 the binary representation as \texttt{u2} value. For each method the function
409 static bool method_load(classbuffer *cb, classinfo *c, methodinfo *m);
412 is called. The beginning of the method loading code is nearly the same
413 as the field loading code. The \texttt{methodinfo *} argument is a
414 pointer to a \texttt{methodinfo} structure allocated by the class
415 loader. The method's \texttt{name} and \texttt{descriptor} are
416 resolved from the class constant pool via
417 \texttt{class\_getconstant}. With the verifier turned on, some method
418 checks are carried out. These include \texttt{flags}, \texttt{name}
419 and \texttt{descriptor} checks and argument count check.
421 The method loading function has to distinguish between a
422 \texttt{native} and a ''normal'' JAVA method. Depending on the
423 \texttt{ACC\_NATIVE} flags, a different stub is created.
425 For a JAVA method, a \textit{compiler stub} is created. The purpose of
426 this stub is to call the CACAO jit compiler with a pointer to the byte
427 code of the JAVA method as argument to compile the method into machine
428 code. During code generation a pointer to this compiler stub routine
429 is used as a temporary method call, if the method is not compiled
430 yet. After the target method is compiled, the new entry point of the
431 method is patched into the generated code and the compiler stub is
432 needless, thus it is freed.
434 If the method is a \texttt{native} method, the loader tries to find
435 the native function. If the function was found, a \textit{native stub}
436 is generated. This stub is responsible to manipulate the method's
437 arguments to be suitable for the \texttt{native} method called. This
438 includes inserting the \textit{JNI environment} pointer as first
439 argument and, if the \texttt{native} method has the
440 \texttt{ACC\_STATIC} flag set, inserting a pointer to the methods
441 class as second argument. If the \texttt{native} method is
442 \texttt{static}, the native stub also checks if the method's class is
443 already initialized. If the method's class is not initialized as the
444 native stub is generated, a \texttt{asm\_check\_clinit} calling code
447 Each method can have some attributes. The method loading function
448 processes two of them: \texttt{Code} and \texttt{Exceptions}. The
449 \texttt{Code} attribute is the byte code of the JAVA method itself. If
450 the method is either \texttt{native} or \texttt{abstract}, it must not
451 have a \texttt{Code} attribute, otherwise it must have exactly one
452 \texttt{Code} attribute. Additionally to the byte code, the
453 \texttt{Code} attribute contains the exception table and attributes to
454 \texttt{Code} attribute itself. One exception table entry contains
455 the \texttt{start\_pc}, \texttt{end\_pc} and \texttt{handler\_pc} of
456 the \texttt{try-catch} block, each read as \texttt{u2} value, plus a
457 reference to the class of the \texttt{catch\_type}. Currently there
458 are two attributes of the \texttt{Code} attribute defined in the JVM
459 specification: \texttt{LineNumberTable} and
460 \texttt{LocalVariableTable}. CACAO only processes the
461 \texttt{LineNumberTable} attribute. A \texttt{LineNumberTable} entry
462 contains the \texttt{start\_pc} and the \texttt{line\_number}. Any
463 attributes which are not processed by the CACAO class loading system,
467 static bool skipattributebody(classbuffer *cb);
470 which skips one attribute or
473 static bool skipattributes(classbuffer *cb, u4 num);
476 which skips a specified number \texttt{num} of attributes. If any
477 problem occurs in the method loading function, a
478 \texttt{java.lang.ClassFormatError} with a specific detail message is
482 \subsection{Attribute loading}
484 Attribute loading is done via the
487 static bool attribute_load(classbuffer *cb, classinfo *c, u4 num);
490 function. The currently loading class or interface can contain some
491 additional attributes which have not already been loaded. The CACAO
492 system class loader processes two of them: \texttt{InnerClasses} and
495 The \texttt{InnerClass} attribute is a \textit{variable-length}
496 attribute in the \texttt{attributes} table of the binary
497 representation of the class or interface. A \texttt{InnerClass} entry
498 contains the \texttt{inner\_class} constant pool index itself, the
499 \texttt{outer\_class} index, the \texttt{name} index of the inner
500 class' name and the inner class' flags bitmask. All these values are
501 read in \texttt{u2} chunks. The constant pool indexes are used with
505 voidptr innerclass_getconstant(classinfo *c, u4 pos, u4 ctype);
508 function call to resolve the classes or UTF8 strings.
510 The other attribute, \texttt{SourceFile}, is just one \texttt{u2}
511 constant pool index value to get the reference of the class'
512 \texttt{SourceFile} name.
514 Both attributes must occur only once. Other attributes than these two
515 are skipped with the earlier mentioned \texttt{skipattributebody}
518 After the attribute loading is done and no error occured, the
519 \texttt{class\_load\_intern} function returns the \texttt{classinfo}
520 pointer to signal that there was no problem. If \texttt{NULL} is
521 returned, there was an exception.
524 \section{Dynamic class loader}
526 \section{Eager - lazy class loading}
530 \section{Initialization}