\documentclass[twocolumn,a4paper]{article} % -*- latex -*- \author{ Mark Probst\\ {\tt http://www.unix.cslab.tuwien.ac.at/\char'176schani/} } \title{Threads in Cacao} \newcommand{\globvarslabel}[1]{\mbox{\texttt{#1}}\hfil} \newenvironment{globvars} {\begin{list}{}% {\renewcommand{\makelabel}{\globvarslabel}% }% }% {\end{list}} \newcommand{\structdesclabel}[1]{\mbox{\texttt{#1}}\hfil} \newenvironment{structdesc} {\begin{list}{}% {\renewcommand{\makelabel}{\structdesclabel}% }% }% {\end{list}} \begin{document} \maketitle \section{Overview} \label{sec:overview} To fulfill its role as a Java runtime environment, Cacao has to provide support for threads. A thread in Java is what is generally understood as a 'thread' in computer science: a context of execution within a process that shares resources (virtual memory, file descriptors, \dots ) with all other threads in the same process. The Java Virtual Machine Specification does not specify how threads should be implemented, so we had two alternatives to choose from: \begin{enumerate} \item Implementing Java threads using the corresponding thread implementation of the underlying operating system. \item Using just one operating system thread and implementing Java threads by doing everything 'by hand'. \end{enumerate} A third alternative, namely using processes to emulate threads, is not viable, because the sharing of all resources between processes (at least UNIX processes) is almost impossible. Both alternatives have their own advantages and drawbacks. Using operating system threads avoids implementing all the gory details of threads and gives scheduling control to the operating system, meaning that it is not only preemptive but also multi-processor-aware. Rolling it our own, on the other hand, does not waste operating system resources (imagine a Java application that spawns over a hundred threads!) and gives us much more control over thread-switching, which makes it a lot easier to avoid synchronization problems. Furthermore, it makes it possible to port Cacao to platforms which do not support threads at the operating system level. Since there already was a user-level thread implementation for a Java virtual machine available as free software, we chose to build upon it, thereby avoiding all the nasty coding details, without giving up the advantages of user-level threads. We did, however, leave open the possibility of replacing the user-level threads with an operating system implementation. As outlined in the previous paragraph, the threading code of Cacao is based on the code of Tim Wilkinson's Kaffe (version 0.7), which has been released under a BSD-style license. We had to make a few adjustments to the code, mainly making it '64-bit-aware'. \section{Main Parts of the System} \label{sec:mainparts} The bulk of the threading system is located in the \texttt{threads/} directory of the Cacao distribution. \texttt{threads/thread.c} contains the thread management code (starting, suspending, killing, \dots ), \texttt{threads/locks.c} contains the implementation of synchronization primitives, which are discussed in more detail in section~\ref{sec:synchronization}. \texttt{threads/threadio.c} implements thread-aware I/O, which is detailed in section~\ref{sec:io}. The platform dependent parts of the system (which are pretty few) are to be found in the files \texttt{sysdep/threads.h} and \texttt{sysdep/asmpart.c}. They are discussed in section~\ref{sec:threadswitching}. There are a few other locations where changes had to be made for threading support which are described in sections~\ref{sec:criticalsections} and~\ref{sec:gc}. \section{Data Structures} \label{sec:datastructures} The \texttt{java.lang.Thread} class, as defined in Sunsoft's Java Runtime Library, provides a private instance variable \texttt{PrivateInfo} of pointer type, which is reserved for the purposes of the runtime system. Cacao uses the integer value of \texttt{PrivateInfo} as an index into the array \texttt{contexts[]} (see section~\ref{sec:globalvars}), which contains context information (most notably priority and stack information) of all live threads: \begin{verbatim} typedef struct _ctx { bool free; u1 status; u1 priority; u1* restorePoint; u1* stackMem; u1* stackBase; u1* stackEnd; u1* usedStackTop; s8 time; java_objectheader *exceptionptr; struct _thread *nextlive; u1 flags; } ctx; \end{verbatim} \begin{structdesc} \item[free] Is \texttt{true} if this \texttt{ctx} is associated with a thread. \item[status] One of \texttt{THREAD\_SUSPENDED}, \texttt{THREAD\_RUNNING} and \texttt{THREAD\_DEAD}. \item[priority] The priority of the thread. Must be in the range \texttt{MIN\_PRIORITY} (1) to \texttt{MAX\_PRIORITY} (10). \item[restorePoint] Top of the stack of the corresponding thread. \item[stackMem] Begin of the region of memory allocated for the stack, including the guarding pages. \item[stackBase] Begin of the region of memory that can actually be used for the stack (i.e. the memory right after the guarding pages). \item[stackEnd] Pointer to the first byte of memory after the region allocated for the stack. \item[usedStackTop] Pointer to the actual top of the stack. Only valid if the corresponding thread is not running. \item[time] Not used yet. \item[exceptionptr] Saves the \texttt{exceptionptr} of the corresponding thread if it is not running. \item[nextlive] Maintains a link to the next thread in the \texttt{livethreads} list (see section~\ref{sec:globalvars}). \item[flags] Bitwise combination of several flags. If \texttt{THREAD\_FLAGS\_NOSTACKALLOC} is set, then the thread stack should not be dealloced upon thread death. \texttt{THREAD\_FLAGS\_USER\_SUSPEND} is set after the thread has been suspended by the user (i.e. the thread was not suspended by a blocking operation). \texttt{THREAD\_FLAGS\_KILLED} seems not to be used. \end{structdesc} \section{Scheduling} \label{sec:scheduling} Thread scheduling is performed by the routine \texttt{reschedule()}. It checks the lists \texttt{threadQhead[]}, beginning with the highest priority. If a thread is found, it is immediately switched to. This means that higher-priority threads always take precedence over lower-priority ones. A low-priority thread will only run if all higher-priority ones are suspended (e.g. through blocking). \section{Thread-Switching} \label{sec:threadswitching} Thread-switching is the only platform-dependent part of the Cacao threading system. Currently, only an Alpha implementation is available. The thread-switch is a function (written in assembly language), which does nothing more than to push all callee-saved registers to the stack, switch to the stack of the new thread, pop all callee-saved registers (from the switched-to stack) and return. The only problem is the setting up of a new thread stack, which needs to contain values for the callee-saved registers and the address of the thread-entry routine (\texttt{firstStartThread()}) as the return address of the function call. See the files \texttt{alpha/threads.h} and \texttt{alpha/asmpart.c} for how this is managed in the Alpha implementation. \section{Critical Sections} \label{sec:criticalsections} Although the threading system is in its current implementation not preemptive, this is planned. It will be utilizing alarm timer signals to preempt the running thread at certain points in time. In order to write reliable code in a preemptive environment, it is necessary to make it possible to prohibit preemption during certain critical sections. For this purpose two macros (\texttt{intsDisable()} and \texttt{intsRestore()}) are defined that can be used to enter and to exit critical sections which are guaranteed not to be preempted. Critical sections can be nested. Apart from the code for mutexes and conditions (see section~\ref{sec:synchronization}) the whole just-in-time compiler is not to be preempted since it is not only not reentrant but does also alter tables that may crash other threads if not in a consistent state. This is one of the main problems that must be solved when attempting to use operating-system threads in Cacao. The other critical part outside of the threading system is the memory management. Allocation on the heap is regarded as a critical section as well as garbage collection which is further described in section~\ref{sec:gc}. \section{Thread Stacks} \label{sec:threadstacks} UNIX provides for each process one stack that dynamically grows when its top is reached. Since there can be more than one thread running in a Cacao process, there must be a way of providing stacks for the other threads. Cacao does this by simply allocating a region of memory and using it as the stack for a thread. This stack has, unfortunately, not the capability of growing dynamically. Worse yet, since it is allocated on the heap, there is not even a way of checking the stack for overflow (except from checking upon each entry to a subroutine, which is too much a penalty). The solution to the latter problem is the allocation of additional guarding pages above the top of the stack (i.e. before the stack). These pages are then (by virtue of the \texttt{mprotect(2)} system call) made inaccessible to the process, resulting in a \texttt{SEGV} signal when the stack has over-flown. The size of the thread stacks can be altered with the \texttt{-tss} command line option. The default size is 32k. The number of guarding pages is by default 1 and can be changed with the \texttt{-ngp} command line option (not yet implemented). \section{Mutexes and Conditions} \label{sec:synchronization} In Java, each and every object has a mutex (used for the \texttt{synchronized} directive) and a condition variable (used by the \texttt{wait()}, \texttt{notify()} and \texttt{notifyAll()} methods) attached. Since we did not want to include these two data structures in each object, as they take up rather a lot of space on a 64-bit architecture, we decided to create them on demand and keep them in a hash table that is indexed by the address of the object the mutex/condition belongs to. This way, we are able to allocate these structures on-demand: an object's mutex is allocated not upon object creation but when the mutex is locked. When the mutex is unlocked, it is freed again. For detailed information on this topic, see the paper `Monitors and Exceptions: How to efficiently implement Java'. Following are the structures of mutexes and conditions: \begin{verbatim} typedef struct _iMux { struct _thread* holder; int count; struct _thread* muxWaiters; } iMux; \end{verbatim} \begin{structdesc} \item[holder] Points to the thread that has currently locked the mutex. If the mutex is not locked, \texttt{holder} is \texttt{0}. \item[count] How often the mutex is locked by the current thread. It is incremented each time the mutex is locked and decremented when the mutex is unlocked. \texttt{0} if the mutex is unlocked. \item[muxWaiters] List of threads that are waiting on the mutex to become unlocked. \end{structdesc} \begin{verbatim} typedef struct _iCv { struct _thread* cvWaiters; struct _iMux* mux; } iCv; \end{verbatim} \begin{structdesc} \item[cvWaiters] List of threads waiting on the condition to be signalled. \item[mux] Mutex which was used to wait on the condition. \end{structdesc} \section{I/O} \label{sec:io} UNIX I/O is by default blocking, which is suboptimal for user-level threads, since a blocking I/O call would not only block its own thread but the whole process. For this purpose Cacao makes all file descriptors use non-blocking I/O and, whenever a thread wants to make an I/O call that would block, suspends this thread and does not resume it until the call would not require blocking. In order to check that condition, Cacao does a \texttt{select()} with all pending file descriptors (in function \texttt{checkEvents()}) whenever a thread switch is requested and resumes all threads that can continue. \section{Garbage Collection} \label{sec:gc} Garbage collection requires special attention in a threaded environment and more so in Cacao. Cacao uses a recursive mark-and-sweep garbage collector, which means that it is not reentrant and may use up a lot of stack. This means not only that it has to run without been preempted but also that it needs to run in a thread with a lot of stack available, which is not the case for a standard thread. The best solution is obviously to run it on the stack of the main thread which grows automatically. This is exactly what Cacao does. Since the stack tops of all threads are known at any time, it is a trivial task to switch over to another stack. This is handled by the system dependent function \texttt{asm\_switchstackandcall} (defined in \texttt{sysdep/asmpart.c}). Another problem that needs to be taken care of is that all objects that are referenced by any live thread need to be marked. Furthermore, \texttt{Thread} objects that correspond to live threads must also be marked, even if they are not referenced by any thread (which is possible). To serve both purposes the list \texttt{liveThreads} contains all live threads. \section{Global Variables} \label{sec:globalvars} \begin{globvars} \item[int blockInts] Indicates whether execution is within a critical section. \texttt{0} if not, otherwise \texttt{>0}, depending on the nesting level. \item[bool needReschedule] Is \texttt{true} if a thread switch should occur upon exit of the currently executed critical section, otherwise \texttt{false}. \item[thread *currentThread] The currently executed thread. \item[ctx contexts[\texttt{]}] Array containing \texttt{MAXTHREADS} thread contexts. Elements with \texttt{free==false} are not occupied. The fact that this array has a fixed size means that there is maximum number of threads. This may change in the future. \item[thread *liveThreads] List of all live threads. The list is linked through the \texttt{nextlive} member of the corresponding contexts of the threads. \item[thread *threadQhead[\texttt{]}] Array of lists of not-suspended live threads. Indexed by thread priority. \end{globvars} \section{To Be Done} \begin{itemize} \item Cross-check threading code with newest release of Kaffe. \item Preemptive multi-threading through alarm-timers. \item Implement thread-table more efficiently (maintain free-list). \item Implement some missing command-line switches (\texttt{getopt} would be cool). \end{itemize} \end{document}