+\documentclass[11pt,a4paper]{article}\r
+\r
+\usepackage{gwa}\r
+\usepackage{graphicx}\r
+\usepackage{url}\r
+\usepackage{ngerman}\r
+\usepackage[latin1]{inputenc}\r
+\usepackage{textcomp}\r
+\usepackage{multicol}\r
+\r
+\begin{document}\r
+\r
+\title{Optimization of Array Bounds Checking in the Context of Cacao Java Virtual Machine}\r
+\r
+\author{\r
+ \authN{Bräutigam Klaus}\\\r
+ \matNr{0825965}\\\r
+ \authI{Institute of Computer Languages}\\\r
+ \authU{Technical University of Vienna}\\\r
+ \email{e0825965@student.tuwien.ac.at}\\\\\r
+ \authN{Bachelor Thesis BSc. in Informatics - Software and Information\r
+ Engineering 033 534 }\\\r
+ \authN{supervised by Univ.Prof. Dr. Andreas Krall}\r
+}\r
+\r
+\maketitle\r
+\r
+\begin{multicols}{2}\r
+\r
+%% Abstract -------------------------------------------------------------------\r
+\r
+\abstract{\r
+This document deals with array bounds check optimizations in a very general and historical\r
+way but also strongly relates to the details of implementation within the \r
+Cacao Java Virtual Machine that was developed by the Institute of Computer Languages\r
+of the Technical University of Vienna. Good and sustainable problem solutions \r
+can only be found if previous ideas are studied to improve own considerations\r
+and even more important if failures that happened in the past are not repeated.\r
+So the first main part is dedicated to a study of previous\r
+approaches concerning array bounds check optimization and their environments \r
+to obtain a solid knowledge base for the general problem. \r
+The second part describes details and relations important\r
+to understand the technique choosen for the Cacao Java VM. So the document should\r
+also be interesting for people only searching for a comprehensive overview of \r
+the abstract array bounds check optimization problem.\r
+\r
+}\r
+\r
+%% Introduction ---------------------------------------------------------------\r
+\r
+\section{Introduction}\r
+Array bounds checking and the optimizations necessary for efficient\r
+realization were exhaustively researched and discussed in the past four\r
+or five decades. The rise of high level computer programming languages led\r
+directly to several improvements of the background and environments of them.\r
+For example array accesses have always been common malfunction sources and\r
+security holes. If an array access index is not within the bounds of the array,\r
+memory space is accessed illegally. This can lead to dramatic problems.\r
+Therefore computer language architects recognized very fast that\r
+these array accesses must be validated before execution to increase the safety and\r
+stability of applications. So in most cases array access indices are compared \r
+to the array bounds before executing them. But this introduces some\r
+kind of overhead during compilation and or execution. Due to the fact that\r
+processed information amount increases much faster than our technological\r
+calculation possibilities any type of overhead must be reduced or completely\r
+removed from applications in any thinkable way. As will be seen in the\r
+following chapters the approaches and techniques for reducing overhead of\r
+array bounds checks can be very different. To completely understand the\r
+problem of array bounds checks optimization (short ABCs) all related aspects \r
+must be taken into account and all topics must also be seen from a more\r
+abstract view.\r
+\r
+%% General aspects of ABCs ----------------------------------------------------\r
+\r
+\section{A Generic Reflection of ABCs\label{sec:structure}}\r
+This chapter provides a solid overview for ABC optimizations. The\r
+most common forms of implementation are explained and the neccesity for\r
+improvements of ABC optimizations are motivated. Furthermore some background\r
+information will be provided to get a feeling for the relations of the topic.\r
+But it must be kept in mind that an exhaustive explanation of the topic is\r
+far out of the documents scope so we will only touch on the general ideas.\r
+\r
+ \subsection{Implementation of Array Bounds Checking}\r
+ Before designing new optmization techniques for ABCs it's important to know\r
+ exactly how these checks are implemented. There are many ways to realize\r
+ ABCs. Elaborateness and performance are important to keep in mind when\r
+ choosing a technique and also the insertion method is interesting because\r
+ it is greatly connected to the language architecture. Should tests\r
+ be integreated automatically without developers knowledge or should\r
+ ABCs only be integrated selectively by programmers or quality assurance.\r
+ \r
+ \subsubsection{Compare Instruction Strategy}\r
+ Most implementations are done using compare instructions of the\r
+ processor and a conditional jump to react to failed comparisons. For this\r
+ technique the array inception and length are obtained and compared to\r
+ the index. For a more detailed description and good examples please\r
+ consult\r
+ \cite{AComparisonOfArrayBoundsCheckingOnSuperscalarAndVLIWArchitectures:2002}.\r
+ For some languages like Java that assure that arrays only start\r
+ with index null only one check is necessary to check lower and \r
+ upper bounds using an unsigned compare instruction. This assures that\r
+ indices are always within the specified array bounds and that exceptions\r
+ or other reactions are done on the right point of program execution.\r
+\r
+ \subsubsection{Hardware Implementation}\r
+ It is also possible to implement ABCs directly and completely in\r
+ hardware. This means some more effort in processor architecture but\r
+ the resulting run-time overhead of ABC execution can be reduced \r
+ practically to 0.0\%. There have been several approaches from Sun\r
+ Microsystems, Inc. to design a new generation of processors that are\r
+ capable of this. \r
+ \cite{ProcessorWithAcceleratedArrayAcessBoundsChecking:2000}\r
+ for example describes a processor that uses two hardware comparators and\r
+ a disjunction to inform the instruction execution unit about the validness\r
+ of the array access instruction.\r
+ \cite{ApparatusAndMethodForArrayBoundsCheckingWithAShadowFile:2001} is an\r
+ even more improved way. In addition it uses a shadow register to further\r
+ boost and integrate ABCs into the general instruction flow of the processor.\r
+\r
+ \subsubsection{Other Approaches}\r
+ Other more exotic approaches of ABC implementation also exist.\r
+ An interesting one can be found in \r
+ \cite{ArrayBoundsCheckEliminationUtilizingAPageProtectionMechanism:2003}\r
+ where the processor page protection mechanism is used to identify ABCs and\r
+ even improve their effectiveness by eliminating unnecessary ones.\r
+ \r
+ \subsection{The Real Overhead Introduced by ABCs}\r
+ Before crying out for costly and time-consuming optimizations for a specific\r
+ mechanism we will analyze the overhead caused by this. Is it\r
+ really worth while studying and researching the topic? This is in many cases a difficult\r
+ question because future trends must also be considered to get a clear result.\r
+ Concerning ABCs it depends on the ABC implementation, the language\r
+ design and architecture and on the application itself.\r
+ \r
+ \subsubsection{Overhead Reduction through Hardware}\r
+ For decades information processing machines were mainly designed as single\r
+ processor systems that didn't really considered modern programming techniques\r
+ and patterns. Nearly all higher-level improvements that were reached in\r
+ computer programming were done in an abstract way using software\r
+ (virtual machines, multiprocessing, null pointer checks, ABCs\ldots). In the past\r
+ there were some approaches that tried to completely integrate ABCs in hardware, but\r
+ these were refused very fast. Nevertheless we can\r
+ view something in connection to ABCs by comparing benchmarks of the same\r
+ application on the same virtual machines running on different processor\r
+ generations. For example take overhead benchmark results from\r
+ \cite{AVerifiableControlFlowAwareConstraintAnalyzerForBoundsCheckElimination:2009}\r
+ and compare them to that of\r
+ \cite{SafeMultiphaseBondsCheckEliminationInJava:2010} and it's easy to\r
+ recognize that the same application on the same virtual machine with the same\r
+ configuration produces a different overhead. In this case the reason can be\r
+ found very fast. The first benchmark was calculated on a single processor\r
+ system where the second uses a multicore processor. This new processor\r
+ generation is able to the reduce the overhead of ABCs without changing the\r
+ technique or implementation of ABCs. So the question if ABC optimization pays\r
+ not only depends on the software but also on the hardware it is running on.\r
+ Newer processor generations will be able to furthermore reduce the overhead\r
+ raised by ABCs implemented as processor instructions but it is very unlikely\r
+ that the approaches of implementing ABCs completely in hardware will be\r
+ realized in the near future.\r
+ \r
+ \subsubsection{Overhead in Different Types of Applications}\r
+ To reduce overhead of ABCs the application where the optimization operates on\r
+ must use arrays frequently otherwise the overhead produced is insignificant\r
+ and not really measureable\cite{SafeMultiphaseBondsCheckEliminationInJava:2010}.\r
+ An important question is if these optimizations really pay for many\r
+ applications. For example Java is mainly used due to it's simple and clear\r
+ object oriented and abstract nature. Most developers use Java only in\r
+ conjunction with powerful frameworks like Spring and many of them only for\r
+ web-based applications. The resulting ABC overhead in these strong object\r
+ oriented applications is practically not measurable even if many arrays\r
+ are used. The overhead is mainly caused by nested method calls (invokes) which\r
+ are also very difficult to reduce. ABC optimizations only really pay on\r
+ applications with simple object relations that make excessive use of arrays\r
+ and matrix calculations like scientific research and development. But for\r
+ these cases it is maybe better and easier to use extremely optimized data\r
+ structures that are recognized by machines and executed in a special way.\r
+ \cite{EliminationOfJavaArrayBoundsChecksInThePresenceOfIndirection:2002}\r
+ mentions some libraries and also explain some new special data structures\r
+ that follow this approach.\r
+ \r
+ \subsubsection{Necessarity for ABC Optimization}\r
+ It's difficult to argue pro/contra to ABC optimization and the approach that\r
+ should be taken to implement it because this strongly depends on the language\r
+ architecture and many other factors discussed here. But in general it should\r
+ be the architects and implementors target to evolve the language to the most\r
+ effective, efficient and clear state. Even if hardware is capable of executing\r
+ ABCs without any resulting overhead it is better to eliminate unnecessary code\r
+ to improve low-level code quality cleanness and clearance. Program code at\r
+ any abstraction layer should not contain any superfluous instructions or code\r
+ parts even if they have no impact on execution time or security. Moreover some\r
+ slight processor architectures (embedded systems) maybe will never be capable\r
+ of implementing ABC in hardware!\r
+\r
+ \subsection{Important Relations to ABC Optimization}\r
+ This chapter explains some important basic conditions that should\r
+ be noticed before starting to think about ABC optimization techniques but\r
+ some of them are valid for nearly all kinds of optimizations. At first the array\r
+ access index types are important to be studied in detail to identify possible\r
+ limitations raised by them. The next important fact is that optimizations\r
+ only pay if their benefits outperform the generated overhead. It should\r
+ also be kept in mind that the elaborateness and form of implementation\r
+ dramatically influences the effectiveness of the optimization.\r
+ \r
+ \subsubsection{Array Access Index Types}\r
+ The array access index type must also be taken into consideration for\r
+ several reasons. Most computer programming languages (from now on referred\r
+ to as languages) of even all programming paradigms accept the type integer\r
+ as array index type. The word length of this type varies in most of the cases and\r
+ furthermore signs of integral numerical types can also be treated differently.\r
+ In C/C++ integer types exist as signed or unsigned and the length is not specified\r
+ and depends mainly on the instruction length of the underlying hardware. \r
+ On the other side Java declares it's integer type explicitly signed and\r
+ 32bit wide (range from -2.147.483.648 to 2.147.483.647 in 2's complement).\r
+ The Java language furthermore declares that overflowing or underflowing values\r
+ are wrapped around which is not desired for array access index types because\r
+ there are hardly reasonable applications that access arrays using overflowing index\r
+ expressions. \cite{JavaLanguageSpecificationv3.0:2005}\r
+ This source of malfunctions and security leaks must also be tackled precisely\r
+ while designing optimization mechanisms.\r
+ \r
+ \subsubsection{Benefits to Execution Time Ratio}\r
+ The ABC optimization benefits must always outperform the execution\r
+ time necessary to calculate them. It is not acceptable that an optimization is\r
+ implemented that saves execution time for some applications and slows others\r
+ down. There is nothing to gain.\r
+ \cite{AComprehensiveApproachToArrayBoundsCheckEliminationForJava:2002}\r
+ for example shows that activated ABC optimization together with others\r
+ slows down some applications of the used benchmark. This is extremely important\r
+ to consider for optimization techniques that work during Just-In-Time\r
+ compilation and execution where expensive, elaborate and in the first\r
+ instance disturbing processes must be omitted as far as possible. As\r
+ mentioned in the next chapter this is maybe best accomplished by splitting\r
+ optimization techniques into different phases to locate them where they are\r
+ unproblematic to run.\r
+\r
+ \subsubsection{Intermediate Representation}\r
+ Which intermediate representation should the optimization operate on? This\r
+ has to be considered too. Most compilers fulfill their\r
+ compilation tasks in several phases and for this reason different intermediate\r
+ representations of the programs form. These depend completely on the compiler\r
+ architecture and the language itself especially on the programming paradigm \r
+ \cite{ModernCompilerImplementationInC:1998}.\r
+ But not only during the compilation process do new program representations\r
+ arise. For interpreted languages some kind of IR is necessary to transport\r
+ code between compiler and interpreter. For example the Java Byte Code is such\r
+ an IR that is delivered to the Java Virtual Machine and executed. If the\r
+ interpreter wants to apply some optimizations to the code it is forced to\r
+ operate on the intermediate representation delivered to him or to do an\r
+ extremely expensive transformation into an other desired form which slows down\r
+ execution even more. Java Byte Code for example is not very adequate to serve\r
+ as IR because it suffers from many problems concerning size, security\ldots \r
+ and is not really practical for exhaustive optimizations. Finally it should\r
+ not be forgotten that also high level code (Java, C++) can be a very\r
+ suitable representation for obtaining some optimization constraints!\par\r
+ \r
+ A lot of effort has been spent in using Java Byte Code annotations for\r
+ integrating optimization information into conventional class files that are\r
+ ignored by most compilers and allowing some to boost execution performance.\r
+ But these approaches have also a lot of disadvantages because the annotations\r
+ are abused, the usage is not standardized, further increased size and process\r
+ amount\ldots.\r
+ \cite{AFrameworkForOptimizingJavaUsingAttributes:2000} describes an example\r
+ framework that follows the previously discussed one. Also\r
+ \cite{VerifiableAnnotationsForEmbeddedJavaEnvironments:2005} can be an\r
+ interesting source for the beginning because it focuses on especially on \r
+ three important optimizations and on low cost checks. Furthermore\r
+ \cite{VerifiableRangeAnalysisAnnotationsForArrayBoundsCheckElimination:2007}\r
+ and \cite{SafeBoundsCheckAnnotations:2008} deal with annotations specifically\r
+ for ABC optimizations. In the historic review many more approaches will be\r
+ mentioned that are based on this construct.\par\r
+ \r
+ Another way to deal with the problem of IR is to use a completely new approach\r
+ and in the past several forms based on Static Single Assignment (SSA) where\r
+ researched and proven to be the commonly used ones in the future. The form is based on\r
+ the constraint that every variable can have only one point of assignment and\r
+ not more or less. But many more constraints and ideas were integrated into\r
+ different modern SSA forms that were adapted and optimized for several\r
+ reasons and techniques. For getting a general idea of SSA forms consult\r
+ \cite{AVerifiableSSAProgramRepresentationForAgressiveCompilerOptimization:2006}.\r
+ \cite{SafeTSAATypeSafeAndReferentiallySecureMobileCodeRepresentationBasedOnStaticSingleAssignmentForm:2001}\r
+ explains SafeTSA which is a type safe and size optimized SSA form and used in\r
+ \cite{UsingTheSafeTSARepresentationToBoostThePerformanceOfAnExistingJavaVirtualMachine:2002}\r
+ to optimize execution performance of a Java Virtual Machine. Another approach\r
+ is covered by\r
+ \cite{AVerifiableSSAProgramRepresentationForAgressiveCompilerOptimization:2006}\r
+ that focuses on clean and efficient embedding of range checks into other aggressive\r
+ optimization phases.\r
+ \cite{ABCDEliminatingArrayBoundsChecksOnDemand:2000} uses a new representation\r
+ by introducing new constraints for dealing efficiently with program branches\r
+ (if-else, while\ldots) called Extended SSA form (eSSA).\r
+ Another SSA form named HSSA is introduced in conjunction with a new\r
+ optimization algorithm ABCE in\r
+ \cite{ArrayBoundsCheckEliminationForJavaBasedOnSparseRepresentation:2009}. \r
+ Finally\r
+ \cite{TowardsArrayBoundCheckEliminationInJavaTMVirtualMachineLanguage:1999}\r
+ even introduces a new Java Byte Code representation called JVMLa that is also\r
+ suitable for several optimization tasks. Nearly all of these systems are not\r
+ commonly used and are therefore not standardized but show how future\r
+ intermediate representations could be designed or influenced by different\r
+ ideas.\r
+\r
+ \subsubsection{Positioning of Optimizations}\r
+ Being generally aware of all previously discussed circumstances another very\r
+ important aspect must be recognized regarding positioning of optimizations.\r
+ The main differentiation can be seen between compiled and interpreted\r
+ languages.\par\r
+ \r
+ Compiled languages are restricted to optimizations that can be calculated at\r
+ compile time (mainly static optimizations). This imposes really big\r
+ limitations on the possible and suitable techniques but expensive\r
+ computations do not therefore really mean big problems. High compile\r
+ time is widely accepted with special optimization level\r
+ parameters (for example -O3 for most C compilers). For development and\r
+ debugging the optimizations are not really necessary so they are omitted and\r
+ for deployment one optimized version is created where expensive and long\r
+ running optimizations are not seen problematic.\par\r
+ \r
+ For interpreted language architectures new aspects must be taken into account.\r
+ On the one hand the possibility for powerful highly dynamic optimizations\r
+ arise that use information only available at run time but on the other hand\r
+ it is unacceptable to compute expensive optimizations because it would slow\r
+ down execution dramatically. So the most successful approach would probably be\r
+ to split up dynamic optimizations to compile- and run time parts.\r
+ Expensive precalculations are delegated to the compiler and fast and effective\r
+ optimizations that use information obtained at compile- and run time are\r
+ calculated by the interpreter (virtual machine). This approach was\r
+ exhaustively researched by most recently published ABC optimization\r
+ techniques. At compile time some constraints and equalities/unequalities are\r
+ infered from high level code and intermediate representations through\r
+ theorem proving and even speculation and special optimizing virtual machines\r
+ interpret these constraints and annotations to calculate very effective\r
+ optimizations efficiently.\r
+ \cite{SafeMultiphaseBondsCheckEliminationInJava:2010} and all previous\r
+ versions of the same authors\r
+ \cite{AVerifiableControlFlowAwareConstraintAnalyzerForBoundsCheckElimination:2009},\r
+ \cite{SpeculativeImprovementsToVerfifiableBoundsCheckElimination:2008}\ldots \r
+ are recommended for understanding the complete idea. Also most previously\r
+ mentioned approaches that use annotations in any form (Soot framework)\r
+ basically follow it.\par\r
+ \r
+ One must also consider that optimizations must fit into the compilation and\r
+ interpretation process. Optimizations should not be an isolated process that\r
+ is quickly added to the compiler/interpreter architecture at the end of the\r
+ design process. Moreover optimizations and related techniques and\r
+ especially used intermediate representations should be taken into\r
+ account very early in language architecture design to get clean, clear,\r
+ effective and efficient compilation and interpretation processes overall \r
+ \cite{ModernCompilerImplementationInC:1998}.\r
+ Nearly all mechanisms researched in the context of Java and the Java Virtual\r
+ Machines suffer from the drawback that many optimizations are added to the\r
+ constructs too late and are therefore not really suitable for the complete\r
+ systems.\r
+ \r
+ \subsection{Types of ABCs}\r
+ ABCs can always be found before array access instructions. In Java the\r
+ instructions are provided for every primitive type as xaload and xastore where\r
+ x represents the array type (integer, float, signed byte, character\ldots).\r
+ But not only the array access instructions itself are important for ABCs but\r
+ also the context were they appear. Therefore some placements of\r
+ ABCs are of special significance. These are now described a little bit in\r
+ detail to understand which forms of interest can arise. The first three\r
+ classify if ABCs can be proven statically, dynamic or never as really\r
+ necessary during execution for keeping to the Java conventions. The second\r
+ three deal with the code structure where these ABCs occure. A good explanation\r
+ concerning raw classification of ABCs in general can be found in \r
+ \cite{AFreshLookAtOptimizingArrayBoundChecking:1990} and more recently by the\r
+ same author in\r
+ \cite{OptimizingArrayBoundChecksUsingFlowAnalysis:1993}.\r
+ \r
+ \subsubsection{Redundant ABCs}\r
+ Some ABCs that are inserted by primitive insertion algorithms are completely\r
+ redundant and can be statically proven unnecessary at all. For example three\r
+ sequential array accesses a[i] and a[i+1] and a[i+2] lead to three inserted\r
+ ABCs but the first two of them are completely redundant because the third\r
+ dominates them. The overall number of these checks is very low and the\r
+ execution time amount occupied by them even lower. So there\r
+ are hardly reasonable optimizations possible through elimination of them but\r
+ they are very easy and cheap to compute and remove so it should be done for\r
+ reasons of cleaness and completeness. These are only important to be\r
+ removed if they occur within loops.\r
+ \r
+ \subsubsection{Partial Redundant ABCs}\r
+ Some ABCs are related through different program branches that arise because of\r
+ if-else or while statements that lead to ABCs that are partially redundant.\r
+ Sometimes they can be simplified or partially eliminated. These ABCs also arise\r
+ relatively often in programs and therefore optimization through\r
+ elimination or code motion can really pay. For a detailed explanation\r
+ on this topic please consult the good considerations in\r
+ \cite{GlobalOptimizationBySuppressionOfPartialRedundancies:1979}.\r
+ \r
+ \subsubsection{Necessary ABCs}\r
+ There are ABCs in existence that exactly represent what they are invented for.\r
+ They are placed in front of array access instructions that can't be proven to\r
+ be valid so there are ABCs that can not be eliminated but sometimes simplified\r
+ or moved using code motion procedures. A good and simple example for this are\r
+ array access indices read from command line or from an external data source\r
+ like files or a database!\r
+ They occur in dependence on the application from seldom to very often. Whether an ABC\r
+ can be proven superfluous or necessary depends stronly on the power of the\r
+ used optimization technique. Sometimes array accesses that deal with\r
+ indirection or interprocedural array access program structures can't be proven\r
+ unnecessary although they are. This will also be explained in the chapter about global ABCs.\r
+ Advanced techniques that are capable of proving these exotic kinds of\r
+ redundant or partial redundant ABCs can sometimes eliminate or simplify big\r
+ amounts of elaborate and unnecessary ABCs.\r
+ \r
+ \subsubsection{Local ABCs}\r
+ Local ABCs are found in straight program structures without branches, loops\r
+ and other conditional statements. These code parts are executed\r
+ sequentially without possible interruptions and are generally very simple and\r
+ short. Local ABCs are in most cases redundant and can be completely eliminated\r
+ but don't have much impact on program execution time.\r
+ \r
+ \subsubsection{Global ABCs}\r
+ Global ABCs are much more difficult. We must at first distinguish between\r
+ interprocedural and intraprocedural global ABCs. If several method invokes are\r
+ integrated into one array access it is hard to automatically analyze the\r
+ structure and obtain constraints especially in languages like Java with\r
+ dynamic class loading. These are called interprocedural. Only very few\r
+ techniques have the power to prove such elaborate ABCs unnecessary. If\r
+ ABCs are only analyzed in the context of one method without\r
+ additional method invokes they are called intraprocedural. Also array accesses\r
+ where indices are again placed in arrays (indirection) can be very challenging\r
+ to prove redudant. A technique that pays special attention to ABCs in the\r
+ presence of indirection is\r
+ \cite{EliminationOfJavaArrayBoundsChecksInThePresenceOfIndirection:2002} that\r
+ gives a solid insight into the difficulties raised by this topic. Due to the\r
+ frequent occurence in combination with matrix calculations and other costly\r
+ procedures optimizations of global ABCs can reduce execution time of\r
+ programs.\r
+ \r
+ \subsubsection{ABCs in Loops}\r
+ ABCs that are placed in loops deserve special attention because here\r
+ the biggest runtime overheads of all possible ABC types arise. If unnecessary ABCs\r
+ are placed in loops they are often repeated and result in huge\r
+ fractions of execution time. Sometimes Local and Global ABCs are found in\r
+ loops and can be eliminated or simplified and sometimes ABCs can even be moved\r
+ out of the loop using code motion techniques. This allows dramatic optimizations.\r
+ One problem that arises is that some optimizations change the program behavior\r
+ and especially the exception throw conventions of the Java language can\r
+ introduce a big additional effort. All this must be taken into account. ABC loop\r
+ optimization is so important and powerful that some techniques reach good\r
+ improvements only by analyzing some common loop structures/patterns. For\r
+ example\r
+ \cite{ArrayBoundsCheckEliminationInTheContextOfDeoptimization:2009} describes\r
+ the mechanisms implemented in the HotSpot Java Virtual Machine that is based\r
+ on this idea.\r
+\r
+%% Historic review of ABCs ----------------------------------------------------\r
+\r
+\section{A Historic Review of ABCs and Related Optimization}\r
+The presentation of history concerning ABC optimizations will be done in a\r
+chronological order to underline the whole evolution. To get a quick insight into\r
+the different documents some kind of classification will always be provided that\r
+keeps as far as possible to parameters explained in the past chapter so that it\r
+should be possible to quickly understand the raw idea described.\r
+\r
+ \subsection{The Early Beginning}\r
+ The first approaches for dealing with ABC optimization all operate on high level\r
+ code representation (no additional intermediate representation necessary)and\r
+ use data flow graphs to represent program parts in an abstract manner. The real\r
+ ideas of ABCs and optimizations in this direction can't be explicitly\r
+ attributed to one person.\par\r
+ \r
+ In the authors opinion William H. Harrison was the\r
+ first that directly covered the problem. In May 1977 he describes in \r
+ \cite{CompilerAnalysisOfTheValueRangesForVariables:1977} the more\r
+ general problem of value range analysis of variables. Array bound checks are a\r
+ subproblem of this because array indices are only numerical variables that must\r
+ keep to value ranges (0..arraylength). He introduced in his paper an algorithm\r
+ called Range Propagation that obtains variable value ranges starting\r
+ with [-Inf:+Inf] by analyzing the related programing structures for each\r
+ variable [0..100] for example. Another algorithm Range Analysis even\r
+ covers the problem of induction with variables and therefore mainly concerns\r
+ loops. The algorithm builds through derivations linear equations that are then\r
+ solved based on a data flow analysis. He also introduced the idea of loop\r
+ counting, general program diagnostic and program termination that can identify\r
+ several mistakes and malfunctions to the programmer. This can be seen as the\r
+ real origin of the idea how to insert and optimize ABCs effectively. His\r
+ algorithms are inserted into the conventional compilation process and he even\r
+ considered overall execution time of compilation in his algorithms.\par \r
+ \r
+ In 1977 nearly at the same time Norihisa Suzuki and Kiyoshi Ishihata composed\r
+ \cite{ImplementationOfAnArrayBoundChecker:1977} where they explicitly insert\r
+ ABCs in high level code in the form of assert statements in front of\r
+ array accesses to provide comfortable error information. This is done by\r
+ deducing formulas from the program and using a theorem prover to decide if an\r
+ ABC (assertion) is necessary for an access instruction or not. Because only static\r
+ analysis is done no real powerful optimizations can be applied. Especially the\r
+ intraprocedural form limits the general success but most local and even some\r
+ global ABCs are tackled. All things considered this paper is the first really\r
+ modern approach for dealing with theorem proving of ABCs and definitly a good\r
+ starting point for studying the topic in depth.\par\r
+ \r
+ In 1979 E. Morel and C. Renvoise published\r
+ \cite{GlobalOptimizationBySuppressionOfPartialRedundancies:1979} that\r
+ introduced the idea of formal consideration of partial redundancies that are\r
+ also necessary for more elaborte analyzing of Global ABCs. For really\r
+ understanding partial redundant ABCs this paper is a must.\par\r
+\r
+ In 1982 Victoria Markstein, John Cocke and Peter Markstein presented a new\r
+ approach for optimizing ABCs in \r
+ \cite{OptimizationOfRangeChecking:1982}. For the first time the idea is\r
+ documented that code parts with high execution frequencies (loops) deserve\r
+ special attention (analysis effort). They introduced usage of subexpression\r
+ elimination and code motion for optimizing ABCs. The technique operates on the\r
+ code by deriving Strongly Connected Regions SCRs using USE and DEF\r
+ functions. These were also used previously by\r
+ \cite{CompilerAnalysisOfTheValueRangesForVariables:1977} albeit the\r
+ explanations were much more complicated and detailed. The technique is based on\r
+ data flow analysis and meant to be integrated into a multiphase compiler after\r
+ all previous optimizations were accomplished. The same authors even obtained an\r
+ USPatent in 1987 for their ABC optimization technique\r
+ \cite{OptimizationOfRangeChecking:1987}.\par\r
+ \r
+ In 1990 Rajiv Gupta published results of his researches on ABC\r
+ optimization in \r
+ \cite{AFreshLookAtOptimizingArrayBoundChecking:1990}. He used a\r
+ so called Minimal Control Flow Graph MCFG to implement some new\r
+ revolutionary approaches like elimination of redudant ABCs, combination of\r
+ partial redundant ABCs and propagation of ABCs out of loops (code motion). It\r
+ exactly describes how the MCFG is built from source code and also the\r
+ algorithms to perform elimination, propagation and combination of ABCs. This\r
+ gives very precise guidance for first experiments with ABC optimization\r
+ algorithms and abstract representations.\r
+ \r
+ \r
+\r
+ In 1992 Jonathan M. Asuru presented in\r
+ \cite{OptimizationOfArraySubscriptRangeChecks:1992} an approach that started dealing\r
+ more precisely with Global ABCs that definitely allow more important\r
+ optimizations. He claimed that previous techniques where not able to deal with\r
+ elaborating loops (nested loops, nonconstant induction variables\ldots) and he\r
+ was right. He described two new mechanisms that were able to deal with much\r
+ more complicated program constructs than all others before.\r
+ \begin{description}\r
+ \item[Inner-Loop Guard Elimination ILGE] deals generally with conventional\r
+ known local and global ABC types.\r
+\r
+ \item[Conservative Expression Substitution CES] deals with nested loops and\r
+ partial order for related range checks. Also important with CES is Local Range\r
+ Check Optimization where the greatest lower bound and least upper bounds are\r
+ used to eliminate range checks that are redundant or partially redundant.\r
+ Furthermore Loop Relative Global Range Check Optimization builds up a system\r
+ of structures to improve nested loops checking and finally Range Check\r
+ Propagation uses range check motion to move ABCs to more efficient program\r
+ places.\r
+ \end{description} \par\r
+\r
+ In 1993 Rajiv Gupta again published his new insights into the topic in\r
+ \cite{OptimizingArrayBoundChecksUsingFlowAnalysis:1993}. Many refinements to\r
+ his previous ideas were added and the knowledge about the general problem of ABC\r
+ optimization increased. He assumed that readers are familiar with his last work \r
+ \cite{AFreshLookAtOptimizingArrayBoundChecking:1990}.\r
+ \begin{description}\r
+ \item[Local Optimization] deals with Identical Checks, Subsumed Checks with\r
+ Identical Bounds and Subsumed Checks with Identical Subscript Expressions\r
+\r
+ \item[Global Optimizations] are implemented using at first an algorithm for\r
+ modifying checks to create Redundant Checks and secondly an algorithm for the\r
+ reelimination of Redundant Checks. This important idea of first inserting new\r
+ checks and eliminating them is brilliant and will be used by further\r
+ techniques in the future. It allows the identification and resolution of difficult\r
+ redundancies.\r
+\r
+ \item[Propagation of Checks Out of Loops] is done using a refined algorithm\r
+ for propagation.\r
+ \end{description} \par\r
+\r
+ \subsection{The Chaotic Middle Ages}\r
+ Until 1995 (ca. the first two decades) only a very small group of visioners\r
+ researched ABCs and related optimization techniques. All these approaches\r
+ operated for performance and technological reasons directly on high level\r
+ program code like J. Welsh for Pascal\r
+ \cite{EconomicRangeChecksInPascal:1978} and some others for C and Fortran.\r
+ All things considered all previous approaches were based on the same general\r
+ idea of building flow graphs and applying data flow analysis. But since 1995\r
+ many scientists started researching ABC optimization and therefore a chaotic\r
+ amount of new techniques and ideas appeared. From 1995 on most approaches are\r
+ based on SSA program intermediate representations and algorithms suitable for\r
+ processing optimizations on it but also some exotic and completely new ideas\r
+ were invented and researched. \par\r
+ \r
+ In 1995 Priyadarshan Kolte and Michael Wolfe published\r
+ \cite{EliminationOfRedundantArraySubscriptRangeChecks:1995} where they explain\r
+ their ABC optimization techniques in the Nascent Fortran compiler. They use an\r
+ SSA intermediate form for induction analysis and furthermore a Check\r
+ Implication Graph CIG for global ABCs. They tried different approaches but also\r
+ mentioned that a combination of them would deliver the best results. Also some\r
+ new and more detailed information about Partial Redundant Elimination PRE is\r
+ provided that directly covers optimization of partial redundant ABCs.\r
+ \begin{description}\r
+ \item[No-Insertion NI] means optimization with check elimination without adding\r
+ any new checks that are simpler.\r
+ \item[Safe-Earliest SE] achieves a little bit more eliminations than LNI and NI\r
+ but not much and is the first of two PRE placement checks.\r
+ \item[Latest-Not-Isolated LNI] is only a second PRE placement check.\r
+ \item[Check Strengthening CS] is already described in\r
+ \cite{OptimizingArrayBoundChecksUsingFlowAnalysis:1993}.\r
+ \item[Preheader-Insertion LI] with only loop invariant checks.\r
+ \item[Preheader-Insertion Loop Limit Substitution LLS] is the clear winner if\r
+ compile time and elimination number are taken into account.\r
+ \end{description} \par\r
+ This paper proves in the end that optimized check insertion techniques are much\r
+ better than only elimination techniques. \par\r
+ \r
+ In 1998 Hongwei Xi and Frank Pfenning dealt in \r
+ \cite{EliminatingArrayBoundCheckingThroughDependentTypes:1998} with ABC\r
+ optimization in ML. So this time ABC optimization also gets very interesting\r
+ for other programming paradigms like functional programming.\par\r
+ But also S. P. Midkiff and J. E. Moreira and M. Snir published a new work\r
+ concerning this topic. \r
+ \cite{OptimizingArrayReferenceCheckingInJavaPrograms:1998} is a\r
+ very voluminous paper that directly covers the problem of ABC optimization for\r
+ the Java language. At the beginning a very formal theory about the topic is\r
+ presented and then all optimization techniques are explained in detail and\r
+ always presented with good examples on high level Java source code. This work\r
+ gives really comprehensive insight into the topic in relation to the Java\r
+ language. \par\r
+ \r
+ In 1999 the paper\r
+ \cite{TowardsArrayBoundCheckEliminationInJavaTMVirtualMachineLanguage:1999}\r
+ from Xi and Xia introduced a new idea. They did not agree with the standard\r
+ Java Virtual Machine Language JVML (Java Byte Code) and wanted to cope with\r
+ ABCs through a new system of dependent types to improve safeness of the\r
+ language and allow at the same time successful optimizations. So they\r
+ architectured a new intermediate code representation called JVMLa that should\r
+ be created by the Java Compiler and that should only contain ABCs that can't be\r
+ proven unnecessary at compile-time. This approach is questionable because the\r
+ Java Compiler output (JVML) is standardized and wide spread and a very\r
+ complicated ABC optimization part (at compile time) is assumed to function\r
+ already properly in the Java Compiler. This is very inpractical but a theoretic\r
+ idea that may by taken into account by other approaches or languages in the\r
+ future. \par\r
+ \r
+ In 2000 the previously mentioned USPatent\r
+ \cite{ProcessorWithAcceleratedArrayAcessBoundsChecking:2000} was registered and\r
+ Bodik, Gupta and Sarkar presented a new algorithm for optimized ABC execution in\r
+ \cite{ABCDEliminatingArrayBoundsChecksOnDemand:2000}. This algorithm is an\r
+ exotic one because it uses a transformation process at run time that converts\r
+ from Java Byte Code to a slightly extended SSA form (eSSA) which is normaly a\r
+ very expensive procedure and applies a constraint analyzer and solver on it that\r
+ really achieves performance gain. The used constraint analyzer adds\r
+ constraints to the eSSA instructions and build up an inequality graph IG\r
+ where only a shortest path finder (constraint solver) is applied to optimize\r
+ ABCs. This is very impressive because the whole approach is very costly but it\r
+ really pays like depicted in the results. Array Bound Check elimination on\r
+ Demand (ABCD) was really able to improve execution time of some applications in\r
+ the used benchmark. \par\r
+ \r
+ In 2001, one year after the last USPatent concerning hardware implementation of\r
+ ABCs,\r
+ \cite{ApparatusAndMethodForArrayBoundsCheckingWithAShadowFile:2001} was\r
+ registered as a new idea. Also Chin, Khoo and Xu published in \r
+ \cite{DerivingPreconditionsForArrayBoundCheckElimination:2001} some further\r
+ ideas how to derive preconditions for ABCs. This very formal approach and\r
+ documentation gives some more detailed insights into the problem of program\r
+ code analysis for optimizing ABCs. \par\r
+ \r
+ In 2002 Bentley, Watterson and Lowenthal compared execution of ABC execution on\r
+ Superscalar and VLIW hardware architecture in\r
+ \cite{AComparisonOfArrayBoundsCheckingOnSuperscalarAndVLIWArchitectures:2002}\r
+ which gives good explanations concerning implementation of ABCs using processor\r
+ compare instructions. \par\r
+\r
+ Qian, Hendren and Verbrugge present three algorithms\r
+ they implemented using the Soot framework previously mentioned in IBMs HPCJ in\r
+ \cite{AComprehensiveApproachToArrayBoundsCheckEliminationForJava:2002}. The\r
+ first algorithm Variable Constraint Analysis (VCA) works on a simplified\r
+ directed weighted graph that reflects the relations and behavior of relevant\r
+ variables (array access indices). Array Field Analysis AFA and Rectangular\r
+ Array Analysis deal with special forms of array structures that are hard to\r
+ analyze but can in some circiumstances really allow good optimizations. \par\r
+ \r
+ Gurd and Freeman et al. presented a further paper \r
+ \cite{EliminationOfJavaArrayBoundsChecksInThePresenceOfIndirection:2002} that\r
+ exactly captures the problem of ABCs in the presence of indirection what\r
+ simply means that array access indices are stored in an other array.\r
+ Optimizations in this scope are not easy and require a new technique.\r
+ The solution is not embedded in the Java Compiler or VM itself\r
+ but implemented directly as Java classes that must be used by programmers.\r
+ Three classes with different approaches are presented but only one of them\r
+ seems really useable.\r
+ \begin{description}\r
+ \item[ImmutableIntMultiarray:] The array is checked for invalid bounds and is\r
+ not allowed to be modified during index access operations!\r
+ \item[MutableImmutableStateIntMultiarray]: The array can be in two states, in\r
+ the first (mutable) bounds are checked and in the second (immutable) no checks\r
+ occur.\r
+ \item[ValueBoundedIntMultiarray:] This array forces that all entries (indices)\r
+ are within the defined range and so no exception is possible.\r
+ \end{description}\r
+ The idea and all of the explanations and details about matrix\r
+ calculations and related ABCs are really good but generally the practical\r
+ benefits are rare because Java especially wants to permit that program style of\r
+ classes has impact on run time and performance because this is exactly what\r
+ Java wants to abstract from. \par\r
+ \r
+ In 2002 Gupta, Pratt, Midkiff and Moreira registered the USPatent \r
+ \cite{MethodForOptimizingArrayBoundsChecksInPrograms:2002} where they describe a\r
+ very general method for optimizing ABCs in many facets. \par\r
+\r
+ In 2003 Motohiro Kawahito published in an interesting and very exotic idea for ABC\r
+ \cite{ArrayBoundsCheckEliminationUtilizingAPageProtectionMechanism:2003}\r
+ elimination by exploiting the hardware processors page protection mechanism.\r
+ It is based on the idea to place an array to the end/beginning of a page and\r
+ inspect if an access leaves the valid page. If the page protection throws an\r
+ exception it can be derived that the array bounds were violated. \par\r
+ \r
+ In 2005 Nguyen and Irigoin presented with\r
+ \cite{EfficientAndEffectiveArrayBoundChecking:2005}\r
+ a document that implements in a FORTRAN compiler (PIPS) some\r
+ optimizations of ABCs. The general ideas are well known but the detailed\r
+ and formal descriptions give further insights into the details of the\r
+ whole problem. The first idea is based on Guptas approach. Insert ABCs and\r
+ eliminate redudant ones. Furthermore insert ABCs that can't be proven\r
+ unnecessary. And the last important thing is an algorithm for array size\r
+ checking and analysis to prove array access vioaltions over different procedure\r
+ calls. So the technique allows interprocedural ABC optimization that is a real\r
+ challenge and rarely implemented anywhere. \par\r
+ \r
+ Another large document \r
+ \cite{SymbolicBoundsAnalysisOfPointersArrayIndicesAndAccessedMemoryRegions:2005}\r
+ was publisehd by Rugina and Rinard where they describe a new framework that\r
+ deals with general problems like access memory regions but also with bounds\r
+ analysis of pointer array indices. Due to the extensiveness of it any\r
+ explanation about this is widely out of this documents scope. \par\r
+\r
+ In 2007 two new directions were started that are explained more precisely in the\r
+ next chapter. \par\r
+ \r
+ In 2008 Popeea, Xu and Chin presented in\r
+ \cite{APracticalAndPreciseInferenceAndSpecializerForArrayBoundChecksElimination:2008}\r
+ theory and experiments that use forward relational analysis to obtain\r
+ postconditions and backward precondition derivation to inference precise\r
+ (path, context sensitive) and practical pre/postconditions for an inference\r
+ algorithm to reduce ABCs. Two phases inference and specialization (inserts\r
+ necessary checks) are used to obtain the right pre/postconditions. Weak and\r
+ strong derivation processes lead to the results. In the analysis phase\r
+ Presburger algorithm is used that has high complexity and therefore the number\r
+ of formulas must be reduced as far as possible. Flexivariants of methods mean\r
+ that due to their usage the conditions can be weaker or stronger, so different\r
+ condition strengths are calculated and stored for maximum of efficiency and\r
+ minimum of memory effort. The technique also deals with array indirections.\r
+ It works on a high level artificial language IMP that allows no loops but\r
+ recursion (fix-point analysis) and also has some other restrictions. The inference\r
+ algorithm must be run at compile time and some cases were really expensive\r
+ (\(>\)60min)! So it is not suitable for run-time checks! The approach takes interprocedural\r
+ behavior into account but without dynamic class loading, because the analysis\r
+ works bottom-up. The work is in general an extremely formal approach that makes\r
+ hard use of formulas, foreign symbols and hard to understand mathematical\r
+ definitions so the contribution is in best case of theoretical nature but not\r
+ really suitable in practice.\r
+ \r
+ \subsection{State-Of-The-Art Methodologies}\r
+ The previous path through history was very chaotic but since 2007 two\r
+ real main directions arise that have the potential to be, in the end, satisfactory for\r
+ solving the problem of ABC optimization. The first approach is implemented\r
+ already in the HotSpot Java Virtual Machine and uses a fast transformation from\r
+ Java Byte Code to an SSA based intermediate representation to apply low cost\r
+ but extremely effective analysis techniques for well known code patterns and\r
+ code structures to improve ABC optimization quality quite good and precise. The\r
+ second approach can be summarized under the name Multiphase Bound Check\r
+ Elimination (MBCE) where Java Compiler and Java Virtual Machine process separated\r
+ parts of the ABC optimization process to get high quality and highly efficient results.\r
+ The benchmarks and measurements are also suitable to reveal the real impact and\r
+ efficiency of the presented and tested techniques. \par\r
+ \r
+ In 2007 WĂ¼rthinger, Wimmer and Mössenböck published \r
+ \cite{ArrayBoundsCheckEliminationForTheJavaHotSpotTMClientCompiler:2007} were\r
+ they present insights into the general functionality of the HotSpot Java VM and\r
+ the ABC optimization technique they use. Their algorithm optimizes some well\r
+ known programming patterns in Java to eliminate redundant checks and to move\r
+ partial redundant checks out of loops but keep in mind deopimization for not\r
+ provable check eliminations. This means that if checks can't be proven valid\r
+ the VM stops JIT compilation usage and goes back (ORS) to interpretation mode\r
+ (deoptimization) that is slower but easily allows keeping to the Java language\r
+ conventions like throwing exceptions at the right place\ldots. But results show\r
+ no significant increase in optimization, because the part of execution time is\r
+ not very high in the used benchmarks! A maximum of 5\% was possible! The\r
+ solution is clean, keeps to widely accepted standards and optimizes ABCs if\r
+ they occur very often. \par\r
+ \r
+ Another publication \r
+ \cite{VerifiableRangeAnalysisAnnotationsForArrayBoundsCheckElimination:2007}\r
+ from 2007 is the beginning of the most recently researched\r
+ technique of von Ronne, Psarris and Niedzielski. This document describes\r
+ verifiable annotations based on SafeTSA for optimizing ABCs by shifting slow\r
+ optimizations to a Java to SafeTSA compiler and let the JIT compiler do fast\r
+ and also efficient optimizations based on the created annotations. This allows\r
+ usage of expensive optimization techniques in time critical virtual machines\r
+ (JIT compilers). The interesting problem of integer\r
+ overflow/underflow that is not taken into account by many other techniques but\r
+ means a security and safety leak is also mentioned. This should be taken into consideration for\r
+ any implementation to react accordingly to over- and underflows. The theory of\r
+ the build constraints and linear inequality/equalities and the prover are\r
+ strongly related to \r
+ \cite{ABCDEliminatingArrayBoundsChecksOnDemand:2000} but provide\r
+ some improvements and are also meant to be only integrated into low level code\r
+ (SafeTSA) as annotations and therefore use much more expensive techniques than\r
+ ABCD can. \par\r
+ \r
+ In 2008 Nguyen and Keryell published\r
+ \cite{EfficientIntraproceduralArrayBoundChecking:2008} that describes a\r
+ further framework that is based on deriving expensive linear equalities at\r
+ compile-time and integrating them as annotations into a special form of\r
+ intermediate representation like SafeTSA for example. This approach is strongly\r
+ connected to the previous paper\r
+ \cite{VerifiableRangeAnalysisAnnotationsForArrayBoundsCheckElimination:2007}\r
+ (same introduction). They implemented a verifyable annotation generator that\r
+ is based on Jikes RVM 2.2.0. Its use extended Fourier-Motzkin's variable\r
+ elimination (FMVE) to derive proofs. The framework presented is called VBCE\r
+ Verifiable Bound Check Elimination and can be directly compared to VDF (Verifiable\r
+ Data Flow Analysis). The intraprocedural approach that is not able to deal with\r
+ interprocedural relations, works on high level code at compile time and\r
+ annotations and is therefore not suitable for direct JIT optimization.\r
+ Loop invariants are not taken into account. \par\r
+ \r
+ Furthermore in this year Gampe, von Ronne and Niedzielski et al. published two\r
+ updates\r
+ \cite{SafeBoundsCheckAnnotations:2008}\r
+ and\r
+ \cite{SpeculativeImprovementsToVerfifiableBoundsCheckElimination:2008}\r
+ of their previous work\r
+ \cite{VerifiableRangeAnalysisAnnotationsForArrayBoundsCheckElimination:2007}\r
+ that introduce speculation and newer mechanisms for annotations into the\r
+ optimization technique. The papers improves the previous framework further.\r
+ It also works on high level code at compile but now takes loop invariants into\r
+ account.\r
+ \r
+ To get the relationship of their publications on the\r
+ topic these papers should be studied at least at a glance.\r
+ \r
+ In 2009 WĂ¼rthinger, Wimmer and Mössenböck revealed a further update of their\r
+ approach in \r
+ \cite{ArrayBoundsCheckEliminationInTheContextOfDeoptimization:2009} that\r
+ introduces further improvements and optimizations in the HotSpot Java VM.\r
+ The document is logically strongly related to\r
+ \cite{ArrayBoundsCheckEliminationForTheJavaHotSpotTMClientCompiler:2007}\r
+ and explains that code duplication (loop versioning) is avoided using the\r
+ interpretation mode of the VM to retain exception semantics for PRCs.\r
+ It provides a good explanation of all parts especially PRC in loops and why loops\r
+ with additional exit conditions can't be optimized. Furthermore conditions are\r
+ not managed in an equality/inequality graph but for every instruction.\r
+ Interprocedural optimizations are not taken into account due to dynamic class\r
+ loading mechanism of Java. Loading new classes leads to more information about\r
+ array acessing and can lead to deoptimization (OSR at safepoint back to stack)\r
+ if the new conditions can't prove the previously optimizations. But several\r
+ ABCs can be grouped if proveable which leads to further optimizations. All\r
+ things considered a really suitable ABC optimization technique clearly\r
+ embedded in the VM and very effective goind by the test results. \par\r
+ \r
+ \cite{ArrayBoundsCheckEliminationForJavaBasedOnSparseRepresentation:2009} from\r
+ Yang, Huang and Yang is another publication from 2009 that introduces new\r
+ algorithm ABCE that is based on HSSA form and can be used in a Java static\r
+ compiler. It builds an inequality graph and eliminates FRC (fully redudant\r
+ checks) and PRC (partially redundant checks) on it in two phases. The work is\r
+ based on the Openjc java compiler that is restricted to very few platforms and\r
+ not widespread. This compiler uses a five phase generation style where in the\r
+ fourth phase the optimization is done on code that was transformed to HSSA \r
+ form and back. That means a lot of time effort. The quality and embedding of\r
+ the algorithm into the compiler is suitable but not useable for run time\r
+ optimizations and therefore too restricted. Furthermore the article only uses\r
+ percentage of ABC elimination as benchmark and a very intransparent\r
+ performance score graphic! \par\r
+ \r
+ In the same year Gampe, von Ronne and Niedzielski et al. published another\r
+ update\r
+ \cite{AVerifiableControlFlowAwareConstraintAnalyzerForBoundsCheckElimination:2009}\r
+ of their previous work\r
+ \cite{VerifiableRangeAnalysisAnnotationsForArrayBoundsCheckElimination:2007}\r
+ that further improves their ABC optimization technique. In the document a new\r
+ more modern technique is presented that is also based on eSSA of the ABCD\r
+ algorithm and claimed to be 10\% more efficient than on SafeTSA. Then CAS\r
+ (constraint analysis system) a verifiable symbolic program constraint analyser\r
+ is presented. The approach is like previous ones also based on multiphase and\r
+ FVME. \par\r
+ \r
+ From 2010 the last known publication about ABC optimization is\r
+ \cite{SafeMultiphaseBondsCheckEliminationInJava:2010} again from Gampe, von\r
+ Ronne and Niedzielski et al. They refined the details of their\r
+ multiphase ABC optimization technique based on SafeTSA with annotations, CAS\r
+ and elaborate speculations. The results and also the additional provided\r
+ information about the topic show that the solution is practically satisfying.\r
+ The most challenging problems\r
+ connected to ABCs are solved and the solution is well embedded into a Java\r
+ Compiler and Java VM. They are not widespread but solve the problem of ABCs as\r
+ far as possible to be both efficient and convenient.\r
+\r
+ \subsection{Historical Conclusion}\r
+ Lastest research in ABC optimizations prove the assumption that only an early and\r
+ clean integration into the compilation and or interpretation process together\r
+ with suitable intermediate representations and multiple optimization phases \r
+ lead to practically satisfying results that really pay and tackle all newly\r
+ introduced problems. For Java the lately improved approach of the HotSpot Java \r
+ VM and the multiphase bound check elimination MBCE promise practical success\r
+ because all other previous techniques suffer from many drawbacks that can't\r
+ really be solved due to the overall environment of the language architecture,\r
+ hardware, future developments and other aspects of reality.\r
+\r
+%% ABCs in the Cacao Java VM --------------------------------------------------\r
+\r
+\section{ABCs in Cacao Java Virtual Machine}\r
+\r
+ \subsection{Introduction to Cacao Java Virtual Machine}\r
+ The work on the Cacao Java VM was started in 1996 by Rainhard Grafl and\r
+ Andreas Krall. They wanted to create a free, efficient and stable Java VM for\r
+ a wide variety of processor architectures and platforms. In the following years\r
+ many people joined the development team and implemented several extensions,\r
+ refinements and more and more architectures were provided. The implementation\r
+ is mostly done in C and some parts were already refactored to C++. The VM is partitioned\r
+ into modules (*.h/*.hpp with *.c/*.cpp) and keeps to the relevant Java standards \r
+ concerning behavior and general structure. For more detailed information about the\r
+ Cacao Java VM consult\r
+ \cite{HandbookForJavaCacaoVirtualMachine:1996} and\r
+ \cite{CacaoA64BitJavaVMJustInTimeCompiler:1997}.\r
+\r
+ \subsection{Array Bounds Checking in Cacao Java Virtual Machine}\r
+ In this chapter we will describe in general all necessary aspects concerning ABCs.\r
+ The Cacao Java VM uses integer compare instructions of the processor to implement\r
+ ABCs. They are generated together with all other code parts during JIT compilation\r
+ by the code generators from a special internal intermediate representation whose form\r
+ is strongly related to quadruples and derived from the Java Byte Code provided by \r
+ Java class files. ABCs are positioned in front of array accesses. \r
+ The JIT compilation is done locally on every method invocation and therefore only\r
+ allows intraprocedural optimizations, so it's impossible to solve intermediate ABC\r
+ optimization because all underlying and parallel methods must be analyzed. Some\r
+ experiments with example applications showed that execution with ABCs turned on leads\r
+ to significant execution time overhead. For some applications this overhead is\r
+ enormous so in consideration of all properties of the Cacao Java VM, its environment and\r
+ implementation an ABC optimization would be able to drastically reduce execution\r
+ time for some types of applications. In the past this motivated the creation of\r
+ an ABC optimizer in the Cacao Java VM.\r
+ \r
+ \subsection{Array Bounds Check Optimization in Cacao Java Virtual Machine}\r
+ In the year 1998 Christopher KrĂ¼gel added an ABC optimizer for loops to the Cacao Java VM. This led to\r
+ improvements concerning execution time of array intensive applications. For the\r
+ reasons previously explained the optimization algorithm design was limited by \r
+ many constraints so the author only concentrated on the most important topics.\r
+ A useful generic introduction into the optimizer can be obtained from\r
+ \cite{HandbookForJavaCacaoVirtualMachine:1996} in chapter 5.7 Loop Optimization.\r
+ \r
+ \subsubsection{Power and Properties of the ABC Optimizer}\r
+ The intermediate representation (IR) used within the Cacao Java VM is not really suitable\r
+ for complicated high performance optimization like for example SSA based IRs are. This\r
+ has historical reasons, because these representation formats got famous in the past decade\r
+ and also have disadvantages so that their usage must be carefully taken into account\r
+ during the whole design process of an abstract machine or compiler.\r
+ For this reason the algorithm design was in the first instance limited by the complexity raised \r
+ by analysis and transformations of this quadruple related IR. The time consumption of the \r
+ optimizer must be recognized.\r
+ Secondly the JIT compiler architecture limited the optimization scope to one local method.\r
+ This further constraints possibilities because no global ABCs can be removed. \r
+ Due to the internal architecture of the Cacao Java VM\r
+ the structures and algorithms grew very complicated and so the developer only focused on the\r
+ most important facts concerning ABC optimization. The author concentrated only on optimization within \r
+ loops and even furthermore he restricted his algorithm to loops with induction \r
+ variables with constant grow factors (up or down). Maybe\r
+ the algorithm designer used the argument that many empirical tests in the past showed that these\r
+ kinds of loops introduce most of the execution time overhead because they are very often and commonly\r
+ used by programmers in many programming languages. The results he presented proved the concept to\r
+ be generally suitable. So he did not analyze detailed theoretical differences on redudant/\r
+ partial redudant or necessary/local/global ABCs. He simply concentrated on ABCs that seem to \r
+ cause the most overhead in loops within normal applications.\r
+\r
+ \subsubsection{Generic Design Issues of the ABC Optimizer}\r
+ This chapter will provide a generic overview over the design of the ABC optimizer \r
+ and the added improvements. A detailed explanation is, due to the amount of information,\r
+ not possible within the document's scope. For a further functional explanation please consult\r
+ \cite{HandbookForJavaCacaoVirtualMachine:1996}\r
+ and for the implementational details directly the comments in the Cacao Java VM source code. \r
+ \par\r
+ The general idea of the algorithm is to analyze a method during its JIT compilation process\r
+ for loops. This is accomplished by a depth first search of code blocks (basic blocks), then\r
+ calculating the domintators of the blocks using the Lengauer-Tarjan algorithm \r
+ \cite{AFastAlgorithmForFindingDominatorsInAFlowGraph:1979}\r
+ and detecting all loops by searching for back edges\r
+ \cite{ModernCompilerImplementationInC:1998}. These are very time consuming tasks because all\r
+ nodes (basic blocks) of the method must be iterated several times one every method compilation.\r
+ After this happend identical loops are merged, nested loops resolved and all together they are topologically sorted.\r
+ The loops are optimized in their sorted order by analyzing their loop headers and the variables/constants\r
+ involved in array access instructions. If properties are found that allow optimization the\r
+ loop and its header are adapted to suit optimal execution behavior. If nested loops exist their\r
+ combinatorical behavior must be considered.\r
+ The loop header must be adapted in such a way that new dynamic and static constraints introduced by code motion\r
+ out of the loop are inserted. In general only loops can be optimized where array index variables\r
+ are constant (full), only increment (lower bound) or only decrement (upper bound) by constant values.\r
+ Loop headers with or statements can't be optimized by the algorithm as cases\r
+ where array index variables are modified within exception handling (catch) blocks.\r
+ The original loops are replaced by the optimized (memory layout) with a conditional\r
+ jump to the originals if some dynamic constraints don't hold.\r
+ Also important for the optimization design is to notice that the original version of the loop is\r
+ held in memory parallel to the optimized one to allow correct execution with ABCs in the case\r
+ that some dynamic constraints don't hold or can't be proved. This introduces a lot of time\r
+ and space overhead during jit compilation because of the copy procedure. The correct patching\r
+ of links between code blocks and an exact exception handling add even more complexity to the\r
+ solution. So every optimized loop exists two times within the memory.\r
+\r
+ \subsubsection{Implementation of the ABC Optimizer}\r
+ \r
+ This chapter is dedicated to give an abstract but fundamental introduction\r
+ to the structure of the ABC optimization algorithm. At first\r
+ the most important data structures are explained to get an idea what kind\r
+ of data is necessary for the processes (temporary and in the end). Then we will\r
+ have a clear look at the algorithm in form of an abstract pseudocode that represents\r
+ generally all important functions and the behavior overall. The functions are\r
+ named to be self explanatory to express their meaning but for understanding the\r
+ ideas behind it maybe a good beginning would be to first consult the good generic \r
+ textual description found in\r
+ \cite{HandbookForJavaCacaoVirtualMachine:1996} in chapter 5.7 Loop Optimization.\r
+ This could give some useful introducing information before studying the real behavior.\r
+\r
+\end{multicols}\r
+\r
+ The most important data structures to understand the ABC optimization algorithm:\r
+ \r
+ \begin{description}\r
+\r
+ \item[LoopData (ld)] is a data structure that contains all necessary\r
+ intermediate information and states necessary during the ABC\r
+ optimization like detected loops, exception graphs, auxiliary\r
+ information to improve algorithm performance...\r
+\r
+ \item[JITData (jd)] contains all information about the just-in-time\r
+ compiled method that is currently optimized. This structure\r
+ contains for example the methods code blocks (basic blocks),\r
+ the number of codeblocks, exception table...\r
+\r
+ \item[LoopContainer (lc)] that holds all information about a specific\r
+ loop, like the contained code blocks for example.\r
+\r
+ \item[BasicBlocks (bb)] is a container that contains code of one\r
+ atomic code part of the method (no branch/jump) instruction\r
+ contained. So this means that a basic block is garanteed to\r
+ be executed in one exact sequence.\r
+\r
+ \end{description}\r
+ \r
+ So the most important data structures are now identified and the algorithm\r
+ can be described in pseudocode.\\\r
+\r
+ Preconditions are that the method must be completely parsed, analyzed\r
+ and all information must be correctly stored within the related\r
+ jitdata structure.\r
+\r
+ %% Codesection -----------------------------------------------------------------\r
+\r
+ \begin{verbatim}\r
+\r
+ depthFirstSearchOnMethodBasicBlocks (jd)\r
+ exploreDominators (ld)\r
+ detectLoopsInMethod (jd, ld)\r
+ analyzeDoubleLoopHeaders (ld)\r
+ analyzeNestedLoops (jd, ld)\r
+ for all LoopContainers (lc) {\r
+ optimizeSingleLoop (jd, ld, lc)\r
+ }\r
+ redirectMethodEntryBlock (jd, ld)\r
+\r
+ optimizeSingleLoop (jd, ld, lc) {\r
+ analyzeLoopForArrayAccesses (jd, ld, lc)\r
+ if ( analyzeLoopForOrAndExceptions (jd, ld, lc) == true ) {\r
+ initializeLoopHeaderConstraints (jd, ld, lc)\r
+ removeHeaderBoundChecks (jd, ld, lc)\r
+ for all BasicBlocks (bb) {\r
+ removeBoundChecks (jd, ld, lc)\r
+ }\r
+ insertChecksIntoLoopHeader (jd, ld)\r
+ }\r
+ }\r
+\r
+ initializeLoopHeaderConstraints (jd, ld, lc) {\r
+ if ( last Instruction in header is comparison against constant )\r
+ obtainLeftArgumentDetails (ld, lc)\r
+ } else if ( last Instruction in header is standard comparison ) {\r
+ obtainLeftArgumentDetails (ld, lc)\r
+ obtainRightArgumentDetails (ld, lc)\r
+ } else {\r
+ // unknown constraints for these conditions\r
+ }\r
+ if ( both sides contain changing or both have no index variables ) {\r
+ // no constraints can be proven valid\r
+ } else {\r
+ force index variable to be on left side\r
+ ForceRightSideToBeConstant (ld, lc)\r
+ obtainDynamicConstraints (ld, lc)\r
+ }\r
+ }\r
+\r
+ removeBoundChecks (jd, ld, lc) {\r
+ mergeConstraintsWithNestedLoops (ld, lc)\r
+ for all Instructions of BasicBlock (bb) {\r
+ if ( Instruction is accessing array ) {\r
+ obtainIndexVariableDetails (ld, bb)\r
+ calculateBoundConstraints (jd, ld, bb)\r
+ createLoopHeaderCodeMotionChecks (jd, ld, bb)\r
+ }\r
+ }\r
+ for all successors of BasicBlock (bb) {\r
+ removeBoundChecks (jd, ld, bb)\r
+ }\r
+ }\r
+\r
+ insertChecksIntoLoopHeader (jd, ld) {\r
+ copyCompleteLoop (jd, ld)\r
+ createBasicBlockForCodeMotionChecks (jd, ld)\r
+ insertGotoInstructionsForHandlingNewLoopHeaderBasicBlock (jd, ld)\r
+ adaptExceptionHandlingRegions (jd, ld)\r
+ informParentLoopsAboutNewHeader (jd, ld)\r
+ for all LoopHeaderCodeMotionChecks {\r
+ createInstructionsForConstraintChecks (jd, ld)\r
+ insertInstructionsIntoNewLoopHeader (jd, ld)\r
+ }\r
+ updateJumpTargetsOfLoops (jd, ld)\r
+ updateExceptionHandlingForNewAndDuplicatedLoop (jd, ld)\r
+ }\r
+ \r
+ \end{verbatim}\r
+ \r
+ %% Codesection end -------------------------------------------------------------\r
+\r
+ Postconditions are that the methods jitdata structure holds correct, executable\r
+ and optimized loop code blocks that have no semantic differences to the original form.\newline\r
+ \r
+ Keep in mind that this pseudocode representation is due to the documents scope\r
+ very generic so for more detailed information it's maybe best to directly consult\r
+ source code with its comments because it's hard to go further into details and\r
+ preserving overview.\r
+ \r
+ \subsubsection{Results and Performance of the ABC Optimizer}\r
+ This chapter presents results that were taken from an application that sieves prime numbers.\r
+\r
+ \begin{table}[!ht]\r
+ \caption{Benchmark Results of ABCs in Cacao}\r
+ \label{BenchResABCsInCacao}\r
+ \begin{center}\r
+ \begin{tabular}{lccc}\r
+ Configuration&Light&Normal&Hard\\\r
+ \hline\r
+ Runs&10&5&2\\\r
+ Normal&5,563&55,195&551,355\\\r
+ without ABCs -cb&5,562&55,192&551,414\\\r
+ with -oloop&5,560&55,182&551,350\\\r
+ \hline\r
+ \end{tabular}\r
+ \par\medskip\footnotesize\r
+ The measurements are selected to rise by a factor of 10 every level starting from 50000/10000.\r
+ \end{center}\r
+ \end{table}\r
+ \r
+ From this data with polynomic behavior a fixed per centage of speed acceleration should\r
+ be proven but it's easy to see that there are no differences between these configurations.\r
+ This encourages the assumption that ABCs are always created within the Cacao Java VM whatever\r
+ arguments are passed. The proof for this assumption will be done using assembler code output\r
+ from the machine application.\r
+ \r
+\begin{multicols}{2}\r
+ \r
+ \subsubsection{Possible Improvements for the ABC Optimizer}\r
+ The algorithm is due to its constraints of the virtual machine architecture a very conservative approach\r
+ and has no really complicated mechanisms like speculation/probability calculations or exhaustive\r
+ stack exploration. The implementation is done in C (structured programming technique) so\r
+ it's not really likely to find exhaustive performance improvements in its implementation, but like\r
+ always some possibilities remain that could lead to better results. A very expensive\r
+ part of the ABC optimization algorithm includes the necessarity for loop detection at the beginning. \r
+ This is accomplished by usage of the Lengauer-Tarjan algorithm presented in\r
+ \cite{AFastAlgorithmForFindingDominatorsInAFlowGraph:1979}. \r
+ This algorithm has good properties concerning asymptotic complexity and is especially \r
+ used by mathematicians therefore the preferred solution for very big graphs (\(>\) 30.000 nodes)\r
+ and also presented for example in \r
+ \cite{ModernCompilerImplementationInC:1998}. \r
+ But in practice methods and nested loops are in general much smaller and own only\r
+ tens or hundreds of nodes in their control flow graphs. Cooper, Harvey and Kennedy showed in\r
+ \cite{ASImpleFastDominanceAlgorithm:2001} \r
+ that an other algorithm that has higher asymptotic complexity than the Lengauer-Tarjan performs much \r
+ better for smaller graphs because the complexity issue is only really valid for very big (asymptotic) graphs \r
+ that normaly do not occur in most other programs. The authors claim that their algorithm is for\r
+ conventional method and loop graphs 2,5 times faster than the Lengauer-Tarjan algorithm. If the\r
+ experiments are valid, there exists a possibility to improve the whole ABC optimization algorithm by\r
+ using the simpler algorithm of Cooper, Harvey and Kennedy. The algorithm and especially its proof \r
+ are also much simpler to understand and therefore to implement. \par\r
+ One fact must be kept in mind. To really allow drastic improvements in algorithm performance \r
+ the complete Cacao Java Virtual Machine architecture must be reconsidered and this demanding\r
+ task can only be done in an iterative way over a long period of time.\r
+\r
+%% Conclusion -----------------------------------------------------------------\r
+\r
+\section{Conclusion}\r
+The document presented general information about the topic of ABCs overall and also preserved\r
+historical aspects that show up the evolvement. The main part dealed with ABC optimization in the\r
+Cacao Java VM. All things considered the task of successful array bounds check optimization is a complicated and\r
+challenging quest with many facets and relations. A lot of details must be considered in design and\r
+also implementation phases. But it is definitly a problem that must be solved satisfactorily for all\r
+future developments in computer languages and informatics in general because it is simply not bearable\r
+to waste computing time for any superfluous calculations for any reasons. The ABC optimization within loops in\r
+the Cacao Java Virtual Machine is very conservative due to its environment and also for historical reasons.\r
+So we are definitly far away from the perfect solution but the algorithm proved to be useful in some cases\r
+(programs) and this is the first step towards evolvement of a practical really satisfying result.\r
+\r
+%% Bibliography ---------------------------------------------------------------\r
+\r
+\bibliographystyle{alpha}\r
+\bibliography{bibfile}\r
+\r
+\end{multicols}\r
+\r
+\end{document}\r
+\r
+%% End ------------------------------------------------------------------------\r