1 \documentclass[11pt,a4paper]{article}
\r
4 \usepackage{graphicx}
\r
6 %\usepackage{ngerman}
\r
7 \usepackage[latin1]{inputenc}
\r
8 \usepackage{textcomp}
\r
9 \usepackage{multicol}
\r
13 \title{Optimization of Array Bounds Checking in the Context of Cacao Java Virtual Machine}
\r
16 \authN{Bräutigam Klaus}\\
\r
18 \authI{Institute of Computer Languages}\\
\r
19 \authU{Technical University of Vienna}\\
\r
20 \email{e0825965@student.tuwien.ac.at}\\\\
\r
21 \authN{Bachelor Thesis BSc. in Informatics - Software and Information
\r
22 Engineering 033 534 }\\
\r
23 \authN{supervised by Univ.Prof. Dr. Andreas Krall}
\r
28 \begin{multicols}{2}
\r
30 %% Abstract -------------------------------------------------------------------
\r
33 This document deals with array bounds check optimizations in a very general and historical
\r
34 way but also strongly relates to the details of implementation within the
\r
35 Cacao Java Virtual Machine that was developed by the Institute of Computer Languages
\r
36 of the Technical University of Vienna. Good and sustainable problem solutions
\r
37 can only be found if previous ideas are studied to improve own considerations
\r
38 and even more important if failures that happened in the past are not repeated.
\r
39 So the first main part is dedicated to a study of previous
\r
40 approaches concerning array bounds check optimization and their environments
\r
41 to obtain a solid knowledge base for the general problem.
\r
42 The second part describes details and relations important
\r
43 to understand the technique choosen for the Cacao Java VM. So the document should
\r
44 also be interesting for people only searching for a comprehensive overview of
\r
45 the abstract array bounds check optimization problem.
\r
49 %% Introduction ---------------------------------------------------------------
\r
51 \section{Introduction}
\r
52 Array bounds checking and the optimizations necessary for efficient
\r
53 realization were exhaustively researched and discussed in the past four
\r
54 or five decades. The rise of high level computer programming languages led
\r
55 directly to several improvements of the background and environments of them.
\r
56 For example array accesses have always been common malfunction sources and
\r
57 security holes. If an array access index is not within the bounds of the array,
\r
58 memory space is accessed illegally. This can lead to dramatic problems.
\r
59 Therefore computer language architects recognized very fast that
\r
60 these array accesses must be validated before execution to increase the safety and
\r
61 stability of applications. So in most cases array access indices are compared
\r
62 to the array bounds before executing them. But this introduces some
\r
63 kind of overhead during compilation and or execution. Due to the fact that
\r
64 processed information amount increases much faster than our technological
\r
65 calculation possibilities any type of overhead must be reduced or completely
\r
66 removed from applications in any thinkable way. As will be seen in the
\r
67 following chapters the approaches and techniques for reducing overhead of
\r
68 array bounds checks can be very different. To completely understand the
\r
69 problem of array bounds checks optimization (short ABCs) all related aspects
\r
70 must be taken into account and all topics must also be seen from a more
\r
73 %% General aspects of ABCs ----------------------------------------------------
\r
75 \section{A Generic Reflection of ABCs\label{sec:structure}}
\r
76 This chapter provides a solid overview for ABC optimizations. The
\r
77 most common forms of implementation are explained and the neccesity for
\r
78 improvements of ABC optimizations are motivated. Furthermore some background
\r
79 information will be provided to get a feeling for the relations of the topic.
\r
80 But it must be kept in mind that an exhaustive explanation of the topic is
\r
81 far out of the documents scope so we will only touch on the general ideas.
\r
83 \subsection{Implementation of Array Bounds Checking}
\r
84 Before designing new optimization techniques for ABCs it's important to know
\r
85 exactly how these checks are implemented. There are many ways to realize
\r
86 ABCs. Elaborateness and performance are important to keep in mind when
\r
87 choosing a technique and also the insertion method is interesting because
\r
88 it is greatly connected to the language architecture. Should tests
\r
89 be integrated automatically without developers knowledge or should
\r
90 ABCs only be integrated selectively by programmers or quality assurance.
\r
92 \subsubsection{Compare Instruction Strategy}
\r
93 Most implementations are done using compare instructions of the
\r
94 processor and a conditional jump to react to failed comparisons. For this
\r
95 technique the array inception and length are obtained and compared to
\r
96 the index. For a more detailed description and good examples please
\r
98 \cite{AComparisonOfArrayBoundsCheckingOnSuperscalarAndVLIWArchitectures:2002}.
\r
99 For some languages like Java that assure that arrays only start
\r
100 with index null only one check is necessary to check lower and
\r
101 upper bounds using an unsigned compare instruction. This assures that
\r
102 indices are always within the specified array bounds and that exceptions
\r
103 or other reactions are done on the right point of program execution.
\r
105 \subsubsection{Hardware Implementation}
\r
106 It is also possible to implement ABCs directly and completely in
\r
107 hardware. This means some more effort in processor architecture but
\r
108 the resulting run-time overhead of ABC execution can be reduced
\r
109 practically to 0.0\%. There have been several approaches from Sun
\r
110 Microsystems, Inc. to design a new generation of processors that are
\r
112 \cite{ProcessorWithAcceleratedArrayAcessBoundsChecking:2000}
\r
113 for example describes a processor that uses two hardware comparators and
\r
114 a disjunction to inform the instruction execution unit about the validness
\r
115 of the array access instruction.
\r
116 \cite{ApparatusAndMethodForArrayBoundsCheckingWithAShadowFile:2001} is an
\r
117 even more improved way. In addition it uses a shadow register to further
\r
118 boost and integrate ABCs into the general instruction flow of the processor.
\r
120 \subsubsection{Other Approaches}
\r
121 Other more exotic approaches of ABC implementation also exist.
\r
122 An interesting one can be found in
\r
123 \cite{ArrayBoundsCheckEliminationUtilizingAPageProtectionMechanism:2003}
\r
124 where the processor page protection mechanism is used to identify ABCs and
\r
125 even improve their effectiveness by eliminating unnecessary ones.
\r
127 \subsection{The Real Overhead Introduced by ABCs}
\r
128 Before crying out for costly and time-consuming optimizations for a specific
\r
129 mechanism we will analyze the overhead caused by this. Is it
\r
130 really worthwhile studying and researching the topic? This is in many cases a difficult
\r
131 question because future trends must also be considered to get a clear result.
\r
132 Concerning ABCs it depends on the ABC implementation, the language
\r
133 design and architecture and on the application itself.
\r
135 \subsubsection{Overhead Reduction through Hardware}
\r
136 For decades information processing machines were mainly designed as single
\r
137 processor systems that didn't really consider modern programming techniques
\r
138 and patterns. Nearly all higher-level improvements that were reached in
\r
139 computer programming were done in an abstract way using software
\r
140 (virtual machines, multiprocessing, null pointer checks, ABCs\ldots). In the past
\r
141 there were some approaches that tried to completely integrate ABCs in hardware, but
\r
142 these were refused very fast. Nevertheless we can
\r
143 view something in connection to ABCs by comparing benchmarks of the same
\r
144 application on the same virtual machines running on different processor
\r
145 generations. For example take overhead benchmark results from
\r
146 \cite{AVerifiableControlFlowAwareConstraintAnalyzerForBoundsCheckElimination:2009}
\r
147 and compare them to that of
\r
148 \cite{SafeMultiphaseBondsCheckEliminationInJava:2010} and it's easy to
\r
149 recognize that the same application on the same virtual machine with the same
\r
150 configuration produces a different overhead. In this case the reason can be
\r
151 found very fast. The first benchmark was calculated on a single processor
\r
152 system where the second uses a multicore processor. This new processor
\r
153 generation is able to the reduce the overhead of ABCs without changing the
\r
154 technique or implementation of ABCs. So the question if ABC optimization pays
\r
155 not only depends on the software but also on the hardware it is running on.
\r
156 Newer processor generations will be able to furthermore reduce the overhead
\r
157 raised by ABCs implemented as processor instructions but it is very unlikely
\r
158 that the approaches of implementing ABCs completely in hardware will be
\r
159 realized in the near future.
\r
161 \subsubsection{Overhead in Different Types of Applications}
\r
162 To reduce overhead of ABCs the application where the optimization operates on
\r
163 must use arrays frequently otherwise the overhead produced is insignificant
\r
164 and not really measureable\cite{SafeMultiphaseBondsCheckEliminationInJava:2010}.
\r
165 An important question is if these optimizations really pay for many
\r
166 applications. For example Java is mainly used due to its simple and clear
\r
167 object oriented and abstract nature. Most developers use Java only in
\r
168 conjunction with powerful frameworks like Spring and many of them only for
\r
169 web-based applications. The resulting ABC overhead in these strong object
\r
170 oriented applications is practically not measurable even if many arrays
\r
171 are used. The overhead is mainly caused by nested method calls (invokes) which
\r
172 are also very difficult to reduce. ABC optimizations only really pay on
\r
173 applications with simple object relations that make excessive use of arrays
\r
174 and matrix calculations like scientific research and development. But for
\r
175 these cases it is maybe better and easier to use extremely optimized data
\r
176 structures that are recognized by machines and executed in a special way.
\r
177 \cite{EliminationOfJavaArrayBoundsChecksInThePresenceOfIndirection:2002}
\r
178 mentions some libraries and also explain some new special data structures
\r
179 that follow this approach.
\r
181 \subsubsection{Necessity for ABC Optimization}
\r
182 It's difficult to argue pro/contra to ABC optimization and the approach that
\r
183 should be taken to implement it because this strongly depends on the language
\r
184 architecture and many other factors discussed here. But in general it should
\r
185 be the architect's and implementor's target to evolve the language to the most
\r
186 effective, efficient and clear state. Even if hardware is capable of executing
\r
187 ABCs without any resulting overhead it is better to eliminate unnecessary code
\r
188 to improve low-level code quality cleanness and clearance. Program code at
\r
189 any abstraction layer should not contain any superfluous instructions or code
\r
190 parts even if they have no impact on execution time or security. Moreover some
\r
191 slight processor architectures (embedded systems) maybe will never be capable
\r
192 of implementing ABC in hardware!
\r
194 \subsection{Important Relations to ABC Optimization}
\r
195 This chapter explains some important basic conditions that should
\r
196 be noticed before starting to think about ABC optimization techniques but
\r
197 some of them are valid for nearly all kinds of optimizations. At first the array
\r
198 access index types are important to be studied in detail to identify possible
\r
199 limitations raised by them. The next important fact is that optimizations
\r
200 only pay if their benefits outperform the generated overhead. It should
\r
201 also be kept in mind that the elaborateness and form of implementation
\r
202 dramatically influences the effectiveness of the optimization.
\r
204 \subsubsection{Array Access Index Types}
\r
205 The array access index type must also be taken into consideration for
\r
206 several reasons. Most computer programming languages (from now on referred
\r
207 to as languages) of even all programming paradigms accept the type integer
\r
208 as array index type. The word length of this type varies in most of the cases and
\r
209 furthermore signs of integral numerical types can also be treated differently.
\r
210 In C/C++ integer types exist as signed or unsigned and the length is not specified
\r
211 and depends mainly on the instruction length of the underlying hardware.
\r
212 On the other side Java declares its integer type explicitly signed and
\r
213 32bit wide (range from -2.147.483.648 to 2.147.483.647 in 2's complement).
\r
214 The Java language furthermore declares that overflowing or underflowing values
\r
215 are wrapped around which is not desired for array access index types because
\r
216 there are hardly reasonable applications that access arrays using overflowing index
\r
217 expressions. \cite{JavaLanguageSpecificationv3.0:2005}
\r
218 This source of malfunctions and security leaks must also be tackled precisely
\r
219 while designing optimization mechanisms.
\r
221 \subsubsection{Benefits to Execution Time Ratio}
\r
222 The ABC optimization benefits must always outperform the execution
\r
223 time necessary to calculate them. It is not acceptable that an optimization is
\r
224 implemented that saves execution time for some applications and slows others
\r
225 down. There is nothing to gain.
\r
226 \cite{AComprehensiveApproachToArrayBoundsCheckEliminationForJava:2002}
\r
227 for example shows that activated ABC optimization together with others
\r
228 slows down some applications of the used benchmark. This is extremely important
\r
229 to consider for optimization techniques that work during Just-In-Time
\r
230 compilation and execution where expensive, elaborate and in the first
\r
231 instance disturbing processes must be omitted as far as possible. As
\r
232 mentioned in the next chapter this is maybe best accomplished by splitting
\r
233 optimization techniques into different phases to locate them where they are
\r
234 unproblematic to run.
\r
236 \subsubsection{Intermediate Representation}
\r
237 Which intermediate representation should the optimization operate on? This
\r
238 has to be considered too. Most compilers fulfill their
\r
239 compilation tasks in several phases and for this reason different intermediate
\r
240 representations of the programs form. These depend completely on the compiler
\r
241 architecture and the language itself especially on the programming paradigm
\r
242 \cite{ModernCompilerImplementationInC:1998}.
\r
243 But not only during the compilation process do new program representations
\r
244 arise. For interpreted languages some kind of IR is necessary to transport
\r
245 code between compiler and interpreter. For example the Java Byte Code is such
\r
246 an IR that is delivered to the Java Virtual Machine and executed. If the
\r
247 interpreter wants to apply some optimizations to the code it is forced to
\r
248 operate on the intermediate representation delivered to him or to do an
\r
249 extremely expensive transformation into an other desired form which slows down
\r
250 execution even more. Java Byte Code for example is not very adequate to serve
\r
251 as IR because it suffers from many problems concerning size, security\ldots
\r
252 and is not really practical for exhaustive optimizations. Finally it should
\r
253 not be forgotten that also high level code (Java, C++) can be a very
\r
254 suitable representation for obtaining some optimization constraints!\par
\r
256 A lot of effort has been spent in using Java Byte Code annotations for
\r
257 integrating optimization information into conventional class files that are
\r
258 ignored by most compilers and allowing some to boost execution performance.
\r
259 But these approaches have also a lot of disadvantages because the annotations
\r
260 are abused, the usage is not standardized, further increased size and process
\r
262 \cite{AFrameworkForOptimizingJavaUsingAttributes:2000} describes an example
\r
263 framework that follows the previously discussed one. Also
\r
264 \cite{VerifiableAnnotationsForEmbeddedJavaEnvironments:2005} can be an
\r
265 interesting source for the beginning because it focuses especially on
\r
266 three important optimizations and on low cost checks. Furthermore
\r
267 \cite{VerifiableRangeAnalysisAnnotationsForArrayBoundsCheckElimination:2007}
\r
268 and \cite{SafeBoundsCheckAnnotations:2008} deal with annotations specifically
\r
269 for ABC optimizations. In the historic review many more approaches will be
\r
270 mentioned that are based on this construct.\par
\r
272 Another way to deal with the problem of IR is to use a completely new approach
\r
273 and in the past several forms based on Static Single Assignment (SSA) were
\r
274 researched and proven to be the commonly used ones in the future. The form is based on
\r
275 the constraint that every variable can have only one point of assignment and
\r
276 not more or less. But many more constraints and ideas were integrated into
\r
277 different modern SSA forms that were adapted and optimized for several
\r
278 reasons and techniques. For getting a general idea of SSA forms consult
\r
279 \cite{AVerifiableSSAProgramRepresentationForAgressiveCompilerOptimization:2006}.
\r
280 \cite{SafeTSAATypeSafeAndReferentiallySecureMobileCodeRepresentationBasedOnStaticSingleAssignmentForm:2001}
\r
281 explains SafeTSA which is a type safe and size optimized SSA form and used in
\r
282 \cite{UsingTheSafeTSARepresentationToBoostThePerformanceOfAnExistingJavaVirtualMachine:2002}
\r
283 to optimize execution performance of a Java Virtual Machine. Another approach
\r
285 \cite{AVerifiableSSAProgramRepresentationForAgressiveCompilerOptimization:2006}
\r
286 that focuses on clean and efficient embedding of range checks into other aggressive
\r
287 optimization phases.
\r
288 \cite{ABCDEliminatingArrayBoundsChecksOnDemand:2000} uses a new representation
\r
289 by introducing new constraints for dealing efficiently with program branches
\r
290 (if-else, while\ldots) called Extended SSA form (eSSA).
\r
291 Another SSA form named HSSA is introduced in conjunction with a new
\r
292 optimization algorithm ABCE in
\r
293 \cite{ArrayBoundsCheckEliminationForJavaBasedOnSparseRepresentation:2009}.
\r
295 \cite{TowardsArrayBoundCheckEliminationInJavaTMVirtualMachineLanguage:1999}
\r
296 even introduces a new Java Byte Code representation called JVMLa that is also
\r
297 suitable for several optimization tasks. Nearly all of these systems are not
\r
298 commonly used and are therefore not standardized but show how future
\r
299 intermediate representations could be designed or influenced by different
\r
302 \subsubsection{Positioning of Optimizations}
\r
303 Being generally aware of all previously discussed circumstances another very
\r
304 important aspect must be recognized regarding positioning of optimizations.
\r
305 The main differentiation can be seen between compiled and interpreted
\r
308 Compiled languages are restricted to optimizations that can be calculated at
\r
309 compile time (mainly static optimizations). This imposes really big
\r
310 limitations on the possible and suitable techniques but expensive
\r
311 computations do not therefore really mean big problems. High compile
\r
312 time is widely accepted with special optimization level
\r
313 parameters (for example -O3 for most C compilers). For development and
\r
314 debugging the optimizations are not really necessary so they are omitted and
\r
315 for deployment one optimized version is created where expensive and long
\r
316 running optimizations are not seen problematic.\par
\r
318 For interpreted language architectures new aspects must be taken into account.
\r
319 On the one hand the possibility for powerful highly dynamic optimizations
\r
320 arise that use information only available at run time but on the other hand
\r
321 it is unacceptable to compute expensive optimizations because it would slow
\r
322 down execution dramatically. So the most successful approach would probably be
\r
323 to split up dynamic optimizations to compile- and run time parts.
\r
324 Expensive precalculations are delegated to the compiler and fast and effective
\r
325 optimizations that use information obtained at compile- and run time are
\r
326 calculated by the interpreter (virtual machine). This approach was
\r
327 exhaustively researched by most recently published ABC optimization
\r
328 techniques. At compile time some constraints and equalities/unequalities are
\r
329 inferred from high level code and intermediate representations through
\r
330 theorem proving and even speculation and special optimizing virtual machines
\r
331 interpret these constraints and annotations to calculate very effective
\r
332 optimizations efficiently.
\r
333 \cite{SafeMultiphaseBondsCheckEliminationInJava:2010} and all previous
\r
334 versions of the same authors
\r
335 \cite{AVerifiableControlFlowAwareConstraintAnalyzerForBoundsCheckElimination:2009},
\r
336 \cite{SpeculativeImprovementsToVerfifiableBoundsCheckElimination:2008}\ldots
\r
337 are recommended for understanding the complete idea. Also most previously
\r
338 mentioned approaches that use annotations in any form (Soot framework)
\r
339 basically follow it.\par
\r
341 One must also consider that optimizations must fit into the compilation and
\r
342 interpretation process. Optimizations should not be an isolated process that
\r
343 is quickly added to the compiler/interpreter architecture at the end of the
\r
344 design process. Moreover optimizations and related techniques and
\r
345 especially used intermediate representations should be taken into
\r
346 account very early in language architecture design to get clean, clear,
\r
347 effective and efficient compilation and interpretation processes overall
\r
348 \cite{ModernCompilerImplementationInC:1998}.
\r
349 Nearly all mechanisms researched in the context of Java and the Java Virtual
\r
350 Machines suffer from the drawback that many optimizations are added to the
\r
351 constructs too late and are therefore not really suitable for the complete
\r
354 \subsection{Types of ABCs}
\r
355 ABCs can always be found before array access instructions. In Java the
\r
356 instructions are provided for every primitive type as xaload and xastore where
\r
357 x represents the array type (integer, float, signed byte, character\ldots).
\r
358 But not only the array access instructions themselves are important for ABCs but
\r
359 also the context were they appear. Therefore some placements of
\r
360 ABCs are of special significance. These are now described a little bit in
\r
361 detail to understand which forms of interest can arise. The first three
\r
362 classify if ABCs can be proven statically, dynamic or never as really
\r
363 necessary during execution for keeping to the Java conventions. The second
\r
364 three deal with the code structure where these ABCs occur. A good explanation
\r
365 concerning raw classification of ABCs in general can be found in
\r
366 \cite{AFreshLookAtOptimizingArrayBoundChecking:1990} and more recently by the
\r
368 \cite{OptimizingArrayBoundChecksUsingFlowAnalysis:1993}.
\r
370 \subsubsection{Redundant ABCs}
\r
371 Some ABCs that are inserted by primitive insertion algorithms are completely
\r
372 redundant and can be statically proven unnecessary at all. For example three
\r
373 sequential array accesses a[i] and a[i+1] and a[i+2] lead to three inserted
\r
374 ABCs but the first two of them are completely redundant because the third
\r
375 dominates them. The overall number of these checks is very low and the
\r
376 execution time amount occupied by them even lower. So there
\r
377 are hardly reasonable optimizations possible through elimination of them but
\r
378 they are very easy and cheap to compute and remove so it should be done for
\r
379 reasons of cleaness and completeness. These are only important to be
\r
380 removed if they occur within loops.
\r
382 \subsubsection{Partial Redundant ABCs}
\r
383 Some ABCs are related through different program branches that arise because of
\r
384 if-else or while statements that lead to ABCs that are partially redundant.
\r
385 Sometimes they can be simplified or partially eliminated. These ABCs also arise
\r
386 relatively often in programs and therefore optimization through
\r
387 elimination or code motion can really pay. For a detailed explanation
\r
388 on this topic please consult the good considerations in
\r
389 \cite{GlobalOptimizationBySuppressionOfPartialRedundancies:1979}.
\r
391 \subsubsection{Necessary ABCs}
\r
392 There are ABCs in existence that exactly represent what they are invented for.
\r
393 They are placed in front of array access instructions that can't be proven to
\r
394 be valid so there are ABCs that can not be eliminated but sometimes simplified
\r
395 or moved using code motion procedures. A good and simple example for this are
\r
396 array access indices read from command line or from an external data source
\r
397 like files or a database!
\r
398 They occur in dependence on the application from seldom to very often. Whether an ABC
\r
399 can be proven superfluous or necessary depends stronly on the power of the
\r
400 used optimization technique. Sometimes array accesses that deal with
\r
401 indirection or interprocedural array access program structures can't be proven
\r
402 unnecessary although they are. This will also be explained in the chapter about global ABCs.
\r
403 Advanced techniques that are capable of proving these exotic kinds of
\r
404 redundant or partial redundant ABCs can sometimes eliminate or simplify big
\r
405 amounts of elaborate and unnecessary ABCs.
\r
407 \subsubsection{Local ABCs}
\r
408 Local ABCs are found in straight program structures without branches, loops
\r
409 and other conditional statements. These code parts are executed
\r
410 sequentially without possible interruptions and are generally very simple and
\r
411 short. Local ABCs are in most cases redundant and can be completely eliminated
\r
412 but don't have much impact on program execution time.
\r
414 \subsubsection{Global ABCs}
\r
415 Global ABCs are much more difficult. We must at first distinguish between
\r
416 interprocedural and intraprocedural global ABCs. If several method invokes are
\r
417 integrated into one array access it is hard to automatically analyze the
\r
418 structure and obtain constraints especially in languages like Java with
\r
419 dynamic class loading. These are called interprocedural. Only very few
\r
420 techniques have the power to prove such elaborate ABCs unnecessary. If
\r
421 ABCs are only analyzed in the context of one method without
\r
422 additional method invokes they are called intraprocedural. Also array accesses
\r
423 where indices are again placed in arrays (indirection) can be very challenging
\r
424 to prove redudant. A technique that pays special attention to ABCs in the
\r
425 presence of indirection is
\r
426 \cite{EliminationOfJavaArrayBoundsChecksInThePresenceOfIndirection:2002} that
\r
427 gives a solid insight into the difficulties raised by this topic. Due to the
\r
428 frequent occurence in combination with matrix calculations and other costly
\r
429 procedures optimizations of global ABCs can reduce execution time of
\r
432 \subsubsection{ABCs in Loops}
\r
433 ABCs that are placed in loops deserve special attention because here
\r
434 the biggest runtime overheads of all possible ABC types arise. If unnecessary ABCs
\r
435 are placed in loops they are often repeated and result in huge
\r
436 reductions of execution time. Sometimes Local and Global ABCs are found in
\r
437 loops and can be eliminated or simplified and sometimes ABCs can even be moved
\r
438 out of the loop using code motion techniques. This allows dramatic optimizations.
\r
439 One problem that arises is that some optimizations change the program behavior
\r
440 and especially the exception throw conventions of the Java language can
\r
441 introduce a big additional effort. All this must be taken into account. ABC loop
\r
442 optimization is so important and powerful that some techniques reach good
\r
443 improvements only by analyzing some common loop structures/patterns. For
\r
445 \cite{ArrayBoundsCheckEliminationInTheContextOfDeoptimization:2009} describes
\r
446 the mechanisms implemented in the HotSpot Java Virtual Machine that is based
\r
449 %% Historic review of ABCs ----------------------------------------------------
\r
451 \section{A Historic Review of ABCs and Related Optimization}
\r
452 The presentation of history concerning ABC optimizations will be done in a
\r
453 chronological order to underline the whole evolution. To get a quick insight into
\r
454 the different documents some kind of classification will always be provided that
\r
455 keeps as far as possible to parameters explained in the past chapter so that it
\r
456 should be possible to quickly understand the raw idea described.
\r
458 \subsection{The Early Beginning}
\r
459 The first approaches for dealing with ABC optimization all operate on high level
\r
460 code representation (no additional intermediate representation necessary) and
\r
461 use data flow graphs to represent program parts in an abstract manner. The real
\r
462 ideas of ABCs and optimizations in this direction can't be explicitly
\r
463 attributed to one person.\par
\r
465 In the author's opinion William H. Harrison was the
\r
466 first that directly covered the problem. In May 1977 he describes in
\r
467 \cite{CompilerAnalysisOfTheValueRangesForVariables:1977} the more
\r
468 general problem of value range analysis of variables. Array bound checks are a
\r
469 subproblem of this because array indices are only numerical variables that must
\r
470 keep to value ranges (0..arraylength). He introduced in his paper an algorithm
\r
471 called Range Propagation that obtains variable value ranges starting
\r
472 with [-Inf:+Inf] by analyzing the related programing structures for each
\r
473 variable [0..100] for example. Another algorithm Range Analysis even
\r
474 covers the problem of induction with variables and therefore mainly concerns
\r
475 loops. The algorithm builds through derivations linear equations that are then
\r
476 solved based on a data flow analysis. He also introduced the idea of loop
\r
477 counting, general program diagnostic and program termination that can identify
\r
478 several mistakes and malfunctions to the programmer. This can be seen as the
\r
479 real origin of the idea how to insert and optimize ABCs effectively. His
\r
480 algorithms are inserted into the conventional compilation process and he even
\r
481 considered overall execution time of compilation in his algorithms.\par
\r
483 In 1977 nearly at the same time Norihisa Suzuki and Kiyoshi Ishihata composed
\r
484 \cite{ImplementationOfAnArrayBoundChecker:1977} where they explicitly insert
\r
485 ABCs in high level code in the form of assert statements in front of
\r
486 array accesses to provide comfortable error information. This is done by
\r
487 deducing formulas from the program and using a theorem prover to decide if an
\r
488 ABC (assertion) is necessary for an access instruction or not. Because only static
\r
489 analysis is done no real powerful optimizations can be applied. Especially the
\r
490 intraprocedural form limits the general success but most local and even some
\r
491 global ABCs are tackled. All things considered this paper is the first really
\r
492 modern approach for dealing with theorem proving of ABCs and definitly a good
\r
493 starting point for studying the topic in depth.\par
\r
495 In 1979 E. Morel and C. Renvoise published
\r
496 \cite{GlobalOptimizationBySuppressionOfPartialRedundancies:1979} that
\r
497 introduced the idea of formal consideration of partial redundancies that are
\r
498 also necessary for more elaborate analyzing of Global ABCs. For really
\r
499 understanding partial redundant ABCs this paper is a must.\par
\r
501 In 1982 Victoria Markstein, John Cocke and Peter Markstein presented a new
\r
502 approach for optimizing ABCs in
\r
503 \cite{OptimizationOfRangeChecking:1982}. For the first time the idea is
\r
504 documented that code parts with high execution frequencies (loops) deserve
\r
505 special attention (analysis effort). They introduced usage of subexpression
\r
506 elimination and code motion for optimizing ABCs. The technique operates on the
\r
507 code by deriving Strongly Connected Regions SCRs using USE and DEF
\r
508 functions. These were also used previously by
\r
509 \cite{CompilerAnalysisOfTheValueRangesForVariables:1977} albeit the
\r
510 explanations were much more complicated and detailed. The technique is based on
\r
511 data flow analysis and meant to be integrated into a multiphase compiler after
\r
512 all previous optimizations were accomplished. The same authors even obtained an
\r
513 USPatent in 1987 for their ABC optimization technique
\r
514 \cite{OptimizationOfRangeChecking:1987}.\par
\r
516 In 1990 Rajiv Gupta published results of his researches on ABC
\r
518 \cite{AFreshLookAtOptimizingArrayBoundChecking:1990}. He used a
\r
519 so called Minimal Control Flow Graph MCFG to implement some new
\r
520 revolutionary approaches like elimination of redudant ABCs, combination of
\r
521 partial redundant ABCs and propagation of ABCs out of loops (code motion). It
\r
522 exactly describes how the MCFG is built from source code and also the
\r
523 algorithms to perform elimination, propagation and combination of ABCs. This
\r
524 gives very precise guidance for first experiments with ABC optimization
\r
525 algorithms and abstract representations.
\r
529 In 1992 Jonathan M. Asuru presented in
\r
530 \cite{OptimizationOfArraySubscriptRangeChecks:1992} an approach that started dealing
\r
531 more precisely with Global ABCs that definitely allow more important
\r
532 optimizations. He claimed that previous techniques where not able to deal with
\r
533 elaborating loops (nested loops, nonconstant induction variables\ldots) and he
\r
534 was right. He described two new mechanisms that were able to deal with much
\r
535 more complicated program constructs than all others before.
\r
536 \begin{description}
\r
537 \item[Inner-Loop Guard Elimination ILGE] deals generally with conventional
\r
538 known local and global ABC types.
\r
540 \item[Conservative Expression Substitution CES] deals with nested loops and
\r
541 partial order for related range checks. Also important with CES is Local Range
\r
542 Check Optimization where the greatest lower bound and least upper bounds are
\r
543 used to eliminate range checks that are redundant or partially redundant.
\r
544 Furthermore Loop Relative Global Range Check Optimization builds up a system
\r
545 of structures to improve nested loops checking and finally Range Check
\r
546 Propagation uses range check motion to move ABCs to more efficient program
\r
548 \end{description} \par
\r
550 In 1993 Rajiv Gupta again published his new insights into the topic in
\r
551 \cite{OptimizingArrayBoundChecksUsingFlowAnalysis:1993}. Many refinements to
\r
552 his previous ideas were added and the knowledge about the general problem of ABC
\r
553 optimization increased. He assumed that readers are familiar with his last work
\r
554 \cite{AFreshLookAtOptimizingArrayBoundChecking:1990}.
\r
555 \begin{description}
\r
556 \item[Local Optimization] deals with Identical Checks, Subsumed Checks with
\r
557 Identical Bounds and Subsumed Checks with Identical Subscript Expressions
\r
559 \item[Global Optimizations] are implemented using at first an algorithm for
\r
560 modifying checks to create Redundant Checks and secondly an algorithm for the
\r
561 reelimination of Redundant Checks. This important idea of first inserting new
\r
562 checks and eliminating them is brilliant and will be used by further
\r
563 techniques in the future. It allows the identification and resolution of difficult
\r
566 \item[Propagation of Checks Out of Loops] is done using a refined algorithm
\r
568 \end{description} \par
\r
570 \subsection{The Chaotic Middle Ages}
\r
571 Until 1995 (ca. the first two decades) only a very small group of visioners
\r
572 researched ABCs and related optimization techniques. All these approaches
\r
573 operated for performance and technological reasons directly on high level
\r
574 program code like J. Welsh for Pascal
\r
575 \cite{EconomicRangeChecksInPascal:1978} and some others for C and Fortran.
\r
576 All things considered all previous approaches were based on the same general
\r
577 idea of building flow graphs and applying data flow analysis. But since 1995
\r
578 many scientists started researching ABC optimization and therefore a chaotic
\r
579 amount of new techniques and ideas appeared. From 1995 on most approaches are
\r
580 based on SSA program intermediate representations and algorithms suitable for
\r
581 processing optimizations on it but also some exotic and completely new ideas
\r
582 were invented and researched. \par
\r
584 In 1995 Priyadarshan Kolte and Michael Wolfe published
\r
585 \cite{EliminationOfRedundantArraySubscriptRangeChecks:1995} where they explain
\r
586 their ABC optimization techniques in the Nascent Fortran compiler. They use an
\r
587 SSA intermediate form for induction analysis and furthermore a Check
\r
588 Implication Graph CIG for global ABCs. They tried different approaches but also
\r
589 mentioned that a combination of them would deliver the best results. Also some
\r
590 new and more detailed information about Partial Redundant Elimination PRE is
\r
591 provided that directly covers optimization of partial redundant ABCs.
\r
592 \begin{description}
\r
593 \item[No-Insertion NI] means optimization with check elimination without adding
\r
594 any new checks that are simpler.
\r
595 \item[Safe-Earliest SE] achieves a little bit more eliminations than LNI and NI
\r
596 but not much and is the first of two PRE placement checks.
\r
597 \item[Latest-Not-Isolated LNI] is only a second PRE placement check.
\r
598 \item[Check Strengthening CS] is already described in
\r
599 \cite{OptimizingArrayBoundChecksUsingFlowAnalysis:1993}.
\r
600 \item[Preheader-Insertion LI] with only loop invariant checks.
\r
601 \item[Preheader-Insertion Loop Limit Substitution LLS] is the clear winner if
\r
602 compile time and elimination number are taken into account.
\r
603 \end{description} \par
\r
604 This paper proves in the end that optimized check insertion techniques are much
\r
605 better than only elimination techniques. \par
\r
607 In 1998 Hongwei Xi and Frank Pfenning dealt in
\r
608 \cite{EliminatingArrayBoundCheckingThroughDependentTypes:1998} with ABC
\r
609 optimization in ML. So this time ABC optimization also gets very interesting
\r
610 for other programming paradigms like functional programming.\par
\r
611 But also S. P. Midkiff and J. E. Moreira and M. Snir published a new work
\r
612 concerning this topic.
\r
613 \cite{OptimizingArrayReferenceCheckingInJavaPrograms:1998} is a
\r
614 very voluminous paper that directly covers the problem of ABC optimization for
\r
615 the Java language. At the beginning a very formal theory about the topic is
\r
616 presented and then all optimization techniques are explained in detail and
\r
617 always presented with good examples on high level Java source code. This work
\r
618 gives really comprehensive insight into the topic in relation to the Java
\r
622 \cite{TowardsArrayBoundCheckEliminationInJavaTMVirtualMachineLanguage:1999}
\r
623 from Xi and Xia introduced a new idea. They did not agree with the standard
\r
624 Java Virtual Machine Language JVML (Java Byte Code) and wanted to cope with
\r
625 ABCs through a new system of dependent types to improve safeness of the
\r
626 language and allow at the same time successful optimizations. So they
\r
627 architectured a new intermediate code representation called JVMLa that should
\r
628 be created by the Java Compiler and that should only contain ABCs that can't be
\r
629 proven unnecessary at compile-time. This approach is questionable because the
\r
630 Java Compiler output (JVML) is standardized and wide spread and a very
\r
631 complicated ABC optimization part (at compile time) is assumed to function
\r
632 already properly in the Java Compiler. This is very inpractical but a theoretic
\r
633 idea that may by taken into account by other approaches or languages in the
\r
636 In 2000 the previously mentioned USPatent
\r
637 \cite{ProcessorWithAcceleratedArrayAcessBoundsChecking:2000} was registered and
\r
638 Bodik, Gupta and Sarkar presented a new algorithm for optimized ABC execution in
\r
639 \cite{ABCDEliminatingArrayBoundsChecksOnDemand:2000}. This algorithm is an
\r
640 exotic one because it uses a transformation process at run time that converts
\r
641 from Java Byte Code to a slightly extended SSA form (eSSA) which is normaly a
\r
642 very expensive procedure and applies a constraint analyzer and solver on it that
\r
643 really achieves performance gain. The used constraint analyzer adds
\r
644 constraints to the eSSA instructions and builds up an inequality graph IG
\r
645 where only a shortest path finder (constraint solver) is applied to optimize
\r
646 ABCs. This is very impressive because the whole approach is very costly but it
\r
647 really pays like depicted in the results. Array Bound Check elimination on
\r
648 Demand (ABCD) was really able to improve execution time of some applications in
\r
649 the used benchmark. \par
\r
651 In 2001, one year after the last USPatent concerning hardware implementation of
\r
653 \cite{ApparatusAndMethodForArrayBoundsCheckingWithAShadowFile:2001} was
\r
654 registered as a new idea. Also Chin, Khoo and Xu published in
\r
655 \cite{DerivingPreconditionsForArrayBoundCheckElimination:2001} some further
\r
656 ideas how to derive preconditions for ABCs. This very formal approach and
\r
657 documentation gives some more detailed insights into the problem of program
\r
658 code analysis for optimizing ABCs. \par
\r
660 In 2002 Bentley, Watterson and Lowenthal compared execution of ABC execution on
\r
661 Superscalar and VLIW hardware architecture in
\r
662 \cite{AComparisonOfArrayBoundsCheckingOnSuperscalarAndVLIWArchitectures:2002}
\r
663 which gives good explanations concerning implementation of ABCs using processor
\r
664 compare instructions. \par
\r
666 Qian, Hendren and Verbrugge present three algorithms
\r
667 they implemented using the Soot framework previously mentioned in IBMs HPCJ in
\r
668 \cite{AComprehensiveApproachToArrayBoundsCheckEliminationForJava:2002}. The
\r
669 first algorithm Variable Constraint Analysis (VCA) works on a simplified
\r
670 directed weighted graph that reflects the relations and behavior of relevant
\r
671 variables (array access indices). Array Field Analysis AFA and Rectangular
\r
672 Array Analysis deal with special forms of array structures that are hard to
\r
673 analyze but can in some circumstances really allow good optimizations. \par
\r
675 Gurd and Freeman et al. presented a further paper
\r
676 \cite{EliminationOfJavaArrayBoundsChecksInThePresenceOfIndirection:2002} that
\r
677 exactly captures the problem of ABCs in the presence of indirection what
\r
678 simply means that array access indices are stored in an other array.
\r
679 Optimizations in this scope are not easy and require a new technique.
\r
680 The solution is not embedded in the Java Compiler or VM itself
\r
681 but implemented directly as Java classes that must be used by programmers.
\r
682 Three classes with different approaches are presented but only one of them
\r
683 seems really useable.
\r
684 \begin{description}
\r
685 \item[ImmutableIntMultiarray:] The array is checked for invalid bounds and is
\r
686 not allowed to be modified during index access operations!
\r
687 \item[MutableImmutableStateIntMultiarray:] The array can be in two states, in
\r
688 the first (mutable) bounds are checked and in the second (immutable) no checks
\r
690 \item[ValueBoundedIntMultiarray:] This array forces that all entries (indices)
\r
691 are within the defined range and so no exception is possible.
\r
693 The idea and all of the explanations and details about matrix
\r
694 calculations and related ABCs are really good but generally the practical
\r
695 benefits are rare because Java especially wants to permit that program style of
\r
696 classes has impact on run time and performance because this is exactly what
\r
697 Java wants to abstract from. \par
\r
699 In 2002 Gupta, Pratt, Midkiff and Moreira registered the USPatent
\r
700 \cite{MethodForOptimizingArrayBoundsChecksInPrograms:2002} where they describe a
\r
701 very general method for optimizing ABCs in many facets. \par
\r
703 In 2003 Motohiro Kawahito published in an interesting and very exotic idea for ABC
\r
704 \cite{ArrayBoundsCheckEliminationUtilizingAPageProtectionMechanism:2003}
\r
705 elimination by exploiting the hardware processors page protection mechanism.
\r
706 It is based on the idea to place an array to the end/beginning of a page and
\r
707 inspect if an access leaves the valid page. If the page protection throws an
\r
708 exception it can be derived that the array bounds were violated. \par
\r
710 In 2005 Nguyen and Irigoin presented with
\r
711 \cite{EfficientAndEffectiveArrayBoundChecking:2005}
\r
712 a document that implements in a FORTRAN compiler (PIPS) some
\r
713 optimizations of ABCs. The general ideas are well known but the detailed
\r
714 and formal descriptions give further insights into the details of the
\r
715 whole problem. The first idea is based on Gupta's approach. Insert ABCs and
\r
716 eliminate redudant ones. Furthermore insert ABCs that can't be proven
\r
717 unnecessary. And the last important thing is an algorithm for array size
\r
718 checking and analysis to prove array access vioaltions over different procedure
\r
719 calls. So the technique allows interprocedural ABC optimization that is a real
\r
720 challenge and rarely implemented anywhere. \par
\r
722 Another large document
\r
723 \cite{SymbolicBoundsAnalysisOfPointersArrayIndicesAndAccessedMemoryRegions:2005}
\r
724 was publisehd by Rugina and Rinard where they describe a new framework that
\r
725 deals with general problems like access memory regions but also with bounds
\r
726 analysis of pointer array indices. Due to the extensiveness of it any
\r
727 explanation about this is widely out of this document's scope. \par
\r
729 In 2007 two new directions were started that are explained more precisely in the
\r
732 In 2008 Popeea, Xu and Chin presented in
\r
733 \cite{APracticalAndPreciseInferenceAndSpecializerForArrayBoundChecksElimination:2008}
\r
734 theory and experiments that use forward relational analysis to obtain
\r
735 postconditions and backward precondition derivation to inference precise
\r
736 (path, context sensitive) and practical pre/postconditions for an inference
\r
737 algorithm to reduce ABCs. Two phases inference and specialization (inserts
\r
738 necessary checks) are used to obtain the right pre/postconditions. Weak and
\r
739 strong derivation processes lead to the results. In the analysis phase
\r
740 Presburger algorithm is used that has high complexity and therefore the number
\r
741 of formulas must be reduced as far as possible. Flexivariants of methods mean
\r
742 that due to their usage the conditions can be weaker or stronger, so different
\r
743 condition strengths are calculated and stored for maximum of efficiency and
\r
744 minimum of memory effort. The technique also deals with array indirections.
\r
745 It works on a high level artificial language IMP that allows no loops but
\r
746 recursion (fix-point analysis) and also has some other restrictions. The inference
\r
747 algorithm must be run at compile time and some cases were really expensive
\r
748 (\(>\)60min)! So it is not suitable for run-time checks! The approach takes interprocedural
\r
749 behavior into account but without dynamic class loading, because the analysis
\r
750 works bottom-up. The work is in general an extremely formal approach that makes
\r
751 hard use of formulas, foreign symbols and hard to understand mathematical
\r
752 definitions so the contribution is in best case of theoretical nature but not
\r
753 really suitable in practice.
\r
755 \subsection{State-Of-The-Art Methodologies}
\r
756 The previous path through history was very chaotic but since 2007 two
\r
757 real main directions arise that have the potential to be, in the end, satisfactory for
\r
758 solving the problem of ABC optimization. The first approach is implemented
\r
759 already in the HotSpot Java Virtual Machine and uses a fast transformation from
\r
760 Java Byte Code to an SSA based intermediate representation to apply low cost
\r
761 but extremely effective analysis techniques for well known code patterns and
\r
762 code structures to improve ABC optimization quality quite good and precise. The
\r
763 second approach can be summarized under the name Multiphase Bound Check
\r
764 Elimination (MBCE) where Java Compiler and Java Virtual Machine process separated
\r
765 parts of the ABC optimization process to get high quality and highly efficient results.
\r
766 The benchmarks and measurements are also suitable to reveal the real impact and
\r
767 efficiency of the presented and tested techniques. \par
\r
769 In 2007 WĂ¼rthinger, Wimmer and Mössenböck published
\r
770 \cite{ArrayBoundsCheckEliminationForTheJavaHotSpotTMClientCompiler:2007} were
\r
771 they present insights into the general functionality of the HotSpot Java VM and
\r
772 the ABC optimization technique they use. Their algorithm optimizes some well
\r
773 known programming patterns in Java to eliminate redundant checks and to move
\r
774 partial redundant checks out of loops but keep in mind deopimization for not
\r
775 provable check eliminations. This means that if checks can't be proven valid
\r
776 the VM stops JIT compilation usage and goes back (ORS) to interpretation mode
\r
777 (deoptimization) that is slower but easily allows keeping to the Java language
\r
778 conventions like throwing exceptions at the right place\ldots. But results show
\r
779 no significant increase in optimization, because the part of execution time is
\r
780 not very high in the used benchmarks! A maximum of 5\% was possible! The
\r
781 solution is clean, keeps to widely accepted standards and optimizes ABCs if
\r
782 they occur very often. \par
\r
784 Another publication
\r
785 \cite{VerifiableRangeAnalysisAnnotationsForArrayBoundsCheckElimination:2007}
\r
786 from 2007 is the beginning of the most recently researched
\r
787 technique of von Ronne, Psarris and Niedzielski. This document describes
\r
788 verifiable annotations based on SafeTSA for optimizing ABCs by shifting slow
\r
789 optimizations to a Java to SafeTSA compiler and let the JIT compiler do fast
\r
790 and also efficient optimizations based on the created annotations. This allows
\r
791 usage of expensive optimization techniques in time critical virtual machines
\r
792 (JIT compilers). The interesting problem of integer
\r
793 overflow/underflow that is not taken into account by many other techniques but
\r
794 means a security and safety leak is also mentioned. This should be taken into consideration for
\r
795 any implementation to react accordingly to over- and underflows. The theory of
\r
796 the build constraints and linear inequality/equalities and the prover are
\r
797 strongly related to
\r
798 \cite{ABCDEliminatingArrayBoundsChecksOnDemand:2000} but provide
\r
799 some improvements and are also meant to be only integrated into low level code
\r
800 (SafeTSA) as annotations and therefore use much more expensive techniques than
\r
803 In 2008 Nguyen and Keryell published
\r
804 \cite{EfficientIntraproceduralArrayBoundChecking:2008} that describes a
\r
805 further framework that is based on deriving expensive linear equalities at
\r
806 compile-time and integrating them as annotations into a special form of
\r
807 intermediate representation like SafeTSA for example. This approach is strongly
\r
808 connected to the previous paper
\r
809 \cite{VerifiableRangeAnalysisAnnotationsForArrayBoundsCheckElimination:2007}
\r
810 (same introduction). They implemented a verifyable annotation generator that
\r
811 is based on Jikes RVM 2.2.0. Its use extended Fourier-Motzkin's variable
\r
812 elimination (FMVE) to derive proofs. The framework presented is called VBCE
\r
813 Verifiable Bound Check Elimination and can be directly compared to VDF (Verifiable
\r
814 Data Flow Analysis). The intraprocedural approach that is not able to deal with
\r
815 interprocedural relations, works on high level code at compile time and
\r
816 annotations and is therefore not suitable for direct JIT optimization.
\r
817 Loop invariants are not taken into account. \par
\r
819 Furthermore in this year Gampe, von Ronne and Niedzielski et al. published two
\r
821 \cite{SafeBoundsCheckAnnotations:2008}
\r
823 \cite{SpeculativeImprovementsToVerfifiableBoundsCheckElimination:2008}
\r
824 of their previous work
\r
825 \cite{VerifiableRangeAnalysisAnnotationsForArrayBoundsCheckElimination:2007}
\r
826 that introduce speculation and newer mechanisms for annotations into the
\r
827 optimization technique. The papers improves the previous framework further.
\r
828 It also works on high level code at compile but now takes loop invariants into
\r
831 To get the relationship of their publications on the
\r
832 topic these papers should be studied at least at a glance.
\r
834 In 2009 WĂ¼rthinger, Wimmer and Mössenböck revealed a further update of their
\r
836 \cite{ArrayBoundsCheckEliminationInTheContextOfDeoptimization:2009} that
\r
837 introduces further improvements and optimizations in the HotSpot Java VM.
\r
838 The document is logically strongly related to
\r
839 \cite{ArrayBoundsCheckEliminationForTheJavaHotSpotTMClientCompiler:2007}
\r
840 and explains that code duplication (loop versioning) is avoided using the
\r
841 interpretation mode of the VM to retain exception semantics for PRCs.
\r
842 It provides a good explanation of all parts especially PRC in loops and why loops
\r
843 with additional exit conditions can't be optimized. Furthermore conditions are
\r
844 not managed in an equality/inequality graph but for every instruction.
\r
845 Interprocedural optimizations are not taken into account due to dynamic class
\r
846 loading mechanism of Java. Loading new classes leads to more information about
\r
847 array accessing and can lead to deoptimization (OSR at safepoint back to stack)
\r
848 if the new conditions can't prove the previous optimizations. But several
\r
849 ABCs can be grouped if proveable which leads to further optimizations. All
\r
850 things considered a really suitable ABC optimization technique clearly
\r
851 embedded in the VM and very effective going by the test results. \par
\r
853 \cite{ArrayBoundsCheckEliminationForJavaBasedOnSparseRepresentation:2009} from
\r
854 Yang, Huang and Yang is another publication from 2009 that introduces new
\r
855 algorithm ABCE that is based on HSSA form and can be used in a Java static
\r
856 compiler. It builds an inequality graph and eliminates FRC (fully redudant
\r
857 checks) and PRC (partially redundant checks) on it in two phases. The work is
\r
858 based on the Openjc java compiler that is restricted to very few platforms and
\r
859 not widespread. This compiler uses a five phase generation style where in the
\r
860 fourth phase the optimization is done on code that was transformed to HSSA
\r
861 form and back. That means a lot of time effort. The quality and embedding of
\r
862 the algorithm into the compiler is suitable but not useable for run time
\r
863 optimizations and therefore too restricted. Furthermore the article only uses
\r
864 percentage of ABC elimination as benchmark and a very intransparent
\r
865 performance score graphic! \par
\r
867 In the same year Gampe, von Ronne and Niedzielski et al. published another
\r
869 \cite{AVerifiableControlFlowAwareConstraintAnalyzerForBoundsCheckElimination:2009}
\r
870 of their previous work
\r
871 \cite{VerifiableRangeAnalysisAnnotationsForArrayBoundsCheckElimination:2007}
\r
872 that further improves their ABC optimization technique. In the document a new
\r
873 more modern technique is presented that is also based on eSSA of the ABCD
\r
874 algorithm and claimed to be 10\% more efficient than on SafeTSA. Then CAS
\r
875 (constraint analysis system) a verifiable symbolic program constraint analyser
\r
876 is presented. The approach is like previous ones also based on multiphase and
\r
879 From 2010 the last known publication about ABC optimization is
\r
880 \cite{SafeMultiphaseBondsCheckEliminationInJava:2010} again from Gampe, von
\r
881 Ronne and Niedzielski et al. They refined the details of their
\r
882 multiphase ABC optimization technique based on SafeTSA with annotations, CAS
\r
883 and elaborate speculations. The results and also the additional provided
\r
884 information about the topic show that the solution is practically satisfying.
\r
885 The most challenging problems
\r
886 connected to ABCs are solved and the solution is well embedded into a Java
\r
887 Compiler and Java VM. They are not widespread but solve the problem of ABCs as
\r
888 far as possible to be both efficient and convenient.
\r
890 \subsection{Historical Conclusion}
\r
891 Lastest research in ABC optimizations prove the assumption that only an early and
\r
892 clean integration into the compilation and or interpretation process together
\r
893 with suitable intermediate representations and multiple optimization phases
\r
894 lead to practically satisfying results that really pay and tackle all newly
\r
895 introduced problems. For Java the lately improved approach of the HotSpot Java
\r
896 VM and the multiphase bound check elimination MBCE promise practical success
\r
897 because all other previous techniques suffer from many drawbacks that can't
\r
898 really be solved due to the overall environment of the language architecture,
\r
899 hardware, future developments and other aspects of reality.
\r
901 %% ABCs in the Cacao Java VM --------------------------------------------------
\r
903 \section{ABCs in Cacao Java Virtual Machine}
\r
905 \subsection{Introduction to Cacao Java Virtual Machine}
\r
906 The work on the Cacao Java VM was started in 1996 by Reinhard Grafl and
\r
907 Andreas Krall. They wanted to create a free, efficient and stable Java VM for
\r
908 a wide variety of processor architectures and platforms. In the following years
\r
909 many people joined the development team and implemented several extensions,
\r
910 refinements and more and more architectures were provided. The implementation
\r
911 is mostly done in C and some parts were already refactored to C++. The VM is partitioned
\r
912 into modules (*.h/*.hpp with *.c/*.cpp) and keeps to the relevant Java standards
\r
913 concerning behavior and general structure. For more detailed information about the
\r
914 Cacao Java VM consult
\r
915 \cite{HandbookForJavaCacaoVirtualMachine:1996} and
\r
916 \cite{CacaoA64BitJavaVMJustInTimeCompiler:1997}.
\r
918 \subsection{Array Bounds Checking in Cacao Java Virtual Machine}
\r
919 In this chapter we will describe in general all necessary aspects concerning ABCs.
\r
920 The Cacao Java VM uses integer compare instructions of the processor to implement
\r
921 ABCs. They are generated together with all other code parts during JIT compilation
\r
922 by the code generators from a special internal intermediate representation whose form
\r
923 is strongly related to quadruples and derived from the Java Byte Code provided by
\r
924 Java class files. ABCs are positioned in front of array accesses.
\r
925 The JIT compilation is done locally on every method invocation and therefore only
\r
926 allows intraprocedural optimizations, so it's impossible to solve intermediate ABC
\r
927 optimization because all underlying and parallel methods must be analyzed. Some
\r
928 experiments with example applications showed that execution with ABCs turned on leads
\r
929 to significant execution time overhead. For some applications this overhead is
\r
930 enormous so in consideration of all properties of the Cacao Java VM, its environment and
\r
931 implementation an ABC optimization would be able to drastically reduce execution
\r
932 time for some types of applications. In the past this motivated the creation of
\r
933 an ABC optimizer in the Cacao Java VM.
\r
935 \subsection{Array Bounds Check Optimization in Cacao Java Virtual Machine}
\r
936 In the year 1998 Christopher KrĂ¼gel added an ABC optimizer for loops to the Cacao Java VM. This led to
\r
937 improvements concerning execution time of array intensive applications. For the
\r
938 reasons previously explained the optimization algorithm design was limited by
\r
939 many constraints so the author only concentrated on the most important topics.
\r
940 A useful generic introduction into the optimizer can be obtained from
\r
941 \cite{HandbookForJavaCacaoVirtualMachine:1996} in chapter 5.7 Loop Optimization.
\r
943 \subsubsection{Power and Properties of the ABC Optimizer}
\r
944 The intermediate representation (IR) used within the Cacao Java VM is not really suitable
\r
945 for complicated high performance optimization like for example SSA based IRs are. This
\r
946 has historical reasons, because these representation formats got famous in the past decade
\r
947 and also have disadvantages so that their usage must be carefully taken into account
\r
948 during the whole design process of an abstract machine or compiler.
\r
949 For this reason the algorithm design was in the first instance limited by the complexity raised
\r
950 by analysis and transformations of this quadruple related IR. The time consumption of the
\r
951 optimizer must be recognized.
\r
952 Secondly the JIT compiler architecture limited the optimization scope to one local method.
\r
953 This further constrains possibilities because no global ABCs can be removed.
\r
954 Due to the internal architecture of the Cacao Java VM
\r
955 the structures and algorithms grew very complicated and so the developer only focused on the
\r
956 most important facts concerning ABC optimization. The author concentrated only on optimization within
\r
957 loops and even furthermore he restricted his algorithm to loops with induction
\r
958 variables with constant grow factors (up or down). Maybe
\r
959 the algorithm designer used the argument that many empirical tests in the past showed that these
\r
960 kinds of loops introduce most of the execution time overhead because they are very often and commonly
\r
961 used by programmers in many programming languages. The results he presented proved the concept to
\r
962 be generally suitable. So he did not analyze detailed theoretical differences on redudant/%
\r
963 partial redudant or necessary/local/global ABCs. He simply concentrated on ABCs that seem to
\r
964 cause the most overhead in loops within normal applications.
\r
966 \subsubsection{Generic Design Issues of the ABC Optimizer}
\r
967 This chapter will provide a generic overview over the design of the ABC optimizer
\r
968 and the added improvements. A detailed explanation is, due to the amount of information,
\r
969 not possible within the document's scope. For a further functional explanation please consult
\r
970 \cite{HandbookForJavaCacaoVirtualMachine:1996}
\r
971 and for the implementational details directly the comments in the Cacao Java VM source code.
\r
973 The general idea of the algorithm is to analyze a method during its JIT compilation process
\r
974 for loops. This is accomplished by a depth first search of code blocks (basic blocks), then
\r
975 calculating the dominators of the blocks using the Lengauer-Tarjan algorithm
\r
976 \cite{AFastAlgorithmForFindingDominatorsInAFlowGraph:1979}
\r
977 and detecting all loops by searching for back edges
\r
978 \cite{ModernCompilerImplementationInC:1998}. These are very time consuming tasks because all
\r
979 nodes (basic blocks) of the method must be iterated several times one every method compilation.
\r
980 After this happened identical loops are merged, nested loops resolved and all together they are topologically sorted.
\r
981 The loops are optimized in their sorted order by analyzing their loop headers and the variables/constants
\r
982 involved in array access instructions. If properties are found that allow optimization the
\r
983 loop and its header are adapted to suit optimal execution behavior. If nested loops exist their
\r
984 combinatorical behavior must be considered.
\r
985 The loop header must be adapted in such a way that new dynamic and static constraints introduced by code motion
\r
986 out of the loop are inserted. In general only loops can be optimized where array index variables
\r
987 are constant (full), only increment (lower bound) or only decrement (upper bound) by constant values.
\r
988 Loop headers with or statements can't be optimized by the algorithm as cases
\r
989 where array index variables are modified within exception handling (catch) blocks.
\r
990 The original loops are replaced by the optimized (memory layout) with a conditional
\r
991 jump to the originals if some dynamic constraints don't hold.
\r
992 Also important for the optimization design is to notice that the original version of the loop is
\r
993 held in memory parallel to the optimized one to allow correct execution with ABCs in the case
\r
994 that some dynamic constraints don't hold or can't be proved. This introduces a lot of time
\r
995 and space overhead during jit compilation because of the copy procedure. The correct patching
\r
996 of links between code blocks and an exact exception handling add even more complexity to the
\r
997 solution. So every optimized loop exists two times within the memory.
\r
999 \subsubsection{Implementation of the ABC Optimizer}
\r
1001 This chapter is dedicated to give an abstract but fundamental introduction
\r
1002 to the structure of the ABC optimization algorithm. At first
\r
1003 the most important data structures are explained to get an idea what kind
\r
1004 of data is necessary for the processes (temporary and in the end). Then we will
\r
1005 have a clear look at the algorithm in form of an abstract pseudocode that represents
\r
1006 generally all important functions and the behavior overall. The functions are
\r
1007 named to be self explanatory to express their meaning but for understanding the
\r
1008 ideas behind it maybe a good beginning would be to first consult the good generic
\r
1009 textual description found in
\r
1010 \cite{HandbookForJavaCacaoVirtualMachine:1996} in chapter 5.7 Loop Optimization.
\r
1011 This could give some useful introducing information before studying the real behavior.
\r
1015 The most important data structures to understand the ABC optimization algorithm:
\r
1017 \begin{description}
\r
1019 \item[LoopData (ld)] is a data structure that contains all necessary
\r
1020 intermediate information and states necessary during the ABC
\r
1021 optimization like detected loops, exception graphs, auxiliary
\r
1022 information to improve algorithm performance...
\r
1024 \item[JITData (jd)] contains all information about the just-in-time
\r
1025 compiled method that is currently optimized. This structure
\r
1026 contains for example the methods code blocks (basic blocks),
\r
1027 the number of codeblocks, exception table...
\r
1029 \item[LoopContainer (lc)] that holds all information about a specific
\r
1030 loop, like the contained code blocks for example.
\r
1032 \item[BasicBlocks (bb)] is a container that contains code of one
\r
1033 atomic code part of the method (no branch/jump) instruction
\r
1034 contained. So this means that a basic block is guaranteed to
\r
1035 be executed in one exact sequence.
\r
1039 So the most important data structures are now identified and the algorithm
\r
1040 can be described in pseudocode.\\
\r
1042 Preconditions are that the method must be completely parsed, analyzed
\r
1043 and all information must be correctly stored within the related
\r
1044 jitdata structure.
\r
1046 %% Codesection -----------------------------------------------------------------
\r
1050 depthFirstSearchOnMethodBasicBlocks (jd)
\r
1051 exploreDominators (ld)
\r
1052 detectLoopsInMethod (jd, ld)
\r
1053 analyzeDoubleLoopHeaders (ld)
\r
1054 analyzeNestedLoops (jd, ld)
\r
1055 for all LoopContainers (lc) {
\r
1056 optimizeSingleLoop (jd, ld, lc)
\r
1058 redirectMethodEntryBlock (jd, ld)
\r
1060 optimizeSingleLoop (jd, ld, lc) {
\r
1061 analyzeLoopForArrayAccesses (jd, ld, lc)
\r
1062 if ( analyzeLoopForOrAndExceptions (jd, ld, lc) == true ) {
\r
1063 initializeLoopHeaderConstraints (jd, ld, lc)
\r
1064 removeHeaderBoundChecks (jd, ld, lc)
\r
1065 for all BasicBlocks (bb) {
\r
1066 removeBoundChecks (jd, ld, lc)
\r
1068 insertChecksIntoLoopHeader (jd, ld)
\r
1072 initializeLoopHeaderConstraints (jd, ld, lc) {
\r
1073 if ( last Instruction in header is comparison against constant )
\r
1074 obtainLeftArgumentDetails (ld, lc)
\r
1075 } else if ( last Instruction in header is standard comparison ) {
\r
1076 obtainLeftArgumentDetails (ld, lc)
\r
1077 obtainRightArgumentDetails (ld, lc)
\r
1079 // unknown constraints for these conditions
\r
1081 if ( both sides contain changing or both have no index variables ) {
\r
1082 // no constraints can be proven valid
\r
1084 // force index variable to be on left side
\r
1085 ForceRightSideToBeConstant (ld, lc)
\r
1086 obtainDynamicConstraints (ld, lc)
\r
1090 removeBoundChecks (jd, ld, lc) {
\r
1091 mergeConstraintsWithNestedLoops (ld, lc)
\r
1092 for all Instructions of BasicBlock (bb) {
\r
1093 if ( Instruction is accessing array ) {
\r
1094 obtainIndexVariableDetails (ld, bb)
\r
1095 calculateBoundConstraints (jd, ld, bb)
\r
1096 createLoopHeaderCodeMotionChecks (jd, ld, bb)
\r
1099 for all successors of BasicBlock (bb) {
\r
1100 removeBoundChecks (jd, ld, bb)
\r
1104 insertChecksIntoLoopHeader (jd, ld) {
\r
1105 copyCompleteLoop (jd, ld)
\r
1106 createBasicBlockForCodeMotionChecks (jd, ld)
\r
1107 insertGotoInstructionsForHandlingNewLoopHeaderBasicBlock (jd, ld)
\r
1108 adaptExceptionHandlingRegions (jd, ld)
\r
1109 informParentLoopsAboutNewHeader (jd, ld)
\r
1110 for all LoopHeaderCodeMotionChecks {
\r
1111 createInstructionsForConstraintChecks (jd, ld)
\r
1112 insertInstructionsIntoNewLoopHeader (jd, ld)
\r
1114 updateJumpTargetsOfLoops (jd, ld)
\r
1115 updateExceptionHandlingForNewAndDuplicatedLoop (jd, ld)
\r
1120 %% Codesection end -------------------------------------------------------------
\r
1122 Postconditions are that the method's jitdata structure holds correct, executable
\r
1123 and optimized loop code blocks that have no semantic differences to the original form.\newline
\r
1125 Keep in mind that this pseudocode representation is due to the documents scope
\r
1126 very generic so for more detailed information it's maybe best to directly consult
\r
1127 source code with its comments because it's hard to go further into details and
\r
1128 preserving overview.
\r
1130 \subsubsection{Results and Performance of the ABC Optimizer}
\r
1131 This chapter presents results that were taken from an application that sieves prime numbers.
\r
1133 \begin{table}[!ht]
\r
1134 \caption{Benchmark Results of ABCs in Cacao}
\r
1135 \label{BenchResABCsInCacao}
\r
1137 \begin{tabular}{lccc}
\r
1138 Configuration&Light&Normal&Hard\\
\r
1141 Normal&5,563&55,195&551,355\\
\r
1142 without ABCs -cb&5,562&55,192&551,414\\
\r
1143 with -oloop&5,560&55,182&551,350\\
\r
1146 \par\medskip\footnotesize
\r
1147 The measurements are selected to rise by a factor of 10 every level starting from 50000/10000.
\r
1151 From this data with polynomic behavior a fixed percentage of speed acceleration should
\r
1152 be proven but it's easy to see that there are no differences between these configurations.
\r
1153 This encourages the assumption that ABCs are always created within the Cacao Java VM whatever
\r
1154 arguments are passed. The proof for this assumption will be done using assembler code output
\r
1155 from the machine application.
\r
1157 \begin{multicols}{2}
\r
1159 \subsubsection{Possible Improvements for the ABC Optimizer}
\r
1160 The algorithm is due to its constraints of the virtual machine architecture a very conservative approach
\r
1161 and has no really complicated mechanisms like speculation/probability calculations or exhaustive
\r
1162 stack exploration. The implementation is done in C (structured programming technique) so
\r
1163 it's not really likely to find exhaustive performance improvements in its implementation, but like
\r
1164 always some possibilities remain that could lead to better results. A very expensive
\r
1165 part of the ABC optimization algorithm includes the necessity for loop detection at the beginning.
\r
1166 This is accomplished by usage of the Lengauer-Tarjan algorithm presented in
\r
1167 \cite{AFastAlgorithmForFindingDominatorsInAFlowGraph:1979}.
\r
1168 This algorithm has good properties concerning asymptotic complexity and is especially
\r
1169 used by mathematicians therefore the preferred solution for very big graphs (\(>\) 30.000 nodes)
\r
1170 and also presented for example in
\r
1171 \cite{ModernCompilerImplementationInC:1998}.
\r
1172 But in practice methods and nested loops are in general much smaller and own only
\r
1173 tens or hundreds of nodes in their control flow graphs. Cooper, Harvey and Kennedy showed in
\r
1174 \cite{ASImpleFastDominanceAlgorithm:2001}
\r
1175 that an other algorithm that has higher asymptotic complexity than the Lengauer-Tarjan performs much
\r
1176 better for smaller graphs because the complexity issue is only really valid for very big (asymptotic) graphs
\r
1177 that normally do not occur in most other programs. The authors claim that their algorithm is for
\r
1178 conventional method and loop graphs 2.5 times faster than the Lengauer-Tarjan algorithm. If the
\r
1179 experiments are valid, there exists a possibility to improve the whole ABC optimization algorithm by
\r
1180 using the simpler algorithm of Cooper, Harvey and Kennedy. The algorithm and especially its proof
\r
1181 are also much simpler to understand and therefore to implement. \par
\r
1182 One fact must be kept in mind. To really allow drastic improvements in algorithm performance
\r
1183 the complete Cacao Java Virtual Machine architecture must be reconsidered and this demanding
\r
1184 task can only be done in an iterative way over a long period of time.
\r
1186 %% Conclusion -----------------------------------------------------------------
\r
1188 \section{Conclusion}
\r
1189 The document presented general information about the topic of ABCs overall and also preserved
\r
1190 historical aspects that show up the evolvement. The main part dealed with ABC optimization in the
\r
1191 Cacao Java VM. All things considered the task of successful array bounds check optimization is a complicated and
\r
1192 challenging quest with many facets and relations. A lot of details must be considered in design and
\r
1193 also implementation phases. But it is definitly a problem that must be solved satisfactorily for all
\r
1194 future developments in computer languages and informatics in general because it is simply not bearable
\r
1195 to waste computing time for any superfluous calculations for any reasons. The ABC optimization within loops in
\r
1196 the Cacao Java Virtual Machine is very conservative due to its environment and also for historical reasons.
\r
1197 So we are definitly far away from the perfect solution but the algorithm proved to be useful in some cases
\r
1198 (programs) and this is the first step towards evolvement of a practical really satisfying result.
\r
1200 %% Bibliography ---------------------------------------------------------------
\r
1202 \bibliographystyle{alpha}
\r
1203 \bibliography{bibfile}
\r
1209 %% End ------------------------------------------------------------------------
\r