We need to switch to a new register allocator. The current one is split in a global and a local register allocator. The global one can assign only callee-saves registers and happens on the tree-based internal representation: it assigns local variables to hardware registers. The local one happens on the linear representation on a per basic block basis and assigns hard registers to virtual registers (which hold temporary values during expression executions) and it deals also with the platform-specific issues (fixed registers, call conventions). Moving to a different register will help solve some of the performance issues introduced by the above split, make the register more easily portable and solve some of the issues generated by dealing with trees. The general design ideas are below. The new allocator should have a global view of all the method, so it can be able to assign variables also to some of the volatile registers if possible, even across basic blocks (this would improve performance). The allocator would be driven by per-arch declarative data, so porting should be easier: an architecture needs to specify register classes, call convention and instructions requirements (similar to the gcc code). The allocator should operate on the linear representation, this way it's easier and faster to track usages more correctly. We need to assign virtual registers on a per-method basis instead of per basic block. We can assign virtual registers to variables, too. Note that since we fix the stack offset of local vars only after this step (which happens after the burg rules are run), some of the burg rules that try to optimize the code won't apply anymore: the peephole code may need to be enhanced to do the optimizations instead. We need to handle floating point registers in the global allocator, too. The new allocator also needs to keep track precisely of which registers contain references or managed pointers to allow us to move to a precise GC. It may be worth to use a single increasing set of integers for the virtual registers, with the class of the register stored separately (unless the current local allocator which keeps interger and fp registers separate). Since this is a large task, we need to do it in steps as much as possible. The first is to run the register allocator _after_ the burg rules: this requires a rewrite of the liveness code, too, to use linear indexes instead of basic-block/tree number combinations. This can be done by: *) allocating virtual regs to all the locals that can be register allocated *) running the burg rules (some may require adjustments): the local virtual registers are assigned starting from global-virt-regs+1, instead of the current hardware-regs+1, so we can tell apart global and local virt regs. *) running the liveness/whatever code is needed to allocate the global registers *) allocate the rest of the local variables to stack slots *) continue with the current local allocator This work could take 2-3 weeks. The next step is to define the kind of declarative data an architecture needs and assigning virtual regs to all the registers and making the allocator assign from the volatile registers, too. Note that some of the code that is currently emitted in the arch-specific code, will need to be emitted as instructions that the reg allocator can inspect: think of a method that returns the first argument which is received in a register: the current code copies it to either a local slot or to a global reg in the prolog an copies it back to the return register int he basic block, but since neither the regallocator nor the peephole code knows about the prolog code, the first store cannot be optimized away. The gcc code has some example of how to specify register classes in a declarative way.