## The future of compiler optimization

I came across this bit of speculation on the future of compiler optimization today, and I thought I’d add a few thoughts (as well as refer all one of my readers to it.)

Verification and Optimization Will Join Forces. If you read his description of this, it may sound rather surprising that no one has done this yet (to my knowledge).  I think the reason this hasn’t yet happened is that there have been no really suitable languages.  Getting this information to the optimizer isn’t trivial.

Agda (and friends, like Epigram) are dependently typed programming languages that are just appearing on the scene relatively recently.  Currently they’re still in their infancy, but the goal is to have a programming language that allows you to specify (and thus, force you to prove) as much or as little about your program as you like.  The ideal, I think, is to take something written very much like an ordinary Haskell program, and augment it bit by bit with more precise types.  (There’s still a long way to go!  Really, we’re just beginning to learn how to write dependently typed programs.)

These kinds of languages seem to me to be the best bet for trying to augment optimization with verification.  I envision a library of definitions of various predicates (and their associated theories) that the optimizer is aware of.  I think a particularly compelling demonstration could be made with just one predicate: Terminating.  There are a host of optimizations that cannot be done to Haskell programs because of $\bot$.

This has been today’s free research idea!

Optimizations Will Emit Proof Witnesses. I thought I was going to disagree with this, but I think it was just phrased poorly. A better title might be “Optimizers Will Be Proven Correct.” I mean, we could emit proof witnesses for generated code, but why? Just knowing the optimizer is correct seems like enough. (It doesn’t seem like it’d be needed for proof carrying code, nor does it seem like that’s what he’s getting at.)

I’d further add a link to Xavier Leroy on this stuff, who I got to see give a series of lectures this summer about CompCert, a formally verified optimizing C compiler, and formally verifying compilers in general.

A Few IRs Will Win. Agreed, but haven’t they already?  The interesting question is whether GCC will be a major player in this.  Currently, LLVM, JVM, and CLR are the only major IRs I’ve heard of.  GCC may have fallen too far behind on this to catch up, I don’t know.  (It’s almost like letting political considerations overrule good technical decisions is a bad thing.)

Jit vs static optimizations. I disagree ever so slightly and think there is a fundamental, if small, difference: profile-directed optimization will probably always be a bit cumbersome to use, while JITs can do it transparently.

And there’s one I’d like to add:  domain specific optimizers will splash onto the stage.  I think when this starts to happen, we’ll see a major burst in progress.  But I’m not sure anyone can convincingly claim to have the story on how this will happen yet.  Domain specific languages, active libraries, staged or generative programming, or various other forms of meta-programming?  I’m going to go with language extension, personally, but it will surprise no one to know that’s what I work on! ;)

To give you a little idea of what I mean by this, imagine you are writing a program that involves intense, complicated matrix computation as one small part of the whole.  What are the best ways to write this, today?  Well, you can write it using Matlab or Mathematica (the DSL approach), but this is cumbersome to integrate with a larger program.  You could carefully write it all out using a matrix library in C or Fortan, but this requires you to manually choose appropriate functions from the library.  (If this doesn’t sound horrifying, please consult this table.)  There exist C++ template libraries, but this comes complete with all the fragility and incomprehensibility of C++ template meta-programming.

So as far as I know, there’s really no good way to do this, yet. We just have to suffer and get the job done in less than ideal ways. But, if we could simply write down what we mean at a higher level, and leave an optimizer to deal with things like selecting the correct function from the underlying library implementation, that would be a huge leap forward.

Maybe that’s another prediction: optimizers will be slightly less about making code faster, and a little more about letting us write better code that still has the same performance.  To some extent, we may see this already with Haskell.