From spoons+ at cs.cmu.edu Wed Aug 9 20:23:33 2006 From: spoons+ at cs.cmu.edu (Daniel Spoonhower) Date: Wed Aug 9 20:23:37 2006 Subject: [icfp-discuss] thanks; teams page on-line Message-ID: <44DA7C85.3060202@cs.cmu.edu> Greetings, contestants! First, thanks for the many kind words that you've sent to us and posted to the discussion mailing list. It is gratifying to know that our work has been (and continues to be) appreciated. Please feel free to continue to send feedback to the discussion list or to icfpcontest-organizers directly. Second, we've updated the contest web site to include a list of all scoring teams. http://icfpcontest.org/teams.shtml It currently includes team web pages and geographic locations. We plan to update the page with more information after the winners are announced. Finally, prize winners will be announced at the ICFP meeting in about six weeks. Our presentation is scheduled for Tuesday, September 19th. At that time, we will also release an updated Codex (with more puzzles as well as hidden goodies), the full details of how we put the contest together, and many of the tools and source code used to create the Codex. The 2006 ICFP Programming Contest Organizers From clive at xtra.co.nz Sat Aug 12 22:38:27 2006 From: clive at xtra.co.nz (Clive Gifford) Date: Sat Aug 12 22:38:40 2006 Subject: [icfp-discuss] The "Eigenratio" of Self-Interpreters References: <44DA7C85.3060202@cs.cmu.edu> Message-ID: <000c01c6be81$91f2b9a0$0301a8c0@Owhango1> Greetings I'd like to share a small "discovery" that I have made in the last few days. I'm not sure if I am the first to "discover" this particular result or not. Perhaps it is already well known in some circles, or "obvious" to others (in which case I apologise in advance), but it is new for me, and I haven't been able to find anything else on the Net that appears similar. Therefore any feedback you may be willing to provide on its originality or otherwise, and any references to descriptions of similar material elsewhere would be appreciated! My "discovery" concerns what I will refer to (for reasons that will become clear later) as the "eigenratio" of a self-interpreter (or "meta-circular interpreter" if you prefer that terminology). It came about from some experiments I did after the contest ended, triggered initially by the organisers' releasing their reference implementation of an interpreter for the Universal Machine, in the form of a binary file (available from http://icfpcontest.org/um.um). This interpreter is also a "self-interpreter", able to run itself as well as other UM programs. In some earlier postings (rather grandiosely entitled "Clive's Conjecture") I mentioned some interesting (to me) things that I had noticed when timing how long it took for a stack of multiple copies of um.um to interpret some other top level programme, with some other interpreter at the very bottom of this "stack". To recap briefly, you can construct a stack of N um.um interpreters with some other program on top of that and then run the whole thing with some other valid interpreter (your own C implementation for example). If we then look at the ratio between the time taken to run the stack with N+1 interpreters to the time taken to run the stack with just N interpreters, it appears that the ratio converges to a value of around 24.5 as N increases. I.e. As N increases, N+1 levels of the interpreter will take about 24.5 times as long as N levels to run the same top level program. It was also apparent that the limit value of this ratio didn't change when a different top level program was used (on the very top of the stack). Later I added a few lines of code to my C interpreter so it would report exactly how many times each instruction in the UM instruction set was being executed at the lowest level (at the level where things were "really" happening!). I collected those numbers with different numbers of um.um in the stack, then put those values into a spreadsheet where I could experiment with assign more or less arbitrary, but still fixed positive "costs" to each instruction. Once again, it seemed that no matter what costs I assigned to individual instructions, in the end the ratio of the total "cost" for N+1 to just N levels of um.um running some other top level program still converged to the same value of around 24.5! I decided to implement my own simple virtual machine and instruction set, and then to write a self-interpreter for it (in the same style as um.um) but with the specific goal of trying to achieve a lower ratio than the 24.5 (in the limiting case) that um.um produced. I did this with a very simple machine with (initially) only 6 instructions and finally got my own self-interpreter running. It showed the same kind of behaviour as um.um when run as a stack of multiple copies of itself, but with a ratio of about 15.2. After examining the code for this I added another specialised instruction to replace a commonly used sequence of three others, and the updated VM and self-interpreter achieved a ratio of about 11.5. (Still a long way to go to get anywhere near the "Golden Ratio" that I plucked out of thin air for my earlier "conjecture" but at least some progress was being made in the right direction!) Another subsequent smaller tweak to the instruction set reduced the ratio to about 11.2. At this stage I decided to try a different approach, and after spending some time with pen and paper found a tidy way to describe and directly calculate the number of times each instruction would be executed in order to run a stack of N self-interpreters plus some other program on top. The solution was to use matrices. The structure of my own various self-interpreters was simple. A very short section of code executed once at start-up, and then a loop was entered that on each iteration decoded the op code for the next instruction from the program being interpreted (next level up) before branching to some code to emulate that instruction (adjusting memory addresses etc., along the way). Therefore for any instruction being interpreted, I could follow the execution path through the interpreter and determine exactly which instructions it executed and also how many times. From this I built a matrix M with m rows and m columns (each row and column corresponding to one of the m instructions in the instruction set). Each column of the matrix corresponds to the interpretation of one instruction from the instruction set, and the values in that column (reading the rows from top to bottom) are exactly the number of times each instruction is executed in the current level of the interpreter in order to handle that task. I.e. the value in row i and column j is the number of times instruction i has to be executed during the main loop in order to handle one instance of instruction j in the code being interpreted. Next, we need to construct an m by 1 vector (I'll call it P) where each value is a count of times each instruction is executed during a full run of the top level program. (The order of the values in this vector must match the order used for the instructions in the rows and columns in M also. In my case the op codes were integers from 0 to m-1 and the same ordering was used in rows/columns of the matrices.) Thirdly, construct one more vector (S) with the same dimensions as P where the values are again the numbers of times the various instructions are executed, but this time covering any code in the start-up section of the interpreter (before it enters its main loop to handle instructions from the program being interpreted). Now (using juxtaposition to represent matrix multiplication), we can say: P is the number of times instruction is executed to run the top level program. MP + S is the number of times each instruction is executed by a self-interpreter running the top level program. M(MP + S) + S = MMP + MS + S is the number of times each instruction is executed by a stack of two self-interpreters running the top level program. And so on for higher levels... So far, so good. Now I was able to use the Numpy package for Python (which provides support for various matrix operations) and after fixing a couple of errors in my matrices was able to reproduce exactly the same figures for counts of instructions executed as I had previously achieved but only by having my own C interpreter track them. I was pretty happy at that point, having found a nice tidy way to correctly describe with matrices how to predict the number of times each instruction would be executed for a given stack of N copies of some self-interpreter plus some other program on top of that. But now I found myself wondering what characteristic of the matrix M (ignoring S because it is very small in comparison and therefore has very little effect on the big picture) produced the apparently fixed ratios observed as described earlier. Now it's been a long time since I've had any need to use matrix determinants and so on but I at least I did vaguely remember the names of a few things of that nature, if not the actual uses of them! So more or less on a whim I googled "eigenvalues", found an online calculator, copied and pasted the contents of my own 'M' matrix, and clicked "calculate". Imagine my surprise when the first listed eigenvalue matched the (11.20...) ratio I had observed - to some number of decimal digits!! To cut a long story slightly shorter, I have now refreshed my memory on eigenvalues and eigenvectors and have also found the description of the "Power iterator" algorithm (http://en.wikipedia.org/wiki/Power_iteration) which seems to provide at least a basis for understanding why repeated multiplication by the matrix M eventually generates results that are essentially identical to multiplying by the magnitude of the dominant eigenvalue of M instead. In summary, as we let N increase towards infinity, the ratio of total instructions executed to run N+1 interpreters (plus a some other fixed program as the very top level), to the total instructions executed when running just N interpreters (plus the same top-level program), is very close to the magnitude of the dominant eigenvector for the matrix M that describes the internals of the interpreter (as described earlier). The eigenvalue equation (see: http://en.wikipedia.org/wiki/Eigenvalues#Eigenvalue_equation) tells us that: M.(eigenvector k) = (eigenvalue k).(eigenvector k) So (disregarding the effect of S) we can see that as we pile on more and more interpreters onto the stack, the vector holding the current number of times each instruction has been executed converges towards the "dominant eigenvector" for the matrix M (which describes the heart of the interpreter) and the ratio converges to the magnitude of the corresponding "dominant eigenvalue". So far I've just been talking about counts of instructions executed. We can however easily include different costs for each different instruction in another 1 by m vector. The dot product of this and the final counts vector gives the total "cost". But when you calculate the ratio using a formula that includes the costs it turns out to be easy to see that the terms cancel and the final ratio is still the same as what got earlier using just the number of instructions executed. Where to from here? At one point I was hoping that when I understood the relationship between a particular matrix M and the corresponding ratio, that it might help me find the smallest possible ratio - and therefore (in some sense) describe the "most efficient" self-interpreter possible. However, clearly M could (at least in a purely hypothetical sense) be the identity matrix and in that case the ratio is 1. Adding new levels of the interpreter wouldn't make anything run more slowly because if M equals the identity matrix (with ones down the leading diagonal and zeroes elsewhere) then we are also saying each instruction from level N+1 of our stack of programs can be emulated by a single instruction at level N. That doesn't make any sense to me without appealing to some kind of magic or perhaps some not yet discovered type of computing device that can do something for nothing. So the question of what the lower limit is on the "eigenratio" for any self-interpreter (and any "computing machine") is still an open question as far as I can see. Sorry for my verbosity! And thank you for your attention if you made it this far. :-) Clive Gifford From michaelw+icfp2006 at foldr.org Sun Aug 13 05:54:03 2006 From: michaelw+icfp2006 at foldr.org (Michael Weber) Date: Sun Aug 13 05:54:23 2006 Subject: [icfp-discuss] Re: A faster UM. In-Reply-To: <20060801094000.GA19452@panix.com> Message-ID: <20060813095403.GA2431@roadkill.foldr.org> * Jed Davis wrote: > At some point on the Friday of the contest, I thought about trying to > do dynamic generation of '386 machine code; and then I decided that it > wouldn't be useful enough and I shouldn't get sidetracked on it then. Jed, is your work available online somewhere? FWIW, as an experiment I wrote a UM in Common Lisp, and although the generated code is not as good as gcc's, it seems not very far off compared to various C(++) implementations I have seen: http://foldr.org/~michaelw/log/programming/lisp/icfp-contest-2006-vm I am solely relying on Garbage Collection for allocation. The next logical step to improve speed would be to get rid of the interpreter loop by compiling UM code to CL. From prior experience I'd guess this will give another 20%-30%, perhaps a little more if the compiler manages to keep things in registers during inner loops. I don't think I can get down to your times easily, though. Cheers, Michael