Unlike me, my wife uses computers for serious calculations to answer important questions. One of these problems involves taking the cross-correlations of the continuous seismic data of hundreds of measuring stations. A computation that can take days or even weeks. Interestingly, it turns out that a large part of this time is spent loading data from disks into memory. To better understand the problem, let us take a high-level look at the calculation of cross-correlations in Pythonesque pseudo-code:

```# given is some list of stations, say a list of names of the data files
stations = [..]

# correlate, one by one, each unique pair of stations
for i in range(len(stations)) :
for j in range(i+1, range(len(stations))) :
# the following correlates both the pair (i,j) and (j,i)
computation(stations[i], stations[j])
```

The pseudo-code traverses all possible combinations of two stations and computes the so called cross-correlation. Each station’s signal consists of a large data-file that is stored on disk and must be loaded into memory to compute the correlation. Loading memory from disk is an expensive operation, because even modern disks are slow compared to the processor and memory. Luckily the system caches data, i.e., it reuses data that is still stored in memory in stead of reloading it. But the system has limited and varying (on a shared system) memory capacity. When the memory is full, data that is not yet in the memory will replace the least recently used data.

Thus the order in which we access the station’s data – the so called schedule – determines the number of disk accesses. The question is, what schedule minimizes the number of disk reads and maximize the use of the cache. Moreover, what can we do without knowing the size of the memory?

As I found out two years ago the Sierpinski fractal curve provides a good, and probably asymptotically optimal, schedule. For my and (hopefully) your amusement, I’ve now rewritten some of the experiments in the programming language Elm to present the elegant Sierpinsky curve and its discretized brother the H-index.

## Computing the Cost of a Schedule

A schedule can be visualized as a path through a square matrix where each cell consists of a combination of two stations. Such a combination is called a job. Only the lower (or upper) triangle of the matrix need to be visited as we consider (3,4) a duplicate of (4,3). The pseudo-code above defines the following schedule that we call the scanline schedule.

The displayed schedule thus starts correlating station 1 with itself, then correlates 1 with 2, 2 with 2, etc. The cost – in terms of the number of disk accesses or cache misses – of this schedule can be computed by simulating the execution while keeping track of the cache’s contents and counting the number of times we load a station’s data that is not yet in memory. Running such a simulation with memory capacity for five elements gives the following cumulative cost for the successive executions.

The cumulative cost shows that the scanline schedule initially (up to 15) performs quite well, but once the horizontal stretches of the schedule are longer than 5 each new job results in a cache miss. In fact, one can do quite a bit better by making shorter stretches.

Give it a try in the scheduler below! Click on the circles in the matrix to schedule a job and on + and – in the upper right corner to increase or decrease memory capacity. The goal is to minimize total cost of the schedule.

To help you make a path, unscheduled elements in the path have been marked and colored with their cost.

## The Sierpinsky Curve or H-Index

It turns out to be quite challenging to find the best schedule with the least disk accesses. Some experimenting also shows that the performance of a schedule changes quite a bit for different memory sizes. That complicates our task, especially considering that memory capacity may change during execution. For example, when the correlations are done on a supercomputer that is shared by other scientists running similar experiments.

However, one quickly notices that it is generally better not to make large jumps. That is, to make a schedule where the most recent jobs are close together in the matrix. This observation leads to a divide and conquer strategy, where we subdivide the triangle of jobs into smaller triangles and schedule the jobs in the first triangle before the second. Applying this pattern recursively, we get the following sequence.

In addition to the subdivision of triangles, a line is drawn that visits each of the triangles exactly once. The line is the (lower triangle of the) Sierpinsky curve a fractal curve that, when the subdivision is repeated infinitely, will visit each point in the triangle without every crossing itself.

Now, consider the triangle of jobs to be scheduled and subdivide it recursively in smaller triangles until all triangles consist of exactly one job. The Sierpinsky curve through the triangles now describes a schedule!

Well almost: the jobs cannot properly be subdivided in triangles because jobs cannot be split. Therefore, we use the discrete incarnation of the Sierpinski curve that is called the H-index. In this context “index” is another word for schedule and the H stands for the shape the pattern takes: there’s an H on its side in the lower right corner. The H-index defines the following schedule:

The strength of the H-index is that it has very good locality: any two jobs that are scheduled closely together (in time) will also be close together (in distance) in the matrix of jobs. Locality is important because jobs that are close in the matrix are more likely to be in the cache.

How much better does the H-index do? The following graph plots the cost in termes of cache misses of each schedule for a case with 64 stations and therefore 576 jobs for increasing memory capacity.

The range is limited from 2 to 64: one needs to be able to store at least two and if memory has capacity for 64 stations all schedules will have cost 64 because every job has to be loaded exactly once. The cost of the scanline algorithm is dominated by a linear relation also seen in the last part (after job 16) in the previous plot. One may also note that the H-index performs best when the memory capacity is a power of two, note in particular the low points at 4, 8 and 16. In between those spots the H-index becomes slightly irregular but generally the plotted curve quicly approaches the minimum number of loads.

## Conclusion

Hopefully the final plot convinces you that the Sierpinski curve, or more accurately, H-index performs much better than the scanline schedule. However, I have not proven that the H-index is the optimal schedule. In fact, one can easily show that for certain memory sizes, such as in the manual scheduling exercise, it is possible to do (slightly) better than the H-index. Rolf Niedermeiera, Klaus Reinhardt and Peter Sanders, who invented the H-index, have given strong evidence that it is asymptotically optimal.

Does this mean we should remove all matrix traversals based on nested for loops – the scanline schedule – and replace them with with a H-index based traversel? Probably not. Much has been done to optimize such loops in compilers to fine-tune their caching behavior. The experiment presented here only took into account a very special case and has many assumptions on the behavior of the cache that will not hold for many systems.
For example, cpu caches typically use different replacement strategies; they do not simply replace the least recently used element if it is full. Yet, the Sierpinsky curve and H-index have found their way into many computing-intensive applications, from image processing to databases, and may also benefit you.

This post is too small to introduce a practical incarnation of the curve. An attentive reader may have noticed that I’ve only used examples where the number of stations is a power of two. This is no coincidence: the H-index is really only defined for those sizes. One can, however, adapt the index for larger sizes in ways that most of the schedule will be according to the h-index. Practical applications and their empirical evaluation is left to future work.

## Post Scriptum

As for my previous project in Elm, I had fun writing this. Working with such a new language is an interesting experience: one one hand you are exploring new ways to do things, but on the other hand you have to reinvent the wheel all the time. It certainly makes you appreciate the time and effort spent by others on libraries in more mature languages. For example, I spent an inordinate amount of time on the plots, at times resulting in atrocious code.

You are free to look at/steal the code but you’re going against doctor’s orders.

Last year I visited family in Vienna. One rainy afternoon we decided to play a very German dice-game called mensch ärgere dich nicht, which means something like: don’t work yourself up, man. Despite her wheelchair my grandmother beat us three times in a row and I was annoyed.

I decided I should figure out this game and find out if there is a simple strategy that works. Understand that this is an exceedingly simple game. And, as you will see, it is not even clear if there is much of a strategy; the outcome may be determined solely by the dice. My idea is as follows: to devise a number of simple strategies and let the computer play them to figure out the best ones. It also provided me with an excuse to try the new programming language Elm.

## The rules of the game

The game is played by four players (red, blue, green and yellow) each with four pieces. A player wins if she manages to move all her tokens to the home row from the starting position, traversing the full board in a clockwise direction. Without further ado, play a game.

You are the red player and are lucky to start with a six. You have to click on the board to let the AI move.

The rules are as follows:

• Players play in turns in clockwise order starting with the person that wins a first roll.
• A turn starts with a die roll.
• If the player rolls six she can move a piece onto the game. Otherwise she may move one token the exact number of steps in clockwise direction. If a token is moved on top of token, the other token is returned to the starting position and has to start over.
• In the home row, tokens ‘bounce’ off the end, reversing direction for the remaining steps in a throw
• If possible, one should always move.
• Whenever the player rolls a six, she may do another turn.
• If no token can be moved, the turn ends without moving.

## Looking for safe havens

My first thought was to see if there are places which are safer than others. To this end we simulate a game with AI players and count the number of times each a place is occupied by a token.
This leads to the following heat-map (redder if there are more combinations). Click to start the simulation.

The colors have been normalized such that red has the highest count and green the lowest. The simulation is executed with a different random seed every time this page is loaded and I cannot point to specific results.The heatmap is, however, likely so show that most places are equally likely to be occupied and in many cases that the last place before each player’s home row is very much occupied. The latter is to be expected because of tokens being blocked by others in the homerow or bounced back.

Many more simulations could be done to acquire more information, e.g., where are tokens more likely to be hit, but I’d rather just go on to devise strategies first.

## Towards a human playable strategy

As there is so little choice in the game, especially since there usually are no more than two tokens on the board, it is hard to devise any strategy. Moreso to devise one simple enough for a casual player. Computers can even have (meaningful) strategies for rock-paper-scissors but humans are awfully limited. What I am looking for is a simple strategy that is easy to remember and execute. I especially do not want to count the odds of every move while playing.

To compose a strategy, I propose to combine a tactic and a style. The tactic determines the position on the board by choosing the token to move. The playing style determines if I play aggressively or defensively. For the fist, I propose the following simple tactics:

• The eager tactic where, if possible, we always advance the foremost token and otherwise the second token, etc.
• The hedge tactic where we always advance the hindmost token if possible, and otherwise the before-hindmost token, etc.

These three tactics aim for different positions on the board: the eager tactic tries to reach home as fast as possible by investing in only one token, and the hedge tactic is to spread investment over all your tokens so that one token being hit costs less.

The tactics completely disregard the opponent’s tokens: it only proposes an order by which to consider moving your own token. Therefore, I propose to combine these with the two natural playing styles:

• The aggressive style where we choose to hit an opponent whenever we can.
• The defensive style where we first consider the tokens that are in immediate danger of being hit but would be safe if moved and then the tokens that will be safe after moving.

We can now compose a strategy by choosing a tactic and style: The style determines what tokens are under first consideration for a move and the tactic is used to choose one of the moves. If the style has no good candidates, we consider all tokens.

For example, with a defensive style and eager tactic, we first consider all tokens that are in danger and would become safe after moving them. Of those tokens, we move the foremost (as dictated by the runner tactic). If, however, there are no tokens in danger, we move the foremost of all tokens that will be safe after moving. If there are no tokens whatsoever, we move the foremost of all tokens.

We can combine two styles with a priority. With a aggressive-defensive style, we first consider the tokens that can hit an opponent and be safe, if there are no such tokens we choose among the tokens that hit an opponent, if there are no such moves either, we choose according to the defensive style. With a defensive-aggressive style, we first consider the safe and aggressive moves, but then the safe ones first and only then the aggressive moves.

Thus we have defined four strategies, but who will rule them all? Only one way to find out…

## Battle-testing the strategies

It is no coincidence we have four AIs: this way we can conveniently let them play against each other and tally their respective wins.The following chart shows the result of running a hundred games with random starting locations with

• the aggressive-defensive eager strategy in blue;
• the defensive-aggressive eager strategy in yellow;
• the aggressive-defensive hedge strategy in red; and
• the defensive-aggressive hedge strategy in green.

Click to start the simulations. They make take a while!

Again, this test is executed with a different random seed on every page-load and you can refresh until you obtain a result that fits your beliefs. In most simulations the blue and yellow eager strategy with either playing style wins more often than the hedging strategy and the defensive-aggressive style usually trumps the aggressive-defensive style.

## Conclusion

Overall the yellow defensive-aggressive eager player wins the most games and seems to be the most advisable strategy. It also suggests that playing defensively rather than aggressively is better for both the eager and hedge strategies. That said, you’re not really likely to notice the difference when playing one or two games.

Of course this test is rather unsophisticated and statistically unsound. Not to mention the fact that there are many more possible strategies that could be tested. Much future work ™ is required to get any definitive answer. First and foremost, one should test the strategies against an AI that plays random moves and see if they are significantly better.

## Post Scriptum

Despite the utter pointlessness of this game, I had fun writing the game and the experiments.
Along the way I learned much about

The code is available on github, but beware it was written with the diligence of your average research project.

Now, if you would excuse me, I’ve got to play some games.

The following article gives a taste of the work that I’ve done for my master’s thesis at the University of Utrecht under the supervision of Andres Löh. If you are interested, feel free to have a look at the source files.

## Introduction

Programming languages are often specified only in an informal manner; in the available documentation, the language behaviour is described by examples and text. Only the implementation, a compiler or interpreter, describes the exact semantics of constructs.

Python is no different. It is described by an informal manual and a number of implementations. Notably the original implementation CPython and PyPy, an implementation of Python in Python. No systematic, formal descriptions of its semantics are available.

We developed a formal semantics for a comprehensive subset of Python called minpy. The semantics are described in literate Haskell, which are compiled to an interpreter as well as a formal specification.

## Rewriting an abstract machine

The operational semantics are defined by state transitions of an abstract machine. A machine state consist of a heap Θ, an environment stack Γ, a control stack S, and an instruction ι. State transitions are defined by rewrite rules that transform the machine state. These rewrite rules have the following shape.

The heap represents the memory of the abstract machine. It is a mapping of addresses to values. Many different kinds of values are stored on the heap, for example integers, strings, lists and objects.

Even the environment is partly stored on the heap. The environment Γ consists of a stack of addresses, which correspond with the scoping blocks that have been entered. The addresses point to mappings on the heap, which in turn map variables to addresses.

The control stack is a stack of continuation frames, i.e., frames that indicate what to do next. One of the traditional functions of a control stack (also known as call stack) is to store the return pointer of a function call that is being executed.

The instruction is the abstract equivalent of a program counter. The instruction can be a piece of a program, such as an expression, statement, or a complete block. But, for example, when an expression has been evaluated it can also hold the resulting address.

## Executing an assignment

To show what the rewrite rules look like we will have a look at the semantics of variable assignments. Starting with the execution of a variable assignment statement x = e; as instruction, a heap Θ, an environment Γ and a stack S.

The state is rewritten to a new state that is virtually the same, except that a stack frame x = o is pushed onto the control stack and the expression e is the new instruction. The circle o in the new stack frame indicates that the expression which place it took, will be evaluated.

Once the expression has been evaluated, we end up with a state that has an address instruction a, a stack the assignment frame on top, an environment stack with an address γ₁ on top.

The address γ₁ on top of the environment stack points to the variables bound in the current scope. In the new state, the mapping that contains the bindings is changed to include the new assignment. The environment stack however, remains unchanged. The top stack frame is removed and the result of the assignment statement is None, or rather an address pointing to None.

The changes to the heap are not very intuitive, so let us be more precise: the existing environment mapping bound at γ₁ is overwritten with the old mapping Θ(γ₁) that itself has been extended with the mapping [x → a]. The notation of Θ(γ₁) represents the lookup of γ₁ on the heap Θ. The circled addition operator extends the mapping on its left hand side with the mapping on its right hand side. If the both sides of the circled addition operator have a mapping of the same variable or address, the right hand side takes precedence.

## Literate programming

The semantics of minpy are described in literate Haskell, a syntax extension of Haskell that allows us to mix Haskell and LaTeX code in the same file. The Haskell compiler GHC will, when confronted with a literate Haskell file, simply ignore all the Latex code. Latex however, is not natively aware of the literate Haskell code, but can be instructed to include the Haskell code verbatim in the document.

To exploit the advantages of LaTeX typesetting, we preprocessed the literate Haskell sources with lhs2TeX. This tool transforms Haskell source code into a nicely formatted LaTeX equation. It handles horizontal alignment across multiple lines of code and allows us to customize the transformation through simple macros.

In order to hide all traces of Haskell from the formatted rewrite rules, we have pushed lhs2TeX to its limits. To achieve the current results we introduced an additional preprocessing stage, implemented by a simple Perl script. Furthermore, we had to accept certain restrictions on the Haskell vocabulary and adhere to a number of coding conventions. For example, every state has to be on a single line to get proper alignment. Some rules do not even fit on a 30″ monitor.

The resulting source code is nevertheless quite readable, especially when compared to any equivalent LaTeX code. For instance, the above rule for variable binding is defined by the following code.

``````rewrite  (State  ( heap                                              )  ( envs :|: env_1 )  ( stack :|: AssFrame x )  ( BwResult a      )) =
(state  ( heap <+> [env_1 |-> get heap env_1 <+> [x |-> a]] )  ( envs :|: env_1 )  ( stack                )  ( BwResult a_None ))``````

Note that, while compiling the document requires several steps, GHC can still compile the source code in a single step. This allows us to experiment with the semantics, while ignoring the presentation.

## The interpreter

The sources of the thesis and some supporting code that implements a parser, pretty printer, and an interactive interpreter, can be compiled by the Haskell compiler GHC to produce an interpreter. Like the interpreter of CPython, this interpreter has both a command-line modus and the ability to interpret files.

The following example shows a session of the interpreter in its command-line modus. First, we declare the factorial function in a single function definition statement. The statement is immediately executed if parsing succeeds. Next, we call the factorial function without an argument, which results in a TypeError exception. Finally we call it with a correct argument and exit the interpreter.

```gideon@gideon-desktop:~\$ minpy
>>> def fac(x) :
...     if x <= 0 :
...         return 1
...     else :
...         return fac(x-1) * x
...
>>> fac()
TypeError
>>> fac(4)
24
>>> exiting minpy```

All this will be more than familiar to a Python programmer, except for the rather meager error message. The minpy interpreter can also print a trace of the abstract machine revealing any changes after every rewrite step. This has proven to be very useful when debugging the semantic rules.

The interpreter, and by proxy the semantics, was systematically tested by a test suite written in pure Python, or actually, minpy. Executing the suite with CPython results in only one failed test, as it is the reference for our semantics. Our interpreter still fails for 13 tests.

## Scope of the semantics

As mentioned in the introduction, minpy includes a significant number of Python constructs. Functions, generators, objects and classes, operators, exceptions, if, while, for, exec, and import statements are supported, although most only in a simplified form. Class declarations, for instance, must always specify the bases of the class.

Currently, minpy does not have any garbage collection, support for concurrency or foreign functions. It should also be noted that, while minpy does support multiple inheritance, the implementation of metaclasses is still incomplete.