hpc

Fast Interstellar Travel with NumPy and C

Stefan Seemayer

Sep 25, 2015 • 7 min read

featured-image

I absolutely love Python (and NumPy!) for the flexibility it gives me in quickly building up complex data structures and programs to arrive at interesting solutions. Still, sometimes handling larger amounts of data is too slow even after profiling my code and subsequent performance tuning. In those cases, going more low-level by using the NumPy ctypes interface can help reduce the runtimes by orders of magnitude.

I recently came upon an example where this runtime improvement by going to native C code was especially drastical and the amount of work necessary to arrive at this improvement was extremely low. Since this example also involves internet spaceships, I thought it might be interesting for people to read about.

Internet spaceships and competitive advantage

In the MMORPG Eve Online, players pilot a vast variety of spaceships through a huge universe, looking for small to gigantic fights or to profit from the completely player-driven economy. What I love especially about Eve is that the developers make a lot of data about the universe available through static database dumps and an API, so a vast ecosystem of player-created tools has sprung up around the game. As a programmer playing the game, it's hard to resist the temptation to roll your own tools to give you an advantage, so I soon started working on my own projects.

Fast travel planning through space

A crucial functionality of many Eve programs that deal with the universe as a whole is to compute the shortest possible connections between different solar systems, e.g. to compute travel times. In computer science terms, we want to be able to compute the all-pair shortest paths between all ~5000 solar systems in "known space" (i.e. every non-wormhole system).

A classical algorithm for solving this problem is the Floyd-Warshall algorithm which has a runtime complexity in O(N^3). A simple back-of-the-envelope calculation of (5.0 * 10^3)^3 = 1.25 * 10^11 operations says that this is still in a very manageable order of magnitude (a Sandy Bridge Core i7 Quad-core processor can do up to 5.0 * 10^10 floating point operations per second), so I started implementing the Floyd-Warshall algorithm in Python.

Preparing the data

As input, I've processed the static database dumps to create two files:

systems.csv - a list of all solar systems renumbered so that the first system has an index of 0. Additionally, all wormhole systems are filtered out since their connections change all the time and are not publically known.
jumps.csv - a list of all connections between solar systems, denoted as a tuple of two solar system indices.

Using the jumps list, we can initialize two two-dimensional matrices of 16 bit unsigned integers distance and traceback. The matrix distance[i, j] will hold the shortest distance (in jumps) between solar systems with index i and j while traceback[i, j] will give us an index k of a solar system that lies on the shortest path, information that can be used to reconstruct the shortest path after finishing the algorithm. First, we need to fill these matrices with the immediate jump information from jumps:

import numpy as np

# the maximum value for an uint16 denotes that two solar systems are not connected
NO_ROUTE = 2 ** 16 - 1 

def init_nav(n_systems, jumps):
	"""Initialize the navigation matrices"""
    
    # start out with all systems having no connection
	distance = np.empty((n_systems, n_systems), dtype=np.uint16)
    traceback = np.empty((n_systems, n_systems), dtype=np.uint16)
    
	distance[:] = NO_ROUTE	 
    traceback[:] = -1
	
    # fill direct jumps into navigation matrices
    for from_system, to_system in jumps:
    	distance[from_system, to_system] = 1
        distance[to_system, from_system] = 1
        traceback[from_system, to_system] = to_system
    	traceback[to_system, from_system] = from_system
    
	return distance, tracelsback

Version 1: Python+NumPy

With the data in a form that is convenient for the actual Floyd-Warshall algorithm, we can now compute shortest paths between all of the systems by successively updating the distance and traceback matrices with paths that go over intermediate solar systems:

def floyd_warshall_v1(distance, traceback):
	n_systems = distance.shape[0]
    
    for k in range(n_systems):
    	for i in range(n_systems):
        	for j in range(n_systems):
            	if distance[i, k] == NO_ROUTE: continue
                if distance[k, j] == NO_ROUTE: continue
                
                dist_ikj = distance[i, k] + distance[k, j]
            
            	if distance[i, j] > dist_ikj:
                	# it is shorter to go i->k->j than directly from i->j
                    distance[i, j] = dist_ikj
                    traceback[i, j] = k

Looks nice, and we're even using NumPy data structures for good performance, right? Let's see how long this thing will need to compute the navigation data!

(Bioinformaticians might want to replace "compiling code" with "running analyses". Comic from xkcd)

After a while of doing different things to occupy myself, I've actually stopped the benchmark since less than 1% of the computation was done and I didn't feel like waiting any longer. Extrapolating the necessary runtimes, though, it would have taken more than 15 days to compute the optimal routes on my system.

You might say that it would be OK to leave the computer running for a few days and then only use the cached results from then on. By doing this, though, we're sacrificing a lot of flexibility since avoiding certain solar systems or connections would require a re-computation of the navigation network. Besides, even if we're just computing the navigation graph once it would still be worth the time to implement something faster since we can easily be done before the original, slow implementation finishes computing. So let's see if C can help us compute the navigation data more quickly!

Version 2: Python+Numpy+C

Since we've seen that version 1 is too slow to run in practice, let's see if we can get faster by implementing the Floyd-Warshall algorithm as a simple C program:

#include <stdio.h>;
#define NO_ROUTE 65535

void floydwarshall(int n, unsigned short* distance, unsigned short* traceback) {
	for(int k = 0; k < n; k++) {
		for(int i = 0; i < n; i++) {
			for(int j = 0; j < n; j++) {
				unsigned short ik = distance[i * n + k];
				unsigned short kj = distance[k * n + j];
				if(ik != NO_ROUTE && kj != NO_ROUTE) {
					if(distance[i * n + j] > ik + kj) {
						distance[i * n + j] = ik + kj;
						traceback[i * n + j] = k;
					}
				}
			}
		}
	}
}

18 lines of code. Pretty straightforward, right? Since NumPy uses C arrays internally to store the array contents, no conversion of data is necessary and we only need to do a super-simple translation of the main Python algorithm to C.

We compile the C code into a shared library, e.g. using gcc:

gcc -shared -Wl,-soname,libfloydwarshall.so -o libfloydwarshall.so floydwarshall.c

Now we can use the NumPy ctypes interface to pass our distance and traceback matrices directly to the C code. We only need to let ctypes know the interface of our C method:

import numpy as np
import numpy.ctypeslib as npct
import ctypes
import os.path

# define a new type for a 2D array of uint16
array_2d_uint16 = npct.ndpointer(dtype=np.uint16, ndim=2, flags='C_CONTIGUOUS')

# load shared library
libfw = npct.load_library('libfloydwarshall', os.path.dirname(__file__))

# define result and argument types for the C floydwarshall function
libfw.floydwarshall.restype = ctypes.c_voidp
libfw.floydwarshall.argtypes = [
    ctypes.c_int,               # n
    array_2d_uint16,            # distance
    array_2d_uint16             # traceback
]

# lightweight python wrapper for the C floydwarshall function
def floyd_warshall_v2(distance, traceback):
    libfw.floydwarshall(distance.shape[0], distance, traceback)

The numpy arrays will be passed by reference directly to the C code and their contents will be updated in place. After the C code returns, all navigation data is calculated and stored in distance and traceback.

Computing the all-pairs shortest paths for all of known space using this code takes 3 minutes on my system so it's approximately 7700 times faster than the Python code. Not bad :)

Note: there is still a lot of potential here for easy optimizations e.g. by using SIMD intrinsics (check out the awesome Intrinsics Guide by Intel and don't forget about AVX/AVX512!) or by running it on multiple CPU cores using e.g. OpenMP. For me, 3 minutes is good enough, though.

Calculating actual paths

In order to be able to use this as a complete Eve Online navigation system, a few more pieces of code are needed and I will give them here for the sake of completeness.

After the Floyd-Warshall algorithm, we already know the distance between all solar systems in jumps as given by the distance[i, j] matrix. In many cases, however, the actual optimal route might be of interest. We can calculate it by using the traceback matrix and the following recursive code:

def get_path(traceback, from_system, to_system):
	def inner(frm, to):
    	mid = traceback[frm, to]
        if mid == to:
        	return [to]
        else:
        	return inner(frm, mid) + inner(mid, to)
    return [from_system] + inner(from_system, to_system)

Giving this function the traceback matrix and two indices for a start and end system will return a list of all systems on the way. Now we can open up our interstellar travel agency!

Summary

Python is an amazingly productive programming language, but for cases where fast runtimes are important, even libraries such as NumPy might not be fast enough. Instead of re-implementing the whole analysis in a faster lower-level language, however, it can help to replace only the performance bottlenecks with fast C code and we still get most of the speed benefits. Doing this is a lot easier as you might think when using the wonderful NumPy and ctypes interfaces.