Pierre de Buyl's homepage

Developing a Cython library

For some time, I have used Cython to accelerate parts of Python programs. One stumbling block in going from Python/NumPy code to Cython code is the fact that one cannot access NumPy's random number generator from Cython without explicit declaration. Here, I give the steps to make a pip-installable 'cimportable' module, using the threefry random number generator as an example.

(Note: a draft of this article appeared online in june 2017 by mistake, this version is complete)

The aim

The aim is that, starting with a Python code reading

import numpy as np

N=100
x=0
for i in range(N):
    x = x + np.random.normal()

one can end up with a very similar Cython code

cimport numpy as np

cdef int i, N
cdef double x
N = 100
x = 0
for i in range(N):
    x = x + np.random.normal()

With the obvious benefit of using the same module for the random number generator (RNG) with a simple interface.

This is impossible with the current state of NumPy, even though there is work in that direction ng-numpy-randomstate. This post is still relevant for other where Cython is involved contexts anyway.

The challenge

Building a c-importable module just depends on having a corresponding .pxd file available in the path. The idea behind .pxd files is that they contain C-level (or cdef level) declarations whereas the implementation goes in the .pyx file with the same basename.

A consequence of this is that Python-type (regular def) functions do not appear in the .pxd file but only in the .pyx file and cannot be cimported in another cython file. They can of course be Python imported.

The challenge lies in a proper organization of these different parts and of a seamless packaging and installation via pip.

Organization of the module

The module is named threefry after the corresponding Threefry RNG random123. It contains my implementation of the RNG as a C library and of a Cython wrapper.

I review below the steps, that I found via the documentation and quite a lot of trial and error.

Enable cimporting

To enable the use of the Cython wrapper from other Cython code, it is necessary to write a .pxd file, see the documentation on Sharing Declarations. .pxd files can exist on their own but in the present situation, we will use them with the same base name as the .pyx file. This way the .pxd file is automatically read by Cython when compiling the extension, it is as if its content was written in the .pyx file itself.

The .pxd can only contain plain C, cdef or cpdef declarations, pure Python declarations must go the in .pyx file.

Note: The .pxd file must be packaged with the final module, see below.

The file threefry.pxd contains the following declarations

from libc.stdint cimport uint64_t

cdef extern from "threefry.h":
    ...

cdef class rng:
    ...

meaning that the extension type threefry.rng will be accessible via a cimport from other modules. The implementation is stored in threefry.pyx.

With the aim of hiding the implementation details, I wrote a __init__.pxd file containing the following:

from threefry.threefry cimport rng

so that the user code looks like

cimport threefry
cdef threefry.rng r = threefry.rng(seed)

and I am free to refactor the code later if I wish to do so.

Compilation information

To cimport my module, there is one more critical step: providing the needed compiler flag for the C declaration, that is providing the include path for threefry.h (that must be read when compiling user code).

For this purpose, I define a utility routine get_include that can be called from the user's setup.py file as:

from setuptools import setup, Extension
from Cython.Build import cythonize
import threefry

setup(
    ext_modules=cythonize(Extension('use_threefry', ["use_threefry.pyx"], include_dirs=[threefry.get_include()]))
)

Note: the argument include_dirs is given to Extension and not to cythonize.

Packaging

The .h and .pxd files must be added via the package_data argument to setup.

Wrapping up

In short, to make a cimport-able module

  1. Move the shared declarations to a .pxd file.
  2. The implementation goes in the .pyx file, that will be installed as a compiled module.
  3. The .pxd and .h files must be added to package_data.
  4. A convenient way to obtain the include directories must be added.

All of this can be found in my random number generator package https://github.com/pdebuyl/threefry

The algorithm is from Salmon's et al paper Parallel Random Numbers: As Easy as 1, 2, 3, their code being distributed at random123. I wrote about it earlier in a blog post

Comments !

Generated with Pelican. Theme based on MIT-licensed Skeleton.