For some time, I have used Cython to accelerate parts of Python programs. One stumbling block in going from Python/NumPy code to Cython code is the fact that one cannot access NumPy's random number generator from Cython without explicit declaration. Here, I give the steps to make a pip-installable 'cimportable' module, using the threefry random number generator as an example.
(Note: a draft of this article appeared online in june 2017 by mistake, this version is complete)
The aim
The aim is that, starting with a Python code reading
import numpy as np
N=100
x=0
for i in range(N):
x = x + np.random.normal()
one can end up with a very similar Cython code
cimport numpy as np
cdef int i, N
cdef double x
N = 100
x = 0
for i in range(N):
x = x + np.random.normal()
With the obvious benefit of using the same module for the random number generator (RNG) with a simple interface.
This is impossible with the current state of NumPy, even though there is work in that direction ng-numpy-randomstate. This post is still relevant for other where Cython is involved contexts anyway.
The challenge
Building a c-importable module just depends on having a corresponding .pxd
file
available in the path. The idea behind .pxd
files is that they contain C-level (or cdef
level) declarations whereas the implementation goes in the .pyx
file with the same
basename.
A consequence of this is that Python-type (regular def
) functions do not appear in the
.pxd
file but only in the .pyx
file and cannot be cimport
ed in another cython
file. They can of course be Python import
ed.
The challenge lies in a proper organization of these different parts and of a seamless packaging and installation via pip.
Organization of the module
The module is named threefry
after the corresponding Threefry RNG random123. It contains
my implementation of the RNG as a C library and of a Cython wrapper.
I review below the steps, that I found via the documentation and quite a lot of trial and error.
Enable cimporting
To enable the use of the Cython wrapper from other Cython code, it is necessary to write a
.pxd
file, see the documentation on Sharing Declarations. .pxd
files can exist on
their own but in the present situation, we will use them with the same base name as the
.pyx
file. This way the .pxd
file is automatically read by Cython when compiling the
extension, it is as if its content was written in the .pyx
file itself.
The .pxd
can only contain plain C, cdef
or cpdef
declarations, pure Python
declarations must go the in .pyx
file.
Note: The .pxd
file must be packaged with the final module, see below.
The file threefry.pxd
contains the following declarations
from libc.stdint cimport uint64_t
cdef extern from "threefry.h":
...
cdef class rng:
...
meaning that the extension type threefry.rng
will be accessible via a cimport from other
modules. The implementation is stored in threefry.pyx
.
With the aim of hiding the implementation details, I wrote a __init__.pxd
file containing
the following:
from threefry.threefry cimport rng
so that the user code looks like
cimport threefry
cdef threefry.rng r = threefry.rng(seed)
and I am free to refactor the code later if I wish to do so.
Compilation information
To cimport
my module, there is one more critical step: providing the needed compiler flag
for the C declaration, that is providing the include path for threefry.h
(that must be
read when compiling user code).
For this purpose, I define a utility routine get_include
that can be called from the
user's setup.py
file as:
from setuptools import setup, Extension
from Cython.Build import cythonize
import threefry
setup(
ext_modules=cythonize(Extension('use_threefry', ["use_threefry.pyx"], include_dirs=[threefry.get_include()]))
)
Note: the argument include_dirs
is given to Extension
and not to cythonize
.
Packaging
The .h
and .pxd
files must be added via the package_data argument to setup
.
Wrapping up
In short, to make a cimport
-able module
- Move the shared declarations to a
.pxd
file. - The implementation goes in the
.pyx
file, that will be installed as a compiled module. - The
.pxd
and.h
files must be added topackage_data
. - A convenient way to obtain the include directories must be added.
All of this can be found in my random number generator package https://github.com/pdebuyl/threefry
The algorithm is from Salmon's et al paper Parallel Random Numbers: As Easy as 1, 2, 3, their code being distributed at random123. I wrote about it earlier in a blog post
Comments !
Comments are temporarily disabled.