License: CC-BY
ActivePapers is a technology developed by Konrad Hinsen to store code, data and documentation with several benefits: storage in a single HDF5 file, internal provenance tracking (what code created what data/figure, with a Make-like conditional execution) and a containerized execution environment.
Implementations for the JVM and for Python are provided by the author. In this article, I go over the first steps of creating an ActivePaper. Being a regular user of Python, I cover only this language.
An overview of ActivePapers
First, a "statement of fact": An ActivePaper is a HDF5 file. That is, it is a binary, self-describing, structured and portable file whose content can be explored with generic tools provided by the HDF Group.
The ActivePapers project is developed by Konrad Hinsen as a vehicle for the publication of computational work. This description is a bit short and does not convey the depth that has gone into the design of ActivePapers, the ActivePapers paper will provide more information.
ActivePapers come, by design, with restrictions on the code that is executed. For instance, only Python code (in the Python implementation) can be used, with the scientific computing module NumPy. All data is accessed via the h5py module. The goals behind these design choices are related to security and to a good definition of the execution environment of the code.
Creating an ActivePaper
The tutorial on the ActivePapers website start by looking at an existing
ActivePaper. I'll go the other way around, as I found it more intuitive. Interactions with
an ActivePaper are channeled by the aptool
program (see the
installation notes).
Currently, ActivePapers lack a "hello, world" program, so here is mine. ActivePapers work best when you dedicate a directory to a single ActivePaper. You may enter the following in a terminal:
mkdir hello_world_ap # create a new directory
cd hello_world_ap # visit it
aptool -p hello_world.ap create # This lines create a new file "hello_world.ap"
mkdir code # create the "code" directory where you can
# write program that will be stored in the AP
echo "print 'hello, world'" > code/hello.py # create a program
aptool checkin -t calclet code/hello.py # store the program in the AP
That's is, you have created an ActivePaper!
You can observe its content by issuing
aptool ls # inspect the AP
And execute it
aptool run hello # run the program in "code/hello.py"
This command looks into the ActivePapers file and not into the directories visible in the filesystem. The filesystem acts more like a staging area.
A basic computation in ActivePapers
The "hello, world" program above did not perform a computation of any kind. An introductory example for science is the computation of the number $\pi$ by the Monte Carlo method.
I will now create a new ActivePaper (AP) but comment on the specific ways to define parameters, store data and create plots. The dependency on the plotting library matplotlib has to be given when creating the ActivePaper:
mkdir pi_ap
cd pi_ap
aptool -p pi.ap create -d matplotlib
To generate a repeatable result, I store the seed for the random number generator
aptool set seed 1780812262
aptool set N 10000
The line above store a data element in the AP that is of type integer. The value of seed
can be accessed in the Python code of the AP.
I will create several programs to mimic the workflow of more complex problems: one to generate the data, one to analyze the data and one for generating a figure.
The first program is generate_random_numbers.py
import numpy as np
from activepapers.contents import data
seed = data['seed'][()]
N = data['N'][()]
np.random.seed(seed)
data['random_numbers'] = np.random.random(size=(N, 2))
Apart from importing the NumPy module, I have also imported the ActivePapers data
from activepapers.contents import data
data
is a dict-like interface to the content of the ActivePaper and so only work in code
that is checked in the ActivePaper and executed with aptool
. data
can be used to read
values, such a the seed and number of samples, and to store data, such as the samples here.
The [()]
returns the value of scalar datasets in HDF5. To have more information on this,
see the
dataset documentation
of h5py.
The second program is compute_pi.py
import numpy as np
from activepapers.contents import data
xy = data['random_numbers'][...]
radius_square = np.sum(xy**2, axis=1)
N = len(radius_square)
data['estimator'] = np.cumsum(radius_square < 1) * 4 / np.linspace(1, N, N)
And the third is plot_pi.py
import numpy as np
import matplotlib
matplotlib.use('PDF')
import matplotlib.pyplot as plt
from activepapers.contents import data, open_documentation
estimator = data['estimator']
N = len(estimator)
plt.plot(estimator)
plt.xlabel('Number of samples')
plt.ylabel(r'Estimation of $\pi$')
plt.savefig(open_documentation('pi_figure.pdf', 'w'))
Notice:
- The setting of the
PDF
driver for matplotlib before importingmatplotlib.pyplot
. - The use of
open_documentation
. This function provides file descriptors that can read and write binary blurbs.
Now, you can checkin and run the code
aptool checkin -t calclet code/*.py
aptool run generate_random_numbers
aptool run compute_pi
aptool run plot_pi
Concluding words
That's it, we have created an ActivePaper and ran code with it.
For fun: issue the command
aptool set seed 1780812263
(or any number of your choosing that is different from the previous one) and then
aptool update
ActivePapers handle dependencies! That's is, everything that depends on the seed will be updated. That include the random numbers, the estimator for pi and the figure. To see the update, check the creation times in the ActivePaper
aptool ls -l
It is good to know that ActivePapers have been used as companions to research articles! See Protein secondary-structure description with a coarse-grained model: code and datasets in ActivePapers format for instance.
You can have a look at the resulting files that I uploaded to Zenodo: doi:10.5281/zenodo.55268
References
ActivePapers paper K. Hinsen, ActivePapers: a platform for publishing and archiving computer-aided research, F1000Research (2015), 3 289.
ActivePapers website The website for ActivePapers
Comments !
Comments are temporarily disabled.