## Mesoscopic simulations of nanomotors with OpenCL

The simulation of nanomotors is an integral part of my research. For this blog article, I have chosen to present nano-dimer, a simulation software written by my friend Peter Colberg. nano-dimer generates on the fly OpenCL code to perform Molecular Dynamics with a hybrid algorithm called MPCD to study chemically powered dimer nanomotors.

## Overview

I explain how the program nano-dimer by Peter Colberg is structured and how it benefits from a mixed Lua-OpenCL programming approach. nano-dimer is MIT licensed. More precisely, I take a look at the Lua code and explain what the variables represent and how the OpenCL code is called.

I am not reviewing the science behind the simulation. The homepage of the software has good references for the methods.

A bit of motivation though: Lua is a "powerful, fast, lightweight, embeddable scripting language" (from the homepage of Lua). It has simple yet powerful data structures and interfaces very well with C. The just in time implementation LuaJIT provides fast execution of loops which is useful for scientific codes.

More motivation: nano-dimer is the only open-source and documented implementation of chemically powered motors with a MPCD solvent. My own code is open but not documented and currently undergoing a large refactoring and rewriting.

## Installation and program structure

The installation page of nano-dimer contains all the needed information. To ease its reading, here are few guidelines:

1. Use a package manager to install: a recent version of GCC, LuaJIT, git, OpenCL, HDF5.
2. Install LuaRocks, as advised in the documentation of nano-dimer. You must install luarocks yourself to use luajit and cannot rely a priori on the system-provided version. If you lack administrator access to your machine, luarocks accepts the --local option to install in your home directory.
3. Use luarocks to install: opencl, hdf5, templet, ljsyscall, lua-cjson.
4. You are ready to download nano-dimer! (see the installation page). When writing this article the current git version was 1.0.0-57-g65987aa.

In the nano-dimer directory, type make test to check that everything is running. If that is ok, you may proceed to the next paragraph. It is good to know that make is not used to compile anything! There is nothing to compile prior to execution, a real convenience for development.

There are examples coming with the code, you may run them as directed in the README.

source examples/env.sh
cd examples/single_dimer/equilibration/
luajit single_dimer.lua config.lua

All the code that is common to dimer nanomotors is found in the nanomotor directory. This library is then used in several applications corresponding to different physical setups.

## What is going on?

What is happening when the code is run? Most of the code being organised in the library nanomotor, the file single_dimer.lua is really short. It starts by loading a number of lua modules and reads the file config.lua.

Then, "physical" components are initialized. The object box contains the size of the box and the minimum image function that is used for the periodic boundary conditions. Inspection of the file box.lua reveals a pattern that will be used repeatedly. The file defines a function that returns an object embedding data (e.g. L for the box size) and functions (e.g. mindist). When the code box = nm.box(args) is executed, the main function defined in box.lua is executed with arguments args and what is returned is the Lua table with name self in box.lua.

The object dom contains the data structure for the particles: the position, velocities, etc. As for the file box.lua, domain.lua defines a single function that returns data and functions in a single object.

The extreme flexibility of Lua is used to define the data structure and there is very little syntactical noise. One of the datastructure of interest is the table: a simple declaration allows you to collect objects into a containing table.

A table initiated as

mytable = { key1 = 3, key2 = 'a' , [5] = 17}

where key1 = 3 denotes a string attribute that can be retrieved via a dot-based syntax and [5] = 17 is the general form where the table's key is 5. Any type of Lua variable can be used as a key. You can test this in a luajit console, and retrieve the content via

print(mytable.key1);
print(mytable['key1']);
print(mytable[5]);

Lua regards the keys 1, 2, 3, etc as special. They form a sequence whose length is returned by #mytable. Accessing an undefined table key returns nil. Only string keys can be acceess via the dot syntax (i.e. mytable.5 does not work).

Given the layered architecture of nano-dimer, it is good to know where the data is actually stored. Take, for instance, the solvent position. It is stored in dom.rs as a chunk of memory of type cl_double3. The memory is managed by the ffi library of luajit. A corresponding buffer dom.d_rs is created for the OpenCL device. Data is moved between the two locations only if necessary.

The dimer and the solvent are then placed at random in the simulation domain. As the code is compiled just in time (JIT), this Lua code runs very fast!

Next, the integrate object is created. Again, this is a well-organized collection of data and functions. Jumping a few lines, we get to the observables table. This is a list of coroutines that are evaluated during the integration of the system.

The observables are run at regular intervals and this logic lies in the file observe.lua, with a detailed documentation. The function observe will integrate the system and let the observables be computed, with their inner state preserved between evaluations thanks to the coroutine machinery.

## So, this runs with OpenCL?

For now, we only touched at the Lua part of the code. But this code is advertised for OpenCL (thus targeting multicore CPUs, GPUs and other accelerator devices), so how does that work? To understand this part, I have chosen a short part of the code: species.lua and species.cl.

species.lua is structured similarly to other Lua files in the program: it defines a function that returns structured data and functions. One of the functions contains a OpenCL program (see the bit below).

local program = compute.program(context, "nanomotor/species.cl", {
dom = dom,
species = species,
})
-- ...
local kernel = program:create_kernel("species_sum")
-- ...
kernel:set_arg(0, dom.d_sps)

What this call does is load the OpenCL code in species.cl and run it through a templating engine (lua-templet, also by Peter Colberg) to fill in some variables, those that start with a dollar sign. The templating engine allows the execution of any Lua code within a template using a pipe at the beginning of a line. The code below, from species.cl, executes a for loop in Lua to generate several OpenCL statement, one per species.

|for i = 0, #species-1 do
if (sp == ${i}) count.s${i}++;
|end

This feature is used for the conditional generation of optional algorithm so that only the code that is relevant for the parameter set is compiled to OpenCL. In lj.cl, for instance, the absence of the parameter wall, the code for the wall potential is not even sent to the OpenCL compiler. More complex examples are found in random.cl and hilbert.cl.

The code is packaged into a kernel for which the arguments must be listed explicitly (here, dom.d_sps, the array containing the species of the solvent particles, is given as the argument number 0 for instance). This kernel can be enqueued for execution by the GPU.

This is where the magic happens: Lua is used to manage the data and organization of the code and to prepare OpenCL kernels with only the minimum amount of code necessary.