Pierre de Buyl's homepage

For the unaware reader, the Journal of Open Source Software (JOSS) is an open-access scientific journal founded in 2016 and aimed at publishing scientific software. A JOSS article in itself is short and its publication contributes to recognize the work on the software. I share here my point of view on what makes some software tools more ready to be published in JOSS. I do not comment on the size or the relevance for research which are both documented on JOSS' website.

What are the requirements?

JOSS lists the following requirements:

The software should be open source as per the OSI definition.
The software should have an obvious research application.
You should be a major contributor to the software you are submitting.
Your paper should not focus on new research results accomplished with the software.
Your paper (paper.md and BibTeX files, plus any figures) must be hosted in a Git-based repository together with your software (although they may be in a short-lived branch which is never merged with the default).

and a few others about how the software is hosted.

Below those formal requirements, the documentation also states that JOSS submissions should represent a significant scholarly effort. That criterion, if necessary, is assessed by the editorial board.

It is difficult to have guidelines that cover 100% of software types: obvious algorithm implementation, package suited to a specific field or application, generic data processing tools, simulation codes, etc. With the goal of being transparent about the criteria, the reviewing process revolves around a checklist. Make sure to read it if you consider submitting an article to JOSS.

Our documentation does not go into the detail of code coverage, unit testing, integration testing, API design, extent of the documentation, package managers, etc. This is for a good reason: those depend on the type of code, on the programming language, and on the foundational nature of the code. Regarding this last item, I recommend Konrad Hinsen's article about software collapse (Hinsen 2019). Hinsen defines the layers of the scientific software stack. The first layer consists of the operating system, the compilers, etc. The fourth and uppermost layer is the project-specific software, including scripts and analysis notebooks. Before getting the submission into review, or during the review, the editors and reviewers will think of how your code will be used and include that consideration in their assessment.

As an example of how I would take the depth of scientific codes into account, I would be less strict on the good practices for a code performing a specific task in a "sub field" of research than for NumPy or a generic Monte Carlo package, for instance.

Good practices for Python

A generic advice that I would give to all scientists developing software, is to adopt the best practices for the chosen programming language. I use Python as an example here, as I know it well. I deliberately give principles and do not enter the discussion of whether you should use a setup.toml or setup.py file for instance. Consider the description below a guide more than a set of strict rules that would lead to rejection in case of non-conformance.

An overall introduction to good practices can be found in the article "Good enough practices in scientific computing" [(Wilson et al 2019)].

Code organisation

In the code repository, the package "P" will typically be under the directory P that contains the full module. Apart for single-file modules, the entry point for the package is P/__init__.py.

The root directory contains the setup file (here setup.py is the most common choice but not the only one), the requirements.txt file for pip, etc.

Testing

For scientific code, one can use two types of testing: unit testing and validation against know results. Unit tests should ideally be performed with a common tool (pytest or unittest for instance). Validation can be run as dedicated programs, illustrating how to use the library/tool. Testing is a means for other to know why they should trust your code, consider it as a good thing and not as a burden!

Documentation

There are two types of documentation for a scientific code: the API documentation, including module-level or function-level documentation, and a "user guide" which I also call narrative documentation. The latter is too often missing but constitutes a good entry point for users. It should give an overview of the code features, installation instruction, the scientific rationale (reference to algorithms for instance), and maybe a tutorial.

What about language X?

C++

C++ makes up for a non-negligible fraction of JOSS articles, altough mostly in combination with a higher-level language. C++ code should installable on almost any computer. Depending on the complexity of the code, a Makefile can be sufficient to install the code, or it can be best to use a standard solution such as autotools or CMake. In the case of header libraries, dropping the header files in a project might be sufficient.

The use of templates and inheritance should also be proportional to the generic character of the code. There is no need for layers of abstraction for simple projects. It will only confuse the users and the reviewers.

R

R packages must follow a specific organization: a R directory, the DESCRIPTION and NAMESPACE files, etc. Make sure to follow the relevant documentation. Some projects create a website for the documentation, others rely on the .Rd or .Rmd files. Make sure to have a "user guide" type of documentation and function documentation.

X?

Whatever the language, the guiding principles remain the same. The reviewers and developers familiar with language X should be able to easily install, use, and read your code. The code should be well organized, commented where necessary, functions should have an documentation, etc.

All of the rest

The usability of a code depends on other smaller things, among which the ease of installation, the number and quality of the dependencies, the readability of the code, etc. Commented-out code, leftover .o object files, a poorly written or absent README file, all give a sense of incompleteness or neglect to a codebase. As obvious as it may seem, many editors, reviewers, and future users will first read the README file of your code.

Ready?

This blog post does not cover all the content of a JOSS article. I wrote it to describe a requirement that we have not written in full yet because finding a set of rule to enforce strictly for the submissions is not a solved problem. I hope that, by going over examples of good practice, prospective JOSS authors can assess in advance whether their code will appear well built according to its purpose. If yes, there are still scholarly, size, licensing and research suitability questions. If no, either the editors will reject the paper (sometimes asking to come back when due diligence has taken place) or the reviewers will, as it happens often, insist for more effort to be put in the code's "good practices", in relation with the type of code and the field of research.

In line with JOSS' goal to improve the quality of the software submitted, keep in mind that the effort to achieve JOSS-readiness is not only about getting the article published, it is a good investement in your codebase!

Thanks to Dan Katz for feedback on a draft of this article.

Bibliography

(Hinsen 2019) Konrad Hinsen. Dealing with software collapse, Computing in Science and Engineering 21(3), pp 104-108 (2019). preprint
(Wilson et al 2017) Greg Wilson, Jennifer Bryan, Karen Cranston, Justin Kitzes, Lex Nederbragt and Tracy K. Teal. Good enough practices in scientific computing, PLoS Comput Biol 13(6): e1005510