Pierre de Buyl's homepage

JupyterHub on Ubuntu

JupyterHub is a solution to host Jupyter notebooks via a web interface to several users. On their website, there are two main recommended methods for deploying JupyterHub: /The Littlest JupyterHub/, or TLJH, and a Kubernetes deployment. I tried the former and had trouble diagnosing some configuration issues. In the following, I show how to deploy the pip version on Ubuntu 22.04. I must add that I got into this setup by discovering on the Jupyter discourse that there is a .deb package for JupyterHub and thought this would work allright. I failed to debug it, as it comes fully configured, it means the error I got (a 404 after logging in) could come from the systemd service, the hub configuration, or the spawner configuration.

Note: update of the nginx configuration on 2024-01-11

Goals, requirements, design choices

My goals and requirements, mixed together. Some are really specific to my workplace

  • Provide a self-hosted JupyterHub, with user login (so, no simpler JupyterLab deployment).
  • Ability to mount institutional NFS filesystems.
  • Ability to mount the team NFS shares as well.
  • Limit local disk usage, as the machine does not have much storage.
  • Users need to be on the LDAP system, the default PAM authentication works well for this purpose.
  • Run on stock Ubuntu 22.04.
  • Avoid Kubernetes, I only have a single machine and it would make no sense to deploy with Kubernetes.
  • Avoid TLJH as they report that their installation is not fully reversible. All the configuration is done here in the home directory of the user jupyterhub.

The order of priority for the installation is to rely first on apt packages available in the Ubuntu repository, then use a pip install in a virtual environment. This provides, in my experience, a easy to maintain system for the core functionality.

Users will use a MiniForge install, where the base installation is shared, for their environments.

For reference:

Installation of JupyterHub and sudospawner

First install some dependencies using apt. (lines starting by a dollar sign are meant to be typed in the terminal -- without the leading dollar sign).

$ sudo apt install node-configurable-http-proxy jupyterhub python3-jupyterlab-server python3.10-venv
$ sudo apt remove python3-jupyterlab-server

Add a dedicated jupyterhub user. This user will have sudo rights for the spawner and its home directory will contain the pip-installed jupyterhub program.

$ sudo groupadd jupyterhub_users

I chose to install the virtual environment for JupyterHub under the jupyterhub user, so that all things related to JupyterHub are in the same location on the filesystem (apart from the systemd service definition, the nginx configuration, and the systemd service file).

$ python3 -m venv JH01
$ . ~/JH01/bin/activate
$ pip install -U pip
$ pip install wheel jupyterhub jupyterlab notebook
$ git clone https://github.com/jupyterhub/sudospawner
$ cd sudospawner
$ pip install .
$ cd
$ jupyterhub --generate-config

The last line generates the configuration file /home/jupyterhub/jupyterhub_config.py, which I will use later.

To restrict access to the localhost, add the following to the configuration file

c.JupyterHub.bind_url = 'http://127.0.0.1:8000'

You need to execute the following for every user that needs access to the JupyterHub service.

$ sudo usermod -G jupyterhub_users USERNAME

Put the following in a sudoers file. I chose /etc/sudoers.d/jupyterhub.

# the command(s) the Hub can run on behalf of the above users without needing a password
# the exact path may differ, depending on how sudospawner was installed
Cmnd_Alias JUPYTER_CMD = /home/jupyterhub/JH01/bin/sudospawner

# actually give the Hub user permission to run the above command on behalf
# of the above users without prompting for a password

jupyterhub ALL=(%jupyterhub_users) NOPASSWD:JUPYTER_CMD

Finally, I installed nginx

$ sudo apt install nginx

I wrote a minimal nginx configuration at /etc/nginx/sites-available/jupyterhub-reverse-proxy. Edit if you wish to add SSL certificates or if you need to configure a virtual host, etc.

Note: The first version of this article contained a wrong nginx configuration file

map $http_upgrade $connection_upgrade {
  default upgrade;
  ''      close;
}

server {
  client_max_body_size 0;

  location / {
    proxy_pass       http://localhost:8000;
    proxy_set_header Host      $host;
    proxy_set_header X-Real-IP $remote_addr;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;

    # websocket headers
    proxy_http_version 1.1;
    proxy_set_header Upgrade $http_upgrade;
    proxy_set_header Connection $connection_upgrade;
    proxy_set_header X-Scheme $scheme;

    proxy_buffering off;
  }
}

Make a link of this file in /etc/nginx/sites-available and restart nginx with sudo systemctl restart nginx.

Configuration of a systemd service

Using sudospawner and a systemd unit file, we can keep JupyterHub running as user jupyterhub and have it started automatically. Put the content below in a file named /etc/systemd/system/multi-user.target.wants/jupyterhub.service.

[Unit]
Description=JupyterHub

[Service]
Environment=PATH=/home/jupyterhub/JH01/bin:/usr/sbin:/usr/bin:/sbin:/bin VIRTUAL_ENV=/home/jupyterhub/JH01
Type=simple
Restart=on-failure
ExecStart=/home/jupyterhub/JH01/bin/jupyterhub --config /home/jupyterhub/jupyterhub_config.py

RestartSec=3
User=jupyterhub
Group=jupyterhub
WorkingDirectory=/home/jupyterhub

[Install]
WantedBy=multi-user.target

Configuration of additional kernels for users

When installing JupyterHub, users get access to the kernels that are installed in the environment of the hub (or the one that the spawner could access).

I like the idea of keeping the JupyterHub install separate from the users' environments.

Following the same principle as for the virtual environment, I installed conda (here, Miniforge in the home directory of jupyterhub).

After activating a environment, make sure to install the relevant packages. I started with

(my-env)$ mamba install numpy matplotlib scipy

It is necessary to install, for Python, the ipykernel package so that JupyterHub can make use of it:

(my-env)$ mamba install ipykernel
(my-env)$ python -m ipykernel
(my-env)$ python -m ipykernel install --prefix ${HOME}/JH01 --name default-py3.12 --display-name "Default Python 3.12"

Once this is done, you can start the jupyterhub program as user jupyterhub.

$ VIRTUAL_ENV=/home/jupyterhub/JH01 /home/jupyterhub/JH01/bin/jupyterhub --config /home/jupyterhub/jupyterhub_config.py

The command-line above starts the jupyterhub with the above-mentioned configuration file, in the virtual environment JH01, with access to the miniforge environment.

Result and caveats

The server is available on the local network as http://localserver/ on the local network. Any user registered with LDAP or a local account on the computer, and part of the jupyterhub_users group, can login with their existing credentials.

There are a number of limitations with the configuration presented here:

  1. There is no resource management. The server has 40 cores and 100 or so GB of RAM that will be used by all users without any priority.
  2. There is no possibility to "scale out".
  3. I did not setup SSL here. The JupyterHub documentation provides useful information on setting up a proxy, which I won't repeat: https://jupyterhub.readthedocs.io/en/stable/howto/configuration/config-proxy.html
  4. Users don't have write permissions to the conda install and depend on me for installing packages and managing the environments.
  5. There is no user isolation in the sense that is expected from a web service: users can open a console, access mount points, etc. This is part of how Jupyter works of course but is worth mentioning as other setups provide better security. Here, all users could access the machine via SSH anyway so that does not matter much.

There are a few benefits though:

  1. Only the following files go into the system configuration: /etc/nginx/sites-available/jupyterhub-reverse-proxy and /etc/sudoers.d/jupyterhub.
  2. Apart from that, there is a group definition.
  3. The configuration can be reset without affecting the operating system configuration.
  4. It is possible to add Jupyter kernels easily.

Feedback welcome (via email or https://fediscience.org/@pdebuyl while I figure how to setup fediverse comments).

Comments !

Comments are temporarily disabled.

Generated with Pelican. Theme based on MIT-licensed Skeleton.