How to run JupyterHub in docker swarm environment using swarmSpawner.

How to run JupyterHub in docker swarm environment using swarmSpawner.

I had a request in these days where I was asked to run JupyterHub in a docker environment. To be able to provide horizontal scaling to the platform taking in count that for each customer will start a new container and each container will use resources.  Jupyterhub is a great project but since it's a community based project it has very bad support. In order to run jupyterhub you need to have a docker swarm environment. If you don't have it you can find here a tutorial on how to set it up.

Let's start it. As usual I am using Debian environment for this project. Firstly you will need to pull all the images from docker hub into your swarm cluster:

# docker pull jupyter/jupyterhub

If  you don't have it then you need to install docker-compose on your swarm master. Fastest way to do that is to install it via pip, but in order to use pip you need to have python3 environment installed.  

# apt update
# apt install python3 python3-pip -y
# python3 -m pip install docker-compose

Now we can create the docker-compose.ymlwith the following content:

Firstly we need to create a Dockerfile because we need to add some modules to the jupyterhub container image.

# base image: jupyterhub
# this is built by docker-compose
# from the root of this repo
ARG JUPYTERHUB_VERSION=0.9.2
FROM jupyterhub/jupyterhub:${JUPYTERHUB_VERSION}
# install dockerspawner from the current repo
RUN pip install --no-cache dockerspawner
# install dummyauthenticator
RUN pip install --no-cache jupyterhub-dummyauthenticator
# load example configuration
ADD examples/swarm/jupyterhub_config.py /srv/jupyterhub/jupyterhub_config.py
version: "3"
services:
  hub:
    # build an image with SwarmSpawner and our jupyterhub_config.py
    env_file: .env
    build:
      context: "../.."
      dockerfile: "Dockerfile"
    # mount the docker socket
    volumes:
      - "/var/run/docker.sock:/var/run/docker.sock"
      - "./jupyterhub_config.py:/srv/jupyterhub/jupyterhub_config.py"
    networks:
      - jupyterhub-net
    ports:
      - "80:8000"
networks:
  jupyterhub-net:
    driver: overlay

Next step is to create a configuration file jupyterhub_config.py  with the following content:

c.ConfigurableHTTPProxy.should_start = True
c.JupyterHub.authenticator_class = 'dummyauthenticator.DummyAuthenticator'
#c.Spawner.default_url = '/lab'
c.JupyterHub.spawner_class = 'dockerspawner.SwarmSpawner'
c.SwarmSpawner.network_name = "swarm_jupyterhub-net"
c.SwarmSpawner.extra_host_config = { 'network_mode': "swarm_jupyterhub_net" }
#c.SwarmSpawner.extra_create_kwargs.update({ 'volume_driver': 'local' })
c.SwarmSpawner.remove_containers = True
c.SwarmSpawner.debug = True
c.JupyterHub.hub_ip = '0.0.0.0'
c.JupyterHub.hub_port = 8081
c.SwarmSpawner.host_ip = "0.0.0.0"

# TLS config

c.JupyterHub.port = 8000
#c.JupyterHub.ssl_key = os.environ['SSL_KEY']
#c.JupyterHub.ssl_cert = os.environ['SSL_CERT']

c.SwarmSpawner.http_timeout = 300
c.SwarmSpawner.start_timeout = 300

This configuration will use a dummyauthenticator that we have installed earlier via the Dockerfile. By default the jupyterhub is using pam authenticator that mean you have to have the users created on the filesystem of the docker container with password in order to be able to authenticate it. For testing purposes dummyauthenticator is good because you can login with any string.

Now it is the time to build the image.

# docker-compose build

Once the build process is done you will be able to see the docker image by running docker images it's name will be swarm_hub with the latest tag.

Now we need to create another deployment.yml file that will be almost the same as the docker-compose.yml created before, only differnce is that instead of the build option will use image to pass the name of the previously built docker image name. Optionally you can modify the docker-compose.yml to match the file bellow:

version: "3"
services:
  hub:
    # build an image with SwarmSpawner and our jupyterhub_config.py
    env_file: .env
    image: swarm_hub:latest
    # mount the docker socket
    volumes:
      - "/var/run/docker.sock:/var/run/docker.sock"
      - "./jupyterhub_config.py:/srv/jupyterhub/jupyterhub_config.py"
    networks:
      - jupyterhub-net
    ports:
      - "80:8000"
networks:
  jupyterhub-net:
    driver: overlay

Now we are ready to deploy our stack to docker swarm.

# docker stack deploy -c deployment.yml swarm
Creating network swarm_jupyterhub-net
Creating service swarm_hub

To check the running services you can use the following commands:

# docker service ls
ID                  NAME                MODE                REPLICAS            IMAGE               PORTS
t8pbx342jch4        swarm_hub           replicated          1/1                 swarm_hub:latest    *:80->8000/tcp

To see the logs of the service you can use:

# docker service logs swarm_hub

In order to follow the logs you can add the -f switch to the command above.

Now you can open the browser and point it to the public ip of the master node itself.

Use any string as username to login to the jupyterhub. The password is not required at this step. If all goes well you will be logged in to your account and a notebook started for you.

In case the notebook did not started automatically you can click on the " Start my server" button in the middle of your jupyterhub dashboard. If we take a look at the console by running the docker service ls we will see that a new service has been started:

# docker service ls
ID                  NAME                MODE                REPLICAS            IMAGE                       PORTS
x5e8kbevmwcm        jupyter-z           replicated          1/1                 jupyterhub/singleuser:0.9   
t8pbx342jch4        swarm_hub           replicated          1/1                 swarm_hub:latest            *:80->8000/tcp

But if we are running docker ps` on the master we can see that the notebook is not started. Checking one of the workers will see that the actually notebook has started on the worker machine:

#root@sworker:~# docker ps
CONTAINER ID        IMAGE                       COMMAND                  CREATED             STATUS              PORTS               NAMES
d7cc01ffafd4        jupyterhub/singleuser:0.9   "tini -g -- start-no…"   40 seconds ago      Up 34 seconds       8888/tcp            jupyter-z.1.64trd1wjz1mobpm0fcf8o7g48

Note: when you start the swarm stack you have to name it swarm because if you name it any other way the jupyterhub will start on the network named <swarm stack name>_jupyterhub-net and when you will start the notebook and look in the service logs you will see that it cannot connect to the network swarm_jupyterhub-net therefore the notebook will try to connect to a different network. If that network does not exists it's OK because at least we can see it in the logs, but if the network exists it will connect to it but because it is a differnet network than the one jupyterhub is running on will not be able to communicate with it.