How to run JupyterHub in docker swarm environment using swarmSpawner.
I had a request in these days where I was asked to run JupyterHub in a docker environment. To be able to provide horizontal scaling to the platform taking in count that for each customer will start a new container and each container will use resources. Jupyterhub is a great project but since it's a community based project it has very bad support. In order to run jupyterhub you need to have a docker swarm environment. If you don't have it you can find here a tutorial on how to set it up.
Let's start it. As usual I am using Debian environment for this project. Firstly you will need to pull all the images from docker hub into your swarm cluster:
# docker pull jupyter/jupyterhub
If you don't have it then you need to install docker-compose
on your swarm master. Fastest way to do that is to install it via pip, but in order to use pip you need to have python3 environment installed.
# apt update
# apt install python3 python3-pip -y
# python3 -m pip install docker-compose
Now we can create the docker-compose.yml
with the following content:
Firstly we need to create a Dockerfile
because we need to add some modules to the jupyterhub container image.
# base image: jupyterhub
# this is built by docker-compose
# from the root of this repo
ARG JUPYTERHUB_VERSION=0.9.2
FROM jupyterhub/jupyterhub:${JUPYTERHUB_VERSION}
# install dockerspawner from the current repo
RUN pip install --no-cache dockerspawner
# install dummyauthenticator
RUN pip install --no-cache jupyterhub-dummyauthenticator
# load example configuration
ADD examples/swarm/jupyterhub_config.py /srv/jupyterhub/jupyterhub_config.py
version: "3"
services:
hub:
# build an image with SwarmSpawner and our jupyterhub_config.py
env_file: .env
build:
context: "../.."
dockerfile: "Dockerfile"
# mount the docker socket
volumes:
- "/var/run/docker.sock:/var/run/docker.sock"
- "./jupyterhub_config.py:/srv/jupyterhub/jupyterhub_config.py"
networks:
- jupyterhub-net
ports:
- "80:8000"
networks:
jupyterhub-net:
driver: overlay
Next step is to create a configuration file jupyterhub_config.py
with the following content:
c.ConfigurableHTTPProxy.should_start = True
c.JupyterHub.authenticator_class = 'dummyauthenticator.DummyAuthenticator'
#c.Spawner.default_url = '/lab'
c.JupyterHub.spawner_class = 'dockerspawner.SwarmSpawner'
c.SwarmSpawner.network_name = "swarm_jupyterhub-net"
c.SwarmSpawner.extra_host_config = { 'network_mode': "swarm_jupyterhub_net" }
#c.SwarmSpawner.extra_create_kwargs.update({ 'volume_driver': 'local' })
c.SwarmSpawner.remove_containers = True
c.SwarmSpawner.debug = True
c.JupyterHub.hub_ip = '0.0.0.0'
c.JupyterHub.hub_port = 8081
c.SwarmSpawner.host_ip = "0.0.0.0"
# TLS config
c.JupyterHub.port = 8000
#c.JupyterHub.ssl_key = os.environ['SSL_KEY']
#c.JupyterHub.ssl_cert = os.environ['SSL_CERT']
c.SwarmSpawner.http_timeout = 300
c.SwarmSpawner.start_timeout = 300
This configuration will use a dummyauthenticator that we have installed earlier via the Dockerfile
. By default the jupyterhub is using pam authenticator that mean you have to have the users created on the filesystem of the docker container with password in order to be able to authenticate it. For testing purposes dummyauthenticator is good because you can login with any string.
Now it is the time to build the image.
# docker-compose build
Once the build process is done you will be able to see the docker image by running docker images
it's name will be swarm_hub
with the latest
tag.
Now we need to create another deployment.yml
file that will be almost the same as the docker-compose.yml
created before, only differnce is that instead of the build
option will use image
to pass the name of the previously built docker image name. Optionally you can modify the docker-compose.yml
to match the file bellow:
version: "3"
services:
hub:
# build an image with SwarmSpawner and our jupyterhub_config.py
env_file: .env
image: swarm_hub:latest
# mount the docker socket
volumes:
- "/var/run/docker.sock:/var/run/docker.sock"
- "./jupyterhub_config.py:/srv/jupyterhub/jupyterhub_config.py"
networks:
- jupyterhub-net
ports:
- "80:8000"
networks:
jupyterhub-net:
driver: overlay
Now we are ready to deploy our stack to docker swarm.
# docker stack deploy -c deployment.yml swarm
Creating network swarm_jupyterhub-net
Creating service swarm_hub
To check the running services you can use the following commands:
# docker service ls
ID NAME MODE REPLICAS IMAGE PORTS
t8pbx342jch4 swarm_hub replicated 1/1 swarm_hub:latest *:80->8000/tcp
To see the logs of the service you can use:
# docker service logs swarm_hub
In order to follow the logs you can add the -f
switch to the command above.
Now you can open the browser and point it to the public ip of the master node itself.
Use any string as username to login to the jupyterhub. The password is not required at this step. If all goes well you will be logged in to your account and a notebook started for you.
In case the notebook did not started automatically you can click on the " Start my server" button in the middle of your jupyterhub dashboard. If we take a look at the console by running the docker service ls
we will see that a new service has been started:
# docker service ls
ID NAME MODE REPLICAS IMAGE PORTS
x5e8kbevmwcm jupyter-z replicated 1/1 jupyterhub/singleuser:0.9
t8pbx342jch4 swarm_hub replicated 1/1 swarm_hub:latest *:80->8000/tcp
But if we are running docker ps
` on the master we can see that the notebook is not started. Checking one of the workers will see that the actually notebook has started on the worker machine:
#root@sworker:~# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
d7cc01ffafd4 jupyterhub/singleuser:0.9 "tini -g -- start-no…" 40 seconds ago Up 34 seconds 8888/tcp jupyter-z.1.64trd1wjz1mobpm0fcf8o7g48
Note: when you start the swarm stack you have to name it swarm
because if you name it any other way the jupyterhub will start on the network named <swarm stack name>_jupyterhub-net
and when you will start the notebook and look in the service logs you will see that it cannot connect to the network swarm_jupyterhub-net
therefore the notebook will try to connect to a different network. If that network does not exists it's OK because at least we can see it in the logs, but if the network exists it will connect to it but because it is a differnet network than the one jupyterhub is running on will not be able to communicate with it.