Set up JupyterHub on EC2
Having been using Jupyter Notebook to run my Python code almost exlusively for quite some time, I started to realize that it’d great if, instead of hosting Jupyter on my poor laptop, there is a server that does the hosting. I could for example, kick off a machine learning code before I leave work, and check the results when I get home. Oops I absolutely meant when I get work the next morning; who works extra hours anyway?
Some quick googling told me not only can you host a single-user Jupyter Notebook, there is also an multi-user option called JupyterHub. I immediately decided to build one JupyterHub server on Amazon EC2 for our data science team, which potentially would be great for:
- having a unified data science environment (e.g., Anaconda)
- presenting code and results to colleagues
- accessing S3 (where we store our data) much faster
- prototyping Spark code before submitting it to EMR (with Apache Toree)
What you need
- An Ubuntu EC2 instance; I ended up launching Ubuntu 14.04 because 16.04 didn’t work well with JupyterHub for me.
Installation steps
- ssh to your newly launched EC2 instance
- Install Anaconda for Python 2, which will be the main data science environment
wget https://repo.continuum.io/archive/Anaconda2-4.2.0-Linux-x86_64.sh
sudo bash Anaconda2-4.2.0-Linux-x86_64.sh
When prompted, you might want to change the Anaconda installation prefix to /anaconda2.
- Install necessary libraries for JupyterHub (Note: JupyterHub runs on Python 3, make sure you differentiate it from the Anaconda Python 2 installed in step 1)
sudo apt-get update
sudo apt-get install python3-pip
sudo apt-get install npm nodejs-legacy
sudo npm install -g configurable-http-proxy
sudo pip3 install jupyterhub
sudo pip3 install --upgrade notebook
- Create an SSL key and a certificate for the HTTPS connection to the server.
openssl genrsa 1024 > host.key
chmod 400 host.key
openssl req -new -x509 -nodes -sha1 -days 365 -key host.key -out host.cert
- Configure JupyterHub
# generate the config file
jupyterhub --generate-config
This generates a file named jupyterhub_config.py; edit this file by appending the following to the end:
c.LocalAuthenticator.create_system_users = True
#c.Authenticator.whitelist = set()
c.Authenticator.admin_users = {'rex'}
c.Spawner.notebook_dir = '~/notebooks'
c.JupyterHub.ssl_cert = 'host.cert'
c.JupyterHub.ssl_key = 'host.key'
c.JupyterHub.port = 443
- Create new users
JupyterHub users are essentially Linux users who log in JupyterHub with their Linux credentials. New accounts can be set up by running:
# create a new user; you need to fill in the password and so on
sudo adduser rex
su rex
# cd to the home directory and make a directory for JupyterHub
cd
mkdir notebooks
exit
```
7. Add Anaconda Python 2 as a Jupyter kernel
```bash
# check existing kernels
sudo jupyter kernelspec list
This should return:
Available kernels:
python3 /usr/local/lib/python3.4/dist-packages/ipykernel/resources
Add Anaconda Python 2 by
# /anaconda2/bin/python2 is not in the PATH of sudo; use its absolute path
sudo /anaconda2/bin/python2 -m ipykernel install
# check existing kernels again
sudo jupyter kernelspec list
This should now return:
Available kernels:
python3 /usr/local/lib/python3.4/dist-packages/ipykernel/resources
python2 /usr/local/share/jupyter/kernels/python2
- Launch JupyterHub
sudo jupyterhub
Enjoy!