There are times when it makes sense to offload work from a local laptop to a remote machine. When the data set already resides in AWS it's much faster to download it to an EC2 instance rather than to your machine. Occasionally a bit more RAM or disk space is needed, a problem easily solved by spinning up a high end instance. In this post I'll show how I use Jupyter notebooks on remote linux machines, typically AWS EC2 instances.
I don't use the built in notebook server feature. I don't feel comfortable exposing all the privileges of my account behind a single password on a webpage (anyone who accesses the Jupyter notebook server can run arbitrary code as your user). My method is to run the standard server that only listens to localhost with an ssh tunnel to securely connect.
1. Setting up the remote linux machine
- Launch an AWS EC2 instance (see this page for help). I usually launch a spot instance for lower cost.
- SSH into the instance. All the commands below should be run on the remote machine.
-
Install anaconda. The below commands work as of now, but the link may change in the future (I got the location from the anaconda downloads page).
wget https://3230d63b5fc54e62148e-c95ac804525aac4b6dba79b00b39d1d3.ssl.cf1.rackcdn.com/Anaconda2-2.4.1-Linux-x86_64.sh # follow the prompts, answer "yes" to the question about prepending the path bash Anaconda2-2.4.1-Linux-x86_64.sh # need to update PATH for this session source ~/.bashrc
-
Start
tmux
, which will allow your notebook server to continue to run even after you log out. I do all my terminal work within tmux (Terminal Multiplexer) and I highly recommend learning to use it.tmux
-
Start jupyter notebook within your tmux session. The
--no-browser
option prevents jupyter from automatically opening a browser window.jupyter notebook --no-browser
2. Connecting to the remote notebook server
The next step is to use an SSH tunnel to forward a port on your local machine to the remote machine. You can think of this as connecting port 8157 on your local machine to port 8888 (the default jupyter notebook port) on the remote machine.
# run from your local machine
ssh -i /path/to/ssh/key -NL 8157:localhost:8888 ubuntu@your-remote-machine-public-dns
You should now be able to point your browser to http://localhost:8157
and see the jupyter notebook startup screen.
3. Saving your work locally
I like to automatically save my notebooks locally so I don't lose any work. The method uses rsync
and will sync notebooks every 30 seconds.
-
I have the following lines in my
~/.ssh/config
file, which allows all of the instances of"ubuntu@your-remote-machine-public-dns"
above to be replaced by"tmpaws"
. I paste the dns of the machine I'm working on into this file each time it changes.Host tmpaws HostName your-remote-machine-public-dns User ubuntu IdentityFile /path/to/ssh/key
-
Run the below command to continually sync your notebooks. It's an eyesore but the least cumbersome option I've found.
while true; do \ rsync -avz --include='*.ipynb' --exclude='*' tmpaws:/path/to/notebooks/ /path/to/local/dir/; \ sleep 30; \ done
-
As always, keep your local copy of the notebooks under version control.
4. The result
Similar Posts
- IPython 3.0 released, Score: 0.915
- Getting Started with Spark: Running a Simple Spark Job in Java, Score: 0.908
- Creating a Spark Streaming Application in Java, Score: 0.902
- Scikit-learn machine learning algorithm flowchart, Score: 0.902
- GitHub now renders Jupyter (IPython) notebooks, Score: 0.893
Comments