# Installing the Infovis Jupyter Environment

One of the tools we will be using this semester is JupyterLab with Altair. JupyterLab is a browser-based coding environment, which can provide us with an interactive Python notebook for doing data exploration (among many other things). Altair is a Python visualization library built upon Vega/Vega-lite (more on that later). Right now, it is enough to know that it provides us with a pretty straightforward way of quickly visualizing data in Python.

You should do this install on the computer you will be able to use in class, whether that is your laptop, or one of the lab machines (note that because of our shared filesystem, installing it on one lab machine, installs it on them all).

# Miniconda

In order to run JupyterLab, we need to have a Python runtime. My preferred technique is to use Miniconda, which will run on any system and allows you to create custom Python environments with just the libraries you need for your current project (and without being root). It is both a package manager and a virtual environment manager. If you have ever used virtualenv, this is the same principle, and you can skip to the installation part. If this is all new to you, this is an important software development concept.

The problem with Python (and similar languages), which make extensive use of external libraries, is that you can rapidly accumulate a huge number of different libraries as you tackle different projects. This is problematic for a number of reasons. One, you will build up a huge collection of crufty old libraries that you don't actively use. It is difficult to prune them because of the complex web of dependencies between libraries. You install three libraries, and you find that the dependency checker has added 50 more, and that library you don't recognize might be critical to one you use all the time. A second problem is managing library versions. You may find two great libraries for your project, but they each rely on a different version of the same library.

These problems all become magnified when you are doing serious software development. What happens when you need to deploy the code to a production machine, or when you need to work on the project with another developer? Which of the installed libraries are actually part of the project, and which did you install for something else? What if you wrote the code against an old, incompatible version of a library? This is why we need something to manage environments for us. Ideally, it should allow us to specify an environment, a collection of libraries and their versions. We should be able to take a description of this environment to any machine and recreate it so that our project works the same everywhere. Miniconda is one solution to this problem. It allows us to create purpose-built environments, it manages them centrally (unlike virtualenv), and it includes an installer to make the process of constructing environments really easy.

# Installing Miniconda

Installing Miniconda is quite simple. You just need to download the appropriate installer script for your computer (make sure to use the Python 3.7 installer). This is just a script that you will run on the command line, and it will walk you through the installation process. For more guidance, check out the installation guide.

# Create the environment

So that we all have the same environment, I created YAML file with the list of all of the required packages. Download infovis_environment.yml. Then run the following command:


conda env create -f infovis_environment.yml

This will create a new environment called infovis. This will take a little time to finish. When it is finally done, it should tell you how to activate and deactivate the new environment with a message that looks like:


#
# To activate this environment, use
#
#     $ conda activate infovis
#
# To deactivate an active environment, use
#
#     $ conda deactivate

Activate the environment. You should see that your prompt now tells you that you are using the infovis environment. To make sure, type which python, and it should return a path that looks like <your home>/miniconda3/envs/infovis/bin/python (unless you are using Windows, in which case which won't work, and you are somewhat on your own).

Windows Users Some Windows users have had issues with using my environment file. The reported error is that a collection of packages cannot be installed. You can create an identical environment by walking through these steps:

conda create --name infovis
conda activate infovis
conda install -c conda-forge jupyterlab altair vega_datasets

# Using JupyterLab

To start JupyterLab, make sure the infovis environment is active, and then type jupyter lab. This starts a small web server and will probably open your default web browser to the JupyterLab launcher. If it doesn't, look carefully at the terminal output to find the URL you should enter into your browser to connect to the server.

You should take a few minutes and familiarize yourself with the Notebook as it will be a somewhat different environment (though it bears some similarity to MatLab and Mathematica). Rather than typing up a program, we create cells. A code cell can contain an arbitrary number of lines of Python. The cell can then be run using the run button, the Run menu, or by typing Shift-Enter. Any freestanding values or print statements will appear under the cell and you will be given a new cell. Any variables you create will be available for all future cells.

Cells can also contain Markdown. You can switch to a Markdown cell by typing m (type y to return to code mode). In this mode, you can write Markdown and when you run the cell, it will be formatted for you. This allows you to freely intermix text and code in the same document, which is great for making notes about your process or for presentation purposes. You will see more examples as we use this in class.

Play around with it, and check out the documentation.

# Condensed version

I put this at the bottom to encourage you to read the whole document, but for reference here is the sequence of steps without all of the lengthy explanations.

  • Install the appropriate Miniconda for your machine (Python 3.7 version)
  • Download infovis_environment.yml
  • Run conda env create -f infovis_environment.yml
  • Activate the environment: conda activate infovis
  • Run jupyter lab
  • Play with notebook
Last Updated: 9/13/2018, 11:18:08 AM