CS 465 Information Visualization

CS 465 - Tutorial Six

Objectives

  • Learn how to create a scrolling narrative visualization
  • Get a taste of using D3 outside of Observable
  • Learn how to build a histogram in D3
  • Learn some more patterns for working with transitions and making flexible visualizations in D3

Hold on to your hats folks, there is a lot going on in here...

Getting started

  1. Accept the assignment from Github Classroom.
  2. This will create a new GitHub repository for you. Clone the repository to your computer with git clone and the path provided by GitHub.
  3. Use VSCode to open the directory (as opposed to the individual files)

What are we building?

We are building a very short narrative about Radiohead. Why Radiohead? I happen to like Radiohead and someone (Charlie Thompson) took the time to put together a dataset with data about every one of the songs on their studio albums (and he wrote a blog about it).

The data looks like this:

Column nameInterpretation
track_nametrack name
album_namealbum name
album_release_yearalbum release year
album_imglink to cover art
lyricsfull lyrics of the song
duration_msduration in ms
valenceSpotify ranking of the sound's affect. High valence is happier, low is sad. [0-1]
pct_sad% of the words in the lyrics that are classified as "sad"
word_countnumber of words in the lyrics
lyrical_densityrough measure of importance of lyrics to the song -- word count/duration [0-1]
gloom_indexmeasure of track's "gloominess", with lower scores indicating increased gloom [1-100]

As you will have gathered, we are focusing on a scrolling, author-controlled approach. It is certainly not the only way to do narrative visualization, but it is a popular one that is fairly effective for presenting a narrative that doesn't feel like a PowerPoint slide deck.

For background reading, I suggest Mike Bostock's How to Scroll article. It doesn't cover the actual technical implementation details, instead it discusses how to design a good scrolling interface.

Working outside of Observable

This is our first foray outside of the safety of Observable. This is where having some knowledge of HTML and CSS is going to come in handy. Don't worry if you don't understand everything here -- there is a lot going on. I'll try to provide an overview of everything, but focus on the parts you need to complete the tutorial and the implement your own scrolling story telling visualization.

When you clone the repository, you will find that you have a directory with an HTML file (index.html), some JavaScript, a CSS file, some data, and a "favicon.ico" (the little icon that is shown in your browser's tab).

In order to function, these pages need to be served by a web server (while we can just open HTML files in the browser directly, there are issues when the page relies on a collection of other resources). There are a lot of ways to serve these pages, but the easiest approach is to just run a local web server that just serves the contents of this directory.

One of Python's cute party tricks is that you can spin up a simple web server in a directory with a single instruction on the command line.

  • Navigate to your project in the terminal (in VSCode, you can bring up the integrated terminal, which will already have the right working directory)
  • For Python 3, the command is python3 -m http.server (your default Python may be v3, in which case you can use python instead of python3)
  • For Python 2, the command is python -m SimpleHTTPServer

No matter which one you used, you should now have a new web server running. It is probably on port 8000 (this is configurable, and you should actually read the response you get to figure out the correct port). In your browser, go to http://localhost:8000 (or whichever port you are using), and you should see the rendered contents of index.html.

My advice is to keep the server running while you work. I also suggest that you open your browser's developer tools so you can watch the console.

Understanding index.html

We aren't going to spend any time editing the content of index.html, but it is important to understand what is contained inside.

The critical piece to understand is the structure. If you look at the HTML, you will see something like this:

<div id="content">
  <div id="sections">
    <div class="step">
      some content
    </div>
    <div class="step">
      some more content
    </div>
    ...
  </div>
  <div id="vis"></div>
</div>

In other words, we have a container called content, which holds two things: sections and vis. In sections, we have a collection of <div>s that all have the class of step.

Through the magic of CSS, this will form a two column layout, with sections on the left and vis on the right. The steps are styled so that only one of them appears on the screen at a time, and we can scroll through them. The vis container will hold the visualization, and it doesn't move at all. I've given sections a little bit of a background color so you can see the difference, but this isn't otherwise an important stylistic choice.

I've already written the copy for you, but as you can hopefully see, we can put any HTML we want inside of the steps. We can have multiple paragraphs of text, images, even visualizations. We just want to make sure it isn't too wide or it might overlap the main visualization (though maybe you want that!).

Scrolling

The basic idea behind the scrolling is that as each section scrolls into view, we will detect it and trigger a change to the visualization.

You will find the code for doing this in scroller.js. Much of the code comes from Jim Vallendingham's Scroller example. I have just simplified it a bit and updated it for D3v7.

You can read through Jim's post for a long description, but here are the basic broad strokes. We register to listen for scroll and resize events from the browser. On a resize event, we query all of the steps and see where their top is. On a scroll event, we check what part of the window is being viewed and compute which step is the one in the view. If it is a new step, we broadcast our own event called step-change.

To use the scroller, we just need to tell it what elements to watch, and provide an event handler that updates the view when a step changes.

Hook up the event handler

All of your work will be inside of visualization.js. Open that up and find the manageVisualization() function at the bottom. You can see that I've started you off with a few constants. Also, at the very bottom of the file, you can see that this function is called. So, this is the code that will be run immediately when the page is loaded.

I would like you to add three lines to this function:

 const scroll = scroller();
  scroll(d3.selectAll(".step"));
  scroll.on("step-change", (step)=>{console.log(step)});

The first line gets our scroll function (and initializes the scroller).

The second line tells the scroller which elements on the page to use as the steps.

The third line registers our event listener. We have given it a function that just logs the current step. Go ahead and run this to make sure that as you scroll the step count increments and decrements appropriately (you have the browser developer tools open, right?).

Visualizations

Now we can worry about the visualizations. Our narrative just has four steps, so we need to build four visualizations (in order).

  • a histogram showing the distribution of song valence
  • a histogram showing the distribution of songs based on percentage of sad lyrics
  • a histogram showing the distribution of songs based on the gloom index
  • a bar chart showing the average gloom index for each album

To make our narrative more visually interesting, we will use animation to transition between the visualizations.

Getting the data

The first step will be to get the data loaded.

Our tool will be d3.csv(). We can provide this a URL, and it will perform the fetch for us and then parse the result into a JavaScript object (in case you are wondering, there is a similar d3.json()).

This process can be a longish one, so the function returns a Promise, which will eventually resolve into our data. There are a couple of different ways to handle Promises, but we are going to make use of the async/ await functions, which allow us to write asynchronous code in a synchronous style.

I've already added async in front of function manageVisualizations(){. This declares the function to be asynchronous.

Add const data = await d3.csv("data/radiohead.csv"); into the function before the scroller code. This requests our data, waits for the processing to finish and then assigns it to the variable data.

Immediately after, add the following code.

data.forEach((d)=>{
  d.pct_sad*= 100; // convert pct_sad to percentage
});

This isn't required -- it just converts the pct_sad values from fractions to percentages, which will make the view a little nicer.

Create the SVG region

The next thing to do is to create the SVG.

const svg = d3.select("#vis")
      .append("svg")
      .attr("viewbox", [0, 0, width, height])
      .style("height", `${height}px`)
      .style("width", `${width}px`);

This selects the vis <div> and appends an SVG component into it (configuring it in the usual way).

We also could have just created the SVG tag in the HTML in the first place, but I wanted you to see that we can use D3 to arbitrarily update the DOM of the page on the fly (this is by no means a unique ability, but it is handy).

Note that you now know how to add D3 visualizations to arbitrary web pages. At this point you can do all of the things that we have done before to create the visualization with the exception that you no longer need to call svg.node.

Creating the histograms

We need three separate histograms. It should be obvious that we are going to use a little abstraction so we don't actually code three separate histograms. In truth, we are going to go a little farther than that. We are going to make a single reconfigurable histogram so that as we transition from one step of the story to the next we are just updating it.

At the top of visualizations.js you will find the createHistogram function, which I've already written for you. It should look pretty familiar. It sets up the margins, creates some linear scales, creates the inner g region that will hold the actual visualization, and creates some axes. We have done all of this before. The difference s are:

  • there is no data, so I didn't set the domain for the scales
  • similarly, I didn't create any visual marks
  • the function returns references to the scales, axes, and the g where we can put the visualization so we can configure it later

In truth, this shell will work for any visualization with two linear scales.

Add const histogram = createHistogram(svg); to manageVisualizations. You should now see the axes appear in the visualization.

The important work is done in updateHistogram, which takes in our new graph object, the data, and the metric to visualize (as well as the title for the graph and the transition speed).

Update the scales

The first thing we will do is set the domains for the scales based on the data and the metric.

The x domain is pretty straightforward:

 graph.x.domain([0, d3.max(data, d=>+d[metric])]);

Note that we are just setting the domain -- the range was already set.

For the y, things are a little more complicated because we want to make a histogram. We've not yet made a histogram in D3. Recall that for our histogram, we need to "bin" the data, and the height of the bar is the number of items in each bin.

As you might imagine, there is a bin generator tool in D3.

  const makeBins = d3.bin()
  .value((d)=>+d[metric])
  .domain(graph.x.domain())
  .thresholds(20);

  const bins = makeBins(data);

The value method tells the generator which value to look at for binning purposes. The domain and the threshold determine where the breaks will be made for the bins (in this case, about 20 bins between 0 and the max of our metric).

When we call the generator with our data, we get back an array of bins. Each bin is itself an array of the binned items, so the length of the bin is the value we want. In addition, each bin has two attributes: x0 the lower bound of the bin (inclusive) and x, the upper bound for the bin (exclusive except for the last bin).

An important thing to realize is that this does not produce empty bins. If a bin would be empty it is just left out -- this is important to remember for data binding.

So, the domain for the y scale is just

graph.y.domain([0, d3.max(bins, d=>d.length) ]);

Making the bars

At this point, the process for making bars with rect elements should be fairly familiar.

There are some subtleties here, however.

We are going to use bins as the data source.

  graph.g.selectAll(".bin")
.data(bins)

As such, we will use x0 and x1 for the x and width attributes of the bars, and the length of the bin for the height. To make our histogram look a little nicer, we will cheat in the two sides of the bars.

We are also going to animate the bar creation. To do that, we will set the initial y value and height to 0, and then add a transition to grow the graph to the correct values.


 enter=>enter
    .append("rect")
    .attr("x",(d)=>graph.x(d.x0) + 1)
    .attr("y", graph.y(0))
    .attr("width", d => graph.x(d.x1) - graph.x(d.x0) - 2)
    .attr("height", 0)
    .attr("class", "bin")
    .style("fill", "steelblue")
    .transition()
      .duration(speed)
      .attr("y", d=>graph.y(d.length))
      .attr("height", d=>graph.y(0) - graph.y(d.length))

As we transition between histograms, what we are actually doing is swapping out the data. So, we need to include an update, which will just transition the bars to their new values.

    update=>update
      .transition()
      .duration(speed)
      .attr("y", d=>graph.y(d.length))
      .attr("height", d=>graph.y(0) - graph.y(d.length))

Finally, as I said above, not every histogram has a bin in every slot. So, we need to handle this with an exit selection.

    exit=>exit
      .transition()
        .duration(speed)
        .attr("y", graph.y(0))
        .attr("height", 0)
        .remove()

Update the axes

The final piece is to update the axes and the label. We can do this by calling the axis creation functions again. We can add another transition here as well to animate the change in the axis, which looks better than just switching to a new one.

 graph.xAxis
    .transition()
    .duration(speed)
    .call(d3.axisBottom(graph.x));

  graph.yAxis
    .transition()
    .duration(speed)
    .call(d3.axisLeft(graph.y));

  graph.xLabel.text(title);

Switching between histograms

Now that you have the update function, it is time to put it to use.

Remove the console.log from the event handler and instead call your completed updateHistogram function. For testing, start by just creating the graph for valence and make sure the graph looks okay.

When it is working, add in some conditional code that shows valence for step 0, pct_sad for step 1, and gloom_index for step 2. This is a reasonable moment to use switch, but you are welcome to you if-else if you are more comfortable with that.

Another reasonable design is to write separate functions for each stage and store them in an array. Then switching just involves using the current step to pick the right function out of the array. I recommend this approach if your visualizations get more complex.

As you scroll through the first three sections, the graph should animate between the three metrics.

Add a bar chart

If you look in createBarchart, you will see that I have given the code for producing the final bar chart. There are only a couple of things to note in here:

  • I am using d3.rollups to group the songs by album_name and then d3.mean to compute the mean gloom index for the album.
  • I followed the same pattern as the reusable histogram, but only for consistency -- we aren't going to change this graph.
  • The <g> element called innerGraph has its opacity set to 0 (more on this in a moment)

To use the bar chart, call createBarchart in manageVisualizations right after you create the histogram. Importantly, we are going to call it on the same SVG element. Each graph will be in its own <g>, but they will be layered on top of one another. Because the bar chart is invisible, you will still only see the histogram.

Now, in the event handler, when you reach step 3, use transitions to fade the opacity of the histogram to 0 and the opacity of the bar chart to 1.

Of course, when you scroll back up again, you will only see the bar chart. To fix this, add some more transitions to step 2 that fade the two graphs in the opposite direction.

Test it out! You should now have a scrolly narrative structure that you can use for your projects.

Submission

Make sure to commit your changes and push them to GitHub. Submit the URL of your GitHub repository on Canvas.


Last updated 11/09/2021