CS 465 - Tutorial Three

Published

October 8, 2025

Objectives

  • Get some hands-on practice creating a familiar visualization in D3
  • Learn some more about scales
  • Become more mindful of visualization structure
  • Learn how to add tooltips in D3

Prerequisites

  1. Create the git repository for your tutorial by accepting the assignment from GitHub Classroom. This will create a new repository for you with a bare bones npm package already set up for you.

  2. Clone the repository to you computer with git clone (get the name of the repository from GitHub).

  3. Open the directory with VSCode. You should see all of the files down the panel on the left in the Explorer.

  4. In the VSCode terminal, type pnpm install. This will install all of the necessary packages.

  5. In the terminal, type pnpm dev to start the development server.

Overview

For this tutorial, you are going to build a basic bubble plot, just as we did in Plot.

Specifically, we will be building a version of the bubble plot we looked at when we introduced bubble plots.

countryData = FileAttachment("countryData.csv").csv({ typed: true });

Plot.plot({
  color: {
    legend: true,
    type: "categorical"
  },
  x:{
    label:"Life Expectancy"
  },
  y:{
    label:"Fertility",
    domain: [0,8]
  },
  marks: [
    Plot.dot(countryData, {
      filter: (d) => d.year === 2005,
      x: "life_expect",
      y: "fertility",
      r: "pop",
      title: "country",
      opacity:.6,
      fill: "cluster"
    })
  ]
})

Create the SVG

The first step will be to create the SVG region that you will be using to draw your graph.

If you look in index.md of the starter code, you will see that I left this for you:

import {scatterplot} from "./components/visualization.js";

display(scatterplot(countryData));

You will do all of your work in in src/components/visualization.js

Just as we did in the bar chart example in class, set up variables for width, height, and margin. Your margin should be an object with left, right, top and bottom attributes. Make the visualization \(640 \times 400\). You can set all of the margins to 10 initially.

You can then create the SVG with the d3.create() function. It should look like this:

  const svg = d3.create("svg")
  .attr("viewBox", [0, 0, width, height])

Then, replace the placeholder string with svg.node() to load the SVG onto the page.

This creates a big white rectangle, so… it won’t look like much. Add another style to the svg that sets the “border” to “1px lightgray solid”.

Data filtering

We don’t have a convenient data filtering option, so we will just have to do it by hand.

Add this line before you create the svg:

const data05 = data.filter((d) => d.year === 2005);

This will create a new local data source in our cell that just has the data for 2005. Note that the discriminator function is the same one we used in the filter option above.

Scales

The next step will be to create some scales. How many scales do we need?

How many scales do we need? (think before looking)
Hopefully you answered four:
  • life_expect (x)
  • fertility (y)
  • pop (circle size)
  • cluster (color)

We are just going to focus on the position scales right now. What type of data do we have in life_expect and fertility?

Hopefully, your answer to both was “quantitative”. So, for these two, our best bet is a linear scale, which we can construct with d3.scaleLinear().

As a reminder, the scale will set up a mapping from from the input domain (values in our data) to the output range (values we use to display).

The structure looks like:

const x = d3.scaleLinear()
  .domain([domainStart, domainEnd])
  .range([rangeStart, rangeEnd]);

Let’s start by thinking about the output range. You want to have this picture in your head of the graph, with the “Content” region being where the visualization itself will live, and the axes and other annotations hanging out in the outer margin area.

So, the start of the output range for the X encoding will be at the left margin, and the end will be at the width minus the right margin (you can apply the same logic for the Y encoding).

What about the input domain?

Let’s focus on the X encoding. If you look at the Plot graph, you can the domain is roughly [25,85]. You could just use this.

However I want you to try out a D3 utility function: d3.extent(). This takes in an array and returns a two element list with the maximum and minimum values.

example

d3.extent([5,3,8,2,89,34,42]) // returns [2,89]

The function also takes in an optional accessor function which helps it to extract the right value if we have an array of objects. In the following example, we have a list of objects that have a and b properties. If we want to find the extent of the b property, we need to provide an accessor function that tells the function how to extract the correct value. So, we can use (d)=>d.b.

example

d3.extent([{a:3, b:6}, {a:1, b:42}, {a:7, b:7}, {a:34, b: 21}], d=>d.b) // returns [6,42]

I encourage you to play around wth this, try returning different extents from your function and see what they do.

Why d?

As you hopefully recall, the d in (d)=>d.b is that parameter of our little anonymous function. There is no reason it has to be d. Just as with any function parameter we can name it whatever we like and should make choices that lead to readable code (which single letter variables usually are not). These accessor functions are so short that we tend to favor these terse variables that would be less readable in other contexts.

The d is the standard throughout the D3 documentation so you will see it everywhere in example code. I usually think of it as short hand for “data” or “datum” to reflect that we are typically providing the function with a row of the data (or one data item).

Parentheses around the function parameters or not?

We talked about this briefly in class. If we have a single input parameter, the parentheses are optional and are frequently left off. Again, this comes down to being very terse. We are usually including these as arguments of another function, and reducing visual noise and extra characters can make our code more readable. You can make your own choices, but get used to reading it both ways.

For more than one parameter, you should always use parentheses.

Add the X scale

Use these ideas to create a new linear scale for the X axis. Declare it before you create the SVG and call it x.

Add the Y scale

Repeat the same process for the y scale, using ‘fertility’. This time, however, I would like you to start the scale at 0 (our lowest value is essentially there). So, instead of using d3.extent, use d3.max, which works exactly the same way but only returns the maximum value.

Caution

domainand range expect arrays. d3.extent returns an array. d3.min and d3.max do not.

Making circles

Now that we have the foundation for our visualization, we need to create the dots.

We will create the dots with SVG’s <circle \>. The circle has three important attributes: cx (x position of the center of the circle), cy(y position of the center of the circle), r (the circle’s radius).

As we did with the bar chart, we will start by creating a a new <g> tag with const dots = svg.append("g").

Then use a forEach loop on the data creating a new circle for each observation.

We can then set their attributes and styles using attr and style.

Setting attributes

Now we want to set the three attributes of the circles. Recall that the attr() function takes two arguments, the first is the attribute to change (as a string) and the second a value.

For the r attribute, set this to 5 so we can see the dots.

For cx and cy we can use our new scales to generate the right positions for our dots. Set cx to x(d.life_expect), which runs the value we care about (the life expectancy) through the scale to get the right position. Do the same for the cy attribute.

Fix the Y scale (if needed)

Are you seeing a positive relationship between fertility and life expectancy (the line of dots rises to the right)? If so, that isn’t right. If you refer back to the original Plot graph, you will see there is actually a negative correlation.

As we talked about in class, the origin (i.e., the point (0,0)) is in the upper left hand corner and y increases down.

Our preferred solution for this graph will be to swap the ends of the Y scale’s range.

Adding Axes

The next thing we will do is add the axis labels.

You can basically copy what we did in class:

  const xAxis = svg.append("g")
    .attr("transform", `translate(0, ${height - margin.bottom})`)
    .call(d3.axisBottom(x));

  const yAxis = svg.append("g")
    .attr("transform", `translate(${margin.left}, 0)`)
    .call(d3.axisLeft(y));

This adds a new g to the SVG and then calls one of the four axis generators on the component (axisLeft, axisRight, ‘axisTop’, and axisBottom).

You will need to tweak the margins to give yourself room for the axis.

Add label titles

We should label our axes.

In class I showed you that we can just create a new text element and move it where we want it.

Here are the two examples from class for the X and Y axes:

svg.append("text")
  .attr("text-anchor", "middle")
  .attr("transform", `translate(${(width-margin.right + margin.left)/2}, ${height-5})`)
  .style("font-weight", "bold")
  .style("font-size","10px")
  .text("Data");

svg.append("text")
  .attr("text-anchor", "middle")
  .attr("transform", `translate(${margin.left - 20},${height/2}) rotate(-90)`)
  .style("font-weight", "bold")
  .style("font-size","10px")
  .text("Index");

Duplicate this for your plot, with “Life Expectancy” and “Fertility Rate” for the labels.

Don’t just take the numbers here as read – move the labels around until they look right. You may also probably want to tweak the margins again.

Note

At this point you don’t really need to see where the bounds of the SVG are and you can comment out the line the adds the border.

Add a color encoding

In the original graph, we colored the bubbles by cluster, we can do the same here.

Again, our tool of choice will be a d3 scale. The cluster variable, however, is not a quantitative scale, so we want something that will give us a nominal mapping. We also don’t want to use the banded scale we used to make the bar chart, as that doesn’t make much sense for colors.

We are going to use d3.scaleOrdinal. Technically our data is nominal, not ordinal, but d3 got rid of their nominal/categorical scale. The truth is that both are nominal and ordinal scales are scales with discrete domains and ranges. The difference really comes down to the choice of range. If we pick a range that doesn’t imply an ordering, we have a nominal encoding, despite the name.

So, near your other scales, create a new one called color using d3.scaleOrdinal.

For the domain, we want to provide a list of discrete values rather than a min and max. The fastest way to do this is to actually create a new empty array and then iterate over our data adding the cluster to the list if the list doesn’t already include it (using the includes method that is built into JavaScript Arrays).

For the range, we want a list of discrete colors. We could just make some up, but d3 comes with a collection of pre-built color schemes. We want one of the categorical schemes. For simplicity sake, we can just use d3.schemeCategory10.

To color our circles, we need to add a style(), and set the “fill” to the color we want. To use our new scale, we would write .style("fill", color(d.cluster)).

Add a size encoding

Now that we have color, it is time to add our final variable: pop encoded by dot size.

This will largely follow the pattern we have already established. Create a scale for the variable and then use the scale to set the attribute.

We want to encode the population as size, but the only control over the size that we have is r, the radius of the circle. If we are talking about “size” then we really mean “area”. The area of a circle changes with the square of the radius (\(area = \pi r^{2}\)). So if we just use the radius directly, the size won’t grow linearly, it will grow with the square of the input value.

The key to the solution is to realize that r is proportional to the square root of the area, so we want a linear scale based off the square root of the input value. D3 provides this with d3.scaleSqrt. This scale otherwise behaves like a continuous quantitative scale, so we can set up the domain and range the same way we created the positional scales, using min and max values for the domain and range.

To show you the difference in using the two scales, I have a simple data set: [1,2,4,8,16,32]. At each step we should see the value double. The visualization on the left, where we are just directly scaling the radius, the circles are growing too fast making the larger values look disproportionately large.

Code
function scaleDemo(){
  const data = [1,2,4,8,16,32];

  const radiusScale = d3.scaleLinear().domain([0, 32]).range([0, 25]);
  const sqrtScale = d3.scaleSqrt().domain([0, 32]).range([0, 25]);
    
  return htl.html`<svg viewbox="0 0 300 400" style="width:300px; height:400px">

  <g transform="translate(75,0)" style="text-anchor:middle">
  ${data.map((d, i) => htl.svg`<text x=50 y=${(i + 1) * 50} dy=10>${d}</text>`)}
  </g>

  <g fill="darkslateblue">
  ${data.map(
    (d, i) => htl.svg`<circle cx=50 cy=${(i + 1) * 50} r=${radiusScale(d)} />`
  )}
  </g>

  <g transform="translate(150,0)" fill="darkslateblue">
  ${data.map(
    (d, i) => htl.svg`<circle cx=50 cy=${(i + 1) * 50} r=${sqrtScale(d)} />`
  )}
  </g>
  <text x=50 y=375 style="text-anchor:middle">scaleLinear</text>
  <text x=200 y=375 style="text-anchor:middle">scaleSqrt</text>
  </svg>`;
}


scaleDemo();

For the domain, follow the pattern we used for the Y axis and use 0 to the maximum value (since this is a Sqrt scale, we don’t want to have to figure out what appropriate sizes are for both ends of the scale if we use extent).

The range can extend from 0 to width/20. This gives us some reasonable size circles that will scale if we change the size of the graph.

Once you have you scale, change the r attribute to use your new scale on the pop variable.

You will have some significant overplotting when we grow the circles. To combat that, we can make the circles slightly translucent. Add another .style() and set “opacity” to 0.6 (60%).

Tooltips

There are a number of different approaches to creating tooltips in D3. We are going to make use of a quick and dirty approach that won’t necessarily work in all situations (like mobile devices) and will require us to hover over the target for a second or two, but it will be good enough for right now.

We are going to add <title> elements to our circles. These allow us to add “title” data to any SVG element, which we can view when the mouse hovers over it for a short period (i.e., a tooltip).

The way we get these will feel a little strange. We are going to add an append("title") on to the end of the method chain that creates the circle.

.append("title")
.text(d.country);

This appends a title on to each of the circles. We then use the text function to set the text of the title.

Note

Since we have an append call here, we have ended the chain of functions that all return the circle. Any further chained methods will refer to the title element now.

One of the things you will notice when you are playing with the tooltips is that some small countries are completely behind larger ones, and the tooltip doesn’t work.

To solve this, we can sort the data by population so that the smaller countries end up on top of the larger ones.

Right after you make the new data05 variable, add this line:

data05.sort((a,b) => b.pop - a.pop);

Sorting in JavaScript is in-place, so we don’t have to assign the return value to anything.

Note

This will probably also result in all of your colors changing. This happens because the color assignments were done based on the order of the array passed to the domain of the color scale.

Matching the Plot version better

We won’t try to totally match the PLot version, but there are a couple of cosmetic things we can do to make it a little closer.

Adjusting the X scale

You may have noticed that the Plot version of this visualization has a lot more open space on the left side. For some reason the X scale looks like it goes down to 25 instead of the 44ish of our version. The reason is because we let the Plot version see the entire dataset, even though we are only plotting a small piece of it (recall that this is the different between using the filter option and passing Plot the already filtered data).

We can do the same by using data instead of data05 in our positional scales. Go ahead and make that change now.

Adjusting the axes

Our D3 plot has more tick marks than the Plot version. There are a variety of things we can do to change the ticks. In this case, we will just change the number of tick marks. So, for example, there are 6 tick marks along the X axis in the Plot version. We can duplicate that with:

.call(d3.axisBottom(x).ticks(6))

The Y axis can be left alone.

Getting rid of the axis line itself is more of a challenge. Here it helps to know the actual structure of the axis being generated.

Here is the HTML that is generated to produce that bottom axis.

<g transform="translate(0, 355)" fill="none" font-size="10" font-family="sans-serif" text-anchor="middle">
  <path class="domain" stroke="currentColor" d="M55,6V0H590V6"></path>
  <g class="tick" opacity="1" transform="translate(113.03903125211852,0)">
    <line stroke="currentColor" y2="6"></line>
    <text fill="currentColor" y="9" dy="0.71em">30</text></g>
  <g class="tick" opacity="1" transform="translate(203.71085011185684,0)">
    <line stroke="currentColor" y2="6"></line>
    <text fill="currentColor" y="9" dy="0.71em">40</text>
  </g>
  <g class="tick" opacity="1" transform="translate(294.3826689715952,0)">
    <line stroke="currentColor" y2="6"></line>
    <text fill="currentColor" y="9" dy="0.71em">50</text>
  </g>
  <g class="tick" opacity="1" transform="translate(385.0544878313335,0)">
    <line stroke="currentColor" y2="6"></line>
    <text fill="currentColor" y="9" dy="0.71em">60</text>
  </g>
  <g class="tick" opacity="1" transform="translate(475.72630669107184,0)">
    <line stroke="currentColor" y2="6"></line>
    <text fill="currentColor" y="9" dy="0.71em">70</text>
  </g>
  <g class="tick" opacity="1" transform="translate(566.3981255508102,0)">
    <line stroke="currentColor" y2="6"></line>
    <text fill="currentColor" y="9" dy="0.71em">80</text>
  </g>
</g>

The piece we want to hide is the path with the class “domain”. Everything else in there are the tick marks and their labels.

We want to restyle that path. We could do it with a CSS file, using the “domain” class.

We are going to do it with code, however.

All nodes in D3 have a nice method called selectAll(selector). The selectAll method will poke around inside of the node and look for elements that match the selector. The selector will be a CSS Selector string. In this case, the element we want has the class “domain”, so our selector can be ".domain" (the “.” indicates it is a class).

We could call selectAll on the axes (e.g., xAxis.select(".domain")), but we want to change both both path elements, so we will call it on svg, which will return all of the matching elements in the entire visualization.

svg.selectAll(".domain")

once we have a selection, we can use any of the methods we have already seen. So, we will use style to remove the stroke color.

svg.selectAll('.domain')
    .style("stroke", "none");
Warning

If it didn’t work, make sure that you add this to your code after both axes are already on the page.

(Optional) Challenge: Add a legend

We do not have a tool for automatically generating legends 1. The typical approach is to just think of it as another visualization.

To imagine how we might do this, we can think about what the structure might be. The swatches are easy – those are just <rect> tags. The text is just <text>. Beyond that, there are a lot of possibilities.

The model that I like is to wrap each swatch / text pair in its own g tag. That way, we can always put the swatch at (0,0) and the text at a fixed distance away. We then use translate on the g to lay out the row of swatches.

Then, we wrap all of those in a g so we can move it around the page as a single unit (if we want to).

Of course, to build this, we do it in reverse order. Append a g to the svg. Then, for each cluster, we create a new g and append it to the main g for the legend. Inside of this g, we create the rect and the text.

Where do we get the colors and cluster names? From the color scale! We can get the clusters with color.domain(), which when called with no arguments will return the domain 2. We can then just use the scale the same way we did before to get the color associated with the scale.

I’ve been vaguer about the actual steps here, but it is the same process we did earlier to get the dots (and even earlier when we made the bar chart). See if you can figure it out.

Reflection

Before you submit, make sure to answer the questions in reflection.md.

Submitting your work

Footnotes

  1. In truth there are packages out there that we can install, but they aren’t from the D3 core. They are just from folks who generalized this process.↩︎

  2. Alternatively, we also have the clusters array we made earlier to set the domain. That works as well, but getting it from the domain() is cleaner and is a good practice to adopt for moments when it is more work to recover the original domain.↩︎