CS 465 Information Visualization

Project Two - Exploratory Data Analysis

Objectives

  • Gain experience performing exploratory analysis using visualizations
  • Build up your Vega-lite visualization skills
  • Learn about presenting findings as a mixture of text, tables and visualizations

Exploratory Data Analysis

Your goal with this assignment is to gain an understanding of a dataset and develop insights about it. An insight is a form of synthesis, the connecting together of pieces of information in a way that is not necessarily initially obvious. A minor insight we had during our exploration of the employment data was that everyone who scored over 60% in their training got promoted, and no one who had been with the company more than 20 years scored that high. This, of course, raises more questions, many of which we can't answer with the given data set.

Here is another example exploratory analysis that looks at the movie database: https://observablehq.com/@uwdata/a2-example-movies-data.

For the dataset, I would like you to explore the Spotify Top 200 Charts dataset. I am leaving it to you to download the data and load it into Observable (there is a tab on the right that looks like a paperclip -- that is a good place to start).

Teamwork

This will be a team project. You should find a partner to work with you (one group will need to have three as we now have an odd number of students in the class). As the results from my poll were split right down the middle we will try self-selected groups for this assignment. If you don't have someone to work with, post a message on Slack in #general. When you have your group, please go to Canvas and add yourselves to one of the Project 2 groups (you will find the groups under the 'People' tab.)

You should use a pair programming approach to this project. In other words, you should never be working on the assignment alone. In pair programming, there is only one keyboard. The person in front of the keyboard is the driver. The second person is responsible for maintaining the bigger picture as well as catching typos, etc. There should be constant communication between the two. This should be a noisy process. If it is quiet, then it means the driver has taken over and not letting the other person participate. Every fifteen to twenty minutes, you should switch roles.

In this age of Covid, I suggest actually using two laptops, and making use of the Share feature in Observable to share the notebook. This will allow you to watch the other person type in real time, and will allow you to switch roles very rapidly.

Process

As in the tutorial, you will use your notebook both as a tool for exploration as well as a presentation tool to report your findings.

After importing the data, start by getting a feel for the data. What are the variables? What are their ranges? What is an observation in this dataset?

Then, before you go any further, write down three questions you would like to investigate (these are just the starting place -- you may not answer any of them, or you may end up doing a deep dive on one that leads to more questions). Next, try some quick visualizations to give you a sense of the shape of the data. What is present? What is missing? Look for relationships between variables and places where the data may be missing or otherwise messy.

Once you have a reasonable overview, start to dive into each of your questions. Start making a visualization that might address your question and then see where it takes you. Did it not answer the question? Did it raise any new questions? If you find that your questions weren't detailed enough, or you want to revise your questions, that is fine.

At the end, I would like you to produce a minimum of six insights you gained from the dataset. These should be explained in text, with visualizations used as evidence.

All of your work should be in your notebook. I expect to see a textual description of why you are creating each visualization, explanations for what you found, and the new questions that it raises.

The nice feature of the notebooks is that you can experiment and refine visualizations. Please make use of this facility. Don't leave unfinished charts around unless they are an important part of your analysis.

Deliverables

Your notebook is your deliverable. I expect to see a couple of things in the notebook:

  • At the top, I should see your names

  • During the investigation I expect to see clear documentation of the thought process. It is okay to go back and flesh this out if you are in the flow, but I advise writing something to capture what you were thinking at the point when you made a visualization. Insights can be ephemeral, and fade away with time.

  • At the bottom of the notebook, collect your six+ insights together. This section should be clearly marked off (use a horizontal line, heading text, etc...). In this section clearly state each insight and provide the evidence (in the form of visualizations). It is fine to copy visualizations from above. However, these should be fully polished, with intelligible titles and labels. To reiterate, I expect to see a somewhat messy process at the start of the notebook (akin to the tutorial), with a clear report at the end where you present your insights with the visualizations you used to arrive at them.

As with our other assignments, notebooks should be shared with me, and the URL submitted on Canvas.


Last updated 10/28/2021