Project Two - Exploratory Data Analysis

Published

September 29, 2025

Due
2025-10-13T23:59

Objectives

  • Gain experience performing exploratory analysis using visualizations
  • Build up your visualization skills
  • Learn about presenting findings as a mixture of text, tables and visualizations

Exploratory Data Analysis

Your goal with this assignment is to gain an understanding of a dataset and develop insights about it. An insight is a form of synthesis, the connecting together of pieces of information in a way that is not necessarily initially obvious. A minor insight we had during our exploration of the employment data was that everyone who scored over 60% in their training got promoted, and no one who had been with the company more than 20 years scored that high. This, of course, raises more questions, many of which we can’t answer with the given data set.

Here is another example exploratory analysis that looks at the movie database: https://observablehq.com/@uwdata/a2-example-movies-data.

For the dataset, I would like you to explore the Spotify Top 200 Charts dataset. I am leaving it to you to download the data (it should go in src/data – you will need to make the data directory).

Teamwork

This will be a team project. You should find a partner to work with you. If you don’t have someone to work with, post a message on CampusWire under General.

You should use a pair programming approach to this project. In other words, you should never be working on the assignment alone. In pair programming, there is only one keyboard. The person in front of the keyboard is the driver. The second person is responsible for maintaining the bigger picture as well as catching typos, etc. There should be constant communication between the two. This should be a noisy process. If it is quiet, then it means the driver has taken over and not letting the other person participate. Every fifteen to twenty minutes, you should switch roles.

Process

As in the tutorial, you will freely mix visualizations and commentary on the same page.

After importing the data, start by getting a feel for the data. What are the variables? What are their ranges? What is an observation in this dataset?

Then, before you go any further, write down three questions you would like to investigate (these are just the starting place – you may not answer any of them, or you may end up doing a deep dive on one that leads to more questions). Next, try some quick visualizations to give you a sense of the shape of the data. What is present? What is missing? Look for relationships between variables and places where the data may be missing or otherwise messy.

Once you have a reasonable overview, start to dive into each of your questions. Start making a visualization that might address your question and then see where it takes you. Did it not answer the question? Did it raise any new questions? If you find that your questions weren’t detailed enough, or you want to revise your questions, that is fine.

At the end, I would like you to produce a minimum of six insights you gained from the dataset. These should be explained in text, with visualizations used as evidence.

Deliverables

Getting started

As usual, I have prepared a Framework template for you to start from.

Important

Only one of the members of the group should accept the assignment. You should share the repository.

  1. Create the git repository for your project by accepting the assignment from GitHub Classroom. You will have the option to create a team or join an existing one. Make sure to coordinate with your partner so you know who is going first (and thus creating the team).

  2. Clone the repository to you computer with git clone (get the name of the repository from GitHub).

  3. Open the directory with VSCode. You should see all of the files down the panel on the left in the Explorer.

  4. In the VSCode terminal, type pnpm install. This will install all of the necessary packages.

  5. Open package.json and fill in your names and email addresses under “contributors”.

  6. In the terminal, type pnpm dev to start the development server.

Project structure

I have provided you with two pages index.md and exploration.md.

exploration

Start you work in exploration.md. This is your scratch space to conduct your exploration.

I expect to see your initial questions at the top of this page. The rest of the page should be similar to the tutorial. I expect to be able to see your thought process as you feel out the shape of the data and then start asking questions and using visualizations to answer them. I know that it is easy to get in the flow and just start making pictures and forget to maintain your thought process, but I advise staying on top of the writing. Insights can be ephemeral, and fade away with time.

If you want to start additional pages to pursue separate threads, or because the page has gotten too long, that is okay. Just create another file at the same level as the others and it will show up.

I advise against the two of you just carrying out your own separate investigations in different pages though. You should be working together.

report

Once your investigation is complete, I would like you to use index.md to write up a short report.

Start by putting your names at the top of the page.

Then, for each insight, state the insight and then using visualizations and prose, present the evidence for your insight. You can use as many visualizations and text as you feel necessary to convince me that your insight is valid.

Use Markdown formatting to make the divisions between insights very clear.

Your insights do not need to be connected or part of some larger insight (though many insights are hierarchical, building on other insights, so you may find that one of your insights is really a combination of smaller insights). Ideally your insights should be non-trivial and not just variations on a theme.

Aa stated in the tutorial, you should be focused more on the structure of the visualizations rather than making perfect looking visualizations. Try not to leave ugly messes around during your investigation, but don’t spend too long trying to come up with the perfect color scheme either.

In the report portion, I expect clean visualizations. They should be easy to read and not have text clipped off by boundaries, or be too small to read, etc… They can be aesthetically pleasing. However, you should not spend time theming them.

Reflection

I would like a reflection from each of you, so I am asking you not to submit these with the repository.

Instead, create a blank markdown file and answer these questions individually:

  • Is your project complete?
  • If not, what remains to make it complete?
  • What was most challenging about this project?
  • What was the most interesting part of this process?
  • What questions do you have after doing this project?
  • What resources did you use (if any) to help you complete this project?
  • How was the experience of working with your partner?

Submit this file in the Project 2 reflections assignment.

Submitting your work

  • Commit your work to the repository (see the git basics guide if you are unfamiliar with this process)
  • push the work to github
  • submit the repository on Gradescope (see the Gradescope submission guide). _only one member of the group needs to submit. Just make sure both names are listed.