Age v. Total Rides

Gender v. Total Rides

This visualization was created by Grace Benz and Miriam Nielsen as the final project for Information Visualization, a Middlebury Computer Science Course.

Data Transformation:

One of the trickiest parts of this project was getting ourselves some useful data. For the map of New York City that depicts all of the CitiBike station locations, we used a json file from Zillow, a real estate company, that divides the city into neighborhoods. Then, we simply projected the station locations, taken from the CitiBike website, onto the map of the city. One problem that we encountered was that the Zillow neighborhood data neglected to include several areas in their dataset. The neighborhoods that would be Brooklyn Heights and Greenpoint simply don't appear, so it seems as if several of the stations are floating in water. To get the data for the chord diagram, we had to do many meticulous data transformations. Basically, we needed a trip matrix that showed the starting and ending neighborhood for every trip. The CitiBike trip data only included the station ID's or the street address of the station, so the first challenge was transforming the exact location into a neighborhood. To do this, we made a dictionary with the station ID as the key and the neighborhood as the value (using Google's reverse geocoder). Then, we were able to run through each trip and assign it a starting and ending neighborhood. They we organized the data into a matrix and were able pass it into Bostock's code.

Intended Audience:

We hope that this project will provide a useful model for Citibike either to replicate or use while determining which neighborhoods have the most traffic to and from. We are also providing them with simple visualizations showing the breakdown in age and self-identified gender of their subscribers, which could be useful in terms of marketing and advertising. Additionally, our visualizations could prove useful or interesting to NYC commuters and tourists who are interested in knowing the most popular biking locations. This could be taken a further step by city planning folks who could decide specific areas to focus on creating better bike infrastructure. Finally we hope that other cities will take note of our visualization and the current success of Citibike and consider implementing their own bike-share programs.


The purpose of our visualization is to look at commuter trends, identify commercial and commuter centers and increase the visibility of Citibikes. We decided on a multi-graphic visualization in order to improve the visualization's that Citibike already provides. Currently they have a few difficult to read bar charts and a map of the station locations. We chose to create an interactive chloropleth map and chord diagram supplemented by two bar charts in order to convey information that we deemed valuable.
The station map is a simple map of the station locations in lower Manhattan and Brooklyn, divided by neighborhoods. Notice that a few stations look like they are in the 'ocean' or unlabeled white space. This is because those areas do not have official neighborhood designations. Because there were very few stations located in these areas, we decided to group them with the nearest most reasonable neighborhood for the purpose of the chord diagram.
Our initial plan was to map bike routes from station to station, or have an artistic rendering of the bike trips throughout the day. However it proved too difficult with the data that we had to guestimate each bike's route between two stations. We moved on to a chord diagram because it is an excellent way to visualize bikes moving to and from stations. We can easily see the percent of rides from one station to the next. We chose to supplement the chord diagram with the chloropleth map to give viewers not as intimately familiar with NYC and Brooklyn neighborhoods a reference as well as an idea of how many stations are within each neighborhood.


We think our visualization is successful at showing an overview of the Citibike activity during a given month (currently it displays the data for February 2014, due to that month providing a slightly smaller dataset, however we could also configure data from any other month). Currently a viewer can easily assess the frequency of travel between different neighborhoods as well as the major age and (self-identified) gender trends. If we were to do this project again, we might shift gears away from neighborhoods and perhaps either define our own sections or go by zip-code, that would remove the blank areas on our map. Additionally, it would have been neat to include a street map on top of our neighborhood map so viewers could have a more accurate view of the station locations. If we implemented this, we would also want to include zooming along with the project.

Data Sources:

Citibike Data Stations Data , Citibike February Ride Data, Neighborhood's Json, and Google's Reverse Geocoder

Additionally, a substantial section of our code utilized to draw the chord diagram was taken from Mike Bostock. Here provides an excellent example of Uber Bikes by Neighborhood here.