On a previous post, I wrote about how I made a data visualization with Python, Cartopy, Matplotlib, and Imageio. Today, I’d like to write about my experience with Tableau, a major data analytics platform. After the break, follow me as I give Tableau a go and make a visualization about red light traffic camera violations in Chicago.
What You See…
Alright, first off, I’d like to apologize about how bad it probably looks on this page. It looks a lot better fullscreen on desktop, though, I can’t give any guarantees on how it looks on other platforms or devices. I’ll talk more about layout later.
Second, what you’re seeing is a dashboard, a compilation of worksheets I’ve made in Tableau put all together. The main visualization is the map, each dot corresponding with an intersection, with its size and color indicative of how many violations the intersection saw within the date range. To the right, you can see the top 10 intersections that saw the most tickets, as well as a boxplot of total ticket value. You can adjust the start and end dates, and all the visualizations, the map, the list, and the boxplot, would change. You can also highlight specific intersections and see them get highlighted on the map, boxplot, and list (if applicable).
…Is What You Make of It
If you want to get started with Tableau, you can follow in my footsteps and use Tableau Public. Using the Tableau Desktop Public Edition app, you can create visualizations and do your own data analysis, save it to your profile, and share it to the world. If you do see yourself wanting to get in the world of data analysis and visualization, I do encourage you to give it a try.
Once you have that all set up, now you need some data to play with. I love Chicago a ton, and they have a data portal I snagged the red light camera violation and location data from. You can also try Kaggle, a site for data scientists who want to share their work and data.
This is where my first hurdle was – cleaning the data. The data I have is pretty complete, but, a good load of intersections did not have any geographic data. That’s why I needed to grab the location data – I used it to add the missing latitude and longitude data Tableau needs to plot it onto the map (and learn a little more about Excel in the process). Even with my efforts, I had to delete seven intersections since there was no geographic data I could find within either dataset. As I write this, I realize I could have Googled the geographic locations of the intersections, but, I really wanted to get right into it right away, but, I could probably update it in the future. Missing data is never fun, and how you remedy it is entirely situational, but I found this to be a decent solution.
Tableau is neat with how you’re able to connect to data. Aside from Excel spreadsheets, it’s able to take CSVs, JSONs, Access, PDFs, and networked locations, to name a few.
The very first place I stopped on my Tableau journey was the Data Source tab. Overall from my initial impressions, Tableau is pretty drag-and-drop. Find the sheet with data, drag it in. I also noticed some database-style set operations you can do with multiple sheets of data. Depending on how your data was entered, you can do some modifications on the fly. Two of the things I needed to do was split (break up the data in a column on a delimiter) the Violation Date column to get the date only and then parse it as a date (right-click the newly made column from the split, “Create Calculated Field…”, and use the DATEPARSE() function). Similarly, I was able to create a column for ticket costs by creating a calculated field and multiplying Violations by 100 (the cost of a red light camera ticket). And now, I was set to create some visualizations.
Here’s a pretty quick rundown of what I learned playing around in Tableau and Googling what’s what while making the map.
- Data fields are generated by the columns of your data. They are assigned a data type (i.e. integer, string, date), role (dimension or measure), and span (continuous or discrete)
- Dimensions contain qualitative values that can be used to categorize, segment, and reveal details in data.
- Measures contain numeric, quantitative values that are measurable and can be aggregated.
- Discrete and continuous data appear as blue and green data, respectively.
- You can drag-and-drop data onto the workspace, and Tableau should be able to figure out what kind of plot you’re going for. Otherwise, the Show Me button in the top right can let you change what visualization Tableau will generate. If you’re making a map-based visualization, you can drag the Longitude measure into the Column shelf and the Latitude measure into the Row shelf.
- If you have explicit Latitude and Longitude columns with decimal numbers, Tableau will automatically have these made into geographic measures.
- You can drag measures and dimensions into the Color, Size, Detail, and other elements of the Marks section. Tableau will automatically handle these and adjust the marks accordingly. You can tweak the colors and sizes later on.
- If you’re using maps, the Map menu gives you a lot of ways to tweak the map to make it just right for your visualization. In my case, I tweaked the default map layers to include streets and highways, and I tweaked the color.
- If you have time-based data, you can create a date range filter and have user-set parameters so you can make the visualization more interactive.
- You can use highlighters to isolate specific points but maintain the context of the rest of the data. Here, I can highlight a specific intersection and compare it to the rest of the intersections around it, without needing to hunt for it within the map, or switch back and forth between marks.
- As you’re constructing the visualization, you have a lot of control over how everything is laid out. So, make it as pretty and user-friendly as you can, wherever you end up putting your analyses.
I was also able to generate a boxplot and list. Giving all of them highlighters, users are able to isolate specific intersections across all the visualizations simultaneously. With the list, I created a set that kept only the top 10 earners within a given period. The date range filter is able to affect all the sheets I made, which is nice when I put it all together.
Once you’ve made individual visualizations within sheets, you’re able to compile them all into a dashboard. Dashboards are where you pull them all together and tell your data story. Once you save, it’s saved to your Tableau Public account to be viewed.
And Here’s What I Made of It
Let’s actually talk about what we can learn about the red light cameras in Chicago. There are 183 red light cameras I took a look at in the dataset (190 in actuality, if I had time to include the ones I deleted). All of these cameras together generated almost $230 million dollars in fines within the four years of data I have access to. However, the median amount of fines generated is around $900,000. The middle 50% of red light cameras make somewhere between half a million and around $1.3 million dollars in total.
Probably the most interesting thing to look at are the outliers. 18 cameras have stood out from the rest, generating almost 36% of the total ticket value. For the most part, the outlying cameras are quite separate from one another. Something inherent to these intersections and the way traffic works through them just makes them so good at bringing in the violations. Maybe these are older cameras, or maybe there’s something about how traffic works in these areas. I don’t know what it is, but it’s worth exploring.
Overall, Tableau is definitely a valuable tool in this data-driven world, and it wasn’t too hard for me to pick up. And there’s something about Chicago’s red light cameras, especially those well beyond the interquartile range of the rest of the cameras. While I have the basics of it down, I definitely have a bit of a ways to go before I can make the kinds of visualizations that make data look like art. But, I’m not deterred. We don’t make mistakes – we make happy little visualizations.