Posts in the ‘Data Visualization’ Category
I keep hearing about Density Design’s RAW tool and had it filed away in the back of my mind to try out sometime. As usual, necessity is the mother of actually doing something. So, I did something, and now I am a happy camper. What a great tool!
I wanted to show a client how an alluvial diagram would do a good job of expressing some relationships in their data and was not thrilled at the prospect of trying to mock it up in Illustrator. In my search for an easier way, I stumbled across RAW again and had a vector sample in minutes. Brilliant.
Here’s a sample output (not the actual client’s data):
Thanks Density Design! Sad I waited so long.
Why not use all the senses to understand data? It would be interesting to layer in a comparison set of data and hear how they interact.
What we’d like to see next: odor graphics. A market graph that stinks when it sinks.
I came across this visually arresting depiction of gun murders today, thanks to a tip from my friend, Kimball. It was created by the folks at Periscopic, whose mission is to “do good with data”. The animation packs a punch, and when it’s finished, you have a number of options for diving into the data, including getting a sense of the individuals affected.
The arcs in the graphic show how long the murdered person might have lived, trying to give a sense of “stolen years”. I was a bit skeptical when I saw some of the lines showing a life expectancy in the 90s, but reading the notes on methods and sources, I see that each individual line is based on a the age distribution of deaths (not the average life expectancy). Meaning that there would logically be a few people that would make it into their 90s (and some that would die at 50).
I know this is a topic on a lot of minds right now. Not sure where I come down on gun control, but I do find that this exploration raises a lot of questions in my mind. Like a good data visualization should.
In an interesting coincidence, the original dataset was researched by Jerome Cukier, who is helping me with a project right now.
Here’s a nice interactive example of making data friendly to the average human. What country would you like to live in? With the OECD Better Life Initiative, you can pick what you care about (Environment, Work-Life Balance, Health, etc.) and see which countries rise above the rest. Another interactive wonder from the brain of Moritz Stefaner. He’s the same designer who created the Notabilia Wikipedia project that I posted about a while back. Looks like it’s time for me to move to Australia.
Palo Alto Networks launched the data visualization that we created yesterday with the release of their Application Usage & Threat Report. It’s a depiction of network traffic collected from 3,000+ organizations. The visualization gives you a sense of the applications that eat up the most bandwidth and represent the greatest risk. There are many ways to slice and filter the data, facilitated by the capabilities of the d3.js library. Many thanks to Jérôme Cukier for his coding expertise to bring the concept to life.
Another piece of the project was to create this related infographic.
I very much enjoyed delving into the world of moving spheres. What is it about us that is drawn toward playing with bubbles? Looking forward to more projects like this.
Came across this fascinating interaction from the New York Times, doing research for a client project. It was interesting just as a static image with a few rollovers, but then I clicked some of the links up top (types of spending, changes) and things started flying.
I like how it invites interaction. The playfulness of the motion may be a little distracting from the data, but I think it does make it more “sticky”. Try clicking back to the “all spending” tab after exploring the others – interesting to see that the individual bubbles don’t exactly fall back into their original places. I guess the budgeting process is messy like that.
Thanks to Jim Vallandingham for the link.
Fast Company magazine recently featured this beautiful rendition of the ocean’s currents on their blog. It was put together by the visualization geniuses at NASA’s Scientific Visualization Studio and can be viewed in a variety of formats (including an iPad app).
It’s worth downloading one of the high resolution video formats from the NASA site.
I recently discovered Data Stories, a podcast devoted to data visualization hosted by Moritz Stefaner and Enrico Bertini. I have been listening daily in the car on my way from here to there and have made it up to Episode #7 – Color. I think I’ve found my tribe.
Hans Rosling is an entertaining and compelling presenter. He uses the now-Google-owned Public Data Explorer technology (developed by his organization Gapminder), to take you on a journey testing your concept of the developing world.
I recommend watching him in this TED talk delivered to the U.S. Department of State:
I was reading an article in Fast Company this week called The Visible Man by J.J. McCorvey about being black in Silicon Valley. A statement on the first page caught my attention: “…of [Google's] 46,000 employees, just 2%—and just 1% of its technology workforce—are black.” Followed by: “In case you were wondering, blacks make up 13% of the U.S. population.”
Comparing 1 or 2% of the Google workforce to 13% of the U.S. population felt dramatic, but incomplete. I determined to try to get more of the story. Where does the supply chain of black tech workers get pinched? Here’s what I dug up:
I recognize that these are imperfect proxies for the workforce pipeline that leads to Google, but think they are worth exploring nonetheless.
It appears that black high school graduates are well represented in their interest in STEM subjects, but not well prepared — that’s the biggest drop-off (from 13% to 5%). The numbers hold steady for black high school graduates and those that get degrees in fields that might prepare them for a job at Google (5% in each case). And then the 5% to 2% drop from grads to Google hires. Clearly, there’s work to be done.
The Asian and Hispanic stories are also interesting. Qualified Hispanic high school grads do not appear to go on to get as many degrees in CS, etc. while the Asian representation in tech grows with each step.
As with many explorations of data, this may lead you to ask more questions than it answers. It did for me. Google is not dissimilar to other tech companies who have shared the same data, but I’d like to see actual employment data for a broader group of tech companies. That data is only just beginning to be shared with the public. Open Diversity Data is a good source. What isn’t public — and isn’t likely to become public — is data on the ethnicity of job applicants. That’s an important piece of the puzzle and one I imagine Google and others concerned with workforce diversity examine internally.
Also interesting to look at the ethnicity of the population in the four counties within a commute-friendly distance of the Googleplex. I know people relocate to work at Google, but the ethnicity of those who live nearby has to be a factor. Looking at the local population of those counties (Santa Clara, San Mateo, San Francisco, Alameda), Asians are perfectly represented at Google at 30% of the population and 30% of the workforce. And whites are far over-represented (36% of the population, 62% of the Google workforce).
Some notes on data sources
I used 2013 U.S. workforce ethnicity numbers from the U.S. Bureau of Labor Statistics — slightly different than the total U.S. population that was referenced in the Fast Company article, but more relevant in this case.
Data on high school graduate interest levels in STEM (science, technology, engineering, mathematics) by ethnicity comes from the people that administer the ACT test. The latest data is from 2014 graduates. ACT tracks both expressed and measured interest. The numbers I include are expressed, measured or both. They also track which of the students interested in STEM meet standards in math and science, by ethnicity. Ideally, I’d have numbers for those meeting standards in math or science or both, but since I couldn’t discern where any overlap might be, I chose to use the math standard — it was met at a slightly higher level across the board than science.
Data on graduates with college degrees (bachelors, masters, PhD) in Computer Science, Computer Engineering, or Information comes from the 2012-2013 Taulbee Survey. Information degrees include Information Science, Information Systems, Information Technology, Informatics and related disciplines.
The way ethnicity is recorded is (thankfully) fairly consistent across the different data sources. Although there were some inconsistencies (some included options for “Other” or “race not stated”, and others didn’t, for example), the discrepancies don’t materially affect the numbers that we are examining here.
I’m interested in your comments, here or on Twitter.
Stumbled across this interesting interactive work of Ben Fry’s – a great example of visualizing large amounts of data in a cohesive way. He has visually shown all the changes made by Charles Darwin to his classic On the Origin of Species over six different editions. The book went from 150,000 words to 190,000 by the sixth edition, with some interesting edits along the way (including a significant addition to the closing paragraph).
I think what I like most about it is the clear illustration of how the scientific process can lead to continual learning and refinement of ideas. Keep your mind open. Take a look.
A data-heavy project I’ve been working intensely on the last week or so was released yesterday. It’s a statistical review of corporate governance practices since Sarbanes-Oxley, done by the law firm Fenwick & West.
I enjoyed wrestling with Excel and Illustrator to create histograms, box and whisker plots and a few original creations. And the client was great to work with – detail-oriented, appreciative of good design, understanding of complexity. You can download the full report here: Corporate Governance Practices and Trends
It’s amazing how much more understanding you get out of a well-designed visualization than a spreadsheet of numbers. We went from something like this:
Is it strange that I love graphs so much?
Here’s one person’s list of the “12 Great Visualizations that Made History“. I’m in general agreement, although I think #11 (the gold plaque on the Pioneer 11 spacecraft) can’t count until we hear back from the aliens.
Reading the raw data, or even a well-written description, doesn’t have the same impact on understanding as an effective visualization. Like in this famous image of how to pack a slave ship (#2 on the list):
Several people have pointed me to a well-crafted data visualization by Pitch Interactive showing data for U.S. drone attacks in Pakistan. Like a good visualization should, it answered some questions and sparked a few more. To answer some of the questions that came to me, I put together this graphic that gives another angle on the same data.
The bands of color represent total deaths by victim category. From the notes on Pitch Interactive’s site, it sounds like the victim categories can be a little fuzzy, with the definition of “other” depending on who is doing the defining. The Obama administration calls an able-bodied adult male a military combatant if it has not been proven that they are a civilian – here those are classified as “other”. The death totals are approximations too, as many reports include a range (e.g. “6-10 people were killed in a drone attack…”).
Assuming the primary targets of the drones are the so-called “high-profile” combatants, with other combatants as secondary targets, I calculated percentages comparing all civilian deaths (children + civilians) to non-civilian deaths (high-profile + other) to see how often the drones were on target. Those are the dotted lines. This data is up to date through a few days ago (March 22, 2013). I encourage you to check out the interactive visualization here: drones.pitchinteractive.com
This week, my sister tipped me off about The Atavist, a new take on multi-layered storytelling via iPhone or iPad apps (also available on Kindle and Nook). Threestory Studio got its name in part because of my interest in telling stories visually, so I was intrigued to see what The Atavist had to offer.
One of the first things I discovered was a rich infographic showing the events leading up to the fall of the regime in Egypt. It combines a timeline of events with web traffic data and social media engagement in Egypt.
It wasn’t immediately clear what the black bars rising from the bottom were – they appear to indicate numbers of people involved in protests or revolutionary activities. Otherwise, this graphic receives high marks.
Just discovered LinkedIn InMaps today. A good example of interactive information graphics that can lead to discovery. Interesting to find the connections that bridge groups. Like the “I didn’t know Tony knew Larry!” moment.
This fact would have been discoverable just browsing through my connections on the standard LinkedIn site, but seeing the whole network mapped in one place removes a lot of barriers to this kind of discovery.
The zoomed-out view shows an accurate picture of my circles – the smaller clearly defined orange is a networking group I’ve been closely connected to for over a dozen years, the dense multicolored cluster opposite are my various church connections, with family mixed in. In between are various work and school connections that are scattered and less well-defined.
It’s not hard to create your own. Try it here. I’m curious to see what other people’s networks are shaped like.
Moritz Stefaner is a freelance designer in Europe creating some beautiful and data-rich visualizations. I came across his Notabilia project yesterday, after following a lead from someone at The Leonardo.
It maps the collective editing process for Wikipedia articles up for deletion. Right-leaning red segments are votes to delete; left-leaning green ones are votes to keep. The shape of each branch is an excellent mapping of the shape of the discussion. And the collection of 100 branches makes a lively, energetic whole that begs to be explored.
Projects like this excite me about the power of information design to bring things to light that aren’t easily discernible any other way.
“Things change. Now what?”
That’s the tagline for a new website examining statistical indicators of how life in the U.S. has changed over the last few decades: startlingstats.com. Threestory Studio created all the graphics and designed the site. The aim of the site is to wake people up and promote dialogue about life “as we know it”. I use quotes there because who is “we” and what do “we” really know are all part of the debate. I hope you’ll take a look at the site and leave a comment or two.
Threestory Studio’s second data visualization project with Silicon Valley law firm Fenwick & West was released to the public this week. This report looks at trends in venture-funded deals in the life sciences.
Though not as extensive or complex statistically as the first one (Corporate Governance Practices and Trends), this one presented some interesting challenges in presenting data clearly, accurately and concisely. I’m happy with the results.
Thanks to my watchful nephew Christopher P. for pointing me back to my Alma Mater to see this nice interactive visualization linking college majors to career choices.
It was put together by Williams College math students and their professor using CIRCOS visualization software. Rolling over the thumbnails allows you to isolate the paths from individual majors to careers. Nice use of colors to organize majors within larger groupings. I majored in Music (Composition) – not sure if I’m represented by the path to arts/entertainment, writing/communication or other. I’ve always preferred to be uncategorizable.
I’m guessing Williams, as a liberal arts school, may have a more evenly distributed set of careers than some other schools. I’d be interested to see how this same distribution looks for other schools of different types and sizes.
Check out this interactive look at the scale of the universe. Puts things in perspective, from a Planck length to the observable universe:
I’m impressed with two things about the World Bank’s approach to data: first, their commitment to openness in sharing data with the world, and second, their devotion to data visualization. They have also done a nice job inviting exploration with the way they have organized their data website.
Interesting to see that the mortality rate for children under 5 in the US is about 37% higher than the average of high income OECD countries.
And the US spends way more on healthcare than almost any other country in the world. Maybe we’re not spending it on the young.
Note: The World Bank is the source of data for the sample I gave a while back using the Google Public Data Explorer.
I love this stunningly clear, easy-to-use, and information-rich interactive piece that encourages exploration of VC investments. It works well on many dimensions. Looks like it’s a collaboration between Accurat Studio, Ben Willers and Visual.ly, with all the data being drawn from the CrunchBase API, so it should stay as up to date as CrunchBase is.
Some detail of the interaction.
It’s much more interesting to play with it than read about it, so go take a look.
Reading Edward Tufte’s “Envisioning Information“, I came across a simple graphic, originally published in the Chinese mathematics treatise Zhou Bi Suan Jing (or Chou Pei Suan Ching) that impressed with its simplicity. It’s a visual, geometric way of proving the Pythogorean Theorem that was published around 2,000 years ago. If you compare it with Euclid’s proof, this picture is worth about 500 words.
Any mathematicians out there know that there are many ways to prove the theorem attributed to Pythogoras – I like the simple elegance of this one. Here’s my redrawing of the Chinese original.
With the presidential election fast approaching, interest in the predicted outcome is high. I’m impressed by detailed data graphics on Nate Silver’s FiveThirtyEight blog for the New York Times, not only for their clarity but for his thoroughness in examining the data.
I guess we’ll see about the accuracy of all these predictions after November 6.
Trust the Olympics to inspire some innovative data visualization. Thanks to friend Peter F. for the tip on a nice series of visualizations from The New York Times that compare today’s winning sprinters, swimmers and jumpers with past medalists.
Interesting to see the steady march forward over the years in swimming and sprinting.
The promised infographic résumé tool that I mentioned a few posts back has launched at Vizualize.me. It’s a customizable infographic interpretation of your LinkedIn profile, to which you can add skills and other experience. Using LinkedIn to populate the infographic gives a jumpstart to the process. Seeing work experience in a timeline makes a lot of sense, though the scale of the education timeline differs from work experience in a way that gives a distorted view. See my full infographic CV:
I was thinking about the visual display of uncertainty today and came across this nice example from a weather site in Norway. It shows the probable range of future temperature and precipitation levels for the city of Oslo. This is a good solution for something I’ve puzzled about for a while: When I hear that there’s a 30% chance of rain, I’m always asking myself “a 30% chance of how much rain?” A 30% chance of a light sprinkle is a much different forecast than a 30% chance of a deluge.
It would be interesting to know what factors go into the variability of the forecast. I imagine that the further out in time the forecast is, the more uncertain it would be, but there are obviously other factors that affect probability as well.
I also like their “detailed meteogram” with an hour-by-hour view of precipitation, temperature, and pressure, enhanced by an elegant indication of wind speed and direction, and topped off with an artful visualization of cloud cover.
Makes me proud to be 1/8 Norwegian.
I recently completed an online class, Data Visualization and Infographics with D3, taught by two great teachers: Alberto Cairo and Scott Murray. I have worked on a few D3 projects from the design side before, but this was my first real foray into doing the code myself. For class exercises, I picked a dataset to work with that I cared about: youth suicides.
Clusters of suicides among young people in our community have, understandably, caused much concern. The school my children attend is highly competitive and full of students motivated to do well. One huge concern in the community is that school pressures are a major contributor to these tragic deaths. This has led to many discussions about homework, high expectations, class schedules, parental pressure, and more, with a strong undercurrent among parents and educators of a desperate need to change something.
The message I get from my son (a junior), is that the school is not the problem and the system shouldn’t be changed as drastically as some propose. He and many classmates feel like the proposed changes diminish the educational experience and are senseless.
All of this made me want to know if there really was an alarming trend here or not. How does our community compare to others? What does the data say?
I was initially relieved to see that our state and county were below the national average. Suicide data is not reported on the city level, so I tried extrapolating from what was available anecdotally for Palo Alto (a collection of publicly known cases). I was relieved again, until I realized I was extrapolating against the population of the whole city instead of the age-specific population that relates to the data. A more accurate estimate suggests that we are definitely on the high side.
With the small sample size of a single city (or even some of the smaller states), the data gets jittery. Pretty soon you are looking at individual lives – probably helpful if you really want to understand causation, though less helpful for seeing trends. I may need to take a look at three year moving averages to smooth out some of the jitter and see if that clarifies any trends.
My son saw me working on this and encouraged me to pull in the comparisons to national and county data for a clearer picture. When he saw the graph, he said “You have to share this!” – the power of accurate data displayed clearly.
Part of the challenge for the community discussion here is that one suicide is too many, so talking about comparative data can feel cold and dehumanizing. What wouldn’t we do to save even one life? The potential problem comes when you change whole systems based on a handful of tragic cases, and then later realize that you damaged the system and didn’t solve the problem you thought you were solving. I hear echoes of this challenge in what I have been reading in Daniel Kahneman’s book Thinking Fast and Slow regarding loss aversion and the way humans respond to risk.
As with many things, it’s complicated.
This has been a valuable, if painful, discussion in our community, causing us to examine what we really value and how that gets reflected in our education system. I hope some clear data can contribute positively to the conversation.
First impression: it’s a whole lot more windy over the oceans than it is over land. Intuitively obvious (even to the casual observer, as an old friend used to say), but the visualization drives the point home instantly. Now, what happens if we combine global wind data with ocean current data? It would be interesting to see how they interact.
Thanks to Andy Kirk at Visualising Data for pointing this out.
Looks like you may soon be able to create a visual version of your resume in “one click” with the help of vizualize.me. Resumes are certainly fertile ground for visual rethinking, and what job applicant doesn’t want their resume to stand out from the pack?
We’ll see how much customizing is possible once they launch. With the diversity of individual experiences and the differences among job opportunities, it seems like customized options are a must — I know I wouldn’t send the same resume to two different potential employers. If this catches on, it may make it easier for employers to compare resumes, but that would lead us back to people wanting to differentiate. Maybe that’s where visualize.me starts up-charging for higher levels of customization. Sounds a little like Sylvester McMonkey McBean and the racket he pulled off on the Sneetches. Are there stars upon yars?
You are currently browsing the archives for the Data Visualization category.