Expressing certainty in a probabilistic world
Visualization makes things concrete. It helps us to see patterns in data in a direct, comprehensible way while conveying certainty and precision. The sharp edges of lines and bar charts are something we take for granted. The values represented in the chart below are exact. When you see a chart like this, there is no question that the amount on the left is 16 and the amount on the right is 24.
This is part of what makes the bar chart an enduring fixture in data visualization. Once you get it, it just makes sense. We intuitively understand that a bigger pile of something has more stuff than a smaller pile. But data science doesn’t tell us with certainty that there are 24 things in that pile. Instead it says things like, “there are most likely 20 to 28 things in the pile.” The average number of things from our calculations is 24, but it very likely could be a little more or a little less.
Probability is by definition less precise. So how do you represent probability visually in a way that gives an intuitive sense of the uncertainty? There are some traditional chart formats that attempt this. The box and whiskers plot is an example familiar to statisticians. It’s a way to show at once the median value in a set of samples, while also giving an indication of where the majority of the rest of the values land (the boxes), as well as the absolute highest and lowest values in the set (the extents).
The box and whiskers data visualization communicates a lot, but for someone who isn’t thinking about their problem like a statistician, it over-communicates. It tells you things about quartiles and min/max values that may not actually help you solve your problem. As designers at Noodle, we want to bring people closer to the data science, but in a way that enables them to still think like a business person, not a statistician.
We designed an app recently for warehouse foremen that generates a forecast of product volume and uses it to recommend the headcount needed to handle that volume on a given day. Because our models tell us with a certain probability how many workers should be needed in the warehouse on a given day, we decided to riff on the tried and true bar chart. Instead of the certainty of a sharp edge, we introduced a stepped gradient to show the range within which most of the values from our model fall. To help communicate just a bit more of a sense of clarity, we added a dot and labeled the average value. So that person can scan and read the chart quickly, seeing the recommended number, but can also see the range of possibilities around that number. So for example, if our model has a higher degree of uncertainty on an upcoming day, the best decision may be to staff higher than the model’s recommendation (average).
At Noodle we try to bring people closer to the data science by clearly expressing the most relevant information on an initial read, while still exposing the complexity of the modeling in order to give our users confidence in the output. Starting with familiar chart types lets us lean on the built-in expectations people have, while making more intuitive user experiences. By remixing them slightly and adding visual features that make that uncertainty clear, we hope to give people in the business world the benefits of advanced machine learning, without needing to have an advanced degree in mathematics.