Is the Answer to Data Bias More Data?
This will come as no surprise, but human beings aren’t very rational. Ingrained cognitive biases like recency bias, confirmation bias and the bandwagon effect can create reasoning errors that prevent us from seeing the world with complete clarity. And when we make decisions based on faulty vision, they’re understandably sometimes wrong. In business, that can be costly.
The Problem of Human Bias
The definitive book on human cognitive bias that popularized the study of behavioral economics is still “Thinking Fast and Slow,” by Daniel Kahneman and Amos Tversky. Kahneman and Tversky’s book called attention to the irrational nature of the human brain even when it tries to make a rational decision.
According to the authors, who won a Nobel Prize for their work, we’re all unduly influenced by things like how similar a situation is to one we’ve already seen, or how simple one option is over another. Our brains are lazy, and they don’t want to do the work if they don’t have to. They’ll choose the simpler option over and over, even if it’s wrong. Especially if we’re trying to make a quick decision, we often rely on heuristics, or shortcuts, rather than take the time to think things through.
So, Is Data the Cure for Human Bias?
The discovery of human bias gave rise to an entire industry of data-driven decision-making. Most companies now employ data scientists, quants, or number crunchers, who have been hired to take massive amounts of data and use it to make better decisions. The solution, they say, is the algorithm.
Not so fast.
We’re coming to grips now with the reality that algorithms can themselves be biased, if the data that’s fed them contains bias and the bias is not corrected in real-time. This happens when certain fields of data are used (even inadvertently) as proxies, like the way zip code can be a proxy for race or student test scores can be used to hire and fire teachers. If systems are trained using data that is biased by humans, the systems will consistently replicate those errors and reflect those biases.
And, quite frankly, there are some positives to human bias. We do learn from experience, and when we have to make quick decisions, it’s good to rely on previous experience as a shortcut. In a world where the number of details are constantly overwhelming, heuristics make life much easier and cover many common situations adequately.
So, what’s the answer?
Data ethics must go from the back burner to the front burner, and the focus must be on human centered design. Human centered design takes into account that presenting information in different ways allows people to make different decisions, because there’s bias built into the very presentation of information.
We also have to understand that once we have designed and deployed an algorithm, data ethics demands that we actively continue gathering information about our prediction errors and looping that information back into the system, so it can learn not to make those same errors again. To correct for data bias, each team on an AI project should have a data scientist, an engineer, and an industry specialist. Mixed teams are more likely to catch errors in assumptions as well as pure data modeling errors.
Although this is less likely to happen in an industrial setting, where the data often comes from machines, in industries such as fintech, consumers can be denied loans, mortgages, and insurance because of algorithms trained on unconscious bias.
Jenn Gamble, Senior Data Scientist at Noodle.ai, advises anyone who is thinking about AI to ask themselves seven questions:
- What are the most important decisions we’re making?
- What data are we using to make them?
- What actually affects the outcome, and is that being included?
- Are we measuring the right things?
- What are the consequences of false positives & false negatives?
- Will every prediction made have a continuous feedback loop that feeds the data back into the system to correct its errors?
- Given what I am optimizing for, what incentives could this create? Could this cause potential unintended consequences?
Since AI is rolling out in more and more business applications, we have to be aware that it can hurt as well as help in some of them, because artificial intelligence isn’t as “artificial” as we formerly thought. It’s a prediction model, crafted by humans, based on past performance. It’s not just math, it’s math interpreted by humans. And it makes assumptions.
If you are considering a vendor for artificial intelligence solutions, make sure that vendor has a point of view toward data error. If anyone tells you data will not be biased, walk the other way. We can’t totally solve for human cognitive bias, and we probably never will, but we can create continuous learning machines that get better over time, rather than replicating and reinforcing human bias.