Can computers be racist? Big data, inequality, and discrimination

It seems like everyone is talking about the power of big data and how it is helping companies, governments, and organizations make better and more efficient decisions. But rarely do they mention that big data can actually perpetuate and exacerbate existing systems of racism, discrimination, and inequality.

Big data is supposed to make life better. Companies like Netflix use it to recommend movies you might like to watch based on what you’ve previously streamed. There are also broader public applications, such as predicting (and thus more quickly responding to) outbreaks of disease based on online search patterns of symptoms.

The problem with big data is that its application and use is not impartial or unbiased. Harvard professor Latanya Sweeney, who also directs the university’s Data Privacy Lab, conducted a cross-country study of 120,000 Internet search ads and found repeated incidence of racial bias. Specifically, her study looked at Google adword buys made by companies that provide criminal background checks. At the time, the results of the study showed that when a search was performed on a name that was “racially associated” with the black community, the results were much more likely to be accompanied by an ad suggesting that the person had a criminal record—regardless of whether or not they did (see video below). This is just one of many research studies showing similar bias.

If an employer searched the name of a prospective hire, only to be confronted with ads suggesting that the person had a prior arrest, you can imagine how that could affect the applicant’s career prospects.



So while we’re lead to believe that data doesn’t lie—and therefore, that algorithms that analyze the data can’t be prejudiced—that isn’t always true. The origin of the prejudice is not necessarily embedded in the algorithm itself: Rather, it is in the models used to process massive amounts of available data and the adaptive nature of the algorithm. As an adaptive algorithm is used, it can learn societal biases it observes.

As Professor Alvaro Bedoya, executive director of the Center on Privacy and Technology at Georgetown University, explains, “any algorithm worth its salt” will learn from the external process of bias or discriminatory behavior. To illustrate this, Professor Bedoya points to a hypothetical recruitment program that uses an algorithm written to help companies screen potential hires. If the hiring managers using the program only select younger applicants, the algorithm will learn to screen out older applicants the next time around:



As mathematician Cathy O’Neil said at the Personal Democracy Forum earlier this year, “many of these things are truly good intentions gone awry.” While algorithms and data have great potential to help move us toward a more just world, they are too often moving us in the opposite direction.

There is no easy fix. Instead, a broad coalition of civil society organizations must push for change in a number of directions at the same time. Sweeney and Bedoya outline a number of strategies, including:

  • Investing in the technical capacity of public interest lawyers, and developing a greater cohort of public interest technologists. With more engineers participating in policy debates and more policymakers who understand algorithms and big data, both government and civil society organizations will be stronger.
  • Pressing for “algorithmic transparency.” By ensuring that the algorithms underpinning critical systems like public education and criminal justice are open and transparent, we can better understand their biases and fight for change.
  • Exploring effective regulation of personal data. Current laws and regulations are out dated and provide relatively little guidance on how our data is utilized in the technologies we rely on every day. We can do better.

We often hear the Internet spoken about as a “great equalizer.” But while it certainly has the potential to transform governance and connect communities, it can also perpetuate inequality. As Professor Bedoya argues, “Across the board, vulnerable communities, the unpopular, the weak, lose when powerful entities decide what is and isn’t okay about their data.” Understanding the biases inherent in data and digital spaces makes it possible for us to push back, and to shape an Internet that reflects our ideals.

This post is based on a series of presentations given at the Ford Foundation, led by Latanya Sweeney from the Data Privacy Lab at Harvard University and Alvaro Bedoya from the Center on Privacy and Technology at Georgetown University. The presentations in their entirety can be viewed below.