Learning More: Data, Race, Bias, and Justice – Research Computing and Data Services Resources

We teach data science and programming skills to thousands of members of the Northwestern community. In these short workshops, we’re often unable to address important questions of how the skills we’re learning relate to questions of social justice — at Northwestern and in the world more broadly. Yet without considering these issues as we put these skills into practice, we risk doing harm and perpetuating biases we would not actively support. Below are some of the resources I’ve found helpful in expanding my knowledge and exploring ways our Northwestern community can do better.

Coded Bias: A documentary on how racial and gender bias in artificial intelligence systems and algorithms harm communities and threaten civil rights and democracy. The film features the founder and work of the Algorithmic Justice League, which works to raise public awareness of these issues, educate policymakers, and give a voice to those affected by AI algorithms. See AJL’s Library page for a great list of additional resources.
We All Count: Resources for identifying, understanding, and mitigating bias in data science processes. The project focuses on bringing non-Western perspectives to data collection and analysis. Their resource list is especially good for diverse perspectives on research methodology for social sciences.
Machine Bias, by Julia Angwin, Jeff Larson, Surya Mattu and Lauren Kirchner (ProPublica, May 23, 2016): An investigation of the effect of racial bias in software used in the criminal justice system.
Race After Technology, by Ruha Benjamin: Discusses the ways in which a human history of racial bias and racism is encoded into technological processes and products.
Reading List for Fairness in AI Topics, by Catherine Yeo: I found this list when looking for new developments in detecting and addressing racial and gender bias in word embeddings, but it covers additional topics as well.
Data Science as Political Action: Grounding Data Science in a Politics of Justice, by Ben Green: Why data science is inherently political, and options for reforming practices to address this reality.
Mapping for accessibility: A case study of ethics in data science for social good, by Anissa Tanweer, et al.: A thoughtful look at how efforts to do “social good” require engagement with constituent communities and active examination of the ethical issues involved in data science projects.
How to make a racist AI without really trying, by Robyn Speer: a good example of what can happen when applying techniques from tutorials and workshops without further reflection and engagement.
Data Ethics syllabus, from Rachel Thomas, covers disinformation, bias, privacy, and algorithmic colonialism, among other topics. The syllabus includes many readings not on other resource lists I’ve seen.

As a data scientist, another way I learn is by working with data directly. These data sources have helped me explore issues of racial bias raised by incidents of police brutality and the COVID-19 pandemic.

City of Evanston Open Data Portal: useful for looking at our own Evanston community, especially for examining racial disparities in policing. When analyzing police data, this article provides useful context: “Why Statistics Don’t Capture The Full Extent Of The Systemic Bias In Policing,” Laura Bronner, FiveThirtyEight, June 25, 2020. Data USA aggregates several sources of information necessary for putting the Evanston data in context.
The Stanford Open Policing Project: Collects and standardizes data on traffic stops from across the United States. The site also includes their research and analysis.
The COVID Racial Data Tracker for state-level data on COVID-19 cases by race/ethnicity, county-level COVID-19 case and deaths data from USA Facts, and county-level demographic data from the US Census. The APM Research Lab provides a useful guide for exploring what data exists on racial differences in COVID-19 cases and deaths, as well as issues that affect the analysis of this data.