Last Updated: February 2024
The implementation of data science algorithms may result in a variety of undesirable and unexpected social harms. Unfair algorithms are one area of concern, but larger questions of social justice, environmental damage, identity, and representation arise in a wide range of data analysis projects. Without considering these issues as we put our data science skills into practice, we risk doing harm and perpetuating biases we would not actively support. Below are some of resources we have found helpful in expanding knowledge of these issues and exploring ways we can do better. This guide includes both general educational resources and a few technical guides on addressing these issues in your analysis.
If you are new to the topic
- Weapons of Math Destruction, by Cathy O’Neil. An excellent introduction to how algorithms can unexpectedly cause social harms. It contains many case examples and is highly readable. Strongly recommended if you are new to the field or you are unsure of how “math” can be harmful.
General Resources
- Machine Bias, by Julia Angwin, Jeff Larson, Surya Mattu and Lauren Kirchner (ProPublica, May 23, 2016): An investigation of the effect of racial bias in software used in the criminal justice system.
- Coded Bias: A documentary on how racial and gender bias in artificial intelligence systems and algorithms harm communities and threaten civil rights and democracy. The film features the founder and work of the Algorithmic Justice League, which works to raise public awareness of these issues, educate policymakers, and give a voice to those affected by AI algorithms. See AJL’s Library page for a great list of additional resources.
- How to make a racist AI without really trying, by Robyn Speer. A good example of what can happen when applying techniques from tutorials and workshops without further reflection and engagement.
- Data Ethics syllabus, from Rachel Thomas. A crowd sourced list of materials to consider covering in tech ethics courses.
Books
- Data Feminism, by D’Ignazio and Klein. An excellent introduction to general principles behind harms in data collection and data science, and how to start thinking about them. It highlights epistemic, institutional, and technical challenges. Great for social scientists and those who want to engage critically with data ethics in general.
- Atlas of AI, by Kate Crawford. This book tackles more general problems arising from Artificial Intelligence practice, including importantly the material concerns and environmental damage, as well as the historical biases that influence many mathematical models we use today.
- (Technical) Fairness and Machine Learning, by Barocas et al. This book focuses on the problem of algorithmic fairness. It provides an overview of algorithms, theory, and hands-on examples for practitioners. This is a good starting point for researchers interested in fairness as a subdiscipline of data ethics.
- The Costs of Connection, by Couldry and Mejías. A comprehensive study of the relationship between data practices and colonialism, both in historical terms and in structural terms. For a much shorter introduction to the concept of data colonialism, you can read their short paper instead, here.
- Race After Technology, by Ruha Benjamin: Discusses the ways in which a human history of racial bias and racism is encoded into technological processes and products.
- How Data Happened, by Wiggins and Jones. A historical overview of the origins of statistical and computational practices in relation to social control and discrimination. It contains a lot of fun anecdotes in the history of data-related fields. The library provides access to the audiobook.
- The Rise of Big Data Policing, by Andrew G. Ferguson. A great study on data collection and algorithmic practices within police institutions. Its main case example is the city of Chicago but general principles apply.
Papers
- Anatomy of an AI system, by Kate Crawford. A classic and indispensable study of the creation pipeline for the Amazon echo. It takes a materialist approach, dealing with labor exploitation and environmental damage.
- (Technical) Fair Risk Algorithms, by Berk, Kuchibhotla, and Tchetgen Tchetgen. A recent survey of fairness algorithms by leaders in the field. Suitable for those who prefer a more statistical approach, but limited to fairness in risk prediction.
- Datafication of Health, by Ruckenstein and Schüll. Excellent introduction to the concept of datafication and how it affects people’s lives. While the approach is anthropologic, it is a highly recommended resource to anyone dealing with data.
- Data Science as Political Action: Grounding Data Science in a Politics of Justice, by Ben Green. Why data science is inherently political, and options for reforming practices to address this reality.
- Mapping for accessibility: A case study of ethics in data science for social good, by Anissa Tanweer, et al. A thoughtful look at how efforts to do “social good” require engagement with constituent communities and active examination of the ethical issues involved in data science projects.
Organizations and Conferences
- Data & Society. A powerhouse of research related to data ethics and social impact. They produce a steady stream of technical reports, sociological and anthropological studies, talk series, etc.
- AI Now. Another powerhouse in the field. They focusing on informing policy makers for better AI regulation and accountability. They have a lot of work on labor and environment.
- FAccT Conference. Currently the leading conference within the machine learning community for issues of fairness and accountability. It focuses on algorithms for fairness but has expanded to touch on larger sociological questions lately.
- AIES Conference. A general conference on ethics and artificial intelligence.
- We All Count: Resources for identifying, understanding, and mitigating bias in data science processes. The project focuses on bringing non-Western perspectives to data collection and analysis. Their resource list is especially good for diverse perspectives on research methodology for social sciences.
- Human Rights Data Analysis Group (HRDAG). This organization uses data science tools to improve human rights across the world. It falls under the category of “data for good” as compared to data ethics and accountability, but it is a neighboring discipline. They share datasets and methods for data collection that may be of interest to those working in the area of human rights.
- DAIR Institute. A newly formed institute created by Timnit Gebru after her departure from Google. Its goal is to freely study the dangers of unregulated AI without limits and constraints imposed by industry interests. They publish peer-reviewed research.