BACKGROUND
Course Listing: COMP_SCI 214- Data Structures and Algorithms
Course Description: The design, implementation, and analysis of abstract data types, data structures and their algorithms. Topics include: data and procedural abstraction, amortized data structures, trees and search trees, hash tables, priority queues, graphs, shortest paths, searching, and sorting. Required for computer science majors.
Collaborators: Kendall Clark, Frankie Lucco, Claire Schwartz, Avery Keare, Fai Poungpeth, Vincent St-Amour, Sruti Bhagavatula
Timeline: Development in Winter 2023, implementation in Spring 2023 and beyond
Overview: The Data Structures and Algorithms course, mandatory for computer science students, should serve as a step toward making diverse and intersectional perspectives heard in the decision making spaces, which drive technological innovation. Our materials prepare students to consider issues beyond their own needs in their program design, which is essential for students pursuing careers in the tech industry.
Goals:
- Illustrate the ways that curricular concepts and technologies have real-world social, political, and ethical implications.
- Encourage students to consider ethics (e.g. work towards limiting bias, protecting privacy, enabling inclusivity and accessibility, etc.) as an integral part of program design and implementation.
- Prompt critical thinking with respect to a student’s own values, biases, and positionality, in addition to practicing analytical and problem solving skills.
- Allow students the freedom to explore data structures and algorithms in the context of topics that are meaningful to them.
OVERARCHING ETHICAL CONCEPTS
- Privacy – How do we handle personal information safely and responsibly? Can we make this consideration a part of how we think about non-functional correctness? What responsibilities do developers have towards privacy beyond the letter of law?
- Accessibility, fairness and equity – Who can access the program and make use of it? Who is left out, and how could they be included? Does the program create better outcomes for some groups? How could this be fixed?
- Representation – How does data represent the world? What view does a program present of the world? Who might be left out?
- Personal Responsibility – When it comes to societal issues that can feel much bigger than us/out of our control, what can individuals do while working within these systems?
- Learn about more Ethical Concepts
ETHICAL CONCEPTS RELEVANT TO SPECIFIC COURSE TOPICS
- Program Design Process
- Sorting, Ranking, and Binary Heaps
- Graphs and Networks
- Dictionaries
- Classes + Data Representation
- Invariants
Description
- When designing a program, it is imperative to consider ethical decisions embedded within the program that might disproportionally help or harm individuals or communities.
Ethical Concepts
- Data collection
- What defines ethical data?
- How might you collect the data in an unethical way? In an ethical way?
- How might you use the data in an unethical way? In an ethical way?
- User experience
- How and when do you incorporate user experience into your design process?
- Accessibility
- Who will have access to your technology?
- How are you evaluating the accessibility of your program?
- What kinds of accessibility needs should you consider when writing a program?
- How can you meet accessibility needs in your program?
- What are possible consequences of not meeting accessibility needs in your program?
- Morality
- What are values that you want to maintain throughout the design process? Are there any red lines you don’t want to cross?
- Participation / Labor
- Who is represented on your team?
- If programming collaboratively, is anyone disproportionally receiving credit? Why might this be the case?
- What are some pros and cons of collaborating on a program?
- Liability
- Is your code protected by copyright laws? Who benefits the most from these copyright laws, and who benefits the least?
- Privacy
- What data might be ethical to collect from your users? What data is not ethical? Why?
- How might that data be used in beneficial or malicious ways? Which users benefit the most? Which users benefit the least?
- Ex: If a program is trying to model the spread of an infectious disease, it could be helpful to have more personal information about potentially sick people in order to more accurately predict where they might go and who they might infect. However, obtaining and using this information ethically might take a lot of time and resources, and wouldn’t always be legally possible due to patient privacy. (This is why epidemiologists were so excited about the Corrupted Blood Incident)
- Mis/interpretability or misuse
- How might your program be unintentionally misinterpreted?
- How might your program be intentionally misused?
- How might you prevent misinterpretation and misuse?
- Bias, accountability, transparency, and many more ethical concepts
Description
- Course concept: binary trees and heaps, sorting algorithms, custom comparators, priority queues
- How are elements sorted and what are the implications of these orderings? How can prioritization reflect bias?
Ethical Concepts
- Bias / Fairness
- Equity
- Inclusion / Participation
- Mis/interpretability
- Mis/representation
- Misuse
- Privacy
- Relativism
- Transparency
- More ethical concepts found here
College rankings systems and admissions decisions
- What are some pros and cons of ranking universities, and how might these pros and cons impact different students, faculty, universities, and others differently?
- What ranking value is the most important to you (E.g. sports team scores, average starting salary, acceptance rate, etc)? Why? Did college ranking systems influence your decision about where to study?
- Do you think others would prioritize the same ranking value you did? Are there certain groups of people who are more likely to agree? Who and why?
- Briefly explain one prospective student need which the ranking system addresses well and one prospective student need that isn’t addressed by the ranking system.
- E.g. personalized metrics like a specific student’s financial aid package; qualitative metrics like the inclusiveness of different identities on campus.
Search engine ranking
- When you use a search engine to find the “best” schools, how much do you trust the first website that appears? Why?
- Who might benefit the most vs. least from the way that a search engine pulls its first results?
- Further reading: Inside the Mind of PageRank
- Further reading: Google’s update to PageRank sparks controversy
Music recommendation algorithms
- When designing a music recommendation algorithm, which attributes would you prioritize (artist popularity, genre, tempo, etc)?
- If a recommendation algorithm sticks with “safe” choices that do not go far outside the listener’s normal taste, how would it affect the listener? How would it affect different artists, such as smaller artists?
- What are the downsides to making the algorithm a popularity contest i.e. if more users listen to a song, the song will be recommended more often? What genres are classically “popular”? Why?
- What are the repercussions (good and bad) of allowing music to be restricted for the user? What about for the artists?
Fairness and “jumping the line”
- In certain spaces, what demographics are more easily able to “jump the line”?
- Some examples with these lines are at the DMV, path to citizenship, access to food (and of what quality), etc.
- What are the socio-economic impacts of people being able to “jump the line”?
- How can you consider this when choosing and building data structures?
Implementation
- In-Class Survey Question: College Admissions
- You are an admissions officer and are writing a sorting algorithm to rank a set of students on their qualification to get into your university. You have ~20,000 records to sort and the sort cannot be influenced by socio-economic status (directly or indirectly).
- Why would in-place quick sort be a better option than in-place selection sort?
- Would you consider sorting on “number of extracurriculars” as the sort key to be ethical according to the above requirement? Why or why not?
- You are an admissions officer and are writing a sorting algorithm to rank a set of students on their qualification to get into your university. You have ~20,000 records to sort and the sort cannot be influenced by socio-economic status (directly or indirectly).
- Homework Addition/Extension: Students create their own college rankings by implementing the binary heap data structure and using heapsort with a custom comparison function of their own design. Implement with a data set of fictional colleges with differing parameter values (average SAT scores of accepted students, work-life balance score, graduation rate, quality of dining hall food, etc.)
- Homework self-reflection: Students are split into pairs and must review their partner’s sorting algorithm (NOTE:Sharing code may be an issue in certain class environments; this could be avoided by having students exchange written descriptions of their programs as well as their input and output data). Work through CIDER as a worksheet; students may be prompted as follows:
- Critique: List three differences and three similarities between your program and that of your partner. What should your partner have done differently? Are there any important considerations you think they left out? Who would be helped by this ranking and whose preferences/needs are left out? Explain.
- Imagine: Generate a list of ideas for how your partner might revise their program for a more just outcome. Generate a second list of ways that your partner’s program has inspired you to reflect upon and improve your own design.
- Design: Edit and test your program with these new considerations in mind.
- Expand: Regroup with your partner and discuss your critiques of their design as well as the ways it inspired you. Talk through the reasoning behind your custom comparison→ Why did you prioritize certain parameters over others? (Assess awareness of personal bias)
- Repeat: Have students change groups or swap iterations of their program in the same groups as many times as is convenient.
- Changing groups is helpful because it exposes students to many different perspectives – this activity might also be conducted in a larger group, if the class modality allows for that (would probably be better as a discussion based assignment on this scale).
- Producing multiple iterations is helpful because it reflects common industry design processes.
Description
- Just as social interactions influence social network/media platforms, these platforms influence our social interactions, and in ways that benefit some people more than others. Apply the concepts and questions below to existing networks or a new network modeled by the teaching staff or the students.
Ethical Concepts
- Access to technology
- Graph searches usually account for just the elements that are reachable from a starting point, but are there situations/representations where this might be an issue? Explain.
- Accessibility of technology
- Inclusion / participation
- Mis/information
- Fairness / Bias
- Privacy
- Surveillance
- Trust
- Transparency / Explainability
- Community Engagement
- Freedom of Speech
- Mis/interpretability
- Misuse
- More ethical concepts
Professional social networks
- How might a social networking platform use graphs to represent relationships and connections between users? Out of the four graph types covered (WD, WU, UD, UU), which one would be best for this kind of model?
- What does it mean for two people to be “connected” through a professional online network like LinkedIn?
- How do the way networks are built and used on these platforms (LinkedIn, Handshake, etc.) make opportunities/connections accessible and to whom?
- Who has access to these online networks? Who is included and who isn’t?
- How might these networks subvert or replicate traditional professional networking systems?
Social media platforms
- How representative are social media friends of social networks in real life? What are some benefits and consequences of making decisions based on social networks as determined by data found online rather than by directly asking people who their friends are?
- Who has access to these online networks? Even when someone has access, how might they be included or excluded?
- How might mis/information travel through a social media network? How might the programmer code their network to minimize misinformation?
- What is the significance of a “center” in a social network found through a social media site?
Maps
- What are the limitations of using a graph to make a map of a city?
- How might exclusionary program design occur when using a graph to map a city?
Single Source Shortest Path (SSSP)
- Does edge weight have the same meaning in all contexts? What are some other possible meanings that may come up in a SSSP problem? How might these meanings be more or less helpful to different users?
Course material
- The dictionary ADT, direct addressing, heaps
Ethical concepts
- Fairness
- Accuracy
- Inclusion
- Mis/representation
- Mis/interpretability
- More ethical concepts
Word dictionaries
- How might the dictionary assume a single language standard or reinforce stereotypes? How might we program the dictionary differently?
- When using a direct-addressing dictionary, you’re only able to assign a single meaning to each entry. In most natural-language dictionaries, many words have multiple definitions, which are given in a ranked list. How might you choose which definition is “ranked” first, or the one that remains in your direct-addressing dictionary?
- When a word already has a definition in the dictionary but the user wants to add a new one, what should the program do? For example, the program might throw an error, replace the value, add the value to the top of the list, add the value to the bottom of the list, do nothing, etc. What do you think is the best option and why?
- How might you consider/represent how the definitions of words change over time?
- What is the significance of deleting a word/key in a dictionary? When might this happen in our everyday lives?
- How do programmers choose what hash function to use with dictionaries, and how might their choice affect the user’s experience?
Spellcheck
- What are some of the social and ethical issues that could arise when software tells the user something is incorrect versus correct? Describe one issue you might face and a possible way you could address it. (Consider the following ideas: non-English-writing users, varying dialects of English used by different users, language that evolves over time, etc.)
- Some words (such as “repairman” vs. “repairwoman”) reflect gender roles. How might this be an issue for a spell check program? How does your text editor of choice (Google Docs, Word, Notepad, etc) handle “repairman” vs “repairwoman”?
- How might you code these considerations into the program?
Description
- Data representation often involves numbers and discrete categories, and not everyone will always fit in. Consider the ethical issues that might come with changing or reducing something or someone to fit your data scheme. On the other hand, a host of ethical issues arrive when we choose instead to not represent people at all.
Ethical concepts
- Fairness
- Bias
- Accessibility
- Accuracy
- Inclusion
- Mis/representation
- Privacy
- Trust
- More ethical concepts
Representation of names
- Say you define a name as having “first” and “last” variables. What/who is excluded by this design?
- Some examples include people with middle names, more than one first or last name, hyphenated names, and more (source). Different cultural backgrounds have different naming conventions.
- What are ways you can fix the design of a name to be more inclusive?
Erasure of non-binary identities
- Many people don’t fit into the binary categories of male or female, and often they are presented with the choice of choosing one category or the other, or sitting out entirely.
- Further reading: Counting the Countless (Keyes)
Implementation
- Discussion: Describe a program that involves a Name class/struct, such as a program for writing email greetings or taking class attendance. Have students discuss the issues they see with the program– who is left out?
- Activity: Ask students to discuss how to create a simple program involving names, such as the aforementioned suggestions. Have students start by considering (broad) technical solutions. Then, bring up the ethical considerations.
Description
- Since the programmer has control over invariants, it’s important to consider how they might affect users, and affect different users in helpful or harmful ways.
- How do invariants represent and define the relationship between the user and programmer?
Ethical concepts
- Bias / Fairness
- How are invariants decided? Will they always be “fair”?
- How might an invariant be constructed such that some people benefit the most from a given invariant, and some people have the potential to be most harmed?
- Misuse
- How might invariants be broken? By whom and why?
- Transparency
- Do users know the invariants and laws of a data structure you create? What steps or prior knowledge is necessary to access this information?
- Agency
- If programmers and designers get to define invariants/laws, to what extent do they hold authority over users?
- Accountability
- Who should be held responsible if an invariant is broken?
- More ethical concepts
IMPLEMENTATION
Explore various module types and frameworks for inspiration and ways to carry out these learning goals in your classroom.
LESSONS LEARNED FROM DEVELOPMENT TEAM
- Establish concrete goals and a timeline early on in the process.
- Use relatable examples and socially salient concepts to aid in explaining data structures and algorithms. Relevant examples promote engagement among your students.
- We encourage professors to explore/express their own personal perspective and positionality when discussing ethics with students, as they are comfortable. In doing so, professors put aside their academic authority (or “ethics expertise”) which may help students feel more comfortable exploring/expressing their own perspectives and values.
- Our response data from in-class survey questions revealed that students are interested and willing to consider the relationship between ethical concepts/issues and data structures and algorithms.