30-45 minutes
In this activity, students explore their online identities through the lens of a data broker. The students simulate acting as data brokers by analyzing fake profiles of middle-schoolers, profiling them and then choosing a product to target them with. This activity is intended to serve as a high-level introduction to data privacy and AI algorithms and can be used as part of a data privacy unit, along with the 2 other activities: Online Portrait and Online Footprint. This activity can be done as a class or family activity. In a classroom, it can be done either as a whole class activity or in groups.
Key Vocabulary
- Data: information used by a computer to learn or make decisions
- Digital/Online Footprint: sometimes known as a “digital shadow”. “The unique trail of data that a person or business creates while using the internet.” Almost all online activity leaves a trace. Every trace that a person leaves behind forms their digital footprint. [1] “The five main categories of digital footprints are shopping, finances, health and fitness, news and reading, and social media” [2].
- Active Footprint: data that you share and leave through deliberate choices on the Internet. For example, posting on social media platforms, location data, online comments, photos and videos, filling out online forms, installing cookies [2]
- Passive Footprint: data that you leave behind without intending to, or sometimes, without knowing [2]
- Cookies: “Cookies are bits of data that are sent to and from your browser to identify you. When you open a website, your browser sends a piece of data to the web server hosting that website. This data usually appears as strings of numbers and letters in a text file. Every time you access a new website, a cookie is created and placed in a temporary folder on your device. From here, cookies try to match your preferences for what you want to read, see, or purchase. A common analogy for a cookie is a coat check ticket at a concert or event: It’s something you receive from a service, has no intrinsic value outside of the event, and is tailored exactly to you. However, you’ll need it if you want to get your coat back.” [3]
- 2 types of cookies:
- First-party cookies: cookies created by the websites themselves, generally considered to be as safe and reliable as the website itself
- Third-party cookies: usually associated with ads that populate a website, these are distinct from the actual website that you are visiting. These cookies may have tracking info to keep tabs on your browsing history, so that you can be reached with personalized ads by ad and analytics platforms. [3]
- Ex. If you search for pet supplies, a website’s 3rd-party ads could show you dog food even if the website doesn’t contain info about pets.
- These cookies are more susceptible to data breaches since they’re tied to ad and analytics platforms rather than the websites themselves
- You may see the “Accept Cookies” pop-up when visiting a new site, which asks you to consent to cookies or opt out of all but the most common cookies.
- Dataset: “collection of curated data” [4]
- Algorithm: “a set of instructions that turns something (an input) into another thing (an output). A sandwich-making algorithm, for example, would turn a bunch of ingredients (bread, peanut butter, and jelly) into a delicious lunch (a PB&J sandwich)” [4]
- Data Triangulation: using multiple datasets, methods, theories. This will be discussed later in the Data Brokers section [5]
- Data Brokers: companies that sell personal info about you. They collect info from various sources to build a picture of who you are and then sell it. [6]
- Doxxing: the act of revealing identifying info about someone online, such as their real name or home address [7]
- Data Profiling: process of reviewing and summarizing data to better understand it [8]
Activity
1 Data Brokers YouTube Video
[25 mins] You could start or end the lesson by watching the Data Brokers video by John Oliver (also linked in the Further Discussion section). He discusses our online footprint, how much data brokers know about us and what they’re doing with our personal information. This video can help segue into the activity. For a shorter intro, you can have students read an article summarizing the video by The Guardian.
You could also start or end the lesson by reading this article
New York Times – Your Apps Know Where You Were Last Night, and They’re Not Keeping It Secret
2 Introducing the Profiles
As a class, the students look through the profiles. A profile consists of their most recent Google Searches and images of content they have engaged with.
3 Sorting and Grouping the Profiles
Encourage the students to start sorting the profiles into groups to make it easier for targeting the ads. Encourage active discussions on their grouping method. Use the example from the slide as a guide (ex. Couples with Clout, Life of Leisure).
Label the groups with sticky notes.
There is no right or wrong answer for creating the groups.
4 Choosing the products to market (1-6)
You can choose either 1 product or 1-6 products to market.
Based on the profiles you have collected, discuss as a class or within groups which profiles to market which product to, along with your justifications as to why.
5 Who is the best broker?
These numbers are made-up and do not represent actual ad costs and revenue per clicks.
Choose which products and to which person to market. Choose 1 product for each profile.
Have students write down their plan on the student handout to keep track of which products they are marketing to which person.
At the end of the activity, use the answer key to see which profiles ‘bought’ the product.
Some of the profiles are tricky, and additional notes have been added on the answer key to those profiles to explain how our online identities don’t always accurately represent our full self. For example, Mateo has a small online footprint, so it’s hard to figure out his interests. In contrast, Mia clearly likes concerts and music, but it turns out that Bruno Bear is a rival to her favorite artist.
Optional Explanation of Human Reasoning vs AI
You can choose to insert this explanation within the activity or afterwards.
We are attaching human sense-making to this process.
You can mention how normally artificial intelligence algorithms would be doing this by looking through vast quantities of data and finding patterns. It would be impossible for humans to go through all the data that has been collected, so this is just a simulation game!
Humans are able to learn new concepts and ideas from a small number of samples, which is known as one-shot learning. The ability of the students to make any sort of grouping or decision is an example of this. In contrast, AI needs many samples to form patterns, which is known as multishot learning. [TechTarget]. So AI would have a much harder time deciding who to market which product to with only 16 profiles. However, we would have a hard time making decisions when faced with a larger quantity of profiles!
Activity Constraints
There are 16 profiles in total and 6 ad products. The activity can be shortened by decreasing the number of profiles and ads (ex. 8 profiles and 3 ads)
It is recommended to divide the class into 4 groups and assign each group 4 profiles as it may take longer if done as a whole class activity. You can decide whether each group gets the same 4 profiles or if every group has different profiles.
Activity Discussion
Throughout the activity, encourage students to think about these questions and discuss amongst themselves.
Grouping
- How are you grouping the individuals?
- Is it easy or hard to group everyone/certain individuals?
- What kind of profile groups come to mind?
Profiling
- Are any of the profiles hard or easy? Why? Ex. Luis was hard because he fits into a lot of different categories: sports, events, video games
- The hardest can be the edge cases, those with the most variance and those with sparse profiles.
- Share your thought process, especially for the harder profiles
Post Activity Discussion
10-15 minutes
These are questions you can discuss to reflect on the previous activity and the information that you have found, after going through the answer key.
- Reflect on the activity as a whole. Was it easy or difficult to group the profiles? And then to choose who to market what product to?
- Which profiles were hard to categorize? Why?
- Which answers surprised you? Explain your thought process for why you chose to market a certain product to a certain profile.
- Discuss as a class common advertisements that the students get and if the ads make sense for their interests or not.
- For example, not everyone interested in video games bought the Magic Mayhem game. Is that similar to your interests? I.e. if you like a video game, do you play a lot of different ones, or specific genres..
- If you had your own ‘profile collage’, what would be on it?
- Is your search version / online history reflective of your full self
Optional Ethics Extension
10-15 minutes
The class can extend the activity by reading and discussing any of these articles
Business Insider – Phone location data from people who visited abortion clinics, including Planned Parenthood, was legally on sale for $160, report says
AP News – Marketing company to pay $150M for enabling fraud schemes
- What are the ethical issues you see with selling data from targeted groups (people receiving abortions, elderly individuals)
AI Literacy Competencies
AI’s Strengths & Weaknesses, Data Literacy, Learning from Data, Critically Interpreting Data, Ethics
AI Literacy Design Considerations
Embodied Interactions, Promote Transparency, Critical Thinking, Identity, Values, & Backgrounds, Leverage Learners’ Interests