CS 496: Foundations of Data Economics

Required Textbook: None.

Latest Term: Fall 2025.

Synopsis:
As data transforms science and society, understanding the economics of data is of utmost importance. Collecting data is costly and must be correctly incentivized. Possessing data gives market power which can be leveraged through sharing policies. In turn, sharing data has fairness and privacy implications. The value of data for decision problems depends on its quality. And using data to shape strategies can impact outcomes in unexpected ways, requiring careful market design to mitigate bad outcomes. The topics of the course will be drawn from recent and classic literature developing the theoretical foundations of data economics, including data elicitation, information design, differential privacy, fairness, calibration, social learning, and learning in games.

Format: Theorem and proof based lectures, problem sets, and exams.

Prerequisites: Prior experience with algorithms, game theory, economic theory, or data science is recommended.

Homework Policy: Homeworks are to be done in pairs. Both students must contribute to the solution of all problems. One copy of the assignment should be turned in with the names of both students on it. Both students will receive the same grade. You may consult the course notes when answering homework questions; you must not consult the Internet or research papers.

Each of the modules below will be covered over 1-2 weeks.

Module 1: Decision Theory and Elicitation

  • Bayesian decision theory
  • Information structures
  • Proper scoring rules
  • Characterization of proper scoring rules
  • Scoring rules for statistics
  • Optimization of scoring rules

Module 2: Information Design

  • Garblings, Blackwell ordering
  • Cheap talk
  • Disclosure
  • Bayesian Persuasion, Information Design

Module 3: Multi-Agent Elicitation

  • Informational substitutes
  • Prediction markets
  • Scoring-rule based automated market makers
  • Loss functions in ML
  • Value of data as reduction in expected loss
  • Data Shapley Value
  • Prediction markets for data procurement
  • Substitutes/complements and prediction markets
  • Cost-function based automated market makers and connection to
    no-regret learning

Module 4: Peer Prediction

  • Peer Prediction
  • Bayesian Truth Serum
  • Information-theoretic Framework
  • Determinant-based Mutual Information Mechanism
  • Connection with learning in the presence of noisy labels

Module 5: Online Learning

  • Online learning with full feedback
  • Online learning with partial feedback
  • Best-in-hindsight regret vs swap regret
  • Blackwell approachability
  • Connection to equilibria of games (CE, CCE)
  • Econometrics of learning agents
  • Manipulation of learning algorithms
  • Algorithmic collusion in pricing

Module 6: Calibration

  • Expected calibration error
  • Smooth calibration error
  • Calibration error for decision makers
  • Online calibration
  • Multi-calibration
  • Machine learning and calibration
  • Fairness