CS 497: Peer Grading | Jason Hartline

Current Term: Fall 2017

Synopsis:
This interdisciplinary graduate seminar explores the science of peer grading systems. A peer grading system is an online tool that collects student submissions, and assigns review tasks to the students and graders, and aggregates reviews to produce assessments of both the submissions and the peer reviews. Peer grading systems can improve learning outcomes and lessen the time and effort necessary to give students high quality feedback. Students in this seminar will read and present research papers on topics that include peer prediction, rubric design, scoring rules, auction design, human computation, machine learning, and measurement of learning outcomes. These papers are from fields of algorithms, game theory, machine learning, human computer interaction, and learning science. Students will complete a research project that is either a theoretical or empirical study related to peer grading; empirical studies can be based on data collected in a peer grading system that is being developed and used in Northwestern CS classes.

Prerequisites: This interdisciplinary graduate seminar is targeted to Ph.D. students with knowledge in areas relevant to peer grading: algorithms, game theory, machine learning, learning science, or human computer interaction. Advanced undergraduates and masters students are recommended to consult the instructor before enrolling.

Schedule:

Week 0 (Sept. 19): Introductory lecture on peer grading [slides]

(no readings)

Week 1 (Sept. 26): Peer grading systems:

de Alfaro, L., & Shavlovsky, M. (2013). Crowdgrader: Crowdsourcing the evaluation of homework assignments. arXiv preprint arXiv:1308.5273.
Wright, J. R., Thornton, C., & Leyton-Brown, K. (2015, February). Mechanical TA: Partially automated high-stakes peer grading. In Proceedings of the 46th ACM Technical Symposium on Computer Science Education (pp. 96-101). ACM.
Reily, K., Finnerty, P. L., & Terveen, L. (2009, May). Two peers are better than one: aggregating peer reviews for computing assignments is surprisingly accurate. In Proceedings of the ACM 2009 international conference on Supporting group work (pp. 115-124). ACM.

Week 2 (Oct. 3): Peer prediction:

Miller, N., Resnick, P., & Zeckhauser, R. (2005). Eliciting informative feedback: The peer-prediction method. Management Science, 51(9), 1359-1373.
Shnayder, V., Agarwal, A., Frongillo, R., & Parkes, D. C. (2016, July). Informed truthfulness in multi-task peer prediction. In Proceedings of the 2016 ACM Conference on Economics and Computation (pp. 179-196). ACM.
Gao, A., Wright, J. R., & Leyton-Brown, K. (2016). Incentivizing evaluation via limited access to ground truth: Peer-prediction makes things worse. arXiv preprint arXiv:1606.07042.
[Supplemental] Dasgupta, A., & Ghosh, A. (2013, May). Crowdsourced judgement elicitation with endogenous proficiency. In Proceedings of the 22nd international conference on World Wide Web (pp. 319-330). ACM.

Week 3 (Oct. 10): Eliciting peer feedback:

Shah, N. B., Bradley, J. K., Parekh, A., Wainwright, M., & Ramchandran, K. (2013, December). A case for ordinal peer-evaluation in MOOCs. In NIPS Workshop on Data Driven Education.
Hicks, C. M., Pandey, V., Fraser, C. A., & Klemmer, S. (2016, May). Framing feedback: Choosing review environment features that support high quality peer assessment. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems (pp. 458-469). ACM.
Lin, S. S., Liu, E. Z. F., & Yuan, S. M. (2001). Web‐based peer assessment: feedback for students with various thinking‐styles. Journal of Computer Assisted Learning, 17(4), 420-432.

Week 4 (Oct. 17): Incentivizing effort and accuracy:

Lambert, N. S. (2011). Elicitation and evaluation of statistical forecasts. Preprint.
Chawla, S., Hartline, J. D., & Sivan, B. (2015). Optimal crowdsourcing contests. Games and Economic Behavior.
Osband, K. (1989). Optimal forecasting incentives. Journal of Political Economy, 97(5), 1091-1112.

Week 5 (Oct. 24): Assigning Reviews:

Kulkarni, C. E., Socher, R., Bernstein, M. S., & Klemmer, S. R. (2014, March). Scaling short-answer grading by combining peer assessment with algorithmic scoring. In Proceedings of the first ACM conference on Learning@ scale conference (pp. 99-108). ACM.
Sheng, V. S., Provost, F., & Ipeirotis, P. G. (2008, August). Get another label? improving data quality and data mining using multiple, noisy labelers. In Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 614-622). ACM.
Karger, D. R., Oh, S., & Shah, D. (2014). Budget-optimal task allocation for reliable crowdsourcing systems. Operations Research, 62(1), 1-24.

Week 6 (Oct. 31): Cardinal grade aggregation:

Dawid, A. P., & Skene, A. M. (1979). Maximum likelihood estimation of observer error-rates using the EM algorithm. Applied statistics, 20-28.
Hamer, J., Ma, K. T., & Kwong, H. H. (2005, January). A method of automatic grade calibration in peer assessment. In Proceedings of the 7th Australasian conference on Computing education-Volume 42 (pp. 67-72). Australian Computer Society, Inc..
Zhang, Y., Chen, X., Zhou, D., & Jordan, M. I. (2016). Spectral methods meet EM: A provably optimal algorithm for crowdsourcing. The Journal of Machine Learning Research, 17(1), 3537-3580.

Week 7 (Nov. 7): Accuracy of peer reviews:

Falchikov, N., & Goldfinch, J. (2000). Student peer assessment in higher education: A meta-analysis comparing peer and teacher marks. Review of educational research, 70(3), 287-322.
Cho, K., Schunn, C. D., & Wilson, R. W. (2006). Validity and reliability of scaffolded peer assessment of writing from instructor and student perspectives. Journal of Educational Psychology, 98(4), 891.
Sajjadi, M. S., Alamgir, M., & von Luxburg, U. (2016, April). Peer grading in a course on algorithms and data structures: Machine learning algorithms do not improve over simple baselines. In Proceedings of the Third (2016) ACM Conference on Learning@ Scale (pp. 369-378). ACM.

Week 8 (Nov. 14): Ordinal grade aggregation:

Frankel, A. (2014). Aligned delegation. The American Economic Review, 104(1), 66-83.
Raman, K., & Joachims, T. (2014, August). Methods for ordinal peer grading. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 1037-1046). ACM.
Waters, A. E., Tinapple, D., & Baraniuk, R. G. (2015, March). BayesRank: A bayesian approach to ranked peer grading. In Proceedings of the Second (2015) ACM Conference on Learning@ Scale (pp. 177-183). ACM.

Week 9 (Nov. 21): Evaluating learning outcomes:

Cho, K., & Schunn, C. D. (2007). Scaffolded writing and rewriting in the discipline: A web-based reciprocal peer review system. Computers & Education, 48(3), 409-426.
Sadler, P. M., & Good, E. (2006). The impact of self-and peer-grading on student learning. Educational assessment, 11(1), 1-31.
Gielen, S., Peeters, E., Dochy, F., Onghena, P., & Struyven, K. (2010). Improving the effectiveness of peer feedback for learning. Learning and instruction, 20(4), 304-315.

Week 10 (Nov. 27): Student presentations

(no readings)