Annual Report

Essay:
Driven by Data

Data play an increasingly important role in our lives these days. Partly the cause is vastly improved technology: We have new tools to collect massive amounts of data, ever-expanding storage capability, and much faster computers (along with sophisticated algorithms) for analysis. But partly the cause is cultural: Our society has chosen to emphasize data (and its fruits, usually some derived statistics) as evidence that is superior to ‘mere’ observation or judgment. This bias is apparent in the language we use to describe good decision making — it should be data driven, involve metrics, and focus on ‘measurable outcomes.’ This last phrase was even codified into law by the Government Performance and Results Act of 1993 (GPRA).

But while society is fascinated by the power of data mining, some people worry about its unanticipated consequences. In a famous 1976 paper, the social scientist Donald Campbell made the observation that the very act of collecting data about human activities may change the data itself: “The more any quantitative social indicator is used for social decision-making, the more subject it will be to corruption pressures and the more apt it will be to distort and corrupt the social processes it is intended to monitor.”¹ A prominent recent example comes from data on college acceptance rates, which U.S. News & World Report uses in ranking colleges. While a lower acceptance rate usually indicates a more prestigious institution, colleges can game the system by enticing large numbers of unqualified students to apply, which then artificially lowers their acceptance rate. Other examples of corruption abound, from crime statistics to bestseller lists and television sweeps. When we collect data on human activities, we frequently change the way humans behave.

Around the same time, the Nobel Prize-winning economist Herbert A. Simon noted that data and quantitative measures often do “not even remotely describe the processes that human beings use for making decisions in complex situations.”² Simon’s initial examples concerned public policy, where data and statistics alone cannot answer basic questions (for example, whether it is better to invest in new facilities or in additional staff for public parks). There are many examples of complex social policy decisions that cannot be settled by data, no matter how massive. These two concerns — the corruption of quantitative measures and the inability of even massive amounts of data to capture social complexity — are particularly worrisome in our current obsession with data driven education. We capture large amounts of data about standardized test scores for many students over many years, and we calculate various statistics based on that data. We use those statistics to measure the quality of teachers, the quality of schools, and efficacy of education policies. But these measures are easily corrupted when educators focus only on the tests or even when they engage in outright cheating. And any measure based on test scores alone captures only a small part of the actual goal of education. Education has suffered because of this. The lesson is not that tests are bad or useless, but rather that education is (and should be) more complex than test score data.

These concerns predate the recent explosion of ‘big data,’ but they are amplified by that explosion. The sophisticated mathematics we use to analyze vast collections of data sometimes disguises weak data. The elaborate algorithms we use to derive statistics are sometimes based on faulty assumptions or hidden biases. When policy makers uncritically rely on data, but do not understand the mathematics used to analyze it, they can make decisions that appear to be objective and scientific without actually being so. While this is a particular problem in social science, it affects science itself when, for example, the practice of medicine is governed by doctor ‘report cards’ or the value of research is measured by mysterious numbers derived from citation counts.

Data is indispensable, both in science and social science, but no matter how alluring, numbers are not a priori superior to other forms of evidence. Knowledge, understanding, and wisdom are indeed much more.

John Ewing
President, Math for America


¹Donald T. Campbell, 1975. Assessing the impact of planned social change. In Social Research and Public Policies (G. M. Lyons, Ed.). Hanover, NH: Dartmouth College, Public Affairs Center.


²Herbert A. Simon, 1978. Rational decision-making in business organizations. Nobel Memorial Lecture, December 8. Pittsburgh, PA: Carnegie Mellon University.

Read More in:

Education & Outreach

Math for
America

Read Now

Quanta
Magazine

Read Now

Mathematical Sciences Research Institute

Read Now