I should have read this when it came out. So much has happened since the book was published – it hasn’t rendered the book irrelevant so much as not detailed enough, not broad enough, and perhaps not as nuanced enough. Also, I need to tell you that I finished this book on a plane 9 days ago and then travelled a bunch so my memory of the book has already faded perhaps a little too much to be fair to it.
This is a fairly brief summary of some of the sociopolitical and economic issues with “Big Data” algorithms in the United States, circa 2015. It is one-sided, as you might expect from its title, but it’s passionate and the author is an actual expert, a mathematician, which is not always true of the authors of these types of critiques.
O’Neil’s criticisms are deserved and are usually based in either sound statistics or in legitimate concerns of fairness and justice. The latter critiques will likely appeal to fewer people as O’Neil is unabashedly concerned with how these algorithms treat certain people (though she does mention at least one that actually treats the rich poorly as well). I happen to think these concerns are legitimate but, given that she rarely bothers to defend her ideas of fairness and justice, I can see how it could be a problem for others.
O’Neil musters some of the usual critiques about these algorithms that we’re heard time and again – likely before she wrote the book. The most common one is about how these programs are often black boxes that nobody actually understands (or at least very few people do). Another good one is how, by using past data, the programs essentially give up on the idea that humans have free will and can change their behaviour – using your past history to predict that you will always behave this way. Both are good criticisms and she does a good job of summarizing them, but I didn’t learn anything here.
For me, the more important point is how bad the algorithms usually are in terms of statistical best practices. Often due to sampling issues, the algorithms aren’t even good at what they do. And though I have heard such a criticism before, I think the real strength of this book is how much O’Neil hammers home this crucial point. If these programs aren’t even accomplishing our goals, why do we even use them? (It does seem like, more often than not, the goal of the program is to justify the existence of the program, not to actually make sound predictions or give sound appraisals.)
But I found the whole thing entirely too brief. And, though it is not O’Neil’s fault that this book is now 7 years old (arguably, it’s to her credit she published when she did), I found it not as useful for thinking about the current issues of machine learning as I hoped. (This is mostly because I was already familiar with most of the arguments against using algorithms to assess people.)
I also found a few of her fairness concerns a little trumped up, though unfortunately I didn’t write down which ones and so I cannot now remember them. I found the tone a little too polemical at times. Maybe the issue justifies it, but I also think you need to write something like this in part for the people who are making these programs, selling them, or buying them, and not just for the average reader, who might already be convinced “Big Data is bad” but isn’t in any kind of position to do anything about it.
It’s very readable and accessible and I think she mostly makes her case. I wish I had read it in 2016 or 2017.