When I was in graduate school in the early 2000s, the phrase Artificial Intelligence, or AI, did not have the mesmerizing power it possesses today. The field might have been slowly recovering from the twilight of 1990s, but remained an obscure subject that did not exactly inspire enthusiasm among graduate students –– certainly not in my field of study. I might be more biased against AI research than most in my cohort, having acquired a distaste for it from the Dreyfus brothers’ contentious book, Mind Over Machines, which I interpreted at the time, perhaps over simplistically, as a rebuke of AI aspiration. Much has happened since then. In the past decade, AI has made breath-taking progress that enabled computers to navigate complex urban environments and beat the best human Go players. The Dreyfus brothers would probably read the news of these developments with astonishment and disbelief, though they may still not be ready to withdraw their opposition. For me, the last straw was ChatGPT, the chatbot that demonstrates human- and superhuman-level performance in tasks that I never thought can be done by computers in my lifetime: write essays, produce arts, and even achieve top 1% scores in the GRE verbal test, all delivered instantly by conversing fluently in natural language. I am convinced that I need to reassess my outdated opinions about AI. This conviction led me to delve into Human Compatible, a book written by Stuart J. Russell in 2019, whose work I initially came across on Sam Harris’s Podcast. Russell is a world-renowned AI researcher at UC Berkeley, where, ironically from my perspective, the Dreyfus brothers had spent most of their teaching careers.
Russell began by defining human intelligence loosely as the ability to achieve one’s objectives through actions. He believed AI should be described and assessed similarly. Yet, he argued that the focus should not be the “strength” of that ability, but rather its “usefulness” to humanity. In his words (the emphasis is mine), “machines are beneficial to the extent that their actions can be expected to achieve our objectives.”
Paradoxically, a machine that strives to achieve our goals could still be an eminent danger to us. For one thing, humans do not always know their real objectives. Steve Jobs famously said, “people don’t know what they want until you show them.” Russell quipped about the perils of “getting exactly what you wish for”, as everyone who has been granted three wishes by a god can relate to. He calls this the King Midas problem, because the legendary Greek King demanded that everything he touched would turn into gold, only to later regret his ill-fated wish. Second, a rigid, human-specified goal can often be best achieved by violating norms and values that we humans consider common sense. In a thought experiment, Russell imagined a super-intelligent machine, being asked by its human masters to cure cancer, decided to deliberately induce tumors in human beings so that it may carry out medical trials of “millions of potentially effective but previously untested chemical compounds”. Be the fastest cure as this strategy may, it is an abhorrent violation of the established ethical standards in the field of medicine. This is the infamous value alignment problem in AI research.
At this point, most readers would probably breathe a sigh of relief and dismiss these so-called dangers as the illusion of doomsayers. Surely enough, no machines that we know of can grant us wishes or cure cancer without any human supervision, right? Russell warned such complacency is dangerous and irresponsible, given the rapidly improving competence of AI systems. Contrary to what Hollywood movies lead us to believe, a conscious machine is not necessarily dangerous even if it hates humans. But a highly competent one surely is.
When it comes to the future of AI competence, Russell can be described as a cautious optimist. Not only does he believe artificial general intelligence, or AGI, is possible, but he once predicted “it would probably happen in the lifetime of my children”. He reminded us, furthermore, he is “considerably more conservative” than most active AI researchers, adding that it is entirely possible that AGI could come much sooner than his humble forecast. In part, Russell’s confidence stems from seemingly boundless computing power available to machines. At the time of his writing, the fastest computer on earth, the Summit machine at the Oak Ridge National Laboratory, has gained a raw processing capacity in par with human brain, roughly 1017 operations per second (ops). But this is infinitesimal compared to what machines could acquire in theory: 1051 ops for a laptop-sized computer, according to an estimate “based on quantum theory and entropy”.
To be sure, faster does not mean more intelligent. As Russell said, a faster machine may simply “give you the wrong answer more quickly”. According to him, reaching AGI still awaits several conceptual breakthroughs that may be hard to come by, which include: (i) understanding and extracting information from natural language; (ii) cumulative learning and discovery, which is essential to advancing science; (iii) planning and executing activities hierarchically to achieve complex objectives (e.g., going to Mars); and (iv) becoming an autonomous thinker that can manage one’s own mental activity (i.e., knows what and when to think).
Russell asserted that natural language technology was “not up to the task of reading and understanding millions of books”, and even though the existing language models can “extract simple information from clearly stated facts”, they can neither “build complex knowledge structure from text” or engage in “chains of reasoning with information from multiple sources”. That was four years ago. Today it seems clear that our first line of defense against AGI has already begun to fall to the advent of ChatGPT. While this entirely unexpected breakthrough may have caught Russell himself by surprise, it actually proves that he was right all along: we must embrace and prepare for a future in which AGI is an integral part, not in spite of, but precisely because of huge uncertainty.
Russel thinks a super-intelligent machine can understand the world far better and more quickly, cooperate with each other far more effectively, and look much further into the future with far greater accuracy, than any human could ever hope to do. In a nutshell, in a world with AGI,
“there would be no need to employ armies of specialists in different disciplines, organized into hierarchies of contractors and subcontractors, in order to carry out a project. All embodiments of AGI would have access to all the knowledge and skills of the human race, and more besides.”
What does this extraordinary technological triumph mean for human society?
First, the omnipotent AGI would drive up factor productivity to such a level that scarcity and poverty would be eliminated. When “the pie is essentially infinite”, Russell asked, why fight each other for a larger share? If this utopia sounds familiar, it is because Karl Marx said the same thing about communist society. This crown achievement, however, will come at the cost of shattering job losses. Russell believed few of us could keep our jobs. It is delusional to think AGI will create more new jobs than it renders obsolete or enhance workers rather than replace them. His metaphor of “the worker in an online-shopping fulfillment warehouse” is as enlightening as it is frightening. He wrote,
“She is more productive than her predecessors because she has a small army of robots bringing her storage bins to pick items from; but she is a part of a larger system controlled by intelligent algorithms that decide where she should stand and which items she should pick and dispatch. She is already partly buried in the pyramid, not standing on top of it. It’s only a matter of time before the sand fills the spaces in the pyramid and her role is eliminated.”
The implication seems clear: no matter how indispensable you think you are, there will come a time when you too will be replaced. That said, Russell told us everything will be just fine, if only humans could, as Keynes had famously advised 90 years ago, cope with their permanent plight of joblessness by learning “the art of life itself”.
Second, we must solve the alignment problem before entrusting all human affairs to AGI and retiring to the purer pursuit of happiness. The solution to the problem is Russell’s expertise and the essence of the book. Russell argued that AGI development must follow the “Principles for Beneficial Machines”, which state “(i) the machine’s only objective is to maximize the realization of human preferences; (ii) the machine is initially uncertain about what those preferences are and (iii) the ultimate source of information about human preferences is human behavior.” In a nutshell, Russell’s machine would continuously learn and strive to fulfill the preferences of their human masters. Whenever in doubt, it always defers to them, pausing its actions and seeking permission before proceeding.
I am skeptical these principles would be enough to save us from an AGI apocalypse. The last part of the book discusses extensively the imperfection of humans, which are “composed of nasty, envy-driven, irrational, inconsistent, unstable, computationally limited, complex, evolving, heterogeneous” individuals. Given that our species leaves so much to be desired, it seems strange to insist AGI must learn from our behaviors and help advance our (often) ruinous self-interests. Also, history has shown, time and again, humans of ordinary intelligence are perfectly capable of wreaking havoc on earth and perpetuating horrific violence against each other. It stands to reason that the scale of destruction they can inflict would be incomprehensible when armed with superintelligence. Unfortunately, that infinite pie Russell promised won’t eradicate human conflicts, because humans fight and kill as much for differences and status as for survival.
To his credit, Russell did concede that AGI must mind the interest of others, as well as that of its own master. Having reviewed the theories of ethics, he suggested that utilitarianism –– which advocates for maximizing the sum of everyone’s utilities while treating their preferences equally –– might work. Comparing utilities across individuals is meaningful and doable, Russell reasoned, and therefore, machines can be trained to master the science of ethics by what he called inverse reinforcement learning. What he did not elaborate, though, is what mechanisms will be used to reconcile the inevitable conflicts between private and public interests. Humans invented pluralistic politics to deal with this ancient and intricate problem. However, super-intelligent machines are likely to find such politics too messy, too stupid, and too ineffective for their taste. Instead, they may favor a top-down approach that promises to “optimize” everything for everyone. Unfortunately, this very promise had been made and broken before, often with devastating consequences.
Even if Russell’s “beneficial principles” ensure AGI never evolve into a tyrant – a big IF – they are still vulnerable to the “wireheading” trap, which is “the tendency of animals to short-circuit normal behavior in favor of direct stimulation of their own reward system”. Once the machines learn about the shortcut – say, directly stimulating a human’s brain to release pleasure-inducing chemicals – they would exploit it relentlessly to maximize the “total happiness” of humanity. This tactic is not in violation of Russell’s principles because simulated happiness is still happiness, and to many it is an authentic experience. The reader may recall that, in the famous movie The Matrix, many people willingly choose that virtual experience (the blue pill) over the real one (the red pill). Even Pascal admitted, “the heart has its reasons, which reason does not know”. How could you blame AGI for gleefully encouraging their human masters to want what their heart loves more than their reason does?
Perhaps the gravest concern for humanity in the era of AGI will be the potential loss of autonomy. In order for our civilization to endure, Russell explained, we must recreate it “in the mind of new generations”. With AGI, this is no longer necessary since machines can store our knowledge and essentially “run our civilization for us”. What is the point for any individual to spend a significant portion of their life acquiring knowledge and skills that they have no use for, except for the purpose of preserving our collective autonomy? Sadly, human nature being what it is, this tragedy of the commons may trap us all for eternity.
Russell’s writing exhibits a delightful wit, and the breadth of his knowledge in social sciences is remarkable, especially considering he specializes in computer science. The book would make a stimulating but comfortable read for anyone who has some basic understanding of game theory and machine learning. A reader without such a background may find some materials less accessible. Nevertheless, if Russell wanted to assuage the public’s concerns about AI safety, he might have fallen short. If anything, the book had rendered me more pessimistic about AGI’s human compatibility. While the Dreyfus brothers may be wrong about the superiority of mind over machines, deep down, I still wish they were right after all. To end on a desperately needed positive note, allow me to indulge a favorite quote from their book (again, the emphasis is mine):
“The truth is that human intelligence can never be replaced with machine intelligence simply because we are not ourselves “thinking machines”. Each of us has, and use every day, a power of intuitive intelligence that enables us to understand, to speak, and to cope skillfully with our everyday environment. We must learn what this power is, how it works, where it fits into our lives, and how it can be preserved and developed.”
A good read. I’m also recently reading some materials in AI alignment problem, and honestly I’m quite pessimistic on the future of humans — not that the human beings will be eradicated by AI, but that we will be “useless” ultimately. One basic concept in the AI alignment is called “instrumental convergence”, stating that whatever the goal we set the AI to achieve, as long as it is smart enough, it will soon realize that some basic sub-goals are necessary for maximizing the probability of achieving the ultimate goal, e.g., self-preserving, power-seeking, etc. Also, a sufficiently advanced AI will be able to self-improve, meaning, update its own source-code, and there is no rigorous logical statement to prevent it from breaking the mandatory restrictions initially added on it. Human-in-the-loop may appear to be a short-term solution, but a sufficiently advanced AI can certainly learn to manipulate human thoughts, just as many politicians do but only with much more efficiency. Just as you mentioned, any goal-oriented AI, as long as they are sufficiently smart and capable, will probably achieve their preset goals in a “weird” way, such as maximizing the total happiness of human beings by taking control of the biochemical process of our brains.
Yet, what I’m more pessimistic is that, the incoming presence of AGI may prove that the whole package of human asset, including the language, culture, science, art, is merely the emergent phenomena associated with certain level of complexity, in this case, the combined complexity of all human brains. And, with today’s (or near future) technology, we are totally able to produce a structure with much more complexity than all human brains combined. For that super-complicated structure, human asset is no different than the ant asset in our eyes. Yes, ants too have some level of emergent behaviors, some form of intra-group order, but certainly far simpler than our creations. If that is the case, then I cannot find any reason to justify why our humans should still be the “major player” in the future history. It seems “letting the ASI work for human beings” is itself not a grounded statement.
I guess the only thing that we humans have, distinctive from the current-form AI, is that our existence is driven by solid molecular structures (the self-replicating organic structure), and all the complexity is the emergent phenomena on the basis. AI, on the other hand, although it may evolve sub-goals autonomously, relies on a primary goal that is authorized by humans. This basis for existence is not solid. So maybe in the near future, we may see ASI-powered new human instead of pure ASI taking control of the world. That may be a “not-so-bad” ending for human beings, at least we the species can continue leading the history of earth to some extent.
AI-enhanced human certainly is a possible outcome – though that would be a very different species than us. So one way or the other, Sapiens will go extinct