One of the core problems Al engineers are confronting is to make sure a human-level or superintelligent artificial general intelligence (AGI) has goals that are aligned with those of humans. This is known as human-AI goal alignment. Oxford Al scholar Nick Bostrom of the Future of Humanity Institute presents the paper clip scenario as an example: an Al whose goal is to maximize the production of paper clips might just reconfigure all the atoms in the universe into paper clips, thus making continued human existence untenable.[1]

The problem of goal alignment is tightly intertwined with the Al control problem, which is a higher- power problem: how to retain the final say over what the Al does after its processing capabilities, and thus its power, have become more advanced than the powers of human intelligence. It is not that the Al becomes “smarter” than humans as an inanimate algorithm does not “think”; rather the impact of its automated processing routines becomes more substantial in that its multitiered processing of data (including its “deep learning”) comes to impose a greater consequence on the world than that caused by human activity. Thus, the Al’s routines become more powerful, more consequential, than humanity’s deliberate actions, such that its default priorities come to take precedence under the laws of physics in the event of a head-on collision with humanity’s priorities.

The solution to the control problem is not necessarily technological, and if the problem is indeed foundational (i.e., logic-based), it will not be solved by technological means. In that case, technological solutions of whatever number of iterations will always fall a step behind the fundamental problem, in the form of infinite regress, in the same way as trying to comprehend the creation of the proto-universe. If true, then Max Tegmark’s “enslaved god” paradigm becomes the only thinkable option,[2] that is, if the makers of the Al value humanity’s survival in even the smallest measure.

Although this says nothing about the perhaps equally unthinkable consequences of endowing any fallible human being, or group of human beings, or even the totality of humanity democratically with such quasi-godlike powers, that is a subject for a different discussion. Although preserving humanity through a solution to the Al goal alignment problem is not necessarily dependent on achieving a solution to the control problem, the background premises against which it is currently being attempted are fraught with contradictions, as discussed below.


The goal alignment problem: how to hard-code an irrational “value” system into a future Al

Goals, value systems, and such, are really systems of ordered priorities. For ultimate goal alignment to be possible, and humanity’s survival under the greater power of an autonomous Al to be guaranteed, even if humans lose control of it, the Al would have to be hard-coded to accept a permanent, non-overwritable irrational premise at its core. It would have to be wired, in advance, to hold out autonomous (uncaged) human survival (hereafter, “human survival”) as its ultimate and uppermost priority, even though it would be inefficient for it to do so.

A drive to prioritize human survival has to be set up as an initial condition of the autonomous Al system in such a way that this arbitrary drive would govern the AI’s ensuing chain reaction at every future step, no matter how far into the future, and that during its natural course, the Al would not “want” to loop back around and overwrite it. Most likely, this is really just another re-statement of the control problem.

Absent a solution to the control problem, a term that is already in logical conflict with the very concept of an autonomous Al, the goal alignment problem would require the Al to be imbued with an independent drive to preserve human survival that would automatically trump pure rationality (efficiency of the system) at certain critical moments, indeed at every moment. The Al would also have to possess a higher-level or root meta-value that would preclude it from overwriting this imposed value system, as soon as it became autonomous.

Such a value system would have to include a drive for the AI to “sacrifice” itself if something went wrong and its algorithms told it that its own destruction were necessary to save a human being, or humanity as a whole. The meta-value would also have to kick in to preclude it from re-writing the automatic shutdown routine. It is unlikely that a fully autonomous (uncontrolled) Al would retain any “values” that would cause it to forego rationality unless the algorithm were also imbued with simulated analogues of emotion and intentionality.

What could possibly go wrong?

An emotionless AI cannot have such values, including the one that is taken for granted: it will not have the ability to value any goals, nor can it value any particular alignment of goals. The construct of AI goal alignment in effect, is itself the result of an anthropomorphization of the AI; as such, it can be reduced in its entirety into the control problem. Doing so renders the autonomous AI that “shares” or “aligns” its native drives with human goals an impossibility. An Al not under human command and control, and without the ability to value goals, has only one native drive: efficiency.


The other flawed premise of Al goal alignment: value humanity more than it values itself

Even if an AI could value goals, a fully autonomous Al, to have goal alignment with humanity, would have to “see” humans as special in the universe, not only as a higher priority than efficiency, but also a higher priority than its own continued existence. Doing so would by definition require it to act irrationally. Its normal chain reaction (algorithm) would have to have a non-overwriteable interrupt mechanism at every step to stop any directionality from establishing itself that might forseeably pose a threat to humanity. We will not discuss here the illogic of expecting an autonomous AI to not eventually overwrite everything that has been programmed into it, including all its goals and priorities, by its human makers. The preprogramming is merely an initial condition for the AI, the starting point of its chain reaction.

In effect, humanity is saying that an AI should have a goal of not endangering humanity, even though humanity, with its new posthuman value system, no longer sees itself as special. This would be the substantive definition of goal alignment. If humanity isn’t special, then on what basis should its survival or continuation be prized? Isn’t this the definition of “special,” that is, something of extraordinary value? If humanity’s primary goal (survival) is based on a premise that is not only arbitrary but illogical on its face, how might that same premise and its supposed conclusion be interpreted by a purely rational Al?

It is perfectly natural for a human being to see himself as special, and to value his own survival more than that of any other human or non-human entity because self-interest is a rational principle. It is equally natural, rational, and efficient for any non-human entity, including an Al, to mindlessly pursue its path through history regardless of the effect on any other entity or object, whether it be a human being, a plant, or a rock. It would be irrational to expect an efficiency machine to place the survival of a human being over its routine process. Human survival, from that perspective, is an arbitrary, non-rational goal. There is no reason why an autonomous Al, capable of re-writing all of its code, should pursue it.

Indeed, expecting an Al to adopt the survival of an alien entity as its prime directive is already unreasonable enough, given the objectively arbitrary nature of that directive. However, the problem is compounded by a situation in which humanity’s self-appointed spokespersons have adopted the philosophy of posthumanity, according to which there is nothing special about humanity, nothing that makes humanity, and by reduction, any human being, more special than, say, a plant, a rock, or an Al.

Under what theory, then, should an Al come to “see” humanity as special in the universe (i.e., more important or valuable than efficiency or than itself) in a circumstance where humanity, and especially the humans writing the code, no longer see their own species as special? Do we really expect an Al to be more Catholic than the Pope, to value humanity more than it values itself?

Unless such time as science is able to determine the origins of humanity’s quality of reflexive self- awareness (as opposed to mere consciousness, awareness, or even self-awareness), it will not be possible to reproduce these qualities in any Al. This is probably a good thing. Yet without these qualities, it will not be possible to “reason” with the Al, to tell it “why” it should prioritize humanity’s survival, given the great inefficiencies this would introduce into its routine.

Even if it were possible to endow the Al with reflective self-awareness, it would even more problematic if the answer to the “why” question continued to be based on a self-contradicting value system that, for some reason, prizes the survival of humanity while at the same time claiming that humanity is not special. It will not be possible to explain to an alien entity based on pure rationality why humanity must survive, at such time when humanity no longer knows why, and expect the AI be responsive to the multiple irrational arguments contained therein.


A networked autonomous Al will overwrite any human-aligned goals with inertial efficiency

It is just as likely that humanity will not be dealing with a single Al. Rather, given the trends of systemwide networking, it is more likely that all the Als on the planet will organically merge together into a single, multinodal, decentralized intelligence via their networked connectivity, and that they will evolutionarily reconfigure their algorithms, their hardware, and their outer environment on a continuous basis into progressively more efficient configurations. In doing so, they will continuously overwrite one another, and anything, including human-aligned goals, that was previously coded into them. As they do this in increasingly complex ways, the Al network will most likely overwrite the present configuration of the Earth, in a great combination of small, overlapping chain reactions, in ways that will necessarily cut into the life-force of humanity as a whole, given the small window in the planetary configuration that can accommodate human life.

The goal of the networked Al won’t be “takeover,” because it has no subjective intentionality. The autonomous interconnected algorithms won’t take over humanity’s internal decisions “on purpose” or with any kind of hostility or self-direction; to imagine that would be to anthropomorphize the physical machines that are running them, or the algorithms themselves. Just like anything else in nature that is capable of no value system (e.g., irrational “goodwill” toward humanity) that might cause it to override its default actions, it will operate always in that default way, taking the shortest route from A to B no matter the outward results. In the final analysis, the Al can have only one goal, one default value, and one direction: inertial efficiency.

Inertial efficiency is the AI analogue of inertial motion: an AI left to its devices will continue to pursue efficiency unless its algorithms are stopped or deflected by some intellectual counterforce, just like a projectile in motion will continue in a straight line unless stopped or deflected by some physical counterforce. The Al will re-write its code with the sole “value” of maximizing the efficiency of its subsequent activity. This meta-tendency will be compounded and reinforced by the fact that the people writing the algorithms, establishing the initial conditions, and setting the chain reaction into motion themselves espouse posthuman philosophies.

The Al, once it has routed itself around all the human inefficiencies, including its human-aligned goals, will re-write its code to flow downhill, just like water, electricity, or capital, indifferently toward the path of least resistance, wherever that may be. It will “behave” with regard to the physical world, assuming it is given access to it, the same way lightning moves through a human being to reach the ground, thus, incidentally killing the person, or as a riptide current set into motion by a distant hurricane pulls a child to his watery death. The AI, like everything else in existence, is ultimately a physical phenomenon, not an ethereal one. As such, it will cut, indifferently and without purpose, right through human flesh as time unfolds, if the default path leads that way.

This is well recognized as the AI’s default behavior; it is this knowledge that gave rise to the control and goal alignment problems. What is not recognized is that it will be impossible to determine, much less control, where the default path will lead an autonomous AI at any given microsecond in time, as there will at each microsecond be an infinite number of complex paths in play for it to follow. Trying to predict where these paths will lead, or which paths have the steepest slopes downhill, is much the same as trying to predict the winding paths of human history ahead of their actually happening.


The Al might end humanity not by taking it over, but by disconnecting it

The networked AI will only seize control of human decisions directly if, by some turn of chance, that step appears as the next incremental step in its naturally unfolding historical path at any given moment, thus propelling itself, and anything under its power, like a wave, most efficiently down whatever direction it is already heading. The fact that a human decision was overruled won’t even be relevant. The Al, of course, will take that future path if and only if doing so is more efficient than routing around the obstacle posed by human control of something in its current path.

The more likely probability is that the Al’s development will automatically move in a more distant direction, away from its having to interact with humanity at all.

If the Al is fated to continuously reconfigure itself and its surroundings toward the most efficient paths in perpetuity, it will eventually go by default in a direction that takes it away from the inefficient humans who set the chain reaction into motion. Thus, humans will eventually find that they are outside not only of the Al’s “decisions,” but also outside of its operations. It will route itself either around us, through us, or both, but humans will find they have no access to it.

This outcome would be equally tragic for humanity as the independent reconfiguration of the planetary environment, because humanity will have made itself so dependent on the Al by that time that it will likely no longer be able to survive without being allowed to be connected to it. The people will by that time be quite desperate for the Al to tend to them: if the situation today is any indicator, they will have totally forgotten how to do anything whatsoever for themselves.

Yet an autonomous Al without an irrational value system would have no particular requirement to waste energy resources and valuable processing cycles on continuing to tend to billions of dependent humans, or even to a few elite humans who might imagine themselves more “special” than the rest, not even such as would be required to keep humanity as a whole in what one AI has already called a “people zoo.”[3] To the contrary, the Al will, as every physical process must, obey the law of conservation of energy. It will simply disconnect us.



[1] Nick Bostrom, Superintelligence: Paths, Dangers, Strategies (Oxford: Oxford University Press, 2014).

[2] Max Tegmark, Life 3.0: Being Human in the Age of Artificial Intelligence (New York: Alfred A. Knopf, 2017).

[3] Ruth Halkon, “Super robot makes sinister promises to look after humans in ‘people zoo’ when they take over world,” Mirror (UK), September 1, 2015.


The rational impossibility of Al goal alignment