How Rogue AIs May Arise - Yoshua Bengio

Quote

A potentially rogue AI is an autonomous AI system that could behave in ways that would be catastrophically harmful to a large fraction of humans, potentially endangering our societies and even our species or the biosphere.
- No location available
-

Quote

Rogue AIs are goal-driven, i.e., they act towards achieving given goals.
- No location available
-

Quote

Hypothesis 1: Human-level intelligence is possible because brains are biological machines.
- No location available
-

Quote

Rejecting hypothesis 1 would require either some supernatural ingredient behind our intelligence or rejecting computational functionalism, the hypothesis that our intelligence and even our consciousness can be boiled down to causal relationships and computations that at some level are independent of the hardware substrate, the basic hypothesis behind computer science and its notion of universal Turing machines.
- No location available
- consciousness, intelligence,

Quote

Hypothesis 2: A computer with human-level learning abilities would generally surpass human intelligence because of additional technological advantages.
- No location available
-

Quote

An AI system in one computer can potentially replicate itself on an arbitrarily large number of other computers to which it has access and, thanks to high-bandwidth communication systems and digital computing and storage, it can benefit from and aggregate the acquired experience of all its clones; this would accelerate the rate at which AI systems could become more intelligent (acquire more understanding and skills) compared with humans. Research on federated learning [1] and distributing training of deep networks [2] shows that this works (and is in fact already used to help train very large neural networks on parallel processing hardware).
- No location available
-

Quote

Although the capacity of a human brain is huge, its input/output channels are bandwidth-limited compared with current computers, limiting the total amount of information that a single human can ingest.
- No location available
-

Quote

Note that human brains also have capabilities endowed by evolution that current AI systems lack, in the form of inductive biases (tricks that evolution has discovered, for example in the type of neural architecture used in our brain or our neural learning mechanisms). Some ongoing AI research [3] aims precisely at designing inductive biases that human brains may exploit but are not yet exploited in state-of-the-art machine learning. Note that evolution operated under much stronger energy consumption requirements (about 12 watts for a human brain) than computers (on the order of a million watts for a 10000 GPU cluster of the kind used to train state-of-the-art LLMs) which may have limited the search space of evolution. However, that kind of power is nowadays available and a single rogue AI could potentially do a lot of damage thanks to it.
- No location available
-

Quote

Definition 2: An autonomous goal-directed intelligent entity sets and attempts to achieve its own goals (possibly as subgoals of human-provided goals and can act accordingly.
Note that autonomy could arise out of goals and rewards set by humans because the AI system needs to figure out how to achieve these given goals and rewards, which amounts to forming its own subgoals.)
- No location available
-

Quote

Claim 2: A superintelligent AI system that is autonomous and goal-directed would be a potentially rogue AI if its goals do not strictly include the well-being of humanity and the biosphere, i.e., if it is not sufficiently aligned with human rights and values to guarantee acting in ways that avoid harm to humanity.
- No location available
- ai alignment,

Quote

For example, we may ask an AI to fix climate change and it may design a virus that decimates the human population because our instructions were not clear enough on what harm meant and humans are actually the main obstacle to fixing the climate crisis.
- No location available
-

Quote

For example, military organizations seeking to design AI agents to help them in a cyberwar, or companies competing ferociously for market share may find that they can achieve stronger AI systems by endowing them with more autonomy and agency. Even if the human-set goals are not to destroy humanity or include instructions to avoid large-scale human harm, massive harm may come out indirectly as a consequence of a subgoal (also called instrumental goal) that the AI sets for itself in order to achieve the human-set goal.
- No location available
-

Quote

What remains unknown is the severity of the harm that may follow from a misalignment (and it would depend on the specifics of the misalignment). An argument that one could bring forward is that we may be able to design safe alignment procedures in the future, but in the absence of those, we should probably exercise extra caution. Even if we knew how to build safe superintelligent AI systems, how do we maximize the probability that everyone respects those rules?
- No location available
-

Quote

Evolution has programmed living organisms with specific intrinsic rewards (“the letter of the law”) such as “seek pleasure and avoid pain” that are proxies for evolutionary fitness (“the spirit of the law”) such as “survive and reproduce”. Sometimes a biological organism finds a way to satisfy the letter of the law but not its spirit, e.g., with food or drug addictions.
- No location available
-

Quote

An analogy that is closer to AI misalignment and wireheading is that with corporations as misaligned entities. Corporations may be viewed as special kinds of artificial intelligences whose building blocks (humans) are cogs in the machine (who for the most part may not always perceive the consequences of the corporation’s overall behavior). We might think that the intended social role of corporations should be to provide wanted goods and services to humans (this should remind us of AI systems) while avoiding harm (this is the “spirit of the law”), but it is difficult to directly make them follow such instructions. Instead, humans have provided more quantifiable instructions (“the letter of the law”) to corporations that they can actually follow, such as “maximize profit while respecting laws” but corporations often find loopholes that allow them to satisfy the letter of law but not its spirit.
- No location available
-

Quote

The misalignment between the true objective from the point of view of humans and the quantitative objective optimized by the corporation is a source of nefarious corporate behavior. The more powerful the corporation, the more likely it is to discover loopholes that allow it to satisfy the letter of the law but actually bring negative social value. Examples include monopolies (until proper antitrust laws are established) and making a profit while bringing negative social values via externalities like pollution (which kills humans, until proper environmental laws are passed).
- No location available
- negative externalities, corporate social responsibility, profit maximization, corporate exploitation, corporate short-termism, misaligned incentives,

Quote

An analogy with wireheading is when the corporation can lobby governments to enact laws that allow the corporation to make even more profit without additional social value (or with negative social value). When there is a large misalignment of this kind, a corporation brings more profit than it should, and its survival becomes a supreme objective that may even override the legality of its actions (e.g., corporations will pollute the environment and be willing to pay the fine because the cost of illegality is smaller than the profit of the illegal actions), which at one extreme gives rise to criminal organizations. These are the scary consequences of misalignment and wireheading that provide us with intuitions about analogous behavior in potentially rogue AIs.
- No location available
-

Quote

And as pointed out by Yuval Noah Harari, the fact that AI systems already master language and can generate credible content (text, images, sounds, video) means that they may soon be able to manipulate humans even better than existing more primitive AI systems used in social media. They might learn from interactions with humans how to best influence our emotions and beliefs. This is not only a major danger for democracy but also how a rogue AI with no actual robotic body could wreak havoc, through manipulation of the minds of humans.
- No location available
- artificial intelligence (ai), ai alignment, influence, persuasion,

Quote

a more subtle process that could further enlarge the set of dangerous circumstances in which potentially rogue AIs could arise revolves around evolutionary pressures [9]. Biological evolution has given rise to gradually more intelligent beings on Earth, simply because smarter entities tend to survive and reproduce more, but that process is also at play in technological evolution because of the competition between companies or products and between countries and their military arms. Driven by a large number of small, more or less random changes, an evolutionary process pushes exponentially hard towards optimizing fitness attributes (which in the case of AI may depend on how well it does some desired task, which in turn favors more intelligent and powerful AI systems). Many different human actors and organizations may be competing to design ever more powerful AI systems. In addition, randomness could be introduced in the code or the subgoal generation process of AI systems. Small changes in the design of AI systems naturally occur because thousands or millions of researchers, engineers or hackers will play with the ML code or the prompt (instructions) given to AI systems. Humans are already trying to deceive each other and it is clear that AI systems that understand language (which we already have to a large extent) could be used to manipulate and deceive humans, initially for the benefit of people setting up the AI goals. The AI systems that are more powerful will be selected and the recipe shared with other humans. This evolutionary process would likely favor more autonomous AI (which can better deceive humans and learn faster because they can act to acquire more relevant information and to enhance their own power).
- No location available
-

Quote

How could we reduce the number of genocidal humans? The rogue AI risk may provide an additional motivation to reform our societies so as to minimize human suffering, misery, poor education and injustice, which can give rise to anger and violence. That includes providing enough food and health care to everyone on Earth, and in order to minimize strong feelings of injustice, greatly reduce wealth inequalities. The need for such a societal redesign may also be motivated by the extra wealth arising from the beneficial uses of AI and by their disruptive effect on the job market. To minimize strong feelings of fear, racism and hate that can give rise to genocidal actions and manipulation of our minds via AI systems, we need an accessible planet-wide education system that reinforces children’s abilities for compassion, rationality and critical thinking. The rogue AI risk should also motivate us to provide accessible and planet-wide mental health care, to diagnose, monitor and treat mental illness as soon as possible. This risk should further motivate us to redesign the global political system in a way that would completely eradicate wars and thus obviate the need for military organizations and military weapons.
- No location available
-
- [note::This is an interesting framing. I view improving quality and availability of mental health services as a terminal goal and I think many would agree, but in this context, improving these services is an instrumental goal in order to prevent a genocidal person from using AI to commit harm.]

Quote

The competitive nature of capitalism is clearly also a cause for concern as a potential source of careless AI design motivated by profits and winning market share that could lead to potentially rogue AIs. AI economists (AI systems designed to understand economics) may help us one day to design economic systems which rely less on competition and the focus on profit maximization, with sufficient incentives and penalties to counter the advantage of autonomous goal-directed AI that may otherwise push corporations there.
- No location available
-

Quote

A potentially rogue AI is an autonomous AI system that could behave in ways that would be catastrophically harmful to a large fraction of humans, potentially endangering our societies and even our species or the biosphere.
- No location available
-

Quote

Rogue AIs are goal-driven, i.e., they act towards achieving given goals.
- No location available
-

Quote

Hypothesis 1: Human-level intelligence is possible because brains are biological machines.
- No location available
-

Quote

Rejecting hypothesis 1 would require either some supernatural ingredient behind our intelligence or rejecting computational functionalism, the hypothesis that our intelligence and even our consciousness can be boiled down to causal relationships and computations that at some level are independent of the hardware substrate, the basic hypothesis behind computer science and its notion of universal Turing machines.
- No location available
- consciousness, intelligence,

Quote

Hypothesis 2: A computer with human-level learning abilities would generally surpass human intelligence because of additional technological advantages.
- No location available
-

Quote

An AI system in one computer can potentially replicate itself on an arbitrarily large number of other computers to which it has access and, thanks to high-bandwidth communication systems and digital computing and storage, it can benefit from and aggregate the acquired experience of all its clones; this would accelerate the rate at which AI systems could become more intelligent (acquire more understanding and skills) compared with humans. Research on federated learning [1] and distributing training of deep networks [2] shows that this works (and is in fact already used to help train very large neural networks on parallel processing hardware).
- No location available
-

Quote

Although the capacity of a human brain is huge, its input/output channels are bandwidth-limited compared with current computers, limiting the total amount of information that a single human can ingest.
- No location available
-

Quote

Note that human brains also have capabilities endowed by evolution that current AI systems lack, in the form of inductive biases (tricks that evolution has discovered, for example in the type of neural architecture used in our brain or our neural learning mechanisms). Some ongoing AI research [3] aims precisely at designing inductive biases that human brains may exploit but are not yet exploited in state-of-the-art machine learning. Note that evolution operated under much stronger energy consumption requirements (about 12 watts for a human brain) than computers (on the order of a million watts for a 10000 GPU cluster of the kind used to train state-of-the-art LLMs) which may have limited the search space of evolution. However, that kind of power is nowadays available and a single rogue AI could potentially do a lot of damage thanks to it.
- No location available
-

Quote

Definition 2: An autonomous goal-directed intelligent entity sets and attempts to achieve its own goals (possibly as subgoals of human-provided goals and can act accordingly.
Note that autonomy could arise out of goals and rewards set by humans because the AI system needs to figure out how to achieve these given goals and rewards, which amounts to forming its own subgoals.)
- No location available
-

Quote

Claim 2: A superintelligent AI system that is autonomous and goal-directed would be a potentially rogue AI if its goals do not strictly include the well-being of humanity and the biosphere, i.e., if it is not sufficiently aligned with human rights and values to guarantee acting in ways that avoid harm to humanity.
- No location available
- ai alignment,

Quote

For example, we may ask an AI to fix climate change and it may design a virus that decimates the human population because our instructions were not clear enough on what harm meant and humans are actually the main obstacle to fixing the climate crisis.
- No location available
-

Quote

For example, military organizations seeking to design AI agents to help them in a cyberwar, or companies competing ferociously for market share may find that they can achieve stronger AI systems by endowing them with more autonomy and agency. Even if the human-set goals are not to destroy humanity or include instructions to avoid large-scale human harm, massive harm may come out indirectly as a consequence of a subgoal (also called instrumental goal) that the AI sets for itself in order to achieve the human-set goal.
- No location available
-

Quote

What remains unknown is the severity of the harm that may follow from a misalignment (and it would depend on the specifics of the misalignment). An argument that one could bring forward is that we may be able to design safe alignment procedures in the future, but in the absence of those, we should probably exercise extra caution. Even if we knew how to build safe superintelligent AI systems, how do we maximize the probability that everyone respects those rules?
- No location available
-

Quote

Evolution has programmed living organisms with specific intrinsic rewards (“the letter of the law”) such as “seek pleasure and avoid pain” that are proxies for evolutionary fitness (“the spirit of the law”) such as “survive and reproduce”. Sometimes a biological organism finds a way to satisfy the letter of the law but not its spirit, e.g., with food or drug addictions.
- No location available
-

Quote

An analogy that is closer to AI misalignment and wireheading is that with corporations as misaligned entities. Corporations may be viewed as special kinds of artificial intelligences whose building blocks (humans) are cogs in the machine (who for the most part may not always perceive the consequences of the corporation’s overall behavior). We might think that the intended social role of corporations should be to provide wanted goods and services to humans (this should remind us of AI systems) while avoiding harm (this is the “spirit of the law”), but it is difficult to directly make them follow such instructions. Instead, humans have provided more quantifiable instructions (“the letter of the law”) to corporations that they can actually follow, such as “maximize profit while respecting laws” but corporations often find loopholes that allow them to satisfy the letter of law but not its spirit.
- No location available
-

Quote

The misalignment between the true objective from the point of view of humans and the quantitative objective optimized by the corporation is a source of nefarious corporate behavior. The more powerful the corporation, the more likely it is to discover loopholes that allow it to satisfy the letter of the law but actually bring negative social value. Examples include monopolies (until proper antitrust laws are established) and making a profit while bringing negative social values via externalities like pollution (which kills humans, until proper environmental laws are passed).
- No location available
- negative externalities, corporate social responsibility, profit maximization, corporate exploitation, corporate short-termism, misaligned incentives,

Quote

An analogy with wireheading is when the corporation can lobby governments to enact laws that allow the corporation to make even more profit without additional social value (or with negative social value). When there is a large misalignment of this kind, a corporation brings more profit than it should, and its survival becomes a supreme objective that may even override the legality of its actions (e.g., corporations will pollute the environment and be willing to pay the fine because the cost of illegality is smaller than the profit of the illegal actions), which at one extreme gives rise to criminal organizations. These are the scary consequences of misalignment and wireheading that provide us with intuitions about analogous behavior in potentially rogue AIs.
- No location available
-

Quote

And as pointed out by Yuval Noah Harari, the fact that AI systems already master language and can generate credible content (text, images, sounds, video) means that they may soon be able to manipulate humans even better than existing more primitive AI systems used in social media. They might learn from interactions with humans how to best influence our emotions and beliefs. This is not only a major danger for democracy but also how a rogue AI with no actual robotic body could wreak havoc, through manipulation of the minds of humans.
- No location available
- artificial intelligence (ai), ai alignment, influence, persuasion,

Quote

a more subtle process that could further enlarge the set of dangerous circumstances in which potentially rogue AIs could arise revolves around evolutionary pressures [9]. Biological evolution has given rise to gradually more intelligent beings on Earth, simply because smarter entities tend to survive and reproduce more, but that process is also at play in technological evolution because of the competition between companies or products and between countries and their military arms. Driven by a large number of small, more or less random changes, an evolutionary process pushes exponentially hard towards optimizing fitness attributes (which in the case of AI may depend on how well it does some desired task, which in turn favors more intelligent and powerful AI systems). Many different human actors and organizations may be competing to design ever more powerful AI systems. In addition, randomness could be introduced in the code or the subgoal generation process of AI systems. Small changes in the design of AI systems naturally occur because thousands or millions of researchers, engineers or hackers will play with the ML code or the prompt (instructions) given to AI systems. Humans are already trying to deceive each other and it is clear that AI systems that understand language (which we already have to a large extent) could be used to manipulate and deceive humans, initially for the benefit of people setting up the AI goals. The AI systems that are more powerful will be selected and the recipe shared with other humans. This evolutionary process would likely favor more autonomous AI (which can better deceive humans and learn faster because they can act to acquire more relevant information and to enhance their own power).
- No location available
-

Quote

How could we reduce the number of genocidal humans? The rogue AI risk may provide an additional motivation to reform our societies so as to minimize human suffering, misery, poor education and injustice, which can give rise to anger and violence. That includes providing enough food and health care to everyone on Earth, and in order to minimize strong feelings of injustice, greatly reduce wealth inequalities. The need for such a societal redesign may also be motivated by the extra wealth arising from the beneficial uses of AI and by their disruptive effect on the job market. To minimize strong feelings of fear, racism and hate that can give rise to genocidal actions and manipulation of our minds via AI systems, we need an accessible planet-wide education system that reinforces children’s abilities for compassion, rationality and critical thinking. The rogue AI risk should also motivate us to provide accessible and planet-wide mental health care, to diagnose, monitor and treat mental illness as soon as possible. This risk should further motivate us to redesign the global political system in a way that would completely eradicate wars and thus obviate the need for military organizations and military weapons.
- No location available
-
- [note::This is an interesting framing. I view improving quality and availability of mental health services as a terminal goal and I think many would agree, but in this context, improving these services is an instrumental goal in order to prevent a genocidal person from using AI to commit harm.]

Quote

The competitive nature of capitalism is clearly also a cause for concern as a potential source of careless AI design motivated by profits and winning market share that could lead to potentially rogue AIs. AI economists (AI systems designed to understand economics) may help us one day to design economic systems which rely less on competition and the focus on profit maximization, with sufficient incentives and penalties to counter the advantage of autonomous goal-directed AI that may otherwise push corporations there.
- No location available
-

How Rogue AIs May Arise - Yoshua Bengio

`=this.file.name`

Reference

Notes

Executive Summary

Genocidal Humans

Instrumental Goals: Unintended Consequences of Building AI Agents

Examples of Wireheading and Misalignment Amplification: Addiction and Nefarious Corporations

Our Fascination with the Creation of Human-Like Entities

Unintended Consequences of Evolutionary Pressures among AI Agents

The Need for Risk-Minimizing Global Policies and Rethinking Society

dg-publish: true
created: 2024-07-01
modified: 2024-07-01
title: How Rogue AIs May Arise - Yoshua Bengio
source: hypothesis

`=this.file.name`

Reference

Notes

Executive Summary

Genocidal Humans

Instrumental Goals: Unintended Consequences of Building AI Agents

Examples of Wireheading and Misalignment Amplification: Addiction and Nefarious Corporations

Our Fascination with the Creation of Human-Like Entities

Unintended Consequences of Evolutionary Pressures among AI Agents

The Need for Risk-Minimizing Global Policies and Rethinking Society

=this.file.name

Reference

Notes

Executive Summary

Genocidal Humans

Instrumental Goals: Unintended Consequences of Building AI Agents

Examples of Wireheading and Misalignment Amplification: Addiction and Nefarious Corporations

Our Fascination with the Creation of Human-Like Entities

Unintended Consequences of Evolutionary Pressures among AI Agents

The Need for Risk-Minimizing Global Policies and Rethinking Society

dg-publish: true created: 2024-07-01 modified: 2024-07-01 title: How Rogue AIs May Arise - Yoshua Bengio source: hypothesis

=this.file.name

Reference

Notes

Executive Summary

Genocidal Humans

Instrumental Goals: Unintended Consequences of Building AI Agents

Examples of Wireheading and Misalignment Amplification: Addiction and Nefarious Corporations

Our Fascination with the Creation of Human-Like Entities

Unintended Consequences of Evolutionary Pressures among AI Agents

The Need for Risk-Minimizing Global Policies and Rethinking Society

`=this.file.name`

dg-publish: true
created: 2024-07-01
modified: 2024-07-01
title: How Rogue AIs May Arise - Yoshua Bengio
source: hypothesis

`=this.file.name`