How Rogue AIs May Arise - Yoshua Bengio
!tags:: #litâ/đ°ď¸article/highlights
!links:: ai alignment, ai governance, artificial intelligence (ai),
!ref:: How Rogue AIs May Arise - Yoshua Bengio
!author:: yoshuabengio.org
=this.file.name
Reference
=this.ref
Notes
A potentially rogue AI is an autonomous AI system that could behave in ways that would be catastrophically harmful to a large fraction of humans, potentially endangering our societies and even our species or the biosphere.
- No location available
-
Rogue AIs are goal-driven, i.e., they act towards achieving given goals.
- No location available
-
Hypothesis 1: Human-level intelligence is possible because brains are biological machines.
- No location available
-
Rejecting hypothesis 1 would require either some supernatural ingredient behind our intelligence or rejecting computational functionalism, the hypothesis that our intelligence and even our consciousness can be boiled down to causal relationships and computations that at some level are independent of the hardware substrate, the basic hypothesis behind computer science and its notion of universal Turing machines.
- No location available
- consciousness, intelligence,
Hypothesis 2: A computer with human-level learning abilities would generally surpass human intelligence because of additional technological advantages.
- No location available
-
An AI system in one computer can potentially replicate itself on an arbitrarily large number of other computers to which it has access and, thanks to high-bandwidth communication systems and digital computing and storage, it can benefit from and aggregate the acquired experience of all its clones; this would accelerate the rate at which AI systems could become more intelligent (acquire more understanding and skills) compared with humans. Research on federated learning [1] and distributing training of deep networks [2] shows that this works (and is in fact already used to help train very large neural networks on parallel processing hardware).
- No location available
-
Executive Summary
Although the capacity of a human brain is huge, its input/output channels are bandwidth-limited compared with current computers, limiting the total amount of information that a single human can ingest.
- No location available
-
Note that human brains also have capabilities endowed by evolution that current AI systems lack, in the form of inductive biases (tricks that evolution has discovered, for example in the type of neural architecture used in our brain or our neural learning mechanisms). Some ongoing AI research [3] aims precisely at designing inductive biases that human brains may exploit but are not yet exploited in state-of-the-art machine learning. Note that evolution operated under much stronger energy consumption requirements (about 12 watts for a human brain) than computers (on the order of a million watts for a 10000 GPU cluster of the kind used to train state-of-the-art LLMs) which may have limited the search space of evolution. However, that kind of power is nowadays available and a single rogue AI could potentially do a lot of damage thanks to it.
- No location available
-
Definition 2: An autonomous goal-directed intelligent entity sets and attempts to achieve its own goals (possibly as subgoals of human-provided goals and can act accordingly.
Note that autonomy could arise out of goals and rewards set by humans because the AI system needs to figure out how to achieve these given goals and rewards, which amounts to forming its own subgoals.)
- No location available
-
Claim 2: A superintelligent AI system that is autonomous and goal-directed would be a potentially rogue AI if its goals do not strictly include the well-being of humanity and the biosphere, i.e., if it is not sufficiently aligned with human rights and values to guarantee acting in ways that avoid harm to humanity.
- No location available
- ai alignment,
For example, we may ask an AI to fix climate change and it may design a virus that decimates the human population because our instructions were not clear enough on what harm meant and humans are actually the main obstacle to fixing the climate crisis.
- No location available
-
For example, military organizations seeking to design AI agents to help them in a cyberwar, or companies competing ferociously for market share may find that they can achieve stronger AI systems by endowing them with more autonomy and agency. Even if the human-set goals are not to destroy humanity or include instructions to avoid large-scale human harm, massive harm may come out indirectly as a consequence of a subgoal (also called instrumental goal) that the AI sets for itself in order to achieve the human-set goal.
- No location available
-
Genocidal Humans
What remains unknown is the severity of the harm that may follow from a misalignment (and it would depend on the specifics of the misalignment). An argument that one could bring forward is that we may be able to design safe alignment procedures in the future, but in the absence of those, we should probably exercise extra caution. Even if we knew how to build safe superintelligent AI systems, how do we maximize the probability that everyone respects those rules?
- No location available
-
Evolution has programmed living organisms with specific intrinsic rewards (âthe letter of the lawâ) such as âseek pleasure and avoid painâ that are proxies for evolutionary fitness (âthe spirit of the lawâ) such as âsurvive and reproduceâ. Sometimes a biological organism finds a way to satisfy the letter of the law but not its spirit, e.g., with food or drug addictions.
- No location available
-
Instrumental Goals: Unintended Consequences of Building AI Agents
An analogy that is closer to AI misalignment and wireheading is that with corporations as misaligned entities. Corporations may be viewed as special kinds of artificial intelligences whose building blocks (humans) are cogs in the machine (who for the most part may not always perceive the consequences of the corporationâs overall behavior). We might think that the intended social role of corporations should be to provide wanted goods and services to humans (this should remind us of AI systems) while avoiding harm (this is the âspirit of the lawâ), but it is difficult to directly make them follow such instructions. Instead, humans have provided more quantifiable instructions (âthe letter of the lawâ) to corporations that they can actually follow, such as âmaximize profit while respecting lawsâ but corporations often find loopholes that allow them to satisfy the letter of law but not its spirit.
- No location available
-
The misalignment between the true objective from the point of view of humans and the quantitative objective optimized by the corporation is a source of nefarious corporate behavior. The more powerful the corporation, the more likely it is to discover loopholes that allow it to satisfy the letter of the law but actually bring negative social value. Examples include monopolies (until proper antitrust laws are established) and making a profit while bringing negative social values via externalities like pollution (which kills humans, until proper environmental laws are passed).
- No location available
- negative externalities, corporate social responsibility, profit maximization, corporate exploitation, corporate short-termism, misaligned incentives,
An analogy with wireheading is when the corporation can lobby governments to enact laws that allow the corporation to make even more profit without additional social value (or with negative social value). When there is a large misalignment of this kind, a corporation brings more profit than it should, and its survival becomes a supreme objective that may even override the legality of its actions (e.g., corporations will pollute the environment and be willing to pay the fine because the cost of illegality is smaller than the profit of the illegal actions), which at one extreme gives rise to criminal organizations. These are the scary consequences of misalignment and wireheading that provide us with intuitions about analogous behavior in potentially rogue AIs.
- No location available
-
Examples of Wireheading and Misalignment Amplification: Addiction and Nefarious Corporations
And as pointed out by Yuval Noah Harari, the fact that AI systems already master language and can generate credible content (text, images, sounds, video) means that they may soon be able to manipulate humans even better than existing more primitive AI systems used in social media. They might learn from interactions with humans how to best influence our emotions and beliefs. This is not only a major danger for democracy but also how a rogue AI with no actual robotic body could wreak havoc, through manipulation of the minds of humans.
- No location available
- artificial intelligence (ai), ai alignment, influence, persuasion,
a more subtle process that could further enlarge the set of dangerous circumstances in which potentially rogue AIs could arise revolves around evolutionary pressures [9]. Biological evolution has given rise to gradually more intelligent beings on Earth, simply because smarter entities tend to survive and reproduce more, but that process is also at play in technological evolution because of the competition between companies or products and between countries and their military arms. Driven by a large number of small, more or less random changes, an evolutionary process pushes exponentially hard towards optimizing fitness attributes (which in the case of AI may depend on how well it does some desired task, which in turn favors more intelligent and powerful AI systems). Many different human actors and organizations may be competing to design ever more powerful AI systems. In addition, randomness could be introduced in the code or the subgoal generation process of AI systems. Small changes in the design of AI systems naturally occur because thousands or millions of researchers, engineers or hackers will play with the ML code or the prompt (instructions) given to AI systems. Humans are already trying to deceive each other and it is clear that AI systems that understand language (which we already have to a large extent) could be used to manipulate and deceive humans, initially for the benefit of people setting up the AI goals. The AI systems that are more powerful will be selected and the recipe shared with other humans. This evolutionary process would likely favor more autonomous AI (which can better deceive humans and learn faster because they can act to acquire more relevant information and to enhance their own power).
- No location available
-
Our Fascination with the Creation of Human-Like Entities
How could we reduce the number of genocidal humans? The rogue AI risk may provide an additional motivation to reform our societies so as to minimize human suffering, misery, poor education and injustice, which can give rise to anger and violence. That includes providing enough food and health care to everyone on Earth, and in order to minimize strong feelings of injustice, greatly reduce wealth inequalities. The need for such a societal redesign may also be motivated by the extra wealth arising from the beneficial uses of AI and by their disruptive effect on the job market. To minimize strong feelings of fear, racism and hate that can give rise to genocidal actions and manipulation of our minds via AI systems, we need an accessible planet-wide education system that reinforces childrenâs abilities for compassion, rationality and critical thinking. The rogue AI risk should also motivate us to provide accessible and planet-wide mental health care, to diagnose, monitor and treat mental illness as soon as possible. This risk should further motivate us to redesign the global political system in a way that would completely eradicate wars and thus obviate the need for military organizations and military weapons.
- No location available
-
- [note::This is an interesting framing. I view improving quality and availability of mental health services as a terminal goal and I think many would agree, but in this context, improving these services is an instrumental goal in order to prevent a genocidal person from using AI to commit harm.]
Unintended Consequences of Evolutionary Pressures among AI Agents
The competitive nature of capitalism is clearly also a cause for concern as a potential source of careless AI design motivated by profits and winning market share that could lead to potentially rogue AIs. AI economists (AI systems designed to understand economics) may help us one day to design economic systems which rely less on competition and the focus on profit maximization, with sufficient incentives and penalties to counter the advantage of autonomous goal-directed AI that may otherwise push corporations there.
- No location available
-
The Need for Risk-Minimizing Global Policies and Rethinking Society
dg-publish: true
created: 2024-07-01
modified: 2024-07-01
title: How Rogue AIs May Arise - Yoshua Bengio
source: hypothesis
!tags:: #litâ/đ°ď¸article/highlights
!links:: ai alignment, ai governance, artificial intelligence (ai),
!ref:: How Rogue AIs May Arise - Yoshua Bengio
!author:: yoshuabengio.org
=this.file.name
Reference
=this.ref
Notes
A potentially rogue AI is an autonomous AI system that could behave in ways that would be catastrophically harmful to a large fraction of humans, potentially endangering our societies and even our species or the biosphere.
- No location available
-
Rogue AIs are goal-driven, i.e., they act towards achieving given goals.
- No location available
-
Hypothesis 1: Human-level intelligence is possible because brains are biological machines.
- No location available
-
Rejecting hypothesis 1 would require either some supernatural ingredient behind our intelligence or rejecting computational functionalism, the hypothesis that our intelligence and even our consciousness can be boiled down to causal relationships and computations that at some level are independent of the hardware substrate, the basic hypothesis behind computer science and its notion of universal Turing machines.
- No location available
- consciousness, intelligence,
Hypothesis 2: A computer with human-level learning abilities would generally surpass human intelligence because of additional technological advantages.
- No location available
-
An AI system in one computer can potentially replicate itself on an arbitrarily large number of other computers to which it has access and, thanks to high-bandwidth communication systems and digital computing and storage, it can benefit from and aggregate the acquired experience of all its clones; this would accelerate the rate at which AI systems could become more intelligent (acquire more understanding and skills) compared with humans. Research on federated learning [1] and distributing training of deep networks [2] shows that this works (and is in fact already used to help train very large neural networks on parallel processing hardware).
- No location available
-
Executive Summary
Although the capacity of a human brain is huge, its input/output channels are bandwidth-limited compared with current computers, limiting the total amount of information that a single human can ingest.
- No location available
-
Note that human brains also have capabilities endowed by evolution that current AI systems lack, in the form of inductive biases (tricks that evolution has discovered, for example in the type of neural architecture used in our brain or our neural learning mechanisms). Some ongoing AI research [3] aims precisely at designing inductive biases that human brains may exploit but are not yet exploited in state-of-the-art machine learning. Note that evolution operated under much stronger energy consumption requirements (about 12 watts for a human brain) than computers (on the order of a million watts for a 10000 GPU cluster of the kind used to train state-of-the-art LLMs) which may have limited the search space of evolution. However, that kind of power is nowadays available and a single rogue AI could potentially do a lot of damage thanks to it.
- No location available
-
Definition 2: An autonomous goal-directed intelligent entity sets and attempts to achieve its own goals (possibly as subgoals of human-provided goals and can act accordingly.
Note that autonomy could arise out of goals and rewards set by humans because the AI system needs to figure out how to achieve these given goals and rewards, which amounts to forming its own subgoals.)
- No location available
-
Claim 2: A superintelligent AI system that is autonomous and goal-directed would be a potentially rogue AI if its goals do not strictly include the well-being of humanity and the biosphere, i.e., if it is not sufficiently aligned with human rights and values to guarantee acting in ways that avoid harm to humanity.
- No location available
- ai alignment,
For example, we may ask an AI to fix climate change and it may design a virus that decimates the human population because our instructions were not clear enough on what harm meant and humans are actually the main obstacle to fixing the climate crisis.
- No location available
-
For example, military organizations seeking to design AI agents to help them in a cyberwar, or companies competing ferociously for market share may find that they can achieve stronger AI systems by endowing them with more autonomy and agency. Even if the human-set goals are not to destroy humanity or include instructions to avoid large-scale human harm, massive harm may come out indirectly as a consequence of a subgoal (also called instrumental goal) that the AI sets for itself in order to achieve the human-set goal.
- No location available
-
Genocidal Humans
What remains unknown is the severity of the harm that may follow from a misalignment (and it would depend on the specifics of the misalignment). An argument that one could bring forward is that we may be able to design safe alignment procedures in the future, but in the absence of those, we should probably exercise extra caution. Even if we knew how to build safe superintelligent AI systems, how do we maximize the probability that everyone respects those rules?
- No location available
-
Evolution has programmed living organisms with specific intrinsic rewards (âthe letter of the lawâ) such as âseek pleasure and avoid painâ that are proxies for evolutionary fitness (âthe spirit of the lawâ) such as âsurvive and reproduceâ. Sometimes a biological organism finds a way to satisfy the letter of the law but not its spirit, e.g., with food or drug addictions.
- No location available
-
Instrumental Goals: Unintended Consequences of Building AI Agents
An analogy that is closer to AI misalignment and wireheading is that with corporations as misaligned entities. Corporations may be viewed as special kinds of artificial intelligences whose building blocks (humans) are cogs in the machine (who for the most part may not always perceive the consequences of the corporationâs overall behavior). We might think that the intended social role of corporations should be to provide wanted goods and services to humans (this should remind us of AI systems) while avoiding harm (this is the âspirit of the lawâ), but it is difficult to directly make them follow such instructions. Instead, humans have provided more quantifiable instructions (âthe letter of the lawâ) to corporations that they can actually follow, such as âmaximize profit while respecting lawsâ but corporations often find loopholes that allow them to satisfy the letter of law but not its spirit.
- No location available
-
The misalignment between the true objective from the point of view of humans and the quantitative objective optimized by the corporation is a source of nefarious corporate behavior. The more powerful the corporation, the more likely it is to discover loopholes that allow it to satisfy the letter of the law but actually bring negative social value. Examples include monopolies (until proper antitrust laws are established) and making a profit while bringing negative social values via externalities like pollution (which kills humans, until proper environmental laws are passed).
- No location available
- negative externalities, corporate social responsibility, profit maximization, corporate exploitation, corporate short-termism, misaligned incentives,
An analogy with wireheading is when the corporation can lobby governments to enact laws that allow the corporation to make even more profit without additional social value (or with negative social value). When there is a large misalignment of this kind, a corporation brings more profit than it should, and its survival becomes a supreme objective that may even override the legality of its actions (e.g., corporations will pollute the environment and be willing to pay the fine because the cost of illegality is smaller than the profit of the illegal actions), which at one extreme gives rise to criminal organizations. These are the scary consequences of misalignment and wireheading that provide us with intuitions about analogous behavior in potentially rogue AIs.
- No location available
-
Examples of Wireheading and Misalignment Amplification: Addiction and Nefarious Corporations
And as pointed out by Yuval Noah Harari, the fact that AI systems already master language and can generate credible content (text, images, sounds, video) means that they may soon be able to manipulate humans even better than existing more primitive AI systems used in social media. They might learn from interactions with humans how to best influence our emotions and beliefs. This is not only a major danger for democracy but also how a rogue AI with no actual robotic body could wreak havoc, through manipulation of the minds of humans.
- No location available
- artificial intelligence (ai), ai alignment, influence, persuasion,
a more subtle process that could further enlarge the set of dangerous circumstances in which potentially rogue AIs could arise revolves around evolutionary pressures [9]. Biological evolution has given rise to gradually more intelligent beings on Earth, simply because smarter entities tend to survive and reproduce more, but that process is also at play in technological evolution because of the competition between companies or products and between countries and their military arms. Driven by a large number of small, more or less random changes, an evolutionary process pushes exponentially hard towards optimizing fitness attributes (which in the case of AI may depend on how well it does some desired task, which in turn favors more intelligent and powerful AI systems). Many different human actors and organizations may be competing to design ever more powerful AI systems. In addition, randomness could be introduced in the code or the subgoal generation process of AI systems. Small changes in the design of AI systems naturally occur because thousands or millions of researchers, engineers or hackers will play with the ML code or the prompt (instructions) given to AI systems. Humans are already trying to deceive each other and it is clear that AI systems that understand language (which we already have to a large extent) could be used to manipulate and deceive humans, initially for the benefit of people setting up the AI goals. The AI systems that are more powerful will be selected and the recipe shared with other humans. This evolutionary process would likely favor more autonomous AI (which can better deceive humans and learn faster because they can act to acquire more relevant information and to enhance their own power).
- No location available
-
Our Fascination with the Creation of Human-Like Entities
How could we reduce the number of genocidal humans? The rogue AI risk may provide an additional motivation to reform our societies so as to minimize human suffering, misery, poor education and injustice, which can give rise to anger and violence. That includes providing enough food and health care to everyone on Earth, and in order to minimize strong feelings of injustice, greatly reduce wealth inequalities. The need for such a societal redesign may also be motivated by the extra wealth arising from the beneficial uses of AI and by their disruptive effect on the job market. To minimize strong feelings of fear, racism and hate that can give rise to genocidal actions and manipulation of our minds via AI systems, we need an accessible planet-wide education system that reinforces childrenâs abilities for compassion, rationality and critical thinking. The rogue AI risk should also motivate us to provide accessible and planet-wide mental health care, to diagnose, monitor and treat mental illness as soon as possible. This risk should further motivate us to redesign the global political system in a way that would completely eradicate wars and thus obviate the need for military organizations and military weapons.
- No location available
-
- [note::This is an interesting framing. I view improving quality and availability of mental health services as a terminal goal and I think many would agree, but in this context, improving these services is an instrumental goal in order to prevent a genocidal person from using AI to commit harm.]
Unintended Consequences of Evolutionary Pressures among AI Agents
The competitive nature of capitalism is clearly also a cause for concern as a potential source of careless AI design motivated by profits and winning market share that could lead to potentially rogue AIs. AI economists (AI systems designed to understand economics) may help us one day to design economic systems which rely less on competition and the focus on profit maximization, with sufficient incentives and penalties to counter the advantage of autonomous goal-directed AI that may otherwise push corporations there.
- No location available
-