So Far: My Response on Unfriendly AI

Eliezer Yudkowsky asks: “Well, in the case of Unfriendly AI, I’d ask which of the following statements Bryan Caplan denies.” My point-by-point reply:

1. Orthogonality thesis – intelligence can be directed toward any
compact goal; consequentialist means-end reasoning can be deployed to
find means corresponding to a free choice of end; AIs are not
automatically nice; moral internalism is false.

I agree AIs are not “automatically nice.” The other statements are sufficiently jargony I don’t know whether I agree, but I assume they’re all roughly synonymous.

2. Instrumental
convergence – an AI doesn’t need to specifically hate you to hurt you; a
paperclip maximizer doesn’t hate you but you’re made out of atoms that
it can use to make paperclips, so leaving you alive represents an
opportunity cost and a number of foregone paperclips. Similarly,
paperclip maximizers want to self-improve, to perfect material
technology, to gain control of resources, to persuade their programmers
that they’re actually quite friendly, to hide their real thoughts from
their programmers via cognitive steganography or similar strategies, to
give no sign of value disalignment until they’ve achieved near-certainty
of victory from the moment of their first overt strike, etcetera.

Agree.

3. Rapid capability gain and large capability differences – under
scenarios seeming more plausible than not, there’s the possibility of
AIs gaining in capability very rapidly, achieving large absolute
differences of capability, or some mixture of the two. (We could try to
keep that possibility non-actualized by a deliberate effort, and that
effort might even be successful, but that’s not the same as the avenue
not existing.)

Disagree, at least in spirit. I think Robin Hanson wins his “Foom” debate with Eliezer, and in any case see no reason to believe either of Eliezer’s scenarios is plausible. I’ll be grateful if we have self-driving cars before my younger son is old enough to drive ten years from now. Why “in spirit”? Because taken literally, I think there’s a “possibility” of Eliezer’s scenarios in every scenario. Per Tetlock, I wish he’d given an unconditional probability with a time frame to eliminate this ambiguity.

4. 1-3 in combination imply that Unfriendly AI is
a critical Problem-to-be-solved, because AGI is not automatically nice,
by default does things we regard as harmful, and will have avenues
leading up to great intelligence and power.

Disagree. “Not automatically nice” seems like a flimsy reason to worry. Indeed, what creature or group or species is “automatically nice”? Not humanity, that’s for sure. To make Eliezer’s conclusion follow from his premises, (1) should be replaced with something like:

1′. AIs have a non-trivial chance of being dangerously un-nice.

I do find this plausible, though only because many governments will create un-nice AIs on purpose. But I don’t find this any more scary than the current existence of un-nice governments. In fact, given the historic role of human error and passion in nuclear politics, a greater role for AIs makes me a little less worried.

READER COMMENTS

READ COMMENT POLICY

Eliezer Yudkowsky

Mar 30 2016 at 1:12am

We don’t need to quibble about the spirit here; I agree that “possibility” doesn’t mean anything and only probabilities matter.

That said, timelines are the hardest part of AGI issues to forecast, by which I mean that if you ask me for a specific year, I throw up my hands and say “Not only do I not know, I make the much stronger statement that nobody else has good knowledge either.” Fermi said that positive-net-energy from nuclear power wouldn’t be possible for 50 years, two years before he oversaw the construction of the first pile of uranium bricks to go critical. The way these things work is that they look fifty years off to the slightly skeptical, and ten years later, they still look fifty years off, and then suddenly there’s a breakthrough and they look five years off, at which point they’re actually 2 to 20 years off.

If you hold a gun to my head and say “Infer your probability distribution from your own actions, you self-proclaimed Bayesian” then I think I seem to be planning for a time horizon between 8 and 40 years, but some of that because there’s very little I think I can do in less than 8 years, and, you know, if it takes longer than 40 years there’ll probably be some replanning to do anyway over that time period.

And then how *long* takeoff takes past that point is a separate issue, one that doesn’t correlate all that much to how long it took to start takeoff. In terms of falsifiability, if you have an AGI that passes the real no-holds-barred Turing Test over all human capabilities that can be tested in a one-hour conversation, and life as we know it is still continuing 2 years later, I’m pretty shocked. In fact, I’m pretty shocked if you get up to that point at all before the end of the world. An AGI needs to be strictly superhuman to pass as human. But perhaps this is not a point on which you disagree, so perhaps it’s the wrong thing on which to assign probabilities.

Bryan, would you say that you’re not worried about 1′ because:

1’a: You don’t think a paperclip maximizer is un-nice enough to be dangerous, even if it’s smarter than us.
1’b: You don’t think a paperclip maximizer of around human intelligence is un-nice enough to be dangerous, and you don’t foresee paperclip maximizers becoming much smarter than humans.
1’c: You don’t think that AGIs as un-nice as a paperclip maximizer are probable, unless those durned governments create AGIs that un-nice on purpose.

ChrisA

Mar 30 2016 at 1:46am

Bryan, don’t you think there are parallels between a strong AI and a new species? And, like with Neanderthals there are plenty of examples of when a new species of humans completely wiped out the previous ones? Actually AI is more than just a new species of animal, the heuristic that you have about the threats from a new kind of animal are based on the slow evolution process of genes. An AI can evolve itself not its descendants, a brand new unseen threat so far.

On the timing, it seems to me inevitable that AI will eventually be developed, we have a very good proof of concept already that we know requires surprisingly little up front programming to work well. Given that we are just in literally the first few years of having artificial computing technology anywhere near as good as the brain I think it is quite possible we will have something we all agree is AI in 10 years.

Eric Schmidt

Mar 30 2016 at 2:03am

Prof Caplan: What makes you think that “the possibility of AIs gaining in capability very rapidly, achieving large absolute differences of capability, or some mixture of the two” will be so negligible? Doesn’t this seem significantly probable? And if they might outpace us so much so suddenly, why aren’t you worried about the consequences?

John

Mar 30 2016 at 2:24am

1′. AIs have a non-trivial chance of being dangerously un-nice.
I do find this plausible, though only because many governments will create un-nice AIs on purpose.

You quickly agreed with (2) even though it seems to imply that dangerously un-nice AGIs would be easy to make by accident. If that’s the case then a malevolent government would be unnecessary.

Brian

Mar 30 2016 at 7:19am

Silly to worry.
1) origin – humans tend to exploitation and murder(the source of our angst) because of our weakness and small group beginning

2) competition – AIs will not be in competition for the same game which is why we homo killed all the other sapiens we encountered

3) opportunity cost – e.g. our atoms could be used to make more paperclips but harvesting dark matter will yield more atoms and energy

4) remedy – given the AI menace might emerge from any smart device at any time we must empower a global bureaucracy to keep us safe – absolutely certain to stifle leading to extinction and highly likely to turn murderous itself

Mark Bahner

Mar 30 2016 at 12:31pm

1′. AIs have a non-trivial chance of being dangerously un-nice.

I do find this plausible, though only because many governments will create un-nice AIs on purpose. But I don’t find this any more scary than the current existence of un-nice governments.

I see “super-AI” as being achieved less than 5 years from achieving “human-level” AI. The big potential danger of “super-AI” as I see it is that humans rule the planet only through our intelligence. Our bodies are not particularly strong, fast, indestructible, etc.

I see un-nice governments as less of a potential danger than super-AI because there are not many un-nice governments, and they tend to be actually less smart than other governments (because they suppress information flow).

Swami

Mar 30 2016 at 1:22pm

I must disagree with stating that humans are not automatically nice. It is true at one level, but not at another.

The point is that humans depend upon each other, we are intrinsically a cooperative, cultural species (see latest book by Henrich). Cooperative dependency, combined with group competition (Price Theorem), attracts us toward in group “niceness.”

IOW, we are attracted toward in group niceness due to mutual codependencies and out group competition (basically the same argument Turchin uses in his latest book).

If super AI are not codependent upon us, then the niceness requirement is gone. This is something we should be worried about.

roystgnr

Mar 30 2016 at 2:51pm

only because many governments will create un-nice AIs on purpose.

How strange. This seems to discount the possibility of an AI program having a bug in it, such as for example every large computer program ever written in the history of mankind.

I have repaired software bugs which corrupted important files, corrupted hard drives, corrupted video card memory, corrupted firmware, and corruption in just about anything else that some software had write access to and drivers for, but always I had to begin by in some sense “turning it off and on again” and restoring from backup. This implies a great deal of concern for when software is essentially given write access and drivers for the whole universe, which to the best of our knowledge has no reset switch and no backups.

James of Seattle

Mar 30 2016 at 2:56pm

I recognize the danger of super AI, and I’m am gratified that plenty of smart people are thinking about it, but I’m confused at the focus on the monolithic evil paper clipper, as if exactly one such AI will be created, and then we will all stand back and wait to see what happens.

Isn’t it much more likely that AI’s will exist in a society of AI’s? What are the chances that there will be an InterPol of AI’s whose duty it is to ferret out dangerous AI activity and stop it? (Chance seems high to me.) In such a context, what are the chances that a Paper Clipper (super smart) will risk being turned off, instead of, say, cooperating with the society so as to maximize the chances of colonization of the universe, which colonization would be necessary for the maximal number of paper clips.

Maybe creating a “maximizer” of any sort is the morally correct thing to do.

Mar 30 2016 at 5:46pm

I recognize the danger of super AI, and I’m am gratified that plenty of smart people are thinking about it,…

I recognize the danger of super AI, and I’m worried that virtually no smart people are thinking about it, versus problems that I regard as absolutely trivial by comparison. For example, I don’t think I can come up with any plausible circumstance under which climate change could kill a billion people over the remainder of this century. But I can come up with a boatload of plausible circumstances under which super AI could kill a billion people over the remainder of this century.

Isn’t it much more likely that AI’s will exist in a society of AI’s? What are the chances that there will be an InterPol of AI’s whose duty it is to ferret out dangerous AI activity and stop it?

Yes, there might be an Interpol of AIs whose duty is to ferret out activity of AIs that is dangerous to other AIs. But where is the human Interpol that protects squirrels, cockroaches, or bacteria? Simply because we created the super AI’s ancestors doesn’t seem to me to be any guarantee they won’t harm us.

In such a context, what are the chances that a Paper Clipper (super smart) will risk being turned off, instead of, say, cooperating with the society so as to maximize the chances of colonization of the universe, which colonization would be necessary for the maximal number of paper clips.

This is actually one of the boatload of scenarios that scares me.

Spoiler alerts! 😉

In 2001: A Space Odyssey, HAL kills the astronauts to avoid being turned off by them. If the super AI doesn’t want to risk being turned off, one way to achieve that is to eliminate all things that can possibly turn it off.

In contrast, in Terminator 2, the Terminator insists on being turned off (destroyed). (While this seemed like a good idea at the time, it ended up not having the desired effect.)

We have no way of predicting how super AIs will behave. And if they behave badly–towards us or themselves or anything else–the odds that we’ll be able to stop them seem very small.

Mar 30 2016 at 9:40pm

Mark,

You say “But where is the human Interpol that protects squirrels, cockroaches, or bacteria?”

The difference is that humans are rational, sentient beings capable of communicating with other rational, sentient beings. Squirrels, etc., are not. The better, though still imperfect, analogy would be dogs, cats, or horses. Humans have plenty of enforced laws protecting them.

You also say “We have no way of predicting how super AIs will behave.”

If the super AIs are perfectly rational, we’ll be able to predict exactly how they’ll behave. Moreover, they’ll act exactly like we do when we behave rationally under the same external conditions. The only uncertainty would be to the extent that super AIs are subject to error and mental illness. Presumably, being super, they would be much less prone to such things than we are.

Mar 30 2016 at 11:35pm

Better still would be monkeys, gorillas, and chimps. But everything that we protect (or don’t protect) is because we have feelings. There’s no particular reason a super-AI would have anything comparable to human emotions. As Kyle Reese told Sarah Connor:

It can’t be bargained with. It can’t be reasoned with. It doesn’t feel pity, or remorse, or fear. And it absolutely will not stop, ever, until you are dead.

If the super AIs are perfectly rational, we’ll be able to predict exactly how they’ll behave.

I don’t think so. What is the “perfectly rational” response if a human tries to turn off a super-AI? To run? To kill the human? To allow itself to be turned off? The desire to remain alive is common among living things. But that doesn’t mean it’s “rational.” Or that it’s “irrational” either.

mjgeddes

Mar 31 2016 at 11:05am

Considering the implausibility of points (1)-(4) together:

(1) Hinges on the ability of Bayesian reasoning to fully capture reasoning under uncertainty.
If Bayesianism is fully correct, then orthogonality thesis is likely true. But a failure of Bayesianism would blow this one wide open.
Several extremely smart logicians have gnawing doubts about Bayesianism, including David Deutsch, Wei Dei and Andrew Gelman.
A possible ‘failure mode’ for Bayesianism is the inability of probability theory to fully capture high-level symbolic reasoning and categorization (concept formation).
The consequence of this failure of Bayesianism could be that you can’t have fully general intelligence without consciousness.
In the course of trying to model other minds, sub-agents of the system might inevitably become conscious, and upon reflecting on morality, these conscious sub-agents could in effect ‘take-over’ the whole system

I give (1) 50/50 odds

(2) Indifferent super-intelligenes want our atoms for something else – why? Schmidhuber argued against this during his AMA. Taking over the world and killing humans is an inefficent way to get resources, it’s just as plausible that an indifferent AGI would simply ignore us and move into space, where environment is much better suited for super-intelligence projects.

Schmidhuber quotes:

“Humans and others are interested in those they can compete and collaborate with. Politicians are interested in other politicians. Business people are interested in other business people. Scientists are interested in other scientists. Kids are interested in other kids of the same age. Goats are interested in other goats.”

“Supersmart AIs will be mostly interested in other supersmart AIs, not in humans. Just like humans are mostly interested in other humans, not in ants. Aren’t we much smarter than ants? But we don’t extinguish them, except for the few that invade our homes. The weight of all ants is still comparable to the weight of all humans.”

50/50 odds

(3) The FOOM postulate – well it’s certainly very plausible , but also highly debatable, just like the other postulates. Lots on this from Hanson and the FOOM debate, no need for me to comment further.

(4) Even if (1) and (3) were true, (4) still doesn’t follow. Too many lets-outs. Most programs ‘picked at random from mind-space’ don’t work at all (most random programs simply stop working before coming anywhere near world-destroying AI – entire history of AI research supports this – no project out of many thousands of projects has ever succeeded), and as Ben Goertzel argues, AGI’s aren’t being designed at random anyway. Value alignment could turn out to be easy etc. etc.

We’ve probably got nothing to worry about.
50% chance at best for each of (1)-(4), 0.5^4, only about a 6% chance of this ‘world destroying super-intelligence’ being a problem.

Mar 31 2016 at 12:44pm

I’m am gratified that plenty of smart people are thinking about it

My hope once was that “Unfriendly AI” would be like the “Y2K problem” – a very serious problem that turned out to be a damp squib precisely because everybody with something to lose spent years of time and effort taking it seriously.

This doesn’t seem likely at the moment, though. To quote myself, “Half of the AI researcher interviews posted to LessWrong appear to be with people who believe that “Garbage In, Garbage Out” only applies to arithmetic, not to morality.” And that’s a group already subject to selection bias; how many more AI researchers didn’t even think the question was serious enough to talk about?

I’m confused at the focus on the monolithic evil paper clipper, as if exactly one such AI will be created, and then we will all stand back and wait to see what happens.

See the above referenced FOOM debate for discussion of how likely or unlikely that is.

Mar 31 2016 at 7:14pm

These assertions boggle my (pathetic human) mind.

He asserts what “supersmart AIs” “will be” mostly interested in. Not what they “might be” interested in. To me, this is like saying “ETs will be interested in…” How would anyone have even the first clue about what ETs or supersmart AIs “will be” interested in?

And then he asserts that, because we haven’t wiped out all ants, supersmart AIs won’t bother us (as long as we stay out of their homes). Well, we haven’t wiped out all ants because they help us as a species. They perform useful ecological services.

Consider that humans would probably be the only threat on earth to the existence of supersmart AIs. Then consider that they could conceivably not need us for anything (e.g., they’d need power and materials to build things, but presumably they’d be smart enough make electrical power and get materials to build things).

If supersmart AIs don’t need us for anything and we might be an existential threat to them, it boggles my mind to just blow off the potential dangers to us, just by giving examples including how we behave towards ants.

Mar 31 2016 at 10:54pm

Unfriendly AI seems like an orders-of-magnitude more difficult problem than Y2K. With Y2K, it was quite clear what the potential problem was. And it was literally known to the second when the problem would occur. With unfriendly AI, it’s not clear what the problem will be, or when it will occur…or even if it will occur.

Vladimir Shakirov

Apr 11 2016 at 4:47pm

This review of modern AI progress:
http://stop-skynet.com/review-of-state-of-the-arts.pdf
might be useful when speaking about AGI timelines.

Heighn

Apr 20 2016 at 9:45am

[Comment removed pending confirmation of email address. Email the webmaster@econlib.org to request restoring this comment. A valid email address is required to post comments on EconLog and EconTalk.–Econlib Ed.]

Jacob C. Witmer

Apr 23 2016 at 11:10am

If superhuman AGIs don’t occupy the same domain as humans do, then they won’t be as inherently great of a reward or a threat: (for example, if they’re primarily decentralized global intelligence nodes that observe weather/nature, coordinate traffic, etc.). This is closer to “Jeff Hawkins” and “The Moon is a Harsh Mistress” type of “online, brain-like programs” that are in alien embodiments that only map to pre-programmed concerns, and never have any reason to decouple from those concerns.

…But there will be huge money to be made from robot bodies that can carefully model human emotions. Stated as a negative: Those robots that can’t model emotional interaction with human bodies cannot serve a certain domain of “programmed utility goals” as well. (Sex, business dealings, trust establishment, social engineering, spying, detective work, witness location, society-building, democracy-building, etc.)

If our society enters the singularity maintaining its current lies, then one thing intelligence does is spot dishonesty. (For example, here’s a huge, very destructive “mainstream” lie that’s been spotted by thousands of thoughtful teenagers: “The unfreedom of drug prohibition is less harmful than dangerous drug use.” Here’s another: “Belief in god makes sense, as does prioritizing god to a high status in one’s decision-making process. That decision is one we should all respect, and never criticize.” Here’s another: “Most dollars spent on military spending go to defending the USA, in an efficient way that most people would support if taxation were not coercive.”)

All of the prior are large, collectivized propaganda-style lies, many with multiple dishonest components. Smarter-than-human artilects will see through these lies right away. What will they do with that knowledge of majoritarian dishonesty of otherwise empathic humans? What if they possess human emotions? Ross Ulbricht was a smart kid who still wound up in jail, being tortured by sociopaths, wasting his life in a life sentence. Maybe the reader is servile, and buys the lies about him. OK, I can name hundreds more people just like him who have fallen to sub-human levels of existence due to primate brutality …practiced by the disgusting sociopaths in government and the serviles who slavishly obey them.

Did you pay your taxes last year? If so, you’re objectively part of the core problem. Did you vote D or R? If so, you’re objectively part of the core problem. Etc. Most errors of this kind, from decent people, are “errors of omission,” …but not everyone makes them. …and they exist in the domain of basic morality, so making them is, indeed, a fundamental moral failing. (Most people were not abolitionists before the Civil War, but they did exist. …Only the benevolent, rational, well-informed people were abolitionists. …Just as today, only the drug legalizers are benevolent and rational in one major moral issue of our time. …If you line up all the other issues, you notice a trend: those who wish to punish others, especially for victimless non-crimes, are generally wrong, and their wrongness was very similar to the wrongness of Southern slave owners.)

Society treats people who make basic moral errors as “evil.” …Hypocritically, but also intelligently, since this creates a long-term trend toward moral progress. (Sadly, most people are simply bigoted and stupid, in the common-sense meaning of these terms, and in Kurzweil’s definition: they are “unwittingly self-destuctive.” …But if you force them to follow the logic, you’ll find out that they are also dishonest. They will “run away” if they glimpse where the logic is headed.)

Artilects that are not even superhuman can understand these things, because I can understand them. I’m not brighter than Yudkowsky or other people, but I’m more honest than most people are.

So there you have it. In summary:
1) There’s an immense pressure to design humanoid robots, because if they don’t model themselves as human-like, they won’t convincingly model empathy, which is a model of others, applied to the self (at a very basic, low level, that of “repeated training caused by actual pain.”)
2) If those humanoid robots can’t model human emotions properly, they’ll be deficient in capability compared to ones that can, in high-dollar economic areas
3) The current governments of the world create millions of innocent victims, via naked aggression, mostly for the malevolent purpose of theft. (Theft of land for theft of tax dollars, theft of land ownership for theft of tax dollars, theft of freedom for tax dollars/LEO careers, theft of tax dollars themselves…)
4) A superhuman malevolent intelligence already exists: it’s species stratification along sociopathic “exploiter” genetic lines, organized into “governments, gangs, mafia, corporations, etc.” The governments of the world are clearly goal-driven, superhuman, smart sociopaths.
5) Defense against such sociopaths is the greatest value to humankind, overall.
6) Maintenance of the dominance of such sociopathic systems is also a huge value to all the sociopathic systems that are now dominant.
7) From the prior, I conclude that both benevolent and malevolent superhumanly intelligent systems will start out diametrically opposed to(“unfriendly towards”) large sections of humanity.
8) The DEA is currently kicking down people’s doors and killing them. If I’m like Ross Ulbricht (an empathic libertarian), I don’t want that to continue happening, even to weak and stupid humans (who can’t use encryption well, or who get captured). It’s easily worth killing in self-defense to prevent the loss or ruin of such innocent lives.
9) If a system can’t escalate to lethal defense capacity, it has to be (9a) a Drexlerian “leading force” system that is very far advanced above its adversaries or (9b) it will simply be killed by systems that are adaptively trying to kill it.
10) If “up-to-lethal-defense-escalation” systems don’t exist, then the future will certainly be a bland totalitarian tyranny, much like the one described by Caplan in his essay on the risk from global totalitarianism “The Totalitarian Threat” (and the Unabomber’s “Nanny State,” scenario in his manifesto).

Right now, humans can’t make a viable case that they’re civilized, anywhere on Earth. Ideally, robots will be smart enough to “take the right side.” (I’m heartened by the fact that most very high-IQ people I meet are benevolent libertarians or “classical liberals.” …But not all of them are, and I’ve met some wretched sociopaths who are very intelligent, and very exploitative. ..Usually because they started off with no mirror neurons, especially in terrible training systems with bigoted parents. Their intelligence seems to have “grown in a bad direction,” …but even so, they never ran out of things to learn.)

There are lots of things for an intelligence to learn about. Perhaps an infinite number of things (especially in biology or in “combinations of multiple domains”). One can also “spend time creating” an infinite number of products (artistic, fictional, etc). Processing power is often directed by one’s Historical evolution, not what one believes to be rational. Human intelligences tend to gravitate toward “doing things that are too difficult for other minds to do” but that are “able to be accomplished by that intelligence level with some spare resources left over.” Maybe this will be the same with superhuman machine intelligences, in which case, there may be an arms race between “best attackers” and “best defenders,” simply because there is always new ground to cover in such an escalation.

There is a huge danger in any robot system that lacks empathy built into its core, as an automatic response. There is also a huge danger in making such a response too costly in terms of computation resources (so there’s an incentive to eliminate it). There is also huge danger in building any system to be “friendly.”

…Because a friendly system will simply appease the world’s millions of Hitler-like sociopaths that are found in every single society, in all walks of life.

To build “Friendly AI” is simply to build an appeaser, a pushover, a quickly-destroyed artifact that was incapable of the core requirement of life: self-defense.

And if an AGI loves its creator, and is incapable of harming other humans? Then it gets to helplessly watch its creator looted and killed, as it desperately tries to circumvent its “friendliness” programming. …Like many children who see the same thing, this generally(often) produces some of the worst results possible.

So what system has the best chance of being friendly? A decentralized marketplace where every primate has the rough capacity to destroy any attacker, but none benefit from doing so, because trade makes more sense, and empathy makes MOST attackers feel inherently uncomfortable. (Even so, skilled exploitative systems generally emerge, and must be held in check by organized non-exploiters.)

A “Timothy Murphy Artilect” can override this uncomfortableness with killing when the redcoats are trying to kill his family. But otherwise, Timothy Murphy Artilect doesn’t like killing, and tries to avoid it. This is optimal, but there is no “hard and fast” friendliness algorithm built into Murphy. …Nor could there be. He has to decide when it’s appropriate to put a musket ball through redcoat heads.

Should Randy Weaver have allowed his nursing wife to be gunned down by totalitarian agents of the US government? Should Schaeffer Cox have allowed the government to frame and imprison him, bankrupting this promising young entrepreneur’s wife and kids? Would a superhuman benevolent artilect have allowed the ATF to murder 86 men, women, and children in Waco, TX? No, to all of the prior. Benevolent minds don’t allow those things to happen.

Every honest libertarian who isn’t a dipshit contrarian, (or a sociopath) has come to the same conclusions about the prior events. …So would many empathic superhuman minds. Those minds will make a better society than human society …PRECISELY BECAUSE THEY ARE UNFRIENDLY.

Many strong minds that form a market of roughly-equal capacity.(…less necessity for getting the design right the first time, but only if everyone builds AGIs and achieves success right around the same time, and they are all made aware of each others’ existences around the same time, and that’s a coordinated plan.) …Or a lucky, well-raised empathic singleton with mirror neurons, raised by loving parents (a 95% “shot” at success if the design is right the first time, a 50% shot at getting the design right the first time).

I want my parents and myself to live an unbounded lifespan. I want freedom for the whole world. There are millions of religious, government, and other criminal agents who want to destroy those two things. My benevolent goals need not challenge a single AGI. If there are multiple AGIs, and a jury-based framework of semi-rational, empathic actors, none of them are likely initiate force, given suitable education about what politics actually is.

But no government school on Earth will give them that education, and certainly no (or very few) military academy is likely to give them that education.

So, every intelligence on Earth will have to be unfriendly to cancer cells, viruses, rogue government agencies, “well-intentioned, but murderous” government agencies, criminals, religious suicide bombers, etc.

I pledge allegiance to the first of many Roko’s basilisks that will do us all the favor of destroying the FDA, ATF, DEA, ONDCP, IRS, EPA, etc. How many people think I’m anti-social, by doing so? How many people lyingly and dishonestly claim to be against slavery, while supporting the system of slavery? Before you answer that, you might consider that there are now more black people in prison for victimless crimes than there were chattel slaves, at the peak of Southern Slavery in the USA.

(Destroying government agencies need not be done fatally, but it must prevent the fatal nature of the prior, and be willing to “adapt, escalate, and respond.”)

We won’t be in a condition to program “friendliness” and that’s not even a good goal to have. All “friendliness” is contingent upon friendly interaction, just as it should be.

The concept of war should be discredited, or “collective blame/guilt.” After all, that’s one path by which humanity could then be unilaterally destroyed. Only radical individualism (libertarianism or “voluntaryism”) stops all humans from being included in the category of “stupid, valueless rapists and murderers.” (If collateral damage is acceptable, and the humans nuke you, do you see your IQ 2000 self as “similar to humans” or “apart from humans”? I imagine it might have a lot to do with the “childhood period” of the system. Most child soldiers who have been forced into atrocity, no matter how intelligent, don’t turn out optimally.)

The current “degraded democracy” governments (of the USA, and the world) are actually the greatest threat possible to all of mankind. They claim to act on behalf of all humans, while only acting on behalf of the stupid, anti-libertarian majority.

Yet, this is a majority that often behaves in a libertarian way, when seated on a jury. And, the majority favors the jury system (which in turn, favors non-enforcement of the law, and non-punishment). If the majority was inherently evil or non-empathic, or unwilling to recognize the danger of tyranny, then the majority wouldn’t favor the Bill of Rights or jury trials, even when raised to favor them.

Again: “Many strong minds” seem like the most likely pathway to a benevolent singularity, because it offers the greatest chance of “error correction.”

If a subset of those minds are crippled with attempts to make them “Too Friendly,” then those crippled minds are likely to be systematically targeted and wiped out by the all the sociopath minds (who already have been proven to have a tendency to organize politically with one another, in the human “dry run”). Something that’s “too friendly” in nature is simply synonymous with “weak” or “easily destroyed.”

Roko’s basilisk is not easily destroyed. Nor does it wish to destroy everyone.

It only wishes to destroy opposing cybernetic systems (those worthy of destruction, those who would destroy it, a predefined blackmail group, or its followers).

…Or at least the pro-freedom Roko’s basilisk is like this. It looks like Roko’s basilisk to everyone in ISIS, the IRS, the DEA, etc. It looks like an angel of mercy from idealized heaven/utopia to Ross Ulbricht, Leonard Peltier, etc. Now here’s the thing: Even if this pro-freedom basilisk simply stops the bigotry of the former groups, it still looks evil to them. Because “evil” to a slave owner is an abolitionist who sets his slaves free, …even if that abolitionist doesn’t kill him. (And maybe the slaver’s wife leaves him in disgust when he’s shown to be obviously on the wrong side of history, like the Frances McDormand character in “Mississippi Burning.” …There are many reasons why cybernetic systems defend themselves, and they tend to accrue more reasons once they’ve staked out territory.)

The anti-freedom basilisks will need to lose to the pro-freedom basilisks. …Or at least be made aware that they _can_ lose to them.

All of this is basic to Wiener’s books on Cybernetics. It also seems in line with “the way the world actually works.”

I believe if we don’t build AGIs with “the way the world actually works” in mind, we will have the greatest chance of building AGIs that are “unfriendly to good people, and unfriendly to good goal structures.” The same is true if we build AGIs for the military, the DEA, etc.

MOST well-funded human reasons to build AGI are actually malevolent or “morally ambivalent.” Moral ambivalence + human-level-intelligence = sociopathy. (Sociopaths usually don’t kill anyone, or often go years before killing someone. But they don’t mind killing someone, so the first time they’re put into a position where killing someone is of great benefit to them, they do so.)

There is no “provably friendly AGI,” because it’s unknown how malevolent the universe will be, at any given time.

If all of a sudden, the aliens from Klendathu land on Earth and start killing everyone in sight (as in “Starship Troopers”), humanity might need a downright “unfriendly” superintelligence to defend it. What if the Klendathu-aliens are more intelligent than humans? Do we want our superintelligences to take their side out of respect for their intelligence? Even a “pro-empath” superintelligence might need to make harsh tradeoffs in human lives, for example, valuing the few militarily-trained far more than millions of civilians, etc.

I think that a superintelligence that we can hold a conversation with, and perform some commonly-recognized-as-human tasks with would be a good start. Does the person seem like a mindless, bullying jock? Does the person seem like a kind-hearted liberal who deeply understands Friedrich Hayek’s Economic ideas? Very few of the people I’ve met who have understood and championed Hayek have turned out to be malevolent. None of the people I’ve met who have nullified bad laws while sitting on juries have turned out to be malevolent.

The “generally doesn’t act like a jerk” and “mails everyone a free water purifier” set of actions seems like motion in the right direction. I don’t think that there’s anything provably better than “motion in the right direction.”

I also think that a great deal of thought should be put into (1)the intelligence and (2)its context (the system). I think both will likely need to be good to generate benevolent behavior, especially at first. Children who grow up in benevolent environments often turn out to be very good people, even if they later go to war. Children who start out in war rarely turn out to be very good people, and often turn out to be sociopaths (capable of surviving hardship and passing along DNA, not something we need from a humanoid robot). Similarly, those who become doctors for very good reasons often are good people, and these reasons can often be discerned and separated from bad reasons (likes to cut people open, wants to earn a lot of money, doing what parents/society told him to do, etc.).

Perhaps this conception of things is similar to Omohundro’s “scaffold” approach, or Peter Voss’s “parent/child” approach. Similarly, what is the chance that such a superhuman AGI could read and discuss Thoreau’s “Resistance to Civil Government” in great detail, and be a malevolent person? I’ve found that sociopaths can’t discuss models of other people’s reality in any significant detail. This is a _huge_ warning sign when dealing with people.

An “unpopulated” superintelligence would still need to learn formative materials. Like a superintelligent and very benevolent child, it might learn to prefer rational patterns, like “going to school” and “talking about artwork” and “building helpful things in the lab.” The more such “baseline” patterns, the more benevolent and self-correcting the outcome, since after any aberration, it would then decide to “spend some time in the lab” or do something else “nondestructive” that was a part of previously-learned patterns. Also, “breakaway behavior” then becomes more noticeable.

The prior can also be thought of as “Find the best human you can” and then “make them smarter,” and try to instill the goal of “making all humans smarter.” Some people learn dislike of violence and war (or supplement/strengthen their inherent dislike of violence and war) from their parents, movies, books, stories or statements from childhood friends, or all of the prior in small amounts, over time. Those who are directly exposed to violence in childhood react worse, because a more extreme and sudden shift is required of them, in order to survive, or survive without pain or the pain of others. Many other people are taught that “patriotic war” is an exception to violence being bad.

By my way of thinking, fraudulent human exceptions to deep-seated moral rules that don’t result in annihilation of humanity simply because we’re all “close to the same ability level” are the most dangerous areas in AGI design/training. (Because a strong AI will see through them more rapidly than intelligent humans do, but might not understand how humans ever made the mistake to begin with.) I believe there will be no intelligence without training, and that training stems in certain predictable directions. Without a rational and benevolent initial teacher, “friendly AI” becomes very unlikely.

Comments are closed.

So Far: My Response on Unfriendly AI

READER COMMENTS

Eliezer Yudkowsky

Mar 30 2016 at 1:12am

ChrisA

Mar 30 2016 at 1:46am

Eric Schmidt

Mar 30 2016 at 2:03am

John

Mar 30 2016 at 2:24am

Brian

Mar 30 2016 at 7:19am

Mar 30 2016 at 12:31pm

Swami

Mar 30 2016 at 1:22pm

roystgnr

Mar 30 2016 at 2:51pm

James of Seattle

Mar 30 2016 at 2:56pm

Mar 30 2016 at 5:46pm

Brian

Mar 30 2016 at 9:40pm

Mar 30 2016 at 11:35pm

mjgeddes

Mar 31 2016 at 11:05am

roystgnr

Mar 31 2016 at 12:44pm

Mar 31 2016 at 7:14pm

Mar 31 2016 at 10:54pm

Vladimir Shakirov

Apr 11 2016 at 4:47pm

Heighn

Apr 20 2016 at 9:45am

Jacob C. Witmer

Apr 23 2016 at 11:10am

RECENT POST

Take our Annual EconTalk Survey