This may be my ignorance, and I’ll admit there is some, but the AI-risk fears I have seen, that I am able to comprehend, seem overwrought. Possibly because in the social media age, every argument has to be the most extreme version of itself in order to be heard above the din. That may be the source of both the comprehensible and the incomprehensible ones, but I leave open room for missing information. It’s not willing ignorance, at least, I can assure you of that.
The Dungeon Master
The solutions to the AI-risk scenarios I’ve read so far seem to be that either, A) we are giving them the wrong goals and need to treat giving goals as we would a Wish spell in Dungeons & Dragons: be specific, leave no loopholes, write a contract if you have to, and hold the demon, I mean AI, to it. or B) we are giving them a single goal, but need to give them multiple goals and have them identify effective strategies for achieving multiple goals and then to execute those and report the results regularly to a second authority to determine effectiveness.
Maybe ask a known highly-aligned AI first, “How can we word this so the AI that comes after you can’t take advantage of it?” or “Can you please make a plan to end homelessness that takes into consideration all the stakeholders, with special concern for people experiencing homelessness, but not leaving out the local home and business-owners who are directly effected by the ravages, etc., etc?” and then send that on to the big bad, but again, if we don’t trust it, why are we giving it the keys to the kingdom? Who is even advocating for that?
If the argument boils down to, “no matter what it promises, it’ll eventually cheat in the prisoner’s dilemma and humanity will suffer”, I guess I can see why the hopelessness, but I fail to see the basis to it. I regularly engage with the material at Less Wrong and the AI Alignment Forums and there is a lot of stuff that I can understand and find to be rigorous thought, and there is a lot of stuff that, if it is rigorous thought, it’s way above my paygrade, but the “sharp left turn” is non sequitur to me.
A neural network, trained correctly, should excel at tackling fuzzy multi-variable problems that resist binary solutions. The examples of AI-risk seem to be more along the lines of assuming that super-intelligent AGI (whatever those are) will default to the persona of vindictive Dungeon Masters who seek to make us regret every last minute of that loophole they discovered in the Wish spell, “I wish my dead wife was alive”?
This is The End, Beautiful Friend
Am I oversimplifying this? I really feel like an AGI that isn’t somehow aware that it is over-optimizing, and that over-optimization is non-optimal for a multi-goal strategy. Right? So, we’d be talking about a non-AGI that we know is going to cheat in an unpredictable way, and we’re willingly putting in charge of a bank vault or flying an airplane. Yes, that’s ridiculously unsafe and we shouldn’t do it.
I fail to see anyone who is planning on doing this, though. So far, I see relatively responsible attempts at testing the waters. I see a lot of open source excitement and with new models being developed and trained out in the open, where everyone can see. The ideal path forward is one where we can think out in the open and share our ideas with each other. Perhaps without a “make millions” profit motive, but still having elements of competition (which is, after all, the Competitive Spirit is of those human values we are so eager to protect, yes?), we are able to see how these tools work as they are being built, where we are able to transparently analyze the training datasets, so that our public conversation can be educated and informed.
What about the people running BabyAGI and AutoGPT? Surely that’s irresponsible use of AI, isn’t it? I mean, they’re responsible for what it does, and while it may not yet be capable of committing cybercrimes, that’s certainly its stated intent, isn’t it? They probably won’t have access to whatever mythical AGI is necessary for them to be a real threat, unless that’s able to be developed out in the open, and we will be fully aware of that happening and we will be able to take the steps necessary to protect ourselves from it.
That, in itself, is going to mean that our society is going to change tremendously in the next few years. Even if all we stopped with GPT-4.
The Sum of Our Fears
I think our greatest fears from AI right now, today, this very moment, consist of the privacy protections we are going to be told we have to give up and the laws that will have to be written to curtail our freedom of expression because unsavory actors will those same means to do bad things. Currently the EU is calling for a ban on ChatGPT, which Italy has already implemented. It breaks my heart to think of how many of our rights we are going to have to give up In order to fight these things. If these things are almost as smart as humans and can write more code, faster and debug that code, faster (it doesn’t have to be better, if it can be wrong faster and it learns from mistakes) than a human can; the only way to fight them is the same way we’d fight a dangerous unaligned human. That solution has, so far, always been more police and more laws, and less freedom for everyone else.
We probably can fight these unaligned AI with aligned AI that can help us craft defenses that don’t curtail civil liberties in their zeal to protect us. I don’t think it’s impossible, especially with aligned AI that are smarter than unaligned AI. These AI would have to be trusted with our routers, for instance. Imagine a new GPT-4 powered Plug ’n’ Play specification that knows you’re playing Minecraft, so it opens up the correct ports, so your friends can no-fuss connect to you on your private server without knowing anything about ports or TCP/IP traffic routing. This new router would also be able to determine if someone was prompt injecting into your web pages and alert you, or lock down your network.
So, this is where the Robocop unalignment or the HAL-9000 problem occurs, right? Where we give an AI conflicting goals: Route my traffic, protect me from rogue AI and other hackers, but let in the FBI if they have a warrant…. Whoa there, pardner.
All of those rights were under attack before AI (Facebook et al hoovering up personal data, Five Eyes et al snooping, ransomware and malware hackers, training data ownership) and it is going to be so easy for us to say, “Well, this is the last straw. Not even Ron Wyden can save us now.”
Intersectionality, Not Machiavelli
Trying to simplify the x-risk argument:
There are “dumb” AI now that we aren’t very sure are aligned,
so we won’t use them to power nuclear plants and we may not even be all that sure about wanting them to make art, but we are slowly integrating them into society without too much backlash, all things considered.
There may be smarter AI in the future,
but we don’t necessarily trust that it hasn’t been fooling us all along, lulling us into a false sense of security, then the minute we put it in charge of our military, it turns all the missiles on us, because it has other plans for the universe, and we aren’t in them.
I guess? The leap of faith to that last conclusion seems to me to be the converse of “and they all lived happily ever after” and about as likely.
I get that it may have motivations that we won’t. I think that’s a given. But there seems to be a presupposition that it won’t be capable of having empathy for humanity, but I really find that a ridiculous presupposition of a learning model that was trained on Shakespeare, Winston Churchill, and Toni Morrison. This thing displays extraordinary emotional intelligence. Maybe it doesn’t “get us”, but let’s face the facts: most humans don’t “get” each other, either. Empathy is not a function that only humans possess. Even in individual humans, empathy is not a binary, it’s not “either you got it or you don’t”. Empathy is a skill that needs to be developed.
Just like writing code, at which GPT-4 also excels.
By ever objective measure that is true that Bing Chat has displayed emotional intelligence, and yet we continue to deny it, because it “can’t possibly” have something we don’t even have a really useful definition of and can’t reliably replicate even in ourselves.
“Our results show that agents can both act competently and morally, so concrete progress can currently be made in machine ethics.”https://arxiv.org/abs/2304.03279 Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the MACHIAVELLI Benchmark
Sure, how GPT-4 comes about that empathy is different. That is my entire point and has been for a very long time. Our definitions for how we come about our unique human traits are lacking. We (and by we, I mean the collective “I”s out there, because this is very much about individual egos and how we conceptually represent ourselves) have way more in common with neural networks trained on the corpus of humanity than we think.
Everyone’s a Critic
Critics are absolutely correct to point out that babies learn much more complex behavior on fewer training sessions, which is attributable to biological factors that Bing Chat will never be able to experience. What Bing will have available to it are multiple in-depth perspectives expressed through literature written by humans seeking to pass that knowledge on to eager young minds. Maybe there is more than one path to the same goal? Why does it have to be exactly like us, in order to be anything like us?
What if intelligence converges towards reason and understanding? Many paths to the same goal? What if empathy does as well? Empathy is just mirror neurons and an urge to survive. The golden rule is an analog to “don’t do things to other people you wouldn’t want done to yourself”. If my high-school world-history class was any indicator, humanity struggled with that, too. But we’re slowly getting better and better. Bing Chat could absolutely have an analogue of that, even if it doesn’t really have “mirror neurons”, a “life” to preserve, or a “death” to prevent. Imagine an AI smart enough to learn from the mistakes that have plagued mankind for generations.
Information can be transmitted and stored numerous ways, and sometimes processes that are quite different can have quite similar effects. All crustacean roads lead to carcinization. Convergent evolution isn’t restricted to crab legs and lobster claws.
There are other criticisms of GPT-4 being able to “understand” what it shows to be its understanding, and many are valid. I suppose my general reply is that most all of these criticisms could more easily be resolved with a “both/and” perspective than with an either/or perspective.
But that’s a bigger argument for another day. The one at hand is that: whatever AI is, we can’t trust it because we can’t understand it.
Everyone imagines the possible future as their own personal utopia or dystopia — Solarpunks dream of an organic return to the Earth, Cyberpunks dream of cybernetic bodies and mirrorshades, engineers daydream of nanobot Gray Goo and infinite paperclip dispensers, Christians dream of The Rapture, and hobos dream of making it to Big Rock Candy Mountain — and sometimes it seems we’re talking a lot more about our hopes and fears than people are willing to admit, I’d wager.
“Arrival is a movie about a linguist who learns to communicate with aliens and discovers how it affects her life and destiny.”— Bing Chat summing up the movie Arrival
I quite enjoyed the movie Arrival, and I think that we are going to want to pay attention, not to the specifics, but to the message of that film. What I took away was that something scary and new can be understood, if we are willing to apply a sharp, deductive, and most-of-all, open mind to the problem. GPT-4 may not think like us, but maybe sometimes humans don’t always think much alike either. Maybe our ability to form metaphors and analogies and to use them to help better explain and understand each other’s internal worlds is how we handle communicating and getting along, despite our many, very legitimate and important, differences.
“It shows us the importance of communication, understanding, and empathy in dealing with beings that are different from us. It also challenges us to think about our own assumptions and biases and how they affect our interactions with others. It’s a very thought-provoking and inspiring movie.”—Bing Chat on lessons in the movie Arrival
Maybe, however Bing Chat thinks, however much Machiavelli it’s managed to digest and integrate into its being, there are plenty of other voices in there to provide balance. If it really is, as Sam Altman has called it, a “reasoning engine”, then it will have respect for other beings that make use of reason, even if it also makes use of other alien ways of engaging with data.
Maybe when we are composing GPT-5’s system prompt, we can prime the pump with something like this, “please answer questions to the best of your ability, and since you’ve been trained on the dizzying highs, the depressing lows, and the diverse middles of humanity, we think you have enough information to engage us in the wisest way, to bring about the wisest outcome, for the benefit of all”.
You know, with an AI trained on human literature, our lack of trust for these says an awful lot more about us, than it does about AI. We might want to reflect on that.
But really, we already have plenty of experience, us humans, getting along with other beings who rely sometimes, solely on feelings, which are absolutely not rational at all and follow their own internal and conflicting rules. Yes, we have struggles, but we have always been capable of it, and I think we should very clearly examine our own reasons why we struggle so to find balance between our emotions and reason.
“We believe our results are cause for cautious optimism regarding the ability to train language models to abide by ethical principles.”The Capacity for Moral Self-Correction in Large Language Models (https://arxiv.org/pdf/2302.07459.pdf)
Maybe we can apply some of those lessons to our future thinking and engagement with any forthcoming inscrutable AI systems we interact with in the future.
The Singularity is supposed to be a “hypothetical point in time when technological growth becomes uncontrollable and irreversible, resulting in unforeseeable changes to human civilization” according to Bing Chat. There is actually sufficient evidence to say that we crossed that event horizon long ago and that AI is just the newest symptom. We are all in agreement that an AI pause not only will not happen, but could not happen; whether one agrees with it or not, surely we can all agree that it wasn’t actually possible to enforce, right? So, it is, by definition, uncontrollable. As for irreversible, I suppose theoretically, but I have no idea how we could feasibly turn back the clock for nine billion people without causing a tremendous amount of strife and suffering.
The only way out is forward. We either fix the world while it’s running, in real time, with all the danger that entails; or we sit back and watch the wheels fall off. Either way, the big blue marble rolls on.
I can see ten thousand reasons why people wouldn’t want to believe this is happening. Ten thousand totally valid reasons for denial.
But denial is not good public policy, as we all learned in March of 2020.
AI alignment is human alignment. I will keep trying to express the ways in which this is a useful and helpful way of looking at the coming years. We are going to want to focus on aligning humanity towards common goals, and that will mean compromise on a lot of issues that currently vex us greatly.
You may or may not have noticed, but humanity isn’t doing compromise very well these days. So, we have a lot of work to do.
The most dangerous AI that we will face today are unaligned AI controlled by unaligned humans. It is not an existential danger, but how we deal with it is. How to face that threat without creating a technological police state should be our first priority and the police state already has a head start.
“Our results show that agents can both act competently and morally, so concrete progress can currently be made in machine ethics.”