Why care about AI Safety?

In recent years, we’ve seen AI exceed our expectations in a wide variety of domains — including playing Go, composing human-like text, writing code, and modeling protein folding. It may not be long until we create AI systems that are much more capable than humans at solving most cognitive problems.

Such powerful systems could bring great benefits, but if their goals don’t line up with human values, they could also cause unprecedented disasters, and even human extinction.

Rapid progress in the capabilities of current AI systems has pushed the topic of existential risk  from AI into the mainstream. The abilities that GPT-4 and other recent systems display used to seem out of reach in the foreseeable future. The leading AI labs today are aiming to create “artificial general intelligence” in the not-too-distant future, and many top researchers are warning about its dangers.

Even when AI becomes as smart as humans in most domains, there’s no known impediment to it continuing to get smarter: just as current AI vastly outperforms us at arithmetic, future AI will vastly outperform us in science, technology, economic competition, and strategy. When AI becomes capable of replacing humans for most of the work involved in AI research, this will accelerate such research, potentially resulting in a “superintelligence” in a short time.

A superintelligent AI could be incredibly useful in the quest for human flourishing, if its actions are in line with human values. But it’s not guaranteed that they will be. A central concern of AI safety is making sure that AI systems try to do what we want, and that they keep doing so even if their circumstances change fundamentally – for example, if their cognitive capabilities exceed those of humans. This is called the “AI alignment problem”, and it’s widely regarded as unsolved and difficult.

AI alignment researchers haven’t figured out how to take an objective and ensure that a powerful AI will reliably pursue that exact objective. The way the most capable systems are trained today makes it hard to understand how they even work. The research community has been working on these problems, trying to invent techniques and concepts for building safe systems.

It’s unclear whether these problems can be solved before a misaligned system causes an irreversible catastrophe. However, success becomes more likely if more people make well-informed efforts to help. We made this site to help people understand the challenges at hand and the solutions being worked on. The related questions below are a good place to start learning more, or you can enter your questions into the search bar.