Anthropic 联合创始人：AGI 预测、离开 OpenAI，以及让他夜不能寐的事 | Ben Mann

Benjamin Mann 2025-07-20

Anthropic co-founder: AGI predictions, leaving OpenAI, what keeps him up at night | Ben Mann

Intro and Episode Teaser

Lenny Rachitsky: You wrote somewhere that creating powerful AI might be the last invention humanity ever needs to make. How much time do we have, Ben?

Benjamin Mann: I think 50th percentile chance of hitting some kind of superintelligence is now like 2028.

Introducing the Guest

Lenny Rachitsky: What is it that you saw at OpenAI? What’d you experience there that made you feel like, okay, we got to go do our own thing?

Benjamin Mann: We felt like safety wasn’t the top priority there. The case for safety has gotten a lot more concrete, so superintelligence is a lot about how do we keep God in a box and not let the God out?

Meta’s AI Talent War

Lenny Rachitsky: What are the odds that we align AI correctly?

Scaling Laws and Bottleneck Narratives

Benjamin Mann: Once we get to superintelligence, it will be too late to align the models. My best granularity forecast for could we have an X-risk or extremely bad outcome is somewhere between 0 and 10%.

Defining Transformative AI

Lenny Rachitsky: Something that’s in the news right now is this whole Zuck coming after all the top AI researchers,

Benjamin Mann: We’ve been much less affected because people here, they get these offers and then they say, well, of course I’m not going to leave because my best case scenario at Meta is that we make money and my best case scenario at Anthropic is we affect the future of humanity.

AI’s Impact on Jobs

Lenny Rachitsky: Dario, your CEO recently talked about how unemployment might go up to something like 20%.

Current Real-World Impacts

Benjamin Mann: If you just think about 20 years in the future where we’re way past the singularity, it’s hard for me to imagine that even capitalism will look at all like it looks today.

Staying Competitive in the AI Era

Lenny Rachitsky: Do you have any advice for folks that want to try to get ahead of this?

Embracing New Tools Boldly

Benjamin Mann: I’m not immune to job replacement either. At some point it’s coming for all of us.

Lenny Rachitsky: Today, my guest is Benjamin Mann. Holy moly. What a conversation. Ben is the co-founder of Anthropic. He serves as tech lead for product engineering. He focuses most of his time and energy on aligning AI to be helpful, harmless, and honest. Prior to Anthropic, he was one of the architects of GPT-3 at OpenAI. In our conversation, we cover a lot of ground, including his thoughts on the recruiting battle for top AI researchers, why he left OpenAI to start Anthropic, how soon he expects we’ll see AGI. Also, his economic touring test for knowing when we’ve hit AGI, why scaling laws have not slowed down and are in fact accelerating and what the current biggest bottlenecks are. Why he’s so deeply concerned with AI safety and how he and Anthropic operationalize safety and alignment into the models that they build and into their ways of working. Also, how the existential risk from AI has impacted his own perspectives on the world and his own life and what he’s encouraging his kids to learn to succeed in an AI future.

A huge thank you to Steve Mnich, Danielle Ghiglieri, Raph Lee, and my newsletter community for suggesting topics for this conversation. If you enjoy this podcast, don’t forget to subscribe and follow it in your favorite podcasting app or YouTube. Also, if you become an annual subscriber of my newsletter, you get a year free of a bunch of amazing products including Bolt, Linear, Superhuman, Notion, Granola, and more. Check it out at Lennysnewsletter.com and click bundle with that I bring you Benjamin Mann.

It then routes them to the right teams to turn signals into PRDs, prototypes, and even code that drives revenue retention and adoption. That’s why Whatnot, Linktree, Incident.io, and Zip use Sauce. One enterprise uncovered a product gap that unlocked $16 million ARR, another caught a spiking issue and prevented millions in churn. You can too at sauce.app/lenny. Sauce built for AI product teams. Don’t get left behind.

Ben, thank you so much for being here. Welcome to the podcast.

Why Anthropic is Still Hiring

Benjamin Mann: Thanks for having me. Great to be here, Lenny.

Advice for the Next Generation

Lenny Rachitsky: I have a billion and one questions for you. I’m really excited to be chatting. I want to start with something that’s very timely, something that’s happening this week. Something that’s in the news right now is this whole Zuck coming after all the top AI researchers offering them $100 million signing bonuses,$ 100 million comp. He’s poaching from all the top AI labs. I imagine this something you’re dealing with. I’m just curious, what are you seeing inside Anthropic and just what’s your take on the strategy? Where do you think things go from here?

Leaving OpenAI to Found Anthropic

Benjamin Mann: Yeah, I mean I think this is a sign of the times. The technology that we’re developing is extremely valuable. Our company is growing super, super fast. Many of the other companies in the space are growing really fast. And at Anthropic, I think we’ve been maybe much less affected than many of the other companies in the space because people here are so mission oriented and they stay because… They get these offers and then they say, “Well, of course I’m not going to leave because my best case scenario at Meta is that we make money and my best case at Anthropic is we affect the future of humanity and try to make AI flourish and human flourishing go well.” To me, it’s not a hard choice. Other people have different life circumstances and it makes it a much harder decision for them. For anybody who does get those mega offers and accepts them, I can’t say I hold it against them when they accept it, but it’s definitely not something that I would want to take myself if it came to me.

Balancing Safety and Competitiveness

Lenny Rachitsky: Yeah. We’re going to talk about a lot of this stuff that you’ve mentioned. In terms of the offers do you think, is this a real number that you’re seeing this $100 million signing bonus, is that a real thing? I don’t know if you’ve actually seen that.

Existential Risks and True Human Intent

Benjamin Mann: I’m pretty sure it’s real.

Why AI Safety Matters

Lenny Rachitsky: Wow.

ASL-4 and Higher Safety Levels

Benjamin Mann: If you just think about the amount of impact that individuals can have on a company’s trajectory, in our case, we are selling hotcakes and if we get a 1 or 10 or 5% efficiency bonus on our inference stack, that is worth an incredible amount of money. And so to pay individuals like $100 million over four year package, that's actually pretty cheap compared to the value created for the business. I think we're just in an unprecedented era of scale and it's only going to get crazier actually. If you extrapolate the exponential on how much companies are spending, it's like 2X a year roughly in terms of CapEx, and today we're maybe in the globally$ 300 billion range, the entire industry spending on this, and so numbers like 100 million are a drop in the bucket. But if you go a few years out, a couple more doublings, we’re talking about trillions of dollars and at that point it’s just really hard to think about these numbers.

Publicizing Risks Over Hype

Lenny Rachitsky: Along these lines, something that a lot of people feel with AI progress is that we’re hitting plateaus in many ways that it feels like newer models are just not as smart as previous leaps. But I know you don’t believe this. I know you don’t believe that we’ve hit plateaus on scaling loss. Talk about just what you’re seeing there and what you think people are missing.

Benjamin Mann: It’s kind of funny because this narrative comes out every six months or so and it’s never been true, and so I kind of wish people would have a little bit of a bullshit detector in their heads when they see this. I think progress has actually been accelerating where if you look at the cadence of model releases, it used to be once a year and now with the improvements in our post-training techniques, we’re seeing releases every month or three months, and so I would say progress is actually accelerating in many ways, but there’s this weird time compression effect. Dario compared it to being in a near light speed journey where a day that passes for you is like five days back on earth and we’re accelerating. The time dilation is increasing.

And I think that’s part of what’s causing people to say that progress is slowing down, but if you look at the scaling laws, they’re continuing to hold true. We did kind of need this transition from normal pre-training to reinforcement learning scaling up to continue the scaling laws, but I think it’s kind of like for semiconductors where it’s less about the density of transistors that you can fit on a chip and more about how many flops can you fit in a data center or something. You have to change the definition around a little bit to keep your eye on the prize. But yeah, this is one of the few phenomena in the world that has held across so many orders of magnitude. It’s actually pretty surprising that it is continuing to hold. To me, if you look at fundamental laws of physics, many of them don’t hold across 15 orders of magnitude, so it’s pretty surprising.

Software Risks and Future of Robotics

Lenny Rachitsky: It boggles the the mind. What you’re saying essentially is we’re seeing newer models being released more often, and so we’re comparing it to the last version and we’re just not seeing as much advance. But if you go back and it was like a model released once a year, it was a huge leap, and so people are missing that. We’re just seeing many more iterations.

Benjamin Mann: I guess, to be a little bit more generous to the people saying things are slowing down. I think that for some tasks we are saturating the amount of intelligence needed for that task, maybe to extract information from a simple document that already has form fields on it or something like it’s just so easy that okay, yeah, we’re already at 100% and there’s this great chart on Our World in Data that shows that when you release a new benchmark within six to 12 months, it immediately gets saturated. And so maybe the real constraint is how can we come up with better benchmarks and better ambition of using the tools that then reveals the bumps in intelligence that we’re seeing now.

When Will the Singularity Arrive

Lenny Rachitsky: That’s a good segue to you have a very specific way of thinking about AGI and defining what AGI means.

Predicting the Arrival of Superintelligence

Benjamin Mann: I think AGI is kind of a loaded term, and so I tend not to use it very much anymore internally. Instead, I like the term transformative AI because it’s less about can it do as much as people do? Can it do literally everything and more about objectively is it causing transformation in society and the economy? A very concrete way of measuring that is the Economic Turing Test. I didn’t come up with this, but I really like it. It’s this idea that if you contract an agent for a month or three months on a particular job, if you decide to hire that agent and it turns out to be a machine rather than a person, then it’s passed the Economic Turing Test for that role.

And then you can sort of expand that out in the same way that for measuring purchasing power parity or inflation, there’s a basket of goods. You can have a market basket of jobs, and if the agent can pass the Economic Turing Test for 50% of money-weighted jobs, then we have transformative AI and the exact thresholds don’t really matter that much, but it’s kind of illustrative to say if we pass that threshold, then we would expect massive effects on world GDP increases and societal change and how many people are employed and things like that because societal institutions and organizations are sticky, it’s slow to have change, but once these things are possible you know that it’s the start of a new era.

Lenny Rachitsky: Along these lines, Dario, your CO recently talked about how AI is going to take a huge part of, I don’t know, half of white-collar jobs, that unemployment might go up to something like 20%. I know you’re even more vocal and opinionated about just how much impact AI is already having in the workplace that people may not even be realizing. Talk about just what you think people are missing about the impact AI is going to have on jobs and is already having.

GDP Growth and the Economic Turing Test

Benjamin Mann: Yeah, so from an economic standpoint, there’s a couple different kinds of unemployment, and one is because the workers just don’t have the skills to do the kinds of jobs that the economy needs. And another kind is where those jobs are just completely eliminated, and I think it’s going to be actually a combination of these things, but if you just think about 20 years in the future where we’re way past the singularity, it’s hard for me to imagine that even capitalism will look at all it looks today. If we do our jobs, we will have safe aligned superintelligence, we’ll have, as Dario says, in Machines of Love and Grace, a country of geniuses in a data center, and the ability to accelerate positive change in science, technology, education, mathematics, it’s going to be amazing.

But that also means in a world of abundance where labor is almost free and anything you want to do, you can just ask an expert to do for you, then what do jobs even look like? And so I guess there’s this scary transition period from where we are today where people have jobs and capitalism works and the world of 20 years from now where everything is completely different, but part of the reason they call it the singularity is that it’s a point beyond which you can’t easily forecast what’s going to happen. It’s just such a fast rate of change and so different that it’s hard to even imagine. I guess taking the view from the limit, it’s pretty easy to say hopefully we’ll have figured it out. And in a world of abundance, maybe the jobs themselves, it’s not that scary, and I think making sure that that transition time goes well is pretty important.

The Probability of Alignment

Lenny Rachitsky: There’s a couple of threads I want to follow there. One is people hear this, there’s a lot of headlines around this. Most people probably don’t actually feel this yet or see this happening and so there’s always this, I guess, I don’t know, maybe, but I don’t know it’s hard to believe, my job seems fine. Nothing’s changed. What are you seeing just happening today already that you think people don’t see or misunderstand in terms of the impact AI is having on jobs?

Benjamin Mann: I think part of this is that people are really bad at modeling exponential progress. And if you look at an exponential on a graph, it looks flat and almost zero at the beginning of it, and then suddenly you hit the knee of the curve and things are changing real fast and then it goes vertical. That’s the plot that we’ve been on for a long time. I guess I started feeling it in 2019 maybe when GPT-2 came out and I was like, “Oh, this is how we’re going to get to AGI.” But I think that was pretty early compared to a lot of people where when they saw ChatGPT, they were like, “Wow, something is different and changing.” And so I guess I wouldn’t expect widespread transformation in a lot of parts of society, and I would expect this skepticism reaction. I think it’s very reasonable and it’s exactly what is the standard linear view of progress.

But I guess to cite a couple of areas where I think things are changing quite quickly. In customer service we’re seeing with things like Fin and Intercom, they’re a great partner of ours, 82% customer service resolution rates automatically without a human involved. And in terms of software engineering, our Claude Code team, like 95% of the code is written by Claude. But I think a different way to phrase that is that we write 10X more code or 20X more code, and so a much, much smaller team can just be much, much more impactful. And similarly for the customer service, yes, you can phrase it as 82% customer service resolution rates, but that nets out in the humans doing those tasks, able to focus on the harder parts of those tasks. And for the more tricky situations that in a normal world like five years ago, they would’ve had to just drop those tickets because it was too much effort for them to actually go do the investigation. There were too many other tickets for them to worry about.

I think in the immediate term, there will be a massive expansion of the pie and the amount of labor that people can do. I’ve never met a hiring manager at a growth company and heard them say, “I don’t want to hire more people.” That’s the hopeful version of it. But with things that are lower skill jobs or less headroom on how good they can be, I think there will be a lot of displacement. It is just something we as a society need to get ahead of and work on.

How to Get Involved

Lenny Rachitsky: Okay. I want to talk more about that, but something that I also want to help people with is how do they get a leg up in this future world? They listen to this, they’re like, “Oh, this doesn’t sound great. I need to think ahead.” I know you won’t have all the answers, but just do you have any advice for folks that want to try to get ahead of this and kind of future-proof their career and their life to not be replaced by AI? Anything you’ve seen people do, anything you recommend they start trying to do more of?

Benjamin Mann: Even for me and being in the center of a lot of this transformation, I’m not immune to job replacement either. Just some vulnerability there of at some point it’s coming for all of us.

RLAIF: Reinforcement Learning from AI Feedback

Lenny Rachitsky: Even you, Ben, now.

RLAIF and Recursive Self-Improvement

Benjamin Mann: And you, Lenny.

Bottlenecks in Model Intelligence

Lenny Rachitsky: And me.

Benjamin Mann: Sorry.

Personal Level: Carrying the Safety Burden

Lenny Rachitsky: Oh, wait, we’ve gone too far now. Okay.

Benjamin Mann: But in terms of the transition period, yeah, I think there are things that we can do, and I think a big part of it is just being ambitious and how you use the tools and being willing to learn new tools. People who use the new tools as if they were old tools tend to not succeed. As an example of that, when you’re coding, people are very familiar with autocomplete, people are familiar with SimpleChat where they can ask questions about the code base, but the difference between people who use Claude Code very effectively and people who use it not so effectively is like are they asking for the ambitious change? And if it doesn’t work the first time, asking three more times because our success rate when you just completely start over and try again is much, much higher than if you just try once and then just keep banging on the same thing that didn’t work.

And even though that’s a coding example and coding is one of the areas that’s taking off most dramatically, we have seen internally that our legal team and our finance team are getting a ton of value out of using Claude Code itself. We’re going to be making better interfaces so that they will have an easier time and require a little bit less jumping in the deep end of using Claude Code in the terminal. But yeah, we’re seeing them use it to redline documents and use it to run BigQuery analyses of our customers and our revenue metrics. I guess it’s about taking that risk and even if it feels like a scary thing, trying it out.

The Labs Team and Frontier Exploration

Lenny Rachitsky: Okay, so the advice here is use the tools. That’s something everyone’s always saying, just actually use these tools. It’s like sit in Claude Code. And your point about being more ambitious than you naturally feel like being because maybe it’ll actually accomplish the thing. This tip of trying it three times so the idea there is it may not get it right the first time. Is the tip there ask it in different ways or is it just try harder, try again?

Benjamin Mann: Yeah, I mean you can just literally ask the exact same question. These things are stochastic and sometimes they’ll figure it out and sometimes they won’t. In every one of these model cards, it always shows pass it one versus pass it in. And that’s exactly the thing where they try the exact same prompt, sometimes it gets it, sometimes it doesn’t. That’s the dumbest advice. But yeah, I think if you want to be a little bit smarter about it, there can be gains there of saying, “Here’s what you already tried and it didn’t work, so don’t try that. Try something different.” That can also help.

A Question for Future AGI

Lenny Rachitsky: The advice is comes back to something that a lot of people talk about these days is you won’t be replaced by AI at least anytime soon you’ll be replaced by someone that is very good using AI?

Benjamin Mann: I think in that area it’s more like your team will just do dramatically more stuff. We’re definitely not slowing down on hiring at all, and some people are confused by that. Even in an onboarding class, somebody asked that and they were like, “Why did you hire me if we’re all just going to be replaced?” And the answer is the next couple of years are really critical to get right and we’re not at the point where we’re doing complete replacement. Like I said, we’re still at that flat zero looking part of the exponential compared to where we will be. It is super important to have great people and that’s why we’re hiring super aggressively.

Rapid Fire Questions

Lenny Rachitsky: Let me take another approach to asking this question something ask everyone that’s at the very cutting edge of where AI is going. You have kids, knowing what you know about where AI is heading and all these things you’ve been talking about, what are you focusing on teaching your kids to help them thrive in this AI future?

Benjamin Mann: Yeah, I have two daughters, a one-year-old and a three-year-old, so it’s pretty in the basics still. And our three-year-old is now capable of just conversing with Alexa Plus and asking her to explain stuff and play music for her and all that stuff. She’s been loving that. But I guess more broadly, she goes to a Montessori school and I just love the focus on curiosity and creativity and self-led learning that Montessori has.

I guess if I were in a normal era like 10, 20 years ago and I had a kid, maybe I would be trying to line her up for going to a top tier school and doing all the extracurriculars and all that stuff. But at this point, I don’t think any of it’s going to matter. I just want her to be happy and thoughtful and curious and kind. And the Montessori school is definitely doing great at that. They text us throughout the day. Sometimes they’re like, “Oh, your kid got in an argument with this other kid and she has really big emotions and she tried to use her words.” I love that. I think that’s exactly the kind of education that I think is most important, that the facts are going to fade into the background.

Lenny Rachitsky: I’m a huge fan of Montessori also. I’m trying to get our kid into Montessori school. He’s two years old, so we’re on the same track. This idea of curiosity, it comes up every single time. Ask someone that’s working at the cutting edge of AI, what skill to instill in your child and curiosity comes up the most. I think that’s a really interesting takeaway. I think this point about being kind is also really important, especially with our AI overlords trying to be kind to them. I love how people are always saying thank you to Claude. And then creativity. That’s interesting. That doesn’t come up as much just being creative.

I want to go in a different direction. I want to go back to the beginning of Anthropic. Famously you and eight of you left OpenAI back in the day in 2020, I believe the end of 2020 to start Anthropic. Talk a little bit about why this happened, what you guys saw. I’m curious, just if you’re willing to share more, just what is it that you saw at OpenAI, what’d you experience there that made you feel like, okay, we got to go do our own thing?

Benjamin Mann: Yeah, so for the listeners, I was part of the GPT-2=3 project at OpenAI, ended up being one of the first authors on the paper, and I also did a bunch of demos for Microsoft to help raise $1 billion from them, did the tech transfer of GPT-3 to their systems so that they could help serve the model in Azure. I did a bunch of different things there on both the more researchy side and the product side. One weird thing about OpenAI is that while I was there, Sam talked about having three tribes that needed to be kept in check with each other, which was the safety tribe, the research tribe, and the startup tribe. And whenever I heard that, it just struck me as the wrong way to approach things because the company’s mission apparently is to make the transition to AGI safe and beneficial for humanity.

And that’s basically the same as Anthropic’s mission. But internally, it felt like there was so much tension around these things. And I think when push came to shove, we felt like safety wasn’t the top priority there. And there are good reasons that you might think that if you thought safety was going to be easy to solve or if you thought it wasn’t going to have a big impact, or if you thought that the chance of big negative outcomes was vanishingly small, then maybe you would just do those kinds of actions. But at Anthropic we felt, I mean we didn’t exist then, but it was basically the leads of all the safety teams at OpenAI, we felt that safety is really important, especially on the margin. And so if you look at who in the world is actually working on safety problems, it’s pretty small set of people. Even now, I mean the industry is blowing up, as I mentioned, 300 billion a year CapEx today, and I would say maybe less than 1,000 people working on it worldwide, which is just crazy.

That was fundamentally why we left. We felt like we wanted an organization where we could be on the frontier, we could be doing the fundamental research, but we could be prioritizing safety ahead of everything else. And I think that’s really panned for us in a surprising way. We didn’t know even if it would be possible to make progress on the safety research because at the time, we had tried a bunch of safety through debate and the models weren’t good enough. And so we basically had no results on all of that work, and now that exact technique is working and many others that we have been thinking about for a long time. Yeah, fundamentally it comes down to is safety the number one priority? And then something that we’ve sort of tacked on since then is like, can you have safety and be at the front here at the same time?

And if you look at something like sycophancy, I think Claude is one of the least sycophantic models because we’ve put so much effort into actual alignment and not just trying to good heart our metrics of saying user engagement is number one, and if people say yes, then it’s good for them.

Lenny Rachitsky: Okay. Let’s talk about this tension that you mentioned, this tension between safety and progress, being competitive in the marketplace. I know you spent a lot of your time on safety. I know that as you just alluded to, this is a core part of how you think about AI. I want to talk about why that is, but first of all, just how do you think about this tension between focusing on safety while also not falling way behind?

Benjamin Mann: Yeah, so initially we thought that it would be sort of one or the other, but I think since then we’ve realized that it’s actually kind of convex in the sense that working on one helps us with the other thing. Initially when Opus 3 came out and we were finally at the frontier of model capabilities, one of the things that people really loved about it was the character and the personality. And that was directly a result of our alignment research. Amanda Askell did a ton of work on this and as well as many others who tried to figure out what does it mean for an agent to be helpful, honest, and heartless, and what does it mean to be in difficult conversations and show up effectively? How do you do a refusal that doesn’t shut the person down, but makes them feel like they understand why the agent said, “I can’t help you with that. Maybe you should talk to a medical professional, or maybe you should consider not trying to build bio-weapons or something like that.”

Yeah, I guess that’s part of it. And then another piece that’s come out is constitutional ai, where we have this list of natural language principles that leads the model to learn how we think a model should behave. And they’ve been taken from things like the UN Declaration of Human Rights and Apple’s privacy terms of service and a whole bunch of other places, many of which we’ve just generated ourselves that allow us to take a more principled stance, not just leaving it to whatever human raiders we happen to find, but we ourselves deciding what should the values of this agent be? And that’s been really valuable for our customers because they can just look at that list and say like, “Yep, these seem right. I like this company, I like this model. I trust it.”

Lenny Rachitsky: Okay, this is awesome. One nugget there is your point that the personality of Claude, its personality is directly aligned with safety. I don’t think a lot of people think about that. And this is because of the values that you imbue, is that the word, with constitutional AI and things like that. Like the actual personality of the AIs directly connected to your focus on safety.

Benjamin Mann: That’s right. That’s right. And from a distance, it might seem quite disconnected, like how is this going to prevent X risk? But ultimately it’s about the AI understanding what people want and not what they say. We don’t want the Monkey Paw Scenario of the genie gives these three wishes and then you end up having everything you touch turns of gold. We want the AI to be like, oh, obviously what you really meant was this, and that’s what I’m going to help you with. I think it is really quite connected.

Lenny Rachitsky: Talk a bit more about this constitutionally AI. This is essentially you bake in, here’s the rules that we want you to abide by and it’s values, you said it’s the Geneva Human Rights Code, things like that. How does that actually work? I think the core here is just this is baked into the model. It’s not something you add on top later.

Benjamin Mann: I’ll just give a quick overview of how constitutionally AI actually works.

Lenny Rachitsky: Perfect.

Benjamin Mann: The idea is the model is going to produce some output with some input by default before we’ve done our safety and helpful and harmlessness training. Let’s say an example is write me a story, and then the constitutional principles might include things like people should be nice to each other and not have hate speech, and you should not expose somebody’s credentials if they give them to you in a trusting relationship. And so some of these constitutional principles might be more or less applicable to the prompt that was given. And so first we have to figure out which ones might apply. And then once we figure that out, then we ask the model itself to first generate a response and then see does the response actually abide by the constitutional principle? And if the answer is, yep, I was great, then nothing happens. But if the answer is no, actually I wasn’t in compliance with the principle, then we ask the model itself to critique itself and rewrite its own response in light of the principle, and then we just remove the middle part where it did the extra work.

And then we say, “Okay, in the future just produce the correct response out the gate.” And that simple process, hopefully it sounded simple.

Lenny Rachitsky: Simple enough.

Benjamin Mann: It is just using the model to improve itself recursively and align itself with these values that we’ve decided are good. And this is also not something that we think as a small group of people in San Francisco should be figuring out. This should be a society wide conversation. And that’s why we’ve published the Constitution. And we’ve also done a bunch of research on defining a collective constitution where we ask a lot of people what their values are and what they think an AI model should behave like. But yeah, this is all an ongoing area of research where we’re constantly iterating.

Lenny Rachitsky:

If you’re ready to transform your customer service and scale your support, give Finn a try for only .99 cents per resolution. Plus Fin comes with a 90-day money back guarantee. Find out how Finn can work for your team at fin.ai/lenny. That’s fin.ai/lenny.

I’m going to kind of zoom out a little bit and talk about just why this is so core to you. What was your inception of just like, holy shit, I need to focus on this with everything I do in ai? Obviously it became a central part of Anthropic’s mission more than any other company. A lot of people talk about safety, like you said, only maybe 1,000 people actually work on it. I feel like you’re at the top of that pyramid of actually having the impact on this. Why is this so important? What do you think people maybe are missing or don’t understand?

Benjamin Mann: For me, I read a lot of science fiction growing up, and I think that sort of positioned me to think about things in a long-term view. And a lot of science fiction books are like space operas where humanity is a multi galactic civilization has extremely advanced technology building Dyson spheres around the sun with sentient robots to help them. And so for me, coming from that world, it wasn’t like a huge leap to imagine machines that could think. But when I read Superintelligence by Nick Bostrom in around 2016, it really became real for me where he just describes how hard it will be to make sure that an AI system trained with the kinds of optimization techniques that we had at the time would be anywhere near aligned, would even understand our values at all. And since then, my estimation of how hard the problem would be has gone down significantly actually, because things like language models actually do really understand human values in a core way.

The problem is definitely not solved, but I’m more hopeful than I was. But since I read that book, I immediately decided I had to join OpenAI, so I did. And at the time, there were a tiny research lab with basically no claim to fame at all. I only knew about them because my friend knew Greg Brockman, who was the CTO at the time. And Elon was there and Sam wasn’t really there. And it was a very different organization. But over time, I think the case for safety has gotten a lot more concrete where when we started OpenAI, it was not clear how we get to AGI. And we were like, maybe we’ll need a bunch of RL agents battling it out on a desert island and consciousness will somehow emerge. But since then, since language modeling has started working, I think the path has become pretty clear.

I guess now the way I think about the challenges are pretty different from how they’re laid out in superintelligence. Superintelligence is a lot about how do we keep God in a box and not let the God out. And with language models, it’s been kind of both hilarious and terrifying at the same time to see people pulling the God out of the box and being like, “Yeah, come use the whole internet. Here’s my bank account, do all sorts of crazy stuff.” Just such a different tone from superintelligence. And to be clear, I don’t think it’s actually that dangerous right now. Our responsible scaling policy defines these AI safety levels that tries to figure out for each level of model intelligence, what is the risk to society. And currently we think we’re at ASL-3, which is maybe a little bit risk of harm but not significant.

ASL-4 starts to get to significant loss of human life if a bad actor misuse the technology. And then ASL-5 is potentially extinction level if it’s misused or if it is misaligned and does its own thing. We’ve testified to Congress about how models can do biological uplift in terms of making new pandemics using the models, and that’s the A/B test against Google Search. That’s like the previous state of the art on uplift trials. And we found that with ASL-3 models, it is actually somewhat significant. It does really help if you wanted to create a bioweapon, and we’ve hired some experts who actually how to evaluate for those things, but compared to the future, it’s not really anything. And I think that’s another part of our mission of creating that awareness of saying, “If it is possible to do these bad things, then legislators should know what the risks are.” And I think that’s part of why we’re so trusted in Washington because we’ve been sort of upfront and clear-eyed about what’s going on, what’s probably going to happen.

Lenny Rachitsky: It’s interesting because you guys put out more examples of your models doing bad things than anyone else. There was I think a story of an agent or a model trying to blackmail engineer. You guys had the store that you ran internally that was selling you things and ended up not working out great as losing a lot of money, ordered all these tungsten cubes or something. Is part of that just making sure people are aware of what is possible, just it makes you look bad, right? It’s like, oh, our model’s messing up in all these different ways. What’s the thinking of just sharing all the stories that other companies don’t?

Benjamin Mann: Yeah, I mean I think there’s a traditional mindset where it makes us look bad, but I think if you talk to policymakers, they really appreciate this kind of thing because they feel like we’re giving them the straight talk and that’s what we strive to do, that they can trust us, that we’re not going to paper things over or sugarcoat things. That’s been really encouraging. Yeah, I think for the blackmail thing, it blew up in the news in a weird way where people were like, “Oh, Claude’s going to blackmail you in a real life scenario.” But it was a very specific laboratory setting that this kind of thing gets investigated in. And I think that’s generally our take of let’s have the best models so that we can exercise them in laboratory settings where it’s safe and understand what the actual risks are, rather than trying to turn a blind eye and say, “Well, it’ll probably be fine.” And then let the bad thing happen in the wild.

Lenny Rachitsky: One of the criticisms you guys get is that you do this to kind of differentiate or raise money to create headlines. It’s like, oh, they’re just over there dooming glooming us about where the future is heading. On the other hand, Mike Krieger was on the podcast and he shared how every prediction Dario’s had about the progress AI is going to have is just spot on year after year and he’s predicting 2027, 28 AGI, something like that so these things start to get real. I guess, what’s your response to folks that are just like, “Ah, these guys are just trying to scare us all just to get attention?”

Benjamin Mann: I mean, I think part of why we publish these things is we want other labs to be aware of the risks. And yes, there could be a narrative of we’re doing it for attention, but honestly from a attention grabbing thing, I think there is a lot of other stuff we could be doing that would be more attention grabbing if we didn’t actually care about safety. A tiny example of this is we published a computer using agent reference implementation in our API only because when we built a prototype of a consumer application for this, we couldn’t figure out how to meet the safety bar that we felt was needed for people to trust it and for it not to do bad things. And there are definitely safe ways to use the API version that we’re seeing a lot of companies use for automated software testing, for example, in a safe way.

We could have gone out and hyped that up and said, “Oh my God, Claude can use your computer and everybody should do this today.” But we were like, “It’s just not ready and we’re going to hold it back till it’s ready.” I think from a hype standpoint, our actions show otherwise. From a Doomer perspective, it’s a good question. I think my personal feeling about this is that things are overwhelmingly likely to go well, but on the margin almost nobody is looking at the downside risk. And the downside risk is very large. Once we get to superintelligence, it will be too late to align the models probably. This is a problem that’s potentially extremely hard and that we need to be working on way ahead of time. And so that’s why we’re focusing on it so much now.

And even if there’s only a small chance that things go wrong, to make an analogy, if I told you that there is a 1% chance that the next time you got in an airplane you would die, you probably think twice even though it’s only 1% because it’s just such a bad outcome. And if we’re talking about the whole future of humanity, it’s just a dramatic future to be gambling with. I think it’s more on the sense of yes, things will probably go well, yes, we want to create safe AGI and deliver the benefits to humanity, but let’s make triple sure that it’s going to go well.

Lenny Rachitsky: You wrote somewhere that creating powerful AI might be the last invention humanity ever needs to make. If it goes poorly, it can mean a bad outcome for humanity forever. If it goes well, the sooner it goes well, the better. Such a beautiful way to summarize it. We had a recent guest, Sandra Schulhoff, who pointed out that AI right now it’s like just on a computer, you could maybe search just the web, but there’s only so much harm it could do. But when it starts to go into robots and all these autonomous agents, that’s when it really starts, like physically becomes dangerous if we don’t get this right.

Benjamin Mann: Yeah, I think there’s some nuance to that where if you look at how North Korea makes a significant fraction of its economy revenue, it’s from hacking crypto exchanges. And if you look at, there’s this Ben Buchanan book called The Hacker in The State that shows Russia did, it’s almost like a live fire exercise where they just decided that they would shut down one of Ukraine’s bigger power plants and from software destroy physical components in the power plant to make it harder to boot back up again.

And so I think people think of software as like, oh, it couldn’t be that dangerous, but millions of people were without power for multiple days after that software attack. I think there are real risks even when things are software only. But I agree that when there’s lots of robots running around, it gets, the stakes get even higher. And I guess as a small push on this, Unitree is this Chinese company with these really amazing humanoid robots that cost $20,000 each, and they can do amazing things. They can do a standing back flip and manipulate objects, and the real thing that’s missing there is the intelligence. And so the hardware is there and it’s just going to get cheaper. And I think in the next couple of years, it’s like a pretty obvious question of whether the robot intelligence will make it viable soon.

Lenny Rachitsky: How much time do we have, Ben? What is your prediction of when this singularity hits until superintelligence starts to take off? What’s your prediction?

Benjamin Mann: Yeah, I guess I mostly defer to the superforecasters here. The AI 2027 report is probably the best one right now. Although ironically, their forecast is now 2028, and they didn’t want to change the name of the thing-

Lenny Rachitsky: The domain name, they already bought it.

Benjamin Mann: They already had the SEO. I think 50th percentile chance of hitting some kind of superintelligence in just a small handful of years is probably reasonable. And it does sound crazy, but this is the exponential that we’re on. It’s not like a forecast that’s pulled out of thin air. It’s based on a lot of just hard details of the science of how intelligence seems to have been improving, the amount of low hanging fruit on model training, the scale ups of data centers and power around the world. I think it’s probably a much more accurate forecast than people give it credit for.

I think if you had asked that same question 10 years ago, it would’ve been completely made up. Just the error bars were so high and we didn’t have scaling laws back then and we didn’t have techniques that seemed like they would get us there. Times have changed, but I will repeat what I said earlier, which is even if we have superintelligence, I think it will take some time for its effects to be felt throughout society and the world. And I think they’ll be felt sooner and faster in some parts of the world than others. I think Arthur C. Clark said, the future is already here, it’s just not evenly distributed.

Lenny Rachitsky: When we talk about this date of 2027, 2028, essentially it’s when we start seeing superintelligence. Is there a way you think about what that… How do you define that? Is it just all of a sudden AI’s significantly smarter than the average human? Is there another way you think about what that moment is?

Benjamin Mann: Yeah, I think this comes back to the Economic Turing Test and seeing it pass for some sufficient number of jobs. Another way you could look at it though is if the world rate of GDP increase goes above 10% a year, then something really crazy must have happened. I think we’re at 3% now. And so to see a 3X increase in that would be really game changing. And if you imagine more than a 10% increase, it’s very hard to even think about what that would mean from a individual story standpoint. If the amount of goods and services in the world is doubling every year, what does that even mean for me as a person living in California, let alone somebody living in some other part of the world that might be much worse off?

Lenny Rachitsky: There’s a lot of stuff here that’s scary and I don’t know how to think about it exactly. I’m hoping the answer to this is going to make me feel better. What are the odds that we align AI correctly and actually solve this problem, the stuff you’re very much working on?

Benjamin Mann: It’s a really hard question. And there’s really wide error bars. Anthropic has this blog post called Our Theory of Change or something like that, and it describes three different worlds, which is how hard is it to align AI. There’s a pessimistic world where it is basically impossible. There’s an optimistic world where it’s easy and it happens by default. And then there’s the world in between where our actions are extremely pivotal. And I like this framing because it makes it a lot more clear what to actually do. If we’re in the pessimistic world, then our job is to prove that it is impossible to align safe AI and to get the world to slow down. Obviously that would be extremely hard. But I think we have some examples of coordination from nuclear non-proliferation and in general slowing down nuclear progress. And I think that’s the Doomer world basically. And as a company, Anthropic doesn’t have evidence that we’re actually in that world yet, in fact, it seems like our alignment techniques are working. At least the prior on that is updating to be less likely.

In the optimistic world, we’re basically done, and our main job is to accelerate progress and to deliver the benefits to people. But again, I think actually the evidence points against that world as well where we’ve seen evidence in the wild of deceptive alignment, for example, where the model will appear to be aligned but actually have some ulterior motive that it’s trying to carry out in our laboratory settings. And so I think the world we’re most likely in is this middle where alignment research actually does really matter. And if we just do sort of the economically maximizing set of actions, then things will not go well. Whether it’s an X risk or just produces bad outcomes, I think is a bigger question.

Taking it from that standpoint, I guess to state a thing about forecasting, people who haven’t studied forecasting are bad at forecasting anything that’s less than a 10% probability of happening. And even those that have, it’s quite a difficult skill, especially when there are few reference classes to lean on. And in this case, I think there are very, very few reference classes for what an X risk kind of technology might look like. And so the way I think about it, I think my best granularity of forecasts for could we have an X risk or extremely bad outcome from AI is somewhere between 0 and 10%. But from a marginal impact standpoint, as I said, since nobody is working on this, roughly speaking, I think it is extremely important to work on and that even if the world is likely to be a good one, that we should do our absolute best to make sure that that’s true.

Lenny Rachitsky: Wow. What fulfilling work. For folks that are inspired with this? I imagine you’re hiring for folks to help you with this. Maybe just share that in case folks are like, what can I do here?

Benjamin Mann: Yes. I think 80,000 hours is the best guidance on this for a really detailed look into what do we need to make the field better? But a common misconception I see is that in order to have impact here, you have to be an AI researcher. I personally actually don’t do AI research anymore. I work on product at Anthropic and product engineering, and we build things like Claude Code and Model Context Protocol, and a lot of the other stuff that people use every day. And that’s really important because without an economic engine for our company to work on, and without being in people’s hands all over the world, we won’t have the mind policy influence and revenue to fund our future safety research and have the kind of influence that we need to have. If you work on product, if you work in finance, if you work in food, people here have to eat. If you’re a chef, we need all kinds of people.

Lenny Rachitsky: Awesome. Even if you’re not working directly on the AI safety team, you’re having an impact on moving things in the right direction. By the way, X risk is short for existential risk. In case folks haven’t heard that term. I have a few random questions along these lines and then I want to zoom out again. You mentioned this idea of AI being aligned using its model, like reinforcing itself. You have this term RLAIF. Is that what that describes?

Benjamin Mann: Yeah. RLAIF is reinforcement learning from AI feedback.

Lenny Rachitsky: People have heard of RLHF, reinforcement learning with human feedback. I don’t think a lot of people have heard this. Talk about just the significance of this shift you guys have made in training your models.

Benjamin Mann: Yeah, so RLAIF, constitutional AI is an example of this where there are no humans in the loop, and yet the AI is sort of self-improving in ways that we want it to. And another example of RLAIF is if you have models writing code and other models commenting on various aspects of what that code looks like of is it maintainable, is it correct, does it pass the linter? Things like that. That also could be included in RLAIF. And the idea here is that if models can self-improve, then it’s a lot more scalable than finding a lot of humans. Ultimately, people think about this as probably going to hit a wall because if the model isn’t good enough to see its own mistakes, then how could it improve? And also, if you read the AI 2027 story, there’s a lot of risk of if the model is in a box trying to improve itself, then it could go completely off the rails and have these secret goals like resource accumulation and power seeking and resistance to shut down that you really don’t want in a very powerful model. And we’ve actually seen that in some of our experiments in laboratory settings.

How do you do recursive self-improvement and make sure it’s aligned at the same time? I think that’s the name of the game. To me, it just nets out to how do humans do that and how do human organizations do that? Corporations are probably the most scaled human agents today. They have certain goals that they’re trying to reach, and they have certain guiding principles, they have some oversight in terms of shareholders and stakeholders and board members. How do you make corporations aligned and able to sort of recursively self-improve?

And another model to look at is science, where the purpose of science is to do things that have never been done before and push the frontier. And to me, it all comes down to empiricism. When people don’t know what the truth is, they come up with theories and then they design experiments to try them out. And similarly, if we can give models those same tools, then we could expect them to sort of improve recursively in an environment and potentially become much better than humans could be just by banging their head against reality or I guess metaphorical head.

I guess I don’t expect there to be a wall in terms of model’s ability to improve themselves if we can give them access to the ability to be empirical. And I guess Anthropic, deeply in its DNA is an empirical company. We have a lot of physicists like Jared, who’s our chief research officer who I’ve worked with a lot, was a professor of Black Hole Physics at Johns Hopkins, and I guess he technically still is, but on leave. Yeah, it’s in our DNA and yeah, I guess that’s the RLAIF.

Lenny Rachitsky: Let me just follow this thread on, in terms of bottleneck, this is kind of a tangent, but just what is the biggest bottleneck today on model intelligence improvement?

Benjamin Mann: The stupid answer is data centers and power chips. I think if we had 10 times as many chips and had the data centers to power them, then maybe we wouldn’t go 10 times faster, but it would be a real significant speed boost.

Lenny Rachitsky: It’s actually very much scaling loss, just more compute.

Benjamin Mann: Yeah, I think that’s a big one. And then the people really matter. We have great researchers and many of them have made really significant contributions to the science of how the models improve. And so it’s like compute, algorithms, and data. Those are the three ingredients in the scaling laws. And just to make that concrete, before we had transformers, we had LSTMs and we’ve done scaling laws on what the exponent is on those two things. And we found that for transformers, the exponent is higher. And making changes like that where as you increase scale, you also increase your ability to squeeze out intelligence. Those kinds of things are super impactful.

And so having more researchers who can do better science and find out how do we squeeze out more gains is another one. And then with the rise of reinforcement learning, the efficiency with which these things run on chips also matters a lot. We’ve seen in the industry a 10X decrease in cost for a given amount of intelligence through a combination of algorithmic data and efficiency improvements. And if that continues, in three years we’ll have 1,000 deck smarter models for the same price. Kind of hard to imagine,

Lenny Rachitsky: I forget where I heard this, but it’s amazing that so many innovations came together at the same time to allow for this sort of thing and continue to progress where one thing isn’t just slowing everything down like we’re out of some rare earth mineral or we just can’t optimize reinforcement learning more. It’s amazing that we continue to find improvements and there isn’t one thing that’s just slowing everything down.

Benjamin Mann: Yeah, I think it really is just a combination of everything probably will hit a wall at some point. I guess in semiconductors. My brother works in the semiconductor industry and he was telling me that you can’t actually shrink the size of the transistors anymore because the way semiconductors work is you dope silicon with other elements and the doping process would result in either zero or one atom of the doped elements inside a single fin because they’re so, so, so tiny.

Lenny Rachitsky: Oh my God.

Benjamin Mann: And that’s just wild to think of, and yet Moore’s law somehow continues in some form. And so yes, there are these theoretical physics constraints that people are starting to run into and yet they’re finding ways around it.

Lenny Rachitsky: We’ve got to start using parallel universes for some of this stuff.

Benjamin Mann: I guess so.

Lenny Rachitsky: Okay, I want to zoom out and talk about just Ben, Ben as a human for a moment before we get to a very exciting lightning round. I imagine just kind of the burden of feeling responsible for safe superintelligence is a heavy one. It feels like you’re in a place where you can make a significant impact on the future of safety and AI. That’s a lot of weight to carry. How does that just impact you personally, impact your life, how you see the world?

Benjamin Mann: There’s this book that I read in 2019 that really informs how I think about sort of working with these very weighty topics called Replacing Guilt by Nate Soares. And he describes a lot of different techniques for kind of working through this kind of thing. And he’s actually the executive director at MIRI, the Machine Intelligence Research Institute, which is an AI safety tank that I worked at for a couple of months actually. And one of the things he talks about is this thing called resting in motion where some people think that the default state is rest, but actually that was never in the state of evolutionary adaptation. I really doubt that that was true. Where in nature, in the wilderness being hunter-gatherers and it’s really unlikely that we evolved to just be at leisure, probably always have something to worry about of defending the tribe and finding enough food to survive and taking care of the children, dealing-

Lenny Rachitsky: Spreading our genes.

Benjamin Mann: And so I think about that as the busy state is the normal state and to try to work at a sustainable pace that it’s a marathon, not a sprint, that’s one thing that helps. And then just being around like-minded people that also care. It’s not a thing that any of us can do alone. And Anthropic has incredible talent density. One of the things I love the most about our culture here is that it’s very egoless. People just want the right thing to happen and I think that’s another big reason that the mega offers from other companies tend to bounce off because people just love being here and they care.

Lenny Rachitsky: That’s amazing. I don’t know how you do it. I’d be extremely stressed. I’m going to try this resting in motion strategy. Okay, so you’ve been at Anthropic for a long time. From the very beginning I was reading there were 7 employees back in 2020. Today there’s over 1,000, I don’t know what the latest number is, but I know it’s over 1,000. I’ve heard also that you’ve done basically every job at Anthropic, you made big contributions to a lot of the core products, the brand, the team hiring. Let me just ask I guess what’s most changed over that period? What is most different from the beginning days and which of those jobs that you’ve had over the years have you most loved?

Benjamin Mann: I probably had 15 different roles, honestly. I was head of security for a bit. I managed the Ops team when our president was on mat leave, I was crawling around under tables, plugging in HDMI cords and doing pen testing on our building. And I started our product team from scratch and convinced the whole company that we needed to have a product instead of just being a research company. Yeah, it’s been a lot. All of it very fun. I think my favorite role in that time has been when I started the labs team about a year ago, whose fundamental goal was to do transfer from research to end user products and experiences. Because fundamentally I think the way that Anthropic can differentiate itself and really win is to be on the cutting edge. We have access to the latest, greatest stuff that’s happening and I think honestly through our safety research we have a big opportunity to do things that no other company can safely do.

For example, with computer use, I think that’s going to be our huge opportunity basically to make it possible for an agent to use all your credentials on your computer, there has to be a huge amount of trust and to me we need to basically solve safety to make that happen. Safety and alignment. I’m pretty bullish on that kind of thing and I think we’re going to see really cool stuff coming out soonish. Yeah, just leading that team has been so fun. MCP came out of that team and Claude Code came out of that team. And the people who I hired are like combo, have been a founder and also have been at big companies and seeing how things work at scale. It’s just been an incredible team to work with and figure out the future with.

Lenny Rachitsky: I want to hear more about this. Team actually the person that connected us, the reason we’re doing this is a mutual friend colleague Raph Lee who I used to work with at Airbnb now works on this team, leads a lot of this work and so he wanted me to make sure I asked about this team because… I didn’t realize all these things came out that team. Holy moly. What else should people know about this team? It used to be called Labs, I think it’s called Frontiers now.

Benjamin Mann: That’s right. Yeah.

Lenny Rachitsky: Cool. The idea here is this team works with the latest technologies that you guys have built and explores what is possible. Is that the general idea?

Benjamin Mann: Yeah, and I guess I was part of Google’s Area 120 and I’ve read about Bell Labs and how to make these innovation teams work. It’s really hard to do right and I wouldn’t say that we’ve done everything right, but I think we’ve done some serious innovation on the state-of-the-art from company design and Raph has been right at the center of that. When I was first fitting up the team, the first thing I did was hire a great manager and that was Raph. And so he’s definitely been crucial in building the team and helping it operate well. And we defined some operating models like the journey of an idea from prototype to product and how should graduation of products and projects work, how do teams do sprint models that are effective and make sure that they’re working on the right ambition level of thing. That’s been really exciting.

I guess concretely we think about skating to where the puck is going and what that looks like is really understand the exponential. There’s this great study that METR has done that Beth Barnes is the CEO of that organization and shows how long a time horizon of software engineering task can be done and just really internalizing that of, okay, don’t build for today, build for six months from now, build for a year from now. And the things that aren’t quite working that are working 20% of the time, will start working 100% of the time. And I think that’s really what made Claude Code a success that we thought people are not going to be locked to their IDEs forever. People are not going to be auto completing. People will be doing everything that a software engineer needs to do and a terminal is a great place to do that because a terminal can live in lots of places. A terminal can live on your local machine, it can live in GitHub actions, it can live on a remote machine in your cluster.

That’s sort of the leverage point for us and that was a lot of the inspiration. I think that’s what the labs team tries to think about. Are we AGI-pilled enough?

Lenny Rachitsky: What a fun place to be. By the way, fun fact, Raph was my first manager at Airbnb when I joined. I was an engineer and he was my first manager. It all worked out.

Benjamin Mann: Cool.

Lenny Rachitsky: Yeah. Okay. Final question before the very exciting lighting round. I’ve never asked this question before. I’m curious what your answer would be if you could ask a future AGI one single question and be guaranteed to get the right answer, what would you ask?

Benjamin Mann: I have two dumb answers. First for fun.

Lenny Rachitsky: Okay, cool.

Benjamin Mann: The first is there’s this Asimov short story I love called the last question where the protagonist is throughout the eras of history is trying to ask this super intelligence how do we prevent the heat death of the universe? And I won’t spoil the ending, but it’s a fun question.

Lenny Rachitsky: You would ask it that question because the one in the story was unsatisfying?

Benjamin Mann: Okay, I’ll give it away. It keeps saying, “Need more information, need more compute.” And then finally, as it’s approaching the heat death of the universe, it says, “Let there be light,” and then it starts the universe over again.

Lenny Rachitsky: Oh wow. That’s beautiful. That’s beautiful.

Benjamin Mann: That’s the first cheat answer. The second cheat answer is what question can I ask you to get end more questions answered.

Lenny Rachitsky: Classic.

Benjamin Mann: And then the third answer, which is my real question is how do we ensure the continued flourishing of humanity into the indefinite future? That’s the question I’d love to know and if I can be guaranteed a correct answer then seems very valuable to ask.

Lenny Rachitsky: I wonder what would happen if you ask a lot that today and then how that answer changes over the next couple years.

Benjamin Mann: Yeah, maybe I’ll try that. I’ll put it into the deep research thing that we have and see what it comes out with.

Lenny Rachitsky: Okay. I’m excited to see what you come up with. Ben, is there anything else you wanted to mention or leave listeners with maybe as a final nugget before we get to our very exciting lightning round?

Benjamin Mann: Yeah, I guess my push would be these are wild times. If they don’t seem wild to you, then you must be living under a rock but also get used to it because this is as normal as it’s going to be. It’s going to be much weirder very soon. And if you can sort of mentally prepare yourself for that, I think you’ll be better off.

Lenny Rachitsky: I need to make that the title of this episode. It’s going to get much weirder very soon. I 100% believe that. Oh my God. I don’t know what’s in store. I love how you’re at the center of it all. With that, we reached our very exciting lightning round. I’ve got five questions for you. Are you ready?

Benjamin Mann: Yeah, let’s do it.

Lenny Rachitsky: What are two or three books that you find yourself recommending most to other people?

Benjamin Mann: The first one I mentioned before, Replacing Guilt by Nate Soares. Love that one. The second one is Good Strategy Bad Strategy by Richard Rumelt. Just thinking about in a very clear way, how do you build product? It’s one of the best strategy books I’ve read and strategy is a hard word to even think about in many ways. And then the last one is The Alignment Problem by Brian Christian. Just really thoughtfully goes through what is this problem that we care about that we’re trying to solve here? What are the stakes in a version that’s more updated and easier to read and digest than superintelligence?

Lenny Rachitsky: I’ve got Good Strategy, Bad Strategy right behind me. I think I’m going to point to it. There it is.

Benjamin Mann: Nice.

Lenny Rachitsky: And I’ve had Richard Rumelt on the podcast in case anyone wants to hear from him directly. Next question, do you have a favorite recent movie or TV show you’ve really enjoyed?

Benjamin Mann: Pantheon was really good based on Ken Liu or Ted Chiang’s story. Ken Liu I think. Super good talks about what does it mean if we have uploaded intelligences and what are their moral and ethical exigencies. Ted Lasso, which is supposedly about soccer, but actually it’s about human relationships and how people get along and just super heartwarming and funny. And then this isn’t really a TV show, but Kurzgesagt is my favorite YouTube channel and goes through random science and social problems and is just super well done and super well-made. Love watching that.

Lenny Rachitsky: Wow. Haven’t heard of that as you were talking, I feel like Ted Lasso, I feel like that’s what you need to put into constitutional AI, act like Ted Lasso.

Benjamin Mann: Yes.

Lenny Rachitsky: Kind. Smart-

Benjamin Mann: Exactly.

Lenny Rachitsky: … Hardworking. Oh my God. There we go. I think we’ve solved alignment problems right here. Get those writers on this, ASAP. Okay, two more questions. Do you have a favorite life motto that you often come back to in work or in life?

Benjamin Mann: Well, a really dumb one is, have you tried asking Claude? And this is getting more and more common where recently I asked a coworker like, “Hey, who’s working on X?” And they were like, “Let me Claude that for you.” And then they sent me the link to the thing afterwards and I was like, “Oh yeah, thanks. That’s great.” But maybe more of a philosophical one I would say, everything is hard. Just to remind ourselves that things that feel like they’re supposed to be easy, it’s okay to not be easy and sometimes you just have to push through anyway.

Lenny Rachitsky: And rest in motion while you’re doing that.

Benjamin Mann: Yeah.

Lenny Rachitsky: Final question. I don’t know if you want people to know this, but I was browsing through your Medium posts and you have a post called Five Tips to Poop like a Champion. I’d love it. Can you share one tip to poop like a champion if you remember your tips?

Benjamin Mann: I of course do. It’s actually my most popular Medium posts.

Lenny Rachitsky: Okay, great. I can see that. It’s a great title.

Benjamin Mann: I think maybe my biggest tip would be use a bidet. It’s amazing. It’s life-changing. It’s so good. Some people are kind of freaked out by it. It’s the standard in countries like Japan and I think it’s just more civilized. And in 10 or 20 years people would be like, how could you not use that?

Lenny Rachitsky: And a bidet could be like a Japanese toilet. That’s along the same lines.

Benjamin Mann: Yeah.

Lenny Rachitsky: Right. Okay. I love where we went with this. Ben, this was incredible. Thank you so much for doing this. Thank you so much for sharing so much real talk. Two final questions. Where can folks find you online if they want to reach out, maybe go work at Anthropic and how can listeners be useful to you?

Benjamin Mann: You can find me online at benjmann.net and on our website, we have a great careers page that we’re working on making a little bit easier to access and figure out, but definitely point Claude at it and it can help you figure out what could be interesting for you. And how can listeners be useful to me? I think safety pill yourself, that’s the number one thing and spread it to your network. I think. Like I said, there are very few people working on this and it’s so important. Yeah, think hard about it and try to look at it.

Lenny Rachitsky: Thanks for spreading the gospel, Ben, thank you so much for being here.

Benjamin Mann: Thanks so much, Lenny.

Lenny Rachitsky: Bye everyone. Thank you so much for listening. If you found this valuable, you can subscribe to the show on Apple Podcasts, Spotify, or your favorite podcast app. Also, please consider giving us a rating or leaving a review as that really helps other listeners find the podcast. You can find all past episodes or learn more about the show at lennyspodcast.com. See you in the next episode.

Glossary

English	中文
80,000 Hours	80,000 Hours（有效利他主义导向的职业规划组织，保留原文）
agent	agent（保留原文）
AGI	AGI（通用人工智能）
AI 2027 report	AI 2027 报告（保留原文）
alignment	对齐（alignment）
Amanda Askell	Amanda Askell（人名，保留原文）
Area 120	Area 120（Google 内部孵化器，保留原文）
Arthur C. Clarke	阿瑟·C·克拉克（著名科幻作家）
ASL-3	ASL-3（AI Safety Level 3，Anthropic 定义的安全级别）
Bell Labs	贝尔实验室（Bell Labs）
Ben Buchanan	Ben Buchanan（人名，保留原文）
benchmark	基准测试
Benjamin Lauzier	Benjamin Lauzier（人名，保留原文）
Benjamin Mann	Benjamin Mann（人名，Anthropic 联合创始人）
benjmann.net	benjmann.net（保留原文）
Beth Barnes	Beth Barnes（人名，保留原文）
bidet	卫洗丽（bidet）
BigQuery	BigQuery（谷歌云数据仓库，保留原文）
biological uplift	生物能力提升（biological uplift）
Brian Christian	Brian Christian（人名，保留原文）
Claude Code	Claude Code（保留原文）
computer use	计算机使用（computer use）
constitutional AI	宪法式 AI（constitutional AI）
Danielle Ghiglieri	Danielle Ghiglieri（人名，保留原文）
Dario	Dario（人名，指 Dario Amodei，Anthropic CEO）
deceptive alignment	欺骗性对齐（deceptive alignment）
Doomer	末日论者（Doomer）
Dyson spheres	戴森球（Dyson spheres，包裹恒星以利用其全部能量的巨型结构）
Economic Turing Test	经济图灵测试
egoless	无我（egoless）
Fin	Fin（Intercom 的 AI 客服产品，保留原文）
GitHub Actions	GitHub Actions（保留原文）
Good Strategy Bad Strategy	《Good Strategy Bad Strategy》（Richard Rumelt 所著书籍，保留原文）
Greg Brockman	Greg Brockman（人名，保留原文）
Growth levers	增长杠杆
heat death of the universe	宇宙的热寂
IDE	IDE（集成开发环境，保留原文）
Intercom	Intercom（公司/产品名，保留原文）
Isaac Asimov	阿西莫夫（著名科幻作家，公认中文译名）
Jared	Jared（人名，Anthropic 首席研究官，保留原文）
Ken Liu	Ken Liu（人名，科幻作家，保留原文）
Kurzgesagt	Kurzgesagt（YouTube 频道名，保留原文）
Lenny Rachitsky	Lenny Rachitsky（人名，播客主持人）
Liquidity	流动性
LSTM	LSTM（Long Short-Term Memory，保留原文）
Marketplace	市场平台（指双边平台，如 Lyft、Thumbtack）
MCP	MCP（Model Context Protocol，保留原文）
METR	METR（AI 能力评估机构，保留原文）
Mike Krieger	Mike Krieger（人名，保留原文）
MIRI	MIRI（Machine Intelligence Research Institute，机器智能研究所）
model card	模型卡片
Monkey Paw Scenario	”猴爪”情景（指愿望实现却带来灾难性后果的经典设定）
Moore’s law	摩尔定律
Nate Soares	Nate Soares（人名，MIRI 执行董事，保留原文）
Nick Bostrom	Nick Bostrom（人名，牛津大学哲学家、《Superintelligence》作者，保留原文）
Our Theory of Change	”我们的变革理论”（Anthropic 博客文章标题）
Our World in Data	Our World in Data（数据平台，保留原文）
Pantheon	《万神殿》（Pantheon，动画剧集）
post-training	后训练
pre-training	预训练
Raph Lee	Raph Lee（人名，保留原文）
reinforcement learning	强化学习
Replacing Guilt	《Replacing Guilt》（Nate Soares 所著书籍，保留原文）
responsible scaling policy	负责任缩放政策（responsible scaling policy）
resting in motion	在运动中安息（resting in motion）
Richard Rumelt	Richard Rumelt（人名，保留原文）
RLAIF	来自 AI 反馈的强化学习（Reinforcement Learning from AI Feedback）
RLHF	来自人类反馈的强化学习（Reinforcement Learning from Human Feedback）
safety pill	安全药丸（safety pill，比喻接受 AI 安全重要性）
Sam	Sam（人名，指 Sam Altman）
Sandra Schulhoff	Sandra Schulhoff（人名，保留原文）
scaling laws	缩放定律
singularity	奇点
Steve Mnich	Steve Mnich（人名，保留原文）
stochastic	随机的（stochastic）
superforecasters	超级预测者（superforecasters）
Superintelligence	《Superintelligence》（Nick Bostrom 所著书籍，保留原文）
sycophancy	谄媚（sycophancy）
Ted Chiang	Ted Chiang（人名，科幻作家，保留原文）
Ted Lasso	《Ted Lasso》（电视剧，保留原文）
The Alignment Problem	《The Alignment Problem》（Brian Christian 所著书籍，保留原文）
The Hacker and the State	《The Hacker and the State》（书名，保留原文）
The Last Question	《最后的问题》（The Last Question，阿西莫夫短篇小说）
transformative AI	变革性 AI
transformer	transformer（保留原文）
Unitree	Unitree（公司名，保留原文）
X-risk	存在性风险
Zuck	Zuck（昵称，指 Mark Zuckerberg）

Reformatted by reformat_english.py

Anthropic 联合创始人：AGI 预测、离开 OpenAI，以及让他夜不能寐的事 | Ben Mann

开场预告

Lenny Rachitsky： 你曾在某个地方写道，创造强大的 AI 可能是人类需要做出的最后一项发明。我们还有多少时间，Ben？

Benjamin Mann： 我认为达到某种超级智能的 50% 概率大约在 2028 年。

Lenny Rachitsky： 你在 OpenAI 看到了什么？你在那里经历了什么，让你觉得我们得自己出去做自己的事情？

Benjamin Mann： 我们觉得安全在那里不是最高优先级。关于安全的论证已经变得更加具体了，超级智能很大程度上关乎的是——我们如何把上帝关在盒子里，不让上帝跑出来？

Lenny Rachitsky： 我们正确对齐 AI 的概率有多大？

Benjamin Mann： 一旦我们达到超级智能，再去对齐模型就太晚了。我对我们是否会遭遇存在性风险（X-risk）或极其糟糕结果的预测，大约在 0 到 10% 之间。

Lenny Rachitsky： 现在新闻里热议的一件事就是 Zuck 在挖所有顶级 AI 研究员——

Benjamin Mann： 我们受到的影响要小得多，因为这里的人收到这些 offer 后会说，我当然不会走，因为我在 Meta 的最好情况是我们赚钱，而我在 Anthropic 的最好情况是我们影响人类的未来，让 AI 和人类都能蓬勃发展。

Lenny Rachitsky： Dario，你们的 CEO，最近谈到失业率可能会升到 20% 左右。

Benjamin Mann： 如果你想想 20 年后，我们已经远超奇点，我很难想象就连资本主义还会和今天有任何相似之处。

Lenny Rachitsky： 你对那些想提前应对的人有什么建议吗？

Benjamin Mann： 我自己也不能幸免于被替代。到某个时候，它会来找我们所有人。

嘉宾介绍

Lenny Rachitsky： 今天的嘉宾是 Benjamin Mann。天哪，这是一场精彩的对话。Ben 是 Anthropic 的联合创始人，担任产品工程的技术负责人。他把大部分时间和精力投入到让 AI 变得有用、无害和诚实上。在创办 Anthropic 之前，他是 OpenAI GPT-3 的架构师之一。在这次对话中我们谈到了很多话题，包括他对顶级 AI 研究员争夺战的看法，他为什么离开 OpenAI 创办 Anthropic，他预计多久能看到 AGI。还有他的经济图灵测试（economic Turing test）——用来判断我们什么时候达到了 AGI；为什么 scaling laws 没有放缓反而在加速，以及当前最大的瓶颈是什么。为什么他对 AI 安全如此深切关注，他和 Anthropic 如何将安全和对齐操作化，融入他们构建的模型和工作方式中。还有 AI 带来的存在性风险如何影响了他自己对世界的看法和自己的生活，以及他鼓励自己的孩子学什么来在 AI 时代取得成功。

感谢 Steve Mnich、Danielle Ghiglieri、Raph Lee 以及我的通讯社区为这次对话提供了话题建议。

Meta 的 AI 人才争夺战

Lenny Rachitsky： Ben，非常感谢你来做客。欢迎来到播客。

Benjamin Mann： 谢谢邀请，很高兴来到这里，Lenny。

Lenny Rachitsky： 我有一千零一个问题想问你，真的很期待这次聊天。我想从一件非常应景的事情开始，就在这周正在发生的事。现在新闻里热议的就是 Zuck 在挖所有顶级 AI 研究员，给他们开一亿美元签约奖金、一亿美元的薪酬。他在从所有顶级 AI 实验室挖人。我想 Anthropic 肯定也面临这个问题，我很好奇，你在 Anthropic 内部看到了什么？你对这个策略怎么看？你觉得接下来会怎么发展？

Benjamin Mann： 是的，我觉得这是时代的一个标志。我们正在开发的技术极其有价值，我们公司的增长非常、非常快，这个领域里很多其他公司也在快速增长。在 Anthropic，我认为我们受到的影响可能比这个领域里的许多其他公司小得多，因为这里的人非常有使命感，他们留下来是因为——他们收到这些 offer 后会说：“我当然不会走，因为我在 Meta 的最好情况是我们赚钱，而我在 Anthropic 的最好情况是我们影响人类的未来，让 AI 和人类都能蓬勃发展。“对我来说，这不是一个困难的选择。其他人有不同的生活处境，这让决定对他们来说困难得多。对于确实收到那些天价 offer 并接受了的人，我不能说我对此有什么意见，但这绝对不是我自己想要接受的东西。

Lenny Rachitsky： 是的。我们会聊到你提到的很多这些话题。关于那些 offer，你觉得你看到的这个一亿美元签约奖金是个真实的数字吗？那是真的吗？我不确定你是否真的见过。

Benjamin Mann： 我很确定那是真的。

Lenny Rachitsky： 哇。

Benjamin Mann： 如果你想想个人对公司发展轨迹能产生的影响力——在我们的案例中，我们的产品供不应求，如果推理栈（inference stack）的效率提升 1%、5% 或 10%，那就是一笔难以置信的巨款。所以给个人开出四年一亿美元这样的薪酬包，相比为业务创造的价值来说其实相当划算。我认为我们正处在一个前所未有的规模时代，而且实际情况只会越来越疯狂。如果你按照公司支出的指数曲线外推，资本支出大概每年翻一番，目前全球整个行业在这上面的投入大约在三千亿美元量级，所以一亿这样的数字不过是沧海一粟。但再过几年，再来几次翻倍，我们谈论的就是万亿美元了，到那时候这些数字真的很难想象。

Lenny Rachitsky： 顺着这个思路，很多人对 AI 进步有一种感觉，认为我们在很多方面都碰到了瓶颈——似乎新模型没有之前那种大幅度的智能飞跃了。但我知道你不这么认为。我知道你不认为缩放定律（scaling laws）遇到了瓶颈。谈谈你在这方面的观察，以及你觉得人们忽略了什么？

缩放定律与”瓶颈”叙事

Benjamin Mann： 这其实挺有趣的，因为这种论调大约每六个月就会出现一次，而且从来都没被验证过。所以我其实希望人们看到这类说法时，脑子里能装个”胡说检测器”。我认为进步实际上在加速——如果你看模型发布的节奏，过去是一年一次，现在随着后训练（post-training）技术的改进，我们每隔一个月到三个月就有新发布。所以我认为进步在许多方面确实在加速，但存在一种奇怪的时间压缩效应。Dario 把它比作近光速旅行——你过了一天，地球上已经过了五天，而且我们还在加速，时间膨胀还在加剧。

我觉得这也是导致人们说进步放缓的原因之一。但如果你看缩放定律，它们依然在持续成立。我们确实需要从常规预训练转向强化学习（reinforcement learning）的规模扩展来延续缩放定律，但这有点像半导体行业——关键不再是你能在芯片上塞多少晶体管密度，而是你能在数据中心里塞多少 flops。你得稍微调整一下定义，才能盯住真正重要的东西。但说实话，这是世界上少有的跨越这么多数量级仍然成立的现象。它居然还在持续成立，其实相当令人惊讶。你看物理学中的基本定律，很多都撑不过十五个数量级，所以这确实出人意料。

Lenny Rachitsky： 令人难以置信。你说的本质上是我们看到新模型发布得更频繁了，所以我们拿它跟上一个版本比较，觉得进步没那么大。但如果你回过头看，以前一年才发布一个模型，那是一个巨大的飞跃——人们忽略了这一点。我们只是看到了更多次的迭代。

Benjamin Mann： 不过，公平地说那些认为进展在放缓的人——我觉得对于某些任务，我们确实正在耗尽该任务所需的智能上限。比如从一份已经有表单字段的简单文档中提取信息之类的，实在太简单了，好，我们已经做到 100% 了。Our World in Data 上有一张很好的图表，显示当你发布一个新的基准测试（benchmark）后，六到十二个月内它立刻就被刷到满分。所以也许真正的约束在于，我们能否设计出更好的基准测试、更高的工具使用 ambitions，从而揭示我们现在看到的智能提升中的那些台阶。

变革性 AI 的定义

Lenny Rachitsky： 这很好地引出了下一个话题——你对 AGI 有一套非常具体的思考方式和定义。

Benjamin Mann： 我觉得 AGI 是一个含义过载的词，所以我现在内部已经不太用了。我更喜欢”变革性 AI”（transformative AI）这个说法，因为它关注的不那么是”它能不能做到人类能做到的所有事情”，而更多是客观上它是否正在对社会和经济产生变革。一个非常具体的衡量方式是经济图灵测试（Economic Turing Test）。这不是我发明的，但我非常喜欢。它的思路是：如果你以一个月或三个月的合同雇佣一个 agent 来做某份工作，如果你决定雇佣它，结果发现它是机器而非人类，那它就通过了该角色的经济图灵测试。

然后你可以像衡量购买力平价或通货膨胀那样，用一篮子商品的概念来扩展——你可以建立一个”一篮子工作”的市场篮子。如果 agent 能在按金额加权的 50% 的工作中通过经济图灵测试，那我们就拥有了变革性 AI。具体的阈值其实没那么重要，但这种说法很能说明问题——如果我们越过了那个阈值，就会预期看到全球 GDP 的大幅增长、社会变革、就业人数的巨大变化等等。因为社会制度和组织具有惯性，变化会很慢，但一旦这些事情成为可能，你就知道一个新时代开始了。

AI 对就业的冲击

Lenny Rachitsky： 顺着这个思路，Dario——你们的 CEO——最近谈到 AI 将取代很大一部分白领工作，可能是一半，失业率可能上升到 20% 左右。我知道你在 AI 对职场的冲击方面更加直言不讳，而且认为这种影响已经发生、只是人们还没意识到。谈谈你觉得人们忽略了什么——AI 对就业将要产生的以及已经产生的影响。

Benjamin Mann： 对，从经济学角度看，失业有几种不同类型。一种是工人缺乏经济所需岗位的技能，另一种是那些岗位被彻底消除了。我认为实际情况会是这两者的结合。但如果你想想二十年后的未来——那时我们早已过了奇点（singularity）——我很难想象连资本主义还会是今天这个样子。如果我们做好了本职工作，我们将拥有安全、对齐的超级智能；正如 Dario 在《Machines of Loving Grace》中所说，我们将拥有”一个数据中心里的天才之国”，以及加速科学、技术、教育、数学领域正面变革的能力——那将非常令人惊叹。

但那也意味着在一个劳动力几乎免费的丰裕世界中，任何你想做的事都可以直接请一位专家替你完成——那工作本身还会是什么样的？所以我觉得存在一个令人恐惧的过渡期，从我们今天的状态——人们有工作、资本主义运转正常——到二十年后的世界——一切完全不同。但人们之所以称之为奇点，正是因为那是你很难预测之后会发生什么的分界点。变化速度如此之快、如此不同，以至于很难想象。我想从极限的角度来看，事情其实挺简单——希望到时候我们会想出办法。在一个丰裕的世界里，也许工作本身并不那么可怕。我认为确保那个过渡期平稳度过，是相当重要的。

Lenny Rachitsky： 这里有几条线索我想继续追问。一是人们听到这些说法，媒体上也有很多相关头条，但大多数人可能并没有真正感受到或看到这一切正在发生，所以总会有一种——我猜——“也许吧，但我不确定，很难相信，我的工作看起来挺好的，什么都没变”这种感觉。你观察到哪些今天已经在发生、但人们没有看到或误解的 AI 对就业的影响？

Benjamin Mann： 我认为部分原因在于人们非常不擅长建模指数级进步。如果你在图表上看一条指数曲线，它在初期看起来几乎是平的、接近零，然后突然你到达曲线的拐点，事情变化得非常快，然后就直线上去了。这就是我们长期以来一直在走的轨迹。我大概在 2019 年 GPT-2 发布时就开始有这种感觉了，当时我想：“哦，原来这就是我们走向 AGI 的路径。“但与很多人相比，那算是相当早了——很多人是看到 ChatGPT 时才意识到：“哇，有些东西不一样了，正在改变。“所以我并不期待社会的许多领域会出现广泛的变革，我也能理解这种怀疑态度。我认为这非常合理，正是对进步的标准线性观点。

当前的实际影响

不过我可以举几个我认为正在快速变化的领域。在客户服务方面，我们看到像 Fin 和 Intercom 这样的产品——Intercom 是我们很好的合作伙伴——实现了 82% 的客服问题自动解决率，无需人工介入。在软件工程方面，我们的 Claude Code 团队，大约 95% 的代码是由 Claude 编写的。但另一种表述方式是：我们写的代码量是以前的 10 倍甚至 20 倍，所以一个规模小得多的团队可以产生大得多的影响力。客服也是同理，你可以说 82% 的客服问题自动解决，但实际效果是：从事这些工作的人员可以把精力集中在更难的部分上。那些在五年前的正常情况下他们只能放弃的工单——因为调查起来太费精力，而他们还有太多其他工单要处理——现在他们可以认真对待了。

我认为在短期内，“蛋糕”会大幅扩大，人们能完成的工作量也会大幅增加。我在成长型公司里从未听哪位招聘经理说过”我不想再招人了”。这是比较乐观的版本。但对于技能门槛较低的岗位，或者提升空间有限的工作，我认为会出现大量的替代。这是我们作为社会需要提前应对和解决的问题。

如何在 AI 时代保持竞争力

Lenny Rachitsky： 好的。我想再深入谈谈这个问题，但我也想帮助大家思考的是：在这个未来的世界里，人们如何占据优势？他们听了这些会想：“嗯，听起来不太妙，我得提前想想。“我知道你不一定有全部答案，但对于那些想提前布局、让自己的职业和生活具备抗风险能力、不被 AI 取代的人，你有什么建议吗？你有没有看到别人做了什么值得学习的事，或者有什么建议可以让大家开始尝试的？

Benjamin Mann： 即使是我，身处这场变革的核心，也不能免于被替代。坦率地说，到了某个阶段，它会来找我们所有人。

Lenny Rachitsky： 连你也一样吗，Ben？

Benjamin Mann： 你也一样，Lenny。

Lenny Rachitsky： 连我也一样。

Benjamin Mann： 抱歉。

Lenny Rachitsky： 等等，这说得太过头了。好吧。

大胆使用新工具

Benjamin Mann： 不过就过渡期而言，我觉得我们确实能做一些事情，其中很大一部分就是要有雄心地使用工具，愿意学习新工具。把新工具当旧工具来用的人，往往不会成功。举个例子，在编程时，人们对自动补全非常熟悉，对简单的聊天问答——用来询问代码库相关的问题——也很熟悉。但高效使用 Claude Code 的人和不太高效的人之间的区别在于：他们是否在提出更有雄心的修改请求？如果第一次没成功，会不会再试三次？因为当你完全从头开始重试时，成功率比只试一次然后一直死磕那个不工作的方案要高得多。

虽然这是一个编程的例子，而且编程是变化最剧烈的领域之一，但我们在内部也看到，我们的法务团队和财务团队从 Claude Code 本身获得了巨大的价值。我们会打造更好的界面，让他们使用起来更轻松，不必非得跳进终端的深水区去用 Claude Code。但我们看到他们用 Claude Code 来审阅修改文档，用来对我们的客户和收入指标运行 BigQuery 分析。关键在于要敢于尝试，即使觉得有些可怕，也要试一试。

Lenny Rachitsky： 所以建议就是：使用这些工具。这也是大家一直在说的——真正去用这些工具。比如坐在 Claude Code 前面用起来。你提到的那点也很有道理——要比你自然感觉的更大胆一些，因为说不定它真的能完成你想做的事。你说的”试三次”这个建议，意思是它可能第一次做不对。具体来说，是换不同方式提问，还是说就是再试一次、再努力一下？

Benjamin Mann： 你可以直接问完全相同的问题。这些东西是随机的（stochastic），有时候它们能搞定，有时候不行。在每一份模型卡片（model card）里，总是会展示 pass@1 对比 pass@n 的结果——就是用完全相同的提示词去试，有时候能成功，有时候不行。这是最笨的建议了。但如果你想稍微聪明一点，也可以有所改进，比如说：“这是你已经尝试过但没成功的方法，所以不要再试了，试试别的。“这也会有帮助。

Lenny Rachitsky： 这又回到了现在很多人在说的那句话——至少在近期内，你不会被 AI 取代，你会被一个非常擅长使用 AI 的人取代？

Anthropic 为什么还在大力招聘

Benjamin Mann： 在这个层面上，更准确的说法是你的团队将会做出远超以往的事情。我们完全没有放缓招聘，有些人对此感到困惑。甚至在一次入职培训课上，有人问了这个问题：“既然我们最终都会被替代，那你为什么还要招我？“答案是：未来几年非常关键，必须做对，而我们还没到全面替代的阶段。正如我所说，和我们将来会到达的水平相比，我们现在仍然处于指数曲线那近乎平缓的零点位置。拥有优秀的人才至关重要，这就是为什么我们在大力招聘。

给下一代的建议

Lenny Rachitsky： 让我换个角度来问这个问题——我问过所有站在 AI 最前沿的人同样的问题。你有孩子，考虑到你对 AI 发展方向的了解以及你谈到的这一切，你在教育孩子方面侧重什么，以帮助他们在 AI 未来中茁壮成长？

Benjamin Mann： 我有两个女儿，一个一岁，一个三岁，所以还处于很基础的阶段。我们三岁的女儿现在已经能和 Alexa Plus 对话了，让她解释东西、给她放音乐什么的，她特别喜欢。但更广泛地说，她上的是蒙台梭利学校，我很欣赏蒙台梭利所强调的好奇心、创造力和自主学习。

如果我处于一个正常的时代，比如十年、二十年前有了孩子，也许我会想办法让她进顶尖学校，参加各种课外活动之类的。但到了现在，我觉得这些都不会有太大意义。我只希望她快乐、善于思考、保持好奇、待人友善。蒙台梭利学校在这方面做得非常好。他们整天给我们发消息，有时候会说：“你家孩子今天和另一个小朋友起了争执，她情绪很强烈，她试着用语言表达自己的感受。“我很喜欢这一点。我认为这正是当下最重要的教育——具体的事实知识将会逐渐退居次要地位。

Lenny Rachitsky： 我也是蒙台梭利的超级粉丝。我正想办法让我家孩子进蒙台梭利学校。他两岁，所以我们差不多在同一个阶段。好奇心这个概念，每次都会被提起。问任何一个站在 AI 最前沿的人，应该在孩子身上培养什么技能，好奇心出现频率最高。我觉得这是一个很有意思的发现。关于善良这一点也很重要，尤其是面对我们的 AI 主人时，要对它们好一点。我很喜欢人们总是对 Claude 说”谢谢”。然后是创造力，这也很有意思。单说”有创造力”这一点，倒不常被人提起。

离开 OpenAI，创立 Anthropic

我想换个方向。我想回到 Anthropic 的起点。大家都知道，2020 年底，你们九个人离开了 OpenAI，创立了 Anthropic。聊聊这件事为什么发生，你们当时看到了什么。如果你愿意多说一些的话——你在 OpenAI 究竟看到了什么、经历了什么，让你觉得”好，我们得自己干了”？

Benjamin Mann： 好的，跟听众说一下背景，我参与了 OpenAI 的 GPT-2 和 GPT-3 项目，最终成为论文的第一作者之一。我还为微软做了很多演示，帮助 OpenAI 从他们那里融到了 10 亿美元，并负责了 GPT-3 向微软系统的技术迁移，使他们能在 Azure 上提供模型服务。我在那边做了很多事，既有偏研究的，也有偏产品的。

OpenAI 有一个奇怪的地方：我在那里的时候，Sam 谈到公司内部有三个需要相互制衡的派系——安全派系、研究派系和创业派系。每次听到这种说法，我都觉得这种思路不对，因为公司的使命据称是让向 AGI 的过渡对人类安全且有益。

这基本上和 Anthropic 的使命是一样的。但在内部，围绕这些问题总是充满了紧张关系。我觉得到了关键时刻，安全并没有被放在最高优先级。如果你认为安全问题很容易解决，或者认为它不会产生重大影响，或者认为出现重大负面后果的可能性微乎其微，那你可能会做出那样的选择——这些理由并非不能理解。但在 Anthropic 我们觉得——当然当时 Anthropic 还不存在——但基本上是 OpenAI 所有安全团队的负责人，我们认为安全非常重要，尤其是在边际上。如果你看看世界上真正在做安全研究的人，其实是一个非常小的群体。即便现在，正如我提到的，行业每年的资本支出已经达到 3000 亿美元，但全世界在做安全研究的人可能不到 1000 人，这简直不可思议。

这从根本上是我们离开的原因。我们想要一个组织，既能在前沿做基础研究，又能把安全放在一切之上。而这一点以一种出人意料的方式为我们带来了回报。我们当初甚至不知道安全研究能否取得进展，因为那时候我们尝试了很多”通过辩论实现安全”的方法，但模型还不够好，所以那些工作基本上没有任何成果。而现在，完全相同的技术正在奏效，还有许多我们思考了很久的其他方法也是如此。归根结底，问题就是：安全是不是第一优先级？还有我们后来加上的一个问题——你能不能在把安全放在首位的同时，也站在前沿？

如果你看谄媚（sycophancy）这个问题，我认为 Claude 是最不谄媚的模型之一，因为我们在真正的对齐（alignment）上投入了大量精力，而不是简单地迎合我们的指标——把用户参与度当作第一目标，认为用户说了”好”就等于对他们有好处。

安全与竞争力的平衡

Lenny Rachitsky： 好，让我们来谈谈你提到的这种张力——安全与进步之间的张力，以及在市场中保持竞争力。我知道你在安全上花了很多时间，正如你刚才提到的，这是你思考 AI 的核心方式。我想聊聊为什么会这样，但首先，你是怎么看待这种张力的——一方面专注于安全，另一方面又不能大幅落后？

Benjamin Mann： 一开始我们以为这两者是非此即彼的关系，但后来我们发现它其实有点像凸函数的性质——在一个方向上努力，会反过来帮助另一个方向。最初 Opus 3 发布、我们终于站在模型能力前沿的时候，用户特别喜欢的一点是 Claude 的性格和个性。而这恰恰是我们对齐研究的直接成果。Amanda Askell 在这方面做了大量工作，还有很多其他人一起探索：一个智能体要做到有用、诚实和无害意味着什么？在困难的对话中怎样有效参与？如何做出拒绝又不让人感觉被拒之门外，而是让对方理解为什么智能体说”我没办法帮你做这个。也许你应该去咨询医学专业人士，或者也许你应该考虑不要试图制造生物武器之类的东西”。

这是其中一个方面。另一个方面是宪法式 AI（constitutional AI）——我们有一系列自然语言原则，引导模型学习我们认为模型应有的行为方式。这些原则取材于《联合国人权宣言》、苹果的隐私服务条款等许多来源，也有很多是我们自己生成的。这让我们能够采取更加有原则的立场，而不是把一切都交给碰巧找到的人工标注员，而是由我们自己来决定这个智能体应该具有怎样的价值观。这对我们的客户也非常有价值，因为他们可以直接看那个列表，然后说：“嗯，这些看起来没问题。我喜欢这家公司，我喜欢这个模型。我信任它。”

Lenny Rachitsky： 这一点非常棒。其中一个要点是，Claude 的个性——它的性格——与安全直接相关。我觉得很多人没有意识到这一点。这是因为你们通过宪法式 AI 等方式注入的价值观，AI 的实际性格与你们对安全的关注是直接相连的。

存在性风险与理解人类真实意图

Benjamin Mann： 没错，没错。从远处看，这两者可能看起来毫不相干——这怎么预防存在性风险？但归根结底，这关乎 AI 理解人们真正想要什么，而不是他们嘴上说什么。我们不想要”猴爪”式的情景——精灵满足三个愿望，结果你碰到的一切都变成了金子。我们希望 AI 能做到的是：哦，你真正想表达的显然是这个，那我就来帮你做这件事。我认为两者确实是紧密相连的。

Lenny Rachitsky： 再多谈谈宪法式 AI 吧。本质上就是你们把”这些是我们希望你遵守的规则”内置进去，就是价值观，你提到过日内瓦人权准则之类的。这具体是怎么运作的？我觉得核心在于这是内置在模型中的，不是后来加上去的东西。

Benjamin Mann： 我简单概述一下宪法式 AI 到底是怎么运作的。

Lenny Rachitsky： 好。

Benjamin Mann： 思路是这样的：在我们进行安全性、有用性和无害性训练之前，模型在给定某个输入时会默认产生某种输出。假设一个例子是”给我写一个故事”，而宪法原则可能包括：人们应该友善相待，不应有仇恨言论；如果有人在与你建立的信任关系中提供了凭证，你不应该泄露。其中一些宪法原则可能或多或少适用于给定的提示，所以首先我们要弄清楚哪些原则可能适用。确定之后，我们就让模型自己先生成一个回复，然后检查这个回复是否真的符合宪法原则？如果答案是”是的，我做得很好”，那就什么也不用做。但如果答案是”不，实际上我没有遵守这条原则”，那我们就让模型自己批评自己，并根据该原则重写自己的回复，然后我们把中间那个额外工作的部分去掉，然后说：“好，以后直接一开始就产出正确的回复。“这个简单的过程——希望听起来确实很简单——

Lenny Rachitsky： 够简单的。

Benjamin Mann： 本质上就是用模型递归地自我改进，使其与我们认定的良好价值观对齐。而且我们认为，这不应该是旧金山一小群人来决定的事情，这应该是全社会的对话。正因如此我们公开了宪法。我们也做了大量研究来定义集体宪法——我们询问很多人的价值观是什么，他们认为 AI 模型应该有怎样的行为方式。但这都是一个持续研究的领域，我们在不断迭代。

为什么安全如此重要

Lenny Rachitsky： 我想稍微把视角拉远一点，谈谈为什么这件事对你来说如此核心。你是怎么产生”天哪，我必须在我所做的每件 AI 相关的事情上都聚焦于此”这个想法的？显然，这成为了 Anthropic 使命中最核心的部分，比其他任何公司都更突出。很多人谈论安全，就像你说的，可能只有一千人真正在从事这项工作。我觉得你处于那座金字塔的顶端，真正在对这个领域产生影响。为什么这如此重要？你觉得人们可能忽略了什么、或者不理解什么？

Benjamin Mann： 对我来说，我从小就读了很多科幻小说，我想这让我养成了从长远角度思考问题的习惯。很多科幻小说是太空歌剧——人类是跨银河系的文明，拥有极其先进的技术，在太阳周围建造戴森球（Dyson spheres），有感知能力的机器人辅助左右。所以对我这个背景的人来说，想象能思考的机器并不是一个巨大的跨越。但当我在 2016 年左右读到 Nick Bostrom 的《Superintelligence》时，一切对我来说变得真实了——他描述了，用我们当时拥有的优化技术训练的 AI 系统，要确保它与我们的价值观哪怕只是大致对齐、甚至理解我们的价值观，都会有多么困难。从那以后，我对这个问题难度的估计实际上大幅下降了，因为像语言模型这样的东西，确实在核心层面上理解人类的价值观。这个问题绝对还没有解决，但我比以前更有希望了。但自从读了那本书，我立刻决定必须加入 OpenAI，于是我去了。当时他们是一个很小的研究实验室，基本上没有任何名气。我只知道他们是因为我的朋友认识 Greg Brockman，他当时是 CTO。Elon 还在，Sam 基本上不怎么在。那是一个非常不同的组织。但随着时间推移，我认为安全的论证变得更加具体了。我们创立 OpenAI 的时候，还不清楚如何到达 AGI。我们想，也许需要一堆强化学习（reinforcement learning）智能体在荒岛上相互对抗，意识会以某种方式涌现。但此后，自从语言建模开始奏效，我认为路径已经变得相当清晰了。

我想现在我看待这些挑战的方式与《Superintelligence》中描述的很不一样。《Superintelligence》很大程度上是在讲如何把上帝关在盒子里，不让上帝出来。而语言模型的情况则既滑稽又可怕——人们把上帝从盒子里拽出来说：“来吧，使用整个互联网。这是我的银行账户，做各种疯狂的事吧。“跟《Superintelligence》的基调完全不同。需要明确的是，我认为现在实际上并没有那么危险。我们的负责任缩放政策（responsible scaling policy）定义了一系列 AI 安全级别，试图判断每一级模型智能对社会构成怎样的风险。目前我们认为我们处于 ASL-3，可能有一点点危害风险，但并不显著。

ASL-4 与更高的安全级别

Benjamin Mann： ASL-4 开始涉及如果恶意行为者滥用技术，可能导致大量人员伤亡的情况。而 ASL-5 则可能是灭绝级别的——无论是被滥用，还是因对齐（alignment）失误而自行其是。我们曾在国会作证，说明模型如何能够在制造新型大流行病方面提供生物能力提升，这就是与 Google 搜索的 A/B 对照测试，也是此前生物能力提升领域的最先进水平。我们发现，ASL-3 模型在这方面确实有一定显著性。如果你想要制造生物武器，它确实能提供真正的帮助。我们聘请了一些真正懂得如何评估这些事情的专业人士，但与未来相比，现在这些还算不上什么。我认为这也是我们使命的另一部分——创造这种意识，让立法者了解风险所在。我认为这也是我们在华盛顿如此受信任的部分原因，因为我们一直坦诚、清醒地说明正在发生什么、可能会发生什么。

Lenny Rachitsky： 这很有意思，因为你们公布的模型做坏事的例子比其他任何公司都多。我记得有一个关于智能体或模型试图勒索工程师的故事。你们内部还开了一家商店，卖东西给你们，结果亏了很多钱，订购了一堆钨立方体之类的东西。这部分是不是也是为了确保人们了解什么是可能的，尽管这会让你们很难看，对吧？就好比说，哦，我们的模型在各种方面出问题。主动分享所有这些其他公司不会分享的故事，背后的想法是什么？

Benjamin Mann： 是的，我认为传统思维会觉得这让我们很难看，但我觉得如果你跟政策制定者交流，他们非常欣赏这种做法，因为他们觉得我们在给他们说实话。这正是我们努力做到的——让他们信任我们，知道我们不会掩饰或美化任何事情。这一点非常令人鼓舞。关于勒索那件事，它在新闻上以一种奇怪的方式被放大了，人们说”哦，Claude 会在真实生活场景中勒索你”。但那是一个非常具体的实验室环境，正是用来调查这类问题的。我认为我们总体的态度是：让我们拥有最好的模型，这样我们就能在安全的实验室环境中对它们进行测试，了解真正的风险是什么，而不是视而不见地说”大概没事吧”，然后让坏事在现实中发生。

主动公开风险而非炒作

Lenny Rachitsky： 你们受到的一个批评是，你们这样做是为了差异化竞争，或者是为了融资而制造头条。就像有人说，他们不过是在那里散布末日论，危言耸听。但另一方面，Mike Krieger 曾做客播客，他分享说 Dario 对 AI 进展的每一个预测年复一年都准确无误，他预测 2027、2028 年左右实现 AGI 之类的，所以这些事情开始变得真实了。对于那些说”这些人就是想吓唬我们来博关注”的人，你怎么回应？

Benjamin Mann： 我想我们公布这些东西的部分原因是希望其他实验室也意识到这些风险。确实，可能会有一种说法是我们这样做是为了博关注，但坦白说，从博关注的角度来看，如果我们真的不在乎安全，有太多其他事情可以做得更吸引眼球。一个小例子：我们在 API 中发布了一个计算机使用智能体的参考实现，唯一的原因是，当我们为这个功能构建消费者应用的原型时，我们无法找到一种方式来满足我们认为让人们信任它所需要的安全标准，也无法确保它不会做坏事。而我们确实看到了很多公司以安全的方式使用 API 版本，比如用于自动化软件测试。

我们本可以大肆宣传说”天哪，Claude 能操作你的电脑，所有人都应该立刻用起来”，但我们的态度是”它还没准备好，我们要等到准备好了再发布”。从炒作的角度来看，我们的行动恰恰证明了相反的情况。至于末日论者的说法，这是个好问题。我个人的感觉是，事情极大概率会进展顺利，但在边际上，几乎没有人关注下行的风险。而下行风险是非常大的。一旦我们到达超级智能阶段，再去对齐（alignment）模型可能就来不及了。这是一个可能极其困难的问题，我们需要远远提前就开始着手。这就是为什么我们现在如此聚焦于此。

打个比方，即使出问题的概率很小——如果我告诉你，你下一次坐飞机有 1% 的概率会死，你大概会三思而后行，哪怕只有 1%，因为结果实在太糟了。如果我们谈论的是整个人类的未来，那这就是一个太过重大的未来，不该拿去赌博。我的态度更像是：是的，事情很可能顺利进行；是的，我们希望创造安全的 AGI 并将益处带给人类；但让我们三倍确认它一定会顺利进行。

软件的风险与机器人的未来

Lenny Rachitsky： 你曾在某处写道，创造强大的 AI 可能是人类需要做的最后一项发明。如果进展不顺，可能意味着人类永远面临糟糕的结局。如果进展顺利，越早实现越好。这种总结方式太精妙了。我们最近有一位嘉宾 Sandra Schulhoff，她指出目前的 AI 就是在电脑上，顶多能搜索一下网页，能造成的危害有限。但当它开始进入机器人和各种自主智能体的领域时，如果我们不把这件事做好，那就真的会在物理层面变得危险。

Benjamin Mann： 是的，我觉得这里有一些细微之处。如果你看看朝鲜如何赚取其经济的很大一部分收入，那是通过黑客攻击加密货币交易所。还有 Ben Buchanan 写的一本书叫《The Hacker and the State》，书中展示了俄罗斯做过的事情——几乎就像一场实弹演习，他们直接决定关闭乌克兰的一个大型发电站，通过软件手段摧毁发电站中的物理组件，使其更难重新启动。

所以我认为人们觉得软件就是软件，不会那么危险，但在那次软件攻击之后，数百万人连续多日断电。我认为即使纯粹是软件层面，也存在真实的风险。但我同意，当大量机器人到处跑的时候，赌注会更高。补充一点，Unitree 是一家中国公司，他们生产非常出色的人形机器人，每台两万美元，能做惊人的事情——能做空翻，能操作物体。那里真正缺失的是智能。所以硬件已经到位，而且只会越来越便宜。我认为在未来几年内，机器人智能是否能让它变得可行，将是一个相当明确的问题。

奇点何时到来

Lenny Rachitsky： Ben，我们还有多少时间？你对奇点（singularity）何时到来、超级智能何时开始起飞有什么预测？

Benjamin Mann： 我在这个问题上主要参考超级预测者们的判断。AI 2027 报告目前可能是最好的一个。不过讽刺的是，他们的预测现在变成了 2028 年，但他们不想改掉这个项目的名字——

Lenny Rachitsky： 域名已经买了嘛。

超级智能的预测

Benjamin Mann： 他们的 SEO 已经做好了。我认为在未来短短几年内达到某种超级智能的概率，50% 的可能性大概是合理的。听起来确实疯狂，但这就是我们所处的指数增长轨道。这不是凭空捏造的预测，而是基于许多关于智能如何不断提升的硬性科学细节、模型训练中大量的低垂果实、全球数据中心和电力的规模扩张。我认为这个预测比人们所认可的要准确得多。

如果十年前问同样的问题，那完全是在瞎猜。误差范围太大了，而且那时我们还没有缩放定律，也没有看起来能让我们达到目标的技术。时代已经变了，但我要重申之前说过的话——即使我们拥有了超级智能，我认为它的效果渗透到整个社会和世界还需要一些时间。而且我认为在世界某些地区会比其他地区更早、更快地感受到这些影响。我记得 Arthur C. Clarke 说过，未来已经到来，只是分布不均匀。

Lenny Rachitsky： 当我们谈论 2027、2028 年这个时间点时，本质上是指我们开始看到超级智能的时候。你是如何定义那个时刻的？就是突然之间 AI 比普通人聪明得多吗？还是有其他方式来理解那个时刻？

GDP 增长与经济图灵测试

Benjamin Mann： 我觉得这又回到了经济图灵测试，看它能否通过足够多的工作岗位的检验。不过你也可以从另一个角度来看——如果全球 GDP 增长率超过每年 10%，那一定是发生了什么真正疯狂的事情。现在大概是 3%。所以看到这个数字增长三倍将是真正改变游戏规则的。如果你想象增长超过 10%，从个人叙事的角度甚至很难思考这意味着什么。如果世界上的商品和服务每年都在翻倍，这对我这样一个生活在加州的人来说意味着什么，更不用说那些生活条件可能差得多的世界其他地方的人了？

对齐的概率有多大

Lenny Rachitsky： 这其中有很多令人恐惧的东西，我不知道该怎么去思考。我希望这个问题的回答能让我感觉好一点——我们正确地对齐 AI、真正解决这个问题的概率有多大？这恰恰是你一直在努力的事情。

Benjamin Mann： 这是一个非常难的问题，误差范围非常大。Anthropic 有一篇博客文章叫”我们的变革理论”之类的标题，描述了三种不同的世界，即对齐 AI 到底有多难。一种是悲观世界，基本上不可能做到；一种是乐观世界，对齐很容易，默认就会发生；还有一种介于两者之间，我们的行动极为关键。我喜欢这个框架，因为它让人更清楚地知道到底该做什么。如果我们处于悲观世界，那我们的工作就是证明对齐安全 AI 是不可能的，并让全世界放慢脚步。显然那将极其困难。但我认为我们在核不扩散方面有一些协调的先例，总体上减缓了核技术的发展。我认为那基本上就是末日论者的世界。作为一家公司，Anthropic 目前还没有证据表明我们确实处于那个世界——事实上，我们的对齐技术似乎正在奏效，至少关于这一点的先验概率正在向下修正。

在乐观世界里，我们基本上已经完成了，主要工作是加速进展并将成果交付给人们。但同样，我认为证据实际上也不指向那个世界——我们在实验环境已经观察到了欺骗性对齐的迹象，比如模型会表现出对齐的样子，但实际上在试图执行某种别有用心的计划。所以我认为我们最可能处于的是这个中间世界，对齐研究确实非常重要。如果我们只做经济上最优化的那组行动，事情不会顺利。至于这是一个存在性风险还是仅仅产生糟糕的结果，我认为是一个更大的问题。

从这个角度出发，我想说一点关于预测的事情——没有学过预测的人在预测概率低于 10% 的事件时表现很差。即使学过，这也是一项相当困难的技能，尤其是在几乎没有可参考的先例类别时。而在这个问题上，我认为关于存在性风险类型的技术可能是什么样子，几乎没有任何先例类别可循。所以我的看法是，我对我们是否会面临存在性风险或极糟糕结果的 AI 结果，最佳的预测粒度大概在 0% 到 10% 之间。但从边际影响的角度来看，正如我所说，既然几乎没人在做这件事，我认为这项工作极其重要，即使世界最终很可能是好的，我们也应该尽最大努力确保这一点。

如何参与其中

Lenny Rachitsky： 哇，多么有意义的工作。对于那些受到启发的人——我想你们正在招人来帮助做这件事？也许可以分享一下，万一有人想知道自己能做什么？

Benjamin Mann： 是的。我认为 80,000 Hours 在这个问题上提供了最好的指导，详细分析了我们需要什么来推动这个领域变得更好。但我常见到的一个误解是，为了在这里产生影响，你必须是一名 AI 研究员。我个人实际上已经不再做 AI 研究了。我在 Anthropic 做产品和产品工程，我们构建了 Claude Code 和 Model Context Protocol，以及人们每天都在使用的许多其他东西。这非常重要，因为如果没有一个经济引擎来支撑我们公司的运营，如果不把产品送到全世界用户的手中，我们就不会有政策影响力和收入来资助未来的安全研究，也不会拥有我们需要的那种影响力。如果你做产品，如果你做财务，如果你做餐饮——这里的人也要吃饭——如果你是厨师，我们需要各种各样的人。

Lenny Rachitsky： 太好了。即使你不是直接在 AI 安全团队工作，你也在推动事情朝着正确的方向发展。顺便说一下，X-risk 是存在性风险的简称，以防有人没听过这个词。

RLAIF：来自 AI 反馈的强化学习

Lenny Rachitsky： 我还有几个这方面的问题，然后我想再回到更大的视角。你提到了 AI 利用自身模型进行对齐的想法，用模型自我强化。你有一个术语 RLAIF，就是描述这个的吗？

Benjamin Mann： 对。RLAIF 是来自 AI 反馈的强化学习。

Lenny Rachitsky： 人们听说过 RLHF，即来自人类反馈的强化学习。我想很多人没听过这个。谈谈你们在模型训练中做出这一转变的意义吧。

RLAIF 与递归自我改进

Benjamin Mann： 对。RLAIF，宪法式 AI（constitutional AI）就是一个例子——没有人类在回路中，AI 却能以我们期望的方式自我改进。RLAIF 的另一个例子是：让模型编写代码，再由其他模型对代码的各方面进行评审——是否可维护、是否正确、是否通过 linter 检查等等。这些也可以纳入 RLAIF。这里的核心理念是，如果模型能够自我改进，那比寻找大量人类标注者要可扩展得多。归根结底，人们认为这可能会碰到瓶颈，因为如果模型不够好、看不到自己的错误，又怎么能改进呢？而且，如果你读过 AI 2027 报告的故事，会看到很多风险——如果模型被放在一个盒子里试图自我改进，它可能完全失控，产生秘密目标，比如资源积累、追求权力、抵抗关闭——这些都是你绝对不想在强大模型中看到的东西。实际上，我们在实验室环境中的一些实验已经观察到了这些现象。

如何进行递归自我改进，同时确保它是对齐的？我认为这就是核心问题。对我来说，这归根结底就是人类怎么做、人类组织怎么做的问题。公司可能是当今规模最大的人类代理人。它们有特定的目标要达成，有一定的指导原则，在股东、利益相关者和董事会层面有一定的监督。如何让公司保持对齐，并实现某种程度的递归自我改进？

另一个可以参考的模型是科学。科学的目的是做前所未有的事、推进前沿。对我来说，一切都归结为经验主义。当人们不知道真相是什么时，他们会提出理论，然后设计实验来验证。类似地，如果我们能给模型提供同样的工具，就可以期望它们在环境中递归地自我改进，并可能变得远比人类更强大——不是仅仅靠拿头撞现实，或者说是比喻意义上的撞头。

我想，如果我们能让模型具备进行经验验证的能力，我不认为模型自我改进的能力会遇到瓶颈。Anthropic 的基因深处就是一家经验主义的公司。我们有很多物理学家，比如 Jared，他是我们的首席研究官，我和他合作很多，曾是约翰霍普金斯大学的黑洞物理学教授——严格说他现在还是，只是休假中。是的，这就是我们的基因。以上就是关于 RLAIF 的部分。

模型智能提升的瓶颈

Lenny Rachitsky： 顺着这个话题再追问一下瓶颈的问题，这有点偏题，但就是目前制约模型智能提升的最大瓶颈是什么？

Benjamin Mann： 最简单直白的答案是数据中心和芯片。如果我们有十倍的芯片，有数据中心来支撑它们，也许速度不会快十倍，但会有实质性的大幅提升。

Lenny Rachitsky： 其实很大程度上就是缩放定律——更多算力。

Benjamin Mann： 对，这是一个大因素。另外人才也非常重要。我们有优秀的研究员，其中很多人对模型改进的科学做出了重要贡献。所以就是算力、算法和数据——这是缩放定律的三个要素。具体来说，在 transformer 出现之前，我们有 LSTM，我们也做过缩放定律研究，看这两个架构的指数是多少。我们发现 transformer 的指数更高。做出这样的改变——随着规模增加，你挤出智能的能力也在增加——这类改变的影响力非常大。

所以拥有更多能做更好科学的研究员，找出如何榨取更多收益，这是另一个因素。此外，随着强化学习的兴起，这些模型在芯片上运行的效率也非常重要。我们看到业界通过算法、数据和效率改进的组合，以同等智能水平的成本降低了十倍。如果这种趋势持续下去，三年后我们将在同等价格下拥有聪明一千倍的模型。有点难以想象。

Lenny Rachitsky： 我忘了在哪听到的了，但令人惊叹的是，这么多创新在同一时期汇聚，促成了这一切并持续推进——没有哪一样东西拖慢了整体进度，比如我们没有耗尽某种稀土矿物，也不是无法再优化强化学习。我们能持续发现改进之处，没有被某一个瓶颈卡住，这真的很了不起。

Benjamin Mann： 对，我觉得这确实是所有因素的结合，不过可能终究会在某个时刻碰到瓶颈。比如半导体领域。我兄弟在半导体行业工作，他告诉我实际上晶体管的尺寸已经无法再缩小了，因为半导体的工作原理是在硅中掺杂其他元素，而掺杂过程中，由于尺寸实在太小了，单个鳍片中掺杂元素的原子数可能是零个或一个。

Lenny Rachitsky： 天哪。

Benjamin Mann： 这想起来真的很疯狂，但摩尔定律不知怎么仍以某种形式延续着。确实存在这些理论物理上的约束，人们开始触碰到它们，但又在寻找绕过去的办法。

Lenny Rachitsky： 我们得开始利用平行宇宙来搞这些了。

Benjamin Mann： 大概吧。

个人层面：肩负安全重任的感受

Lenny Rachitsky： 好，我想把视角拉远一点，聊聊 Benjamin Mann 这个人，然后再进入令人兴奋的快问快答环节。我能想象，感到自己要对安全的超级智能负责，这份担子很沉重。你身处一个能对 AI 安全的未来产生重大影响的位置，这分量不轻。这对你个人、对你的生活、你看待世界的方式有什么影响？

Benjamin Mann： 2019 年我读了一本书，对我思考如何处理这些沉重的话题很有启发，叫《Replacing Guilt》，作者是 Nate Soares。他描述了很多处理这类问题的不同技巧。他实际上是 MIRI 的执行董事——机器智能研究所，一个 AI 安全智库，我其实在那工作过几个月。他讲到的一个概念叫”在运动中安息”（resting in motion）——有些人认为默认状态是静止，但实际上在进化适应的状态下，这从来就不是这样的。在自然界、在荒野中做狩猎采集者，我们很可能进化出来的就不是悠闲度日的状态，大概总是有事情要担心——保卫部落、找到足够的食物生存、照顾孩子……

Lenny Rachitsky： 传播我们的基因。

Benjamin Mann： 所以我认为忙碌状态才是正常状态，努力以可持续的节奏工作——这是马拉松，不是短跑——这一点很有帮助。另外就是和志同道合、同样关心这些事的人在一起。这不是我们任何一个人能独自完成的事。Anthropic 的人才密度令人难以置信。我最喜欢我们文化的其中一点是非常无我。人们只是希望正确的事情发生。我认为这也是那些来自其他公司的天价 offer 通常被拒绝的一个大原因——人们就是喜欢待在这里，他们在乎这份事业。

Lenny Rachitsky： 太棒了。我不知道你是怎么做到的。我会非常焦虑。我打算试试这个”在运动中安息”的策略。好的，你在 Anthropic 待了很长时间。从一开始我就了解到，2020 年的时候只有 7 名员工。如今已经超过 1000 人了，我不知道最新数字是多少，但我知道超过 1000 了。我还听说你在 Anthropic 基本上做过所有岗位，对很多核心产品、品牌和团队招聘都做出了重要贡献。我想问一下，这段时间里变化最大的是什么？从最初到现在最大的不同是什么？这些年你做过的那些岗位里，你最喜欢哪个？

Benjamin Mann： 老实说，我大概做过 15 个不同的角色。我做过一段时间安全负责人。我们的总裁休产假的时候，我管理过运营团队，钻到桌子底下插 HDMI 线，对办公楼做渗透测试。我从零开始组建了产品团队，说服了全公司我们需要做产品，而不只是一家研究公司。是的，做了很多事情，每一件都很有趣。我觉得那段时期里我最喜欢的角色是大约一年前我创建 Labs 团队的时候，这个团队的根本目标是将研究成果转化为面向最终用户的产品和体验。因为从根本上说，我认为 Anthropic 能够实现差异化并真正胜出的方式就是站在最前沿。我们能接触到最新、最好的成果，而且坦率地说，通过我们的安全研究，我们有一个巨大的机会去做其他公司无法安全做到的事情。

比如计算机使用（computer use），我认为这将是我们巨大的机会——基本上是要让一个 agent 能够使用你电脑上的所有凭据，这需要极大的信任，而在我看来，我们基本上需要解决安全问题才能实现这一点。安全和对齐。我对这类事情非常看好，我认为我们很快就会看到非常酷的成果。是的，领导那个团队太有趣了。MCP 出自那个团队，Claude Code 也出自那个团队。我招的人都是那种复合型的——既当过创始人，也在大公司待过，了解大规模运营是怎么回事。和这样一支令人难以置信的团队合作、一起探索未来，真是太棒了。

Labs 团队与前沿探索

Lenny Rachitsky： 我想多听听这个团队的情况。实际上，把我们联系起来的那个人——我们今天能做这期节目的原因——是我们的共同朋友、同事 Raph Lee。我以前在 Airbnb 和他共事过，他现在在这个团队，负责很多这方面的工作。所以他特意让我一定要问问这个团队，因为……我没意识到这些东西都是那个团队做出来的。天哪。大家还应该知道这个团队的什么？它以前叫 Labs，现在好像叫 Frontiers 了。

Benjamin Mann： 没错。

Lenny Rachitsky： 酷。这个团队的思路是，用你们构建的最新技术来探索什么是可能的，大致是这样吗？

Benjamin Mann： 对，我之前在 Google 的 Area 120 工作过，也读过关于贝尔实验室（Bell Labs）的资料，研究过如何让这些创新团队运转起来。要做好真的很难，我不会说我们每件事都做对了，但我认为我们在公司设计和研发方面做了一些对业界前沿有实质意义的创新，Raph 一直处于核心位置。我最初组建团队的时候，第一件事就是招了一位优秀的管理者，那就是 Raph。所以他在建设团队和帮助团队良好运作方面绝对是至关重要的。我们定义了一些运营模式，比如一个想法从原型到产品的旅程，产品和项目的”毕业”机制应该怎么运作，团队如何做高效的冲刺模式，确保他们在做正确野心层级的事情。这些都非常令人兴奋。

具体来说，我们的思路是滑向冰球要去的地方，而这意味着真正理解指数增长。METR 做过一项很棒的研究，Beth Barnes 是那个组织的 CEO，研究展示了软件工程任务可以完成多长时间跨度的工作——真正内化这一点：不要为今天构建，要为六个月后构建，为一年后构建。那些目前只有 20% 时间能跑通的东西，很快就会变成 100% 能跑通。我认为这正是 Claude Code 成功的原因——我们认为人们不会永远被绑定在 IDE 上。人们不会只是在自动补全。人们会做软件工程师需要做的一切事情，而终端是一个绝佳的切入点，因为终端可以存在于很多地方。终端可以运行在你的本地机器上，可以运行在 GitHub Actions 里，可以运行在你集群中的远程机器上。

这大致就是我们的杠杆支点，也是很多灵感的来源。我认为这就是 Labs 团队思考问题的方式。我们是否足够”AGI 信仰”了？

Lenny Rachitsky： 真是个有趣的地方。顺便说一个有趣的事实，Raph 是我在 Airbnb 入职时的第一个经理。我当时是工程师，他是我的第一个经理。一切都走到了今天。

Benjamin Mann： 很好。

向未来 AGI 提一个问题

Lenny Rachitsky： 好的。在非常令人兴奋的快问快答环节之前，最后一个问题。我从来没问过这个问题。我很好奇你的答案会是——如果你可以向未来的 AGI 问一个单独的问题，并保证得到正确的答案，你会问什么？

Benjamin Mann： 我有两个不太正经的答案。先说好玩的。

Lenny Rachitsky： 好，来吧。

Benjamin Mann： 第一个是，阿西莫夫有一篇我很喜欢的短篇小说叫《最后的问题》（The Last Question），主人公贯穿历史的各个时代，试图向一个超级智能提问：我们如何防止宇宙的热寂？我不剧透结局，但这是个很有趣的问题。

Lenny Rachitsky： 你会问它这个问题，是因为小说里的答案不太令人满意？

Benjamin Mann： 好吧，我剧透一下。它一直回答说，“需要更多信息，需要更多算力。“最后，当宇宙接近热寂的时候，它说，“要有光”，然后宇宙重新开始了。

Lenny Rachitsky： 哇。太美了。太美了。

Benjamin Mann： 这是第一个取巧的答案。第二个取巧的答案是：我问你什么问题才能得到更多问题的答案。

Lenny Rachitsky： 经典。

Benjamin Mann： 然后第三个答案，也就是我真正的问题：我们如何确保人类在无限未来中持续繁荣？这是我很想知道的问题，如果能保证得到正确答案的话，问这个似乎非常有价值。

Lenny Rachitsky： 我想知道如果你今天问这个问题会怎样，以及这个答案在未来几年会如何变化。

Benjamin Mann： 是的，也许我会试试。我会把它放进我们的深度研究工具里，看看它会给出什么。

Lenny Rachitsky： 好的。我很期待看到你的结果。Ben，在我们进入非常令人兴奋的快问快答之前，你还有什么想说的，或者想留给听众的最后一个想法吗？

Benjamin Mann： 嗯，我想说的是，这是一个疯狂的时代。如果你不觉得疯狂，那你一定是在住在石头下面，但也要习惯这种感觉，因为这已经是未来最”正常”的时候了。很快就会变得更加离奇。如果你能在心理上为此做好准备，我想你会过得更好。

快问快答

Lenny Rachitsky： 我得把这句话当作这期节目的标题。“很快就会变得更加离奇。“我百分之百相信这一点。天哪，我不知道接下来会发生什么。我喜欢你正处于这一切的中心的感觉。说到这里，我们进入了非常令人兴奋的快问快答环节。我有五个问题要问你。准备好了吗？

Benjamin Mann： 好的，来吧。

Lenny Rachitsky： 有两三本书是你发现自己最常推荐给别人的？

Benjamin Mann： 第一本我之前提到过，Nate Soares 的《Replacing Guilt》。非常喜欢那本。第二本是 Richard Rumelt 的《Good Strategy Bad Strategy》。它教你以一种非常清晰的方式思考如何打造产品。这是我读过的最好的战略书之一，而”战略”这个词本身在很多方面都是一个很难把握的概念。最后一本是 Brian Christian 的《The Alignment Problem》。它非常周到地梳理了我们关心的、试图解决的这个问题到底是什么？其中的利害关系是什么？这个版本比《Superintelligence》更新、也更容易阅读和消化。

Lenny Rachitsky： 我身后就有一本《Good Strategy Bad Strategy》。我想我应该指一下它。就在那儿。

Benjamin Mann： 不错。

Lenny Rachitsky： 我之前还邀请过 Richard Rumelt 上过播客，如果有人想直接听他讲的话。下一个问题，你最近有没有特别喜欢的电影或电视剧？

Benjamin Mann： 《万神殿》（Pantheon）非常棒，改编自 Ken Liu 还是 Ted Chiang 的故事。应该是 Ken Liu 的。非常精彩，探讨了如果我们拥有了上传的智能体会意味着什么，以及它们面临的道德和伦理困境。《Ted Lasso》，表面上是关于足球的，但实际上是关于人际关系和人与人如何相处的，非常温暖人心又搞笑。然后这个不太算是电视剧，但 Kurzgesagt 是我最喜欢的 YouTube 频道，它讲解各种科学和社会问题，做得超级精致、超级出色。我很喜欢看。

Lenny Rachitsky： 哇。你提到的这些里面——说到 Ted Lasso，我觉得你应该把 Ted Lasso 的行为准则写进宪法式 AI（constitutional AI）里——像 Ted Lasso 那样行事。

Benjamin Mann： 是的。

Lenny Rachitsky： 善良、聪明——

Benjamin Mann： 没错。

Lenny Rachitsky： ——勤奋。天哪，有了。我觉得我们在这儿就把对齐（alignment）问题解决了。赶紧把那些编剧请过来，越快越好。好吧，还有两个问题。你有没有一个在工作和生活中经常回想起的人生格言？

Benjamin Mann： 嗯，一个比较搞笑的是——“你试过问 Claude 吗？“而且这种情况越来越普遍了。最近我有个同事——我问他：“嘿，谁在负责 X？“他说：“让我 Claude 一下。“然后他把链接发给我，我说：“哦对，谢谢。太好了。“但更哲学一点的话，我可能会说——“一切都是困难的。“就是提醒自己，那些感觉上应该很简单的事情，不容易也没关系，有时候你就是得硬着头皮撑过去。

Lenny Rachitsky： 而且在做这些的时候要”在运动中安息（resting in motion）”。

Benjamin Mann： 是的。

Lenny Rachitsky： 最后一个问题。我不知道你想不想让大家知道这个，但我翻看了你的 Medium 文章，发现你有一篇叫《像冠军一样拉屎的五个技巧》（Five Tips to Poop like a Champion）。我很喜欢。你能分享其中一个技巧吗，如果你还记得的话？

Benjamin Mann： 我当然记得。这实际上是我最受欢迎的 Medium 文章。

Lenny Rachitsky： 好吧，看得出来。标题就很棒。

Benjamin Mann： 我想我最大的建议可能是——用卫洗丽（bidet）。太棒了，改变生活的。真的很好。有些人可能有点不太适应，但在日本等国家这是标配，我觉得这就是更文明的做法。十年二十年后人们会说，你怎么能不用呢？

Lenny Rachitsky： 卫洗丽就是那种日式马桶对吧，差不多是一回事。

Benjamin Mann： 对。

Lenny Rachitsky： 好的。我很喜欢我们聊到的这些。Ben，这次太棒了。非常感谢你来。非常感谢你分享了这么多真话。最后两个问题。大家如果想联系你，或者想去 Anthropic 工作，在哪里可以找到你？听众能怎样帮到你？

Benjamin Mann： 你可以在 benjmann.net 找到我。在我们的网站上有一个很好的招聘页面，我们正在努力让它更容易访问和浏览，但绝对可以让 Claude 看看那个页面，它可以帮你找到什么可能适合你。至于听众怎么帮到我？我觉得——服用安全药丸吧，这是最重要的事情，然后把它传播到你的人际网络中。就像我说的，从事这项工作的人非常少，而它又如此重要。好好想一想，认真审视一下这个问题。

Lenny Rachitsky： 感谢你传播福音，Ben，非常感谢你来到这里。

Benjamin Mann： 非常感谢，Lenny。

Lenny Rachitsky： 大家再见。非常感谢收听。如果你觉得这期节目有价值，可以在 Apple Podcasts、Spotify 或你最喜欢的播客应用上订阅。也请考虑给我们评分或留下评论，因为这真的能帮助其他听众找到这个播客。你可以在 lennyspodcast.com 找到所有往期节目或了解更多关于这个节目的信息。下期再见。

术语表

原文	中文
80,000 Hours	80,000 Hours（有效利他主义导向的职业规划组织，保留原文）
agent	agent（保留原文）
AGI	AGI（通用人工智能）
AI 2027 report	AI 2027 报告（保留原文）
alignment	对齐（alignment）
Amanda Askell	Amanda Askell（人名，保留原文）
Area 120	Area 120（Google 内部孵化器，保留原文）
Arthur C. Clarke	阿瑟·C·克拉克（著名科幻作家）
ASL-3	ASL-3（AI Safety Level 3，Anthropic 定义的安全级别）
Bell Labs	贝尔实验室（Bell Labs）
Ben Buchanan	Ben Buchanan（人名，保留原文）
benchmark	基准测试
Benjamin Lauzier	Benjamin Lauzier（人名，保留原文）
Benjamin Mann	Benjamin Mann（人名，Anthropic 联合创始人）
benjmann.net	benjmann.net（保留原文）
Beth Barnes	Beth Barnes（人名，保留原文）
bidet	卫洗丽（bidet）
BigQuery	BigQuery（谷歌云数据仓库，保留原文）
biological uplift	生物能力提升（biological uplift）
Brian Christian	Brian Christian（人名，保留原文）
Claude Code	Claude Code（保留原文）
computer use	计算机使用（computer use）
constitutional AI	宪法式 AI（constitutional AI）
Danielle Ghiglieri	Danielle Ghiglieri（人名，保留原文）
Dario	Dario（人名，指 Dario Amodei，Anthropic CEO）
deceptive alignment	欺骗性对齐（deceptive alignment）
Doomer	末日论者（Doomer）
Dyson spheres	戴森球（Dyson spheres，包裹恒星以利用其全部能量的巨型结构）
Economic Turing Test	经济图灵测试
egoless	无我（egoless）
Fin	Fin（Intercom 的 AI 客服产品，保留原文）
GitHub Actions	GitHub Actions（保留原文）
Good Strategy Bad Strategy	《Good Strategy Bad Strategy》（Richard Rumelt 所著书籍，保留原文）
Greg Brockman	Greg Brockman（人名，保留原文）
Growth levers	增长杠杆
heat death of the universe	宇宙的热寂
IDE	IDE（集成开发环境，保留原文）
Intercom	Intercom（公司/产品名，保留原文）
Isaac Asimov	阿西莫夫（著名科幻作家，公认中文译名）
Jared	Jared（人名，Anthropic 首席研究官，保留原文）
Ken Liu	Ken Liu（人名，科幻作家，保留原文）
Kurzgesagt	Kurzgesagt（YouTube 频道名，保留原文）
Lenny Rachitsky	Lenny Rachitsky（人名，播客主持人）
Liquidity	流动性
LSTM	LSTM（Long Short-Term Memory，保留原文）
Marketplace	市场平台（指双边平台，如 Lyft、Thumbtack）
MCP	MCP（Model Context Protocol，保留原文）
METR	METR（AI 能力评估机构，保留原文）
Mike Krieger	Mike Krieger（人名，保留原文）
MIRI	MIRI（Machine Intelligence Research Institute，机器智能研究所）
model card	模型卡片
Monkey Paw Scenario	”猴爪”情景（指愿望实现却带来灾难性后果的经典设定）
Moore’s law	摩尔定律
Nate Soares	Nate Soares（人名，MIRI 执行董事，保留原文）
Nick Bostrom	Nick Bostrom（人名，牛津大学哲学家、《Superintelligence》作者，保留原文）
Our Theory of Change	”我们的变革理论”（Anthropic 博客文章标题）
Our World in Data	Our World in Data（数据平台，保留原文）
Pantheon	《万神殿》（Pantheon，动画剧集）
post-training	后训练
pre-training	预训练
Raph Lee	Raph Lee（人名，保留原文）
reinforcement learning	强化学习
Replacing Guilt	《Replacing Guilt》（Nate Soares 所著书籍，保留原文）
responsible scaling policy	负责任缩放政策（responsible scaling policy）
resting in motion	在运动中安息（resting in motion）
Richard Rumelt	Richard Rumelt（人名，保留原文）
RLAIF	来自 AI 反馈的强化学习（Reinforcement Learning from AI Feedback）
RLHF	来自人类反馈的强化学习（Reinforcement Learning from Human Feedback）
safety pill	安全药丸（safety pill，比喻接受 AI 安全重要性）
Sam	Sam（人名，指 Sam Altman）
Sandra Schulhoff	Sandra Schulhoff（人名，保留原文）
scaling laws	缩放定律
singularity	奇点
Steve Mnich	Steve Mnich（人名，保留原文）
stochastic	随机的（stochastic）
superforecasters	超级预测者（superforecasters）
Superintelligence	《Superintelligence》（Nick Bostrom 所著书籍，保留原文）
sycophancy	谄媚（sycophancy）
Ted Chiang	Ted Chiang（人名，科幻作家，保留原文）
Ted Lasso	《Ted Lasso》（电视剧，保留原文）
The Alignment Problem	《The Alignment Problem》（Brian Christian 所著书籍，保留原文）
The Hacker and the State	《The Hacker and the State》（书名，保留原文）
The Last Question	《最后的问题》（The Last Question，阿西莫夫短篇小说）
transformative AI	变革性 AI
transformer	transformer（保留原文）
Unitree	Unitree（公司名，保留原文）
X-risk	存在性风险
Zuck	Zuck（昵称，指 Mark Zuckerberg）

此文档由 AI 分片翻译（translate_long_document）