走进 OpenAI:2026 是智能体之年、AI 最大的瓶颈,以及为什么算力不是问题
Inside OpenAI: 2026 is the year of agents, AI’s biggest bottleneck, and why compute isn’t the issue
Working at OpenAI
Lenny Rachitsky: You lead work on Codex.
Alexander Embiricos: Codex is OpenAI’s coding agent. We think of Codex as just the beginning of a software engineering teammate. It’s a bit like this really smart intern that refuses to read Slack, doesn’t check Datadog unless you ask it to.
OpenAI’s Structure and Operations
Lenny Rachitsky: I remember Karpathy tweeted the gnarliest bugs that he runs into that he just spends hours trying to figure out nothing else has solved, he gives it to Codex, lets it run for an hour and it solves it.
Alexander Embiricos: Starting to see glimpses of the future where we’re actually starting to have Codex be on call for its own training. Codex writes a lot of the code that helps manage its training run, the key infrastructure. So we have a Codex code review that’s catching a lot of mistakes. It’s actually caught some pretty interesting configuration mistakes. One of the most mind-blowing examples of acceleration, the Sora Android app, like a fully new app, we built it in 18 days and then 10 days later, so 28 days total, we went to the public.
On Talent Density
Lenny Rachitsky: How do you think you win in this space?
Alexander Embiricos: One of our major goals with Codex is to get to proactivity. If we’re going to build a super system, has to be able to do things. One of the learnings over the past year is that for models to do stuff, they’re much more effective when they can use a computer. It turns out the best way for models to use computers is simply to write code. And so we’re kind of getting to this idea where if you want to build any agent, maybe you should be building a coding agent.
What is Codex?
Lenny Rachitsky: When you think about progress on Codex, I imagine you have a bunch of evals and there’s all these public benchmarks.
Alexander Embiricos: A few of us are constantly on Reddit. There’s praise up there and there’s a lot of complaints. What we can do is as a product team just try to always think about how are we building a tool so that it feels like we’re maximally accelerating people rather than building a tool that makes it more unclear what you should do as the human?
Initiative and Product Vision
Lenny Rachitsky: Being at OpenAI, I can’t not ask about how far you think we are from AGI.
Alexander Embiricos: The current underappreciated limiting factor is literally human typing speed or human multitasking speed.
Growth Data and Metrics
Lenny Rachitsky: Today, my guest is Alexander Embiricos, product lead for Codex, OpenAI’s incredibly popular and powerful coding agent. In the words of Nick Turley, head of ChatGPT and former podcast guest, “Alex is one of my all time favorite humans I’ve ever worked with, and bringing him and his company into OpenAI ended up being one of the best decisions we’ve ever made.” Similarly, Kevin Weil, OpenAI’s CPO, said, “Alex is simply the best.”
In our conversation, we chat about what it’s truly like to build product at OpenAI, how Codex allowed the Sora team to ship the Sora app, which became the number one app in the app store in under one month. Also, the 20x growth Codex is seeing right now and what they did to make it so good at coding, why his team is now focused on making it easier to review code, not just write code, his AGI timelines, his thoughts on when AI agents will actually be really useful, and so much more. A huge thank you to Ed Bayes, Nick Turley, and Dennis Yang for suggesting topics for this conversation. If you enjoy this podcast, don’t forget to subscribe and follow it in your favorite podcasting app or YouTube. And if you become an annual subscriber of my newsletter, you get a year free of 19 incredible products, including a year free of Devin, Lovable, Replit, Bolt, n8n, Linear, Superhuman, Descript, Wispr Flow, Gamma, Perplexity, Warp, Granola, Magic Patterns, Raycast, ChatPRD, Mobbin, PostHog, and Stripe Atlas. Head on over to lennysnewsletter.com and click Product Pass.
With that, I bring you Alexander Embiricos, after a short word from our sponsors.
Alexander, thank you so much for being here and welcome to the podcast.
Alexander Embiricos: Thank you so much. I’ve been following for ages and I’m excited to be here.
Key Growth Breakthroughs
Lenny Rachitsky: I’m even more excited. I really appreciate that. I want to start with your time at OpenAI. So you joined OpenAI about a year ago. Before that, you had your own startup for about five years. Before that, you were a product manager at Dropbox. I imagine OpenAI is very different from every other place you’ve worked. Let me just ask you this, what is most different about how OpenAI operates and what’s something that you’ve learned there that you think you’re going to take with you wherever you go, assuming you ever leave?
Alexander Embiricos: By far, I would say the speed and ambition of working at OpenAI are just dramatically more than what I can imagine. And I guess it’s kind of an embarrassing thing to say because everyone who’s a startup founder thinks like, “Oh yeah, my startup moves super fast and the talent bar is super high and we’re super ambitious.” But I have to say, working in OpenAI just made me reimagine what that even means.
Intelligence, Training Data, and Model Progress
Lenny Rachitsky: We hear this a lot about feels like every AI company is just like, “Oh my God, I can’t believe how fast they’re moving.” Is there an example of just like, “Wow, that wouldn’t have happened this quickly anywhere else”?
Long-Running Tasks and Compaction
Alexander Embiricos: The most obvious thing that comes to mind is just the explosive growth of Codex itself. I think it’s a while since we bumped our external number, but it’s like the 10x-ing of Codex’s scale was just super fast in a matter of months and it’s well more since then. And once you’ve lived through that, or at least speaking for myself, having lived through that now, I feel like anytime I’m going to spend my time on building tech product, there’s that speed and scale that I now need to meet.
If I think of what I was doing in my startup, it moved way slower and there’s always this balance with startups of how much do you commit to an idea that you have versus find out that it’s not working and then pivot. But I think one thing I’ve realized at OpenAI is the amount of impact that we can have and, in fact, need to have to do a good job is so high that I have to be way more ruthless with how I spend my time now.
Competitive Landscape and Winning Path
Lenny Rachitsky: Before we get to Codex, is there a way that they’ve structured the org or, I don’t know, the way that OpenAI operates that allows the team to move this quickly? Because everyone wants to move super fast. I imagine there’s a structural approach to allowing this to happen.
Alexander Embiricos: I mean, so one thing is just the technology that we’re building with has just transformed so many things from both how we build, but also what kinds of things we can enable for users. And we spend most of our time talking about the sort of improvements within the foundation models, but I believe that even if we had no more progress today with models, which is absolutely not the case, but even if we had no more progress, we are way behind on product. There’s so much more product to build. So I think just the moment is ripe, if that makes sense.
But I think there’s a lot of counterintuitive things that surprised me when I arrived as far as how things are structured. One example that comes to mind is when I was working on my startup and before that, when I was at Dropbox, it was very important, especially as a PM to always rally the ship and it was like make sure you’re pointed in the right direction and then you can accelerate in that direction. But here, I think because we don’t exactly know what capabilities will even come up soon and we don’t know what’s going to work technically, and then we also don’t know what’s going to land even if it works technically, it’s much more important for us to be very humble and learn a lot more empirically and just try things quickly. And the org is set up in that way to be incredibly bottoms up.
This is, again, one of those things that, as you were saying, everyone wants to move fast. I think everyone likes to say that they’re bottoms up, or at least a lot of people do, but OpenAI is truly, truly bottoms up. And that’s been a learning experience for me that now it’ll be interesting if I ever work at… I don’t think it’ll even make sense to work at a non-AI company in the future. I don’t even know what that means. But if I were to imagine it or go back in time, I think I would run things totally new.
Agent Configurability and Script Reuse
Lenny Rachitsky: What I’m hearing is this ready, fire, aim is the approach more than ready, aim, fire. And there’s something, and as you process that, because that may not come across well, but I actually have heard this a lot at AI companies is because you don’t know, and Nick Turley shared I think the same sentiment, because you don’t know how people will use it it doesn’t make sense to spend a lot of time making it perfect. It’s better to just get it out there in a primordial way, see how people use it, and then go big on that use case.
AI’s Impact on Engineering
Alexander Embiricos: Yeah. Okay, to use this analogy a little bit, I feel like there is an aim component, but the aim component is much fuzzier. It’s kind of like, roughly what do we think can happen? Someone I’ve learned a ton from working here is a research lead, and he likes to say that at OpenAI, we can have really good conversations about something that’s a year plus from now, and there’s a lot of ambiguity in what will happen, but that’s a right sort of timeline. And then we can have really good conversations about what’s happening in low months or weeks. But there’s this awkward middle ground, which was as you start approaching a year, but you’re not at a year where it’s very difficult to reason about, right?
And so as far as aiming, I think we want to know, “Okay, what are some of the futures that we’re trying to build towards?” And a lot of the problems we’re dealing with in AI, such as alignment are problems you need to be thinking out really far out into the future. So we’re kind of aiming fuzzily there, but when it comes down to the more tactically like, “Oh yeah, what product will we build and therefore how will people use that product?” That’s the place where we’re much more like, “Let’s find out empirically.”
Lenny Rachitsky: That’s a good way of putting it. Something else that when people hear this, people sometimes hear companies like yours saying, “Okay, we’re going to be bottoms up. We’re going to try a bunch of stuff. We’re not going to have exactly a plan of where it’s going in the next few months.” The key is you all hire the best people in the world. And so that feels like a really key ingredient in order to be this successful at bottoms up work.
Chat-Driven Development
Alexander Embiricos: It just super resonates with me. I was just, again, surprised or even shocked when I arrived at the level of individual drive and autonomy that everyone here has. So I think the way that OpenAI runs, you can’t read this or listen to a podcast and be like, “I’m just going to deploy this to my company.” Maybe this is a harsh thing to say, but I think very few companies have the talent caliber to be able to do that. So it might need to be adjusted if you were going to implement this.
A Tinder-Style Future for Agent Approvals
Lenny Rachitsky: Okay. So let’s talk Codex. You lead work on Codex. How’s Codex going? What numbers can you share? Is there anything you can share there? Also, just not everyone knows exactly what Codex is, explain what Codex is.
The Most Successful AI Product: IDE Autocomplete
Alexander Embiricos: Totally, yeah. So I had the very lucky job of living in the future and leading products on Codex. And Codex is OpenAI’s coding agent. So super concretely, that means it’s an IDE extension, a VS code extension that you can install or a terminal tool that you can install. And when you do so, you can then basically pair with Codex to answer questions about code, write code, run tests, execute code, and do a bunch of the work in that thick middle section of the software development lifecycle, which is all about writing code that you’re going to get into production.
More broadly, we think of Codex as what it currently is just the beginning of a software engineering teammate. So when we use a big word like teammate, some of the things we’re imagining are that it’s not only able to write code, but actually it participates early on in the ideation and planning phases of writing software and then further downstream in terms of validation, deploying and maintaining code.
To make that a little more fun, one thing I like to imagine is if you think of what Codex is today, it’s a bit like this really smart intern that refuses to read Slack and doesn’t check Datadog or Century unless you ask it to. And so no matter how smart it is, how much are you going to trust it to write code without you also working with it? So that’s how people use it mostly today is they pair with it. But we want to get to the point where it can work just like a new intern that you hire, you don’t only ask them to write code, but you ask them to participate across the cycle. So you know that even if they don’t get something right the first try, they’re eventually going to be able to iterate their rate there.
Lenny Rachitsky: I thought the point about not reading Slack and Datadog was it’s just not distracted, it’s just constantly focused and is always in flow. But I get what you’re saying there is it doesn’t have all the context on everything that’s going on.
Contextual Assistance in the Browser
Alexander Embiricos: Yeah. And that’s not only true when it’s performing a task, but again, if you think of the best team and teammates, you don’t tell them what to do. Maybe when you first hire them, you have a couple meetings and you’re like, “Hey,” you learn, “Okay, these prompts work for this teammate, these prompts don’t. This is how to communicate with this person.” Then eventually you give them some starter tasks, you delegate a few tasks. But then eventually you just say like, “Hey, great. Okay, you’re working with this set of people in this area of the code base. Feel free to work with other people on other parts of the code base too, even. And yeah, you tell me what you think makes sense to be done.” And so we think of this as proactivity and one of our major goals with Codex is to get to proactivity.
I think this is critically important to achieve the mission of OpenAI, which is to deliver the benefits of AGI to all humanity. I like to joke today that AI products, and it’s a half joke, they’re actually really hard to use because you have to be very thoughtful about when it could help you. And if you’re not prompting a model to help you, it’s probably not helping you at that time. And if you think of how many times the average user is prompting AI today, it’s probably tens of times. But if you think of how many times people could actually get benefit from a really intelligent entity, it’s thousands of times per day. And so a large part of our goal with Codex is to figure out what is the shape of an actual teammate agent that is helpful by default.
Bottlenecks in Code Validation and Review
Lenny Rachitsky: When people think about Cursor and even Cloud Code, it’s like a IDE that helps you code and auto completes code and maybe does some agentic work. What I’m hearing here is the vision is different, which is it’s a teammate. It’s like a remote teammate, a building code for you that you talk to and ask to do things. And that also does IDE, auto complete and things like that. Is that a kind of a differentiator in the way you think about Codex?
Alexander Embiricos: It’s basically this idea that if you’re a developer and you’re trying to get something done, we want you to just feel like you have superpowers and you’re able to move much, much faster. But we don’t think that in order for you to reap those benefits, you need to be sitting there constantly thinking about, “How can I invoke AI at this point to do this thing?” We want you to be able to plug it in to the way that you work and have it just start to do stuff without you having to think about it.
How Codex Impacts Product Managers
Lenny Rachitsky: Okay. I have a lot of questions along those lines, but just how’s it going? Is there any stats, any numbers you can share about how Codex is doing?
Disposable Code and Design Team Evolution
Alexander Embiricos: Yeah, Codex has been growing absolutely explosively since the launch of GPT-5 back in August. There’s definitely some interesting product insights to talk about as to how we unlock that growth, if you’re interested. But again, the last stat we shared there was we were well over 10x since August. In fact, it’s been 20x since then. Also, the Codex models are serving many trillions of tokens a week now, and it’s basically our most served coding model. One of the really cool things that we’ve seen is that the way that we decided to set up the Codex team was to build a really tightly integrated product and research team that are iterating on the model and the harness together. And it turns out that lets you just do a lot more and try many more experiments as to how these things will work together.
And so we were just training these models for use in our first party harness that we were very opinionated about. And then what we’ve started to see more recently actually is that other major API coding customers are now starting to adopt these models as well. And so we’ve reached a point where actually the Codex model is the most served coding model in the API as well.
Lenny Rachitsky: You hinted at this, what unlocked this growth, I’m extremely interested in hearing that. It felt like before, I don’t know, maybe this was before you joined the team, it just felt like Cloud Code was killing it. Just everyone was sitting on top of Cloud Code. It was by far the best way to code. And then all of a sudden Codex comes around. I remember Karpathy tweeted that he just has never seen a model like this. I think the tweet was the gnarliest bugs that he runs into that he just spends hours trying to figure out nothing else has solved, he gives it to Codex, lets it run for an hour and it solves it. What’d you guys do?
Sora and Atlas: Accelerating Development in Practice
Alexander Embiricos: We have this strong sort of mission here at OpenAI basically to build AGI. And so we think a lot about how can we shape the product so that it can scale. Earlier I was mentioning like, “Hey, if you’re an engineer, you should be getting help from AI thousands of times per day,” and so we thought a lot about the primitives for that when we launched our first version of Codex, which was Codex Cloud. And that was basically a product that had its own computer, lived in the cloud, you could delegate to it. And the coolest part about that is you could run many, many tasks in parallel. But some of the challenges that we saw are that it’s a little bit harder to set that up, both in terms of environment configuration, like giving the model the tools it needs to validate its changes and to learn how to prompt in that way.
My analogy for this is, going back to this teammate analogy, it’s like if you hired a teammate, but you’re never allowed to get on a call with them and you can only go back and forth asynchronously over time. That works for some teammates and eventually that’s actually how you want to spend most of your time. So that’s still the future, but it’s hard to initially adopt. And so we still have that vision of like, that’s what we’re trying to get you to, a teammate that you delegate to and then is proactive, and we’re seeing that growing. But the key unlock is actually first you need to land with users in a way that’s much more intuitive and trivial to get value from.
So the way that most people discover, the vast majority of users discover Codex today is either they download an IDE extension or they run it in their CLI and the agent works there with you on your computer interactively. And it works within a sandbox, which is actually a really cool piece of tech to help that be safe and secure, but it has access to all those dependencies. So if the agent needs to do something, it needs to run a command, it can do so within the sandbox. We don’t have to set up any environment. And if it’s a command that doesn’t work in the sandbox, it can just ask you. And so you can get into this really strong feedback loop using the model. And then over time, our team’s job is to help turn that feedback loop into you as a byproduct of using the product, configuring it so that you can then be delegating to it down the line.
And again, analogy, keep coming back to it, but if you hire a teammate and you ask them to do work, but you just give them a fresh computer from the store, it’s going to be hard for them to do their job. But if as you work with them side by side, you could be like, “Oh, you don’t have a password for this service we use, here’s the password for this service. Yeah, don’t worry, feel free to run this command,” then it’s much easier for them to then go off and do work for hours without you.
Can Non-Engineers Build Software?
Lenny Rachitsky: So what I’m hearing is the initial version of Codex was almost too far in the future. It’s like a remote in the cloud agent that’s coding for you asynchronously. And what you did is, “Okay, let’s actually come back a little bit, let’s integrate into the way engineers already integrate into IDs and locally and help them on ramp to this new world,”
Alexander Embiricos: Totally. And it was quite interesting because we dogfood product a ton at OpenAI. So dogfood as in we use our own product. And so Codex has been accelerating OpenAI over the course of the entire year, and the cloud product was a massive accelerant to the company as well. It just turns out that this was one of those places where the signal we got from dogfooding is a little bit different from the signal you get from the general market because at OpenAI, we train reasoning models all day and so we’re very used to this kind of prompting and think upfront, run things massively in parallel and it would take some time and then come back to it later asynchronously. And so now when we build, we still get a ton of signal from dogfooding internally, but we’re also very cognizant of the different ways that different audiences use the product.
The Future of Vertical AI Startups
Lenny Rachitsky: That’s really funny. It’s like live in the future, but maybe not too far in the future. And I could see how everyone at OpenAI is living very far in the future, and sometimes that won’t work for everyone.
Codex Metrics and KPIs
Alexander Embiricos: Yeah.
The Original Vision for Atlas Browser
Lenny Rachitsky: What about just intelligence training data? I don’t know, is there something else that helped Codex accelerate its ability to actually code? Is it better, cleaner data? Is it more just models advancing? Is there anything else that really helped accelerate?
Advantages of Browsers and Context Awareness
Alexander Embiricos: Yeah, so there’s a few components here. I guess you were mentioning models and the models have improved a ton. In fact, just last Wednesday, we shipped GPT-5.1-Codex-Max, a very accurately named model, that is awesome. It is awesome both because it is for any given task that you were using GPT-5.1-Codex for, it’s roughly 30% faster at accomplishing that task. But also it unlocks a ton of intelligence. So if you use it at our higher reasoning levels, it’s just even smarter. And that tweet you were saying Karpathy made about, “Hey, give this your gnarliest bugs,” obviously there’s a ton going on in the market right now, but Codex-Max is definitely carrying that mantle of us tackling the hardest bugs. So that is super cool.
But I will say it’s like some of how we’re thinking about this is evolving a little bit from being like, “Yeah, we’re just going to think about the model and let’s just train the best model,” to really thinking about what is an agent actually overall? And I’m not going to try to define agent exactly, but at least the stack that we think of it as having is it’s like you have this model, really smart reasoning model that knows how to do a specific kind of task really well, so we can talk about how we make that possible. But then actually we need to serve that model through an API into a harness, and both of those things also have a really big role here.
So for instance, one of the things that we’re really proud of is you can have GPT-5.1-Codex-Max work for really long periods of time. That’s not normal, but you can set it up to do that or that might happen. But now routinely we’ll hear about people saying, “Yeah, it ran overnight or it ran for 24 hours.” And so for a model to work continuously for that amount of time, it’s going to exceed its context window. And so we have a solution for that, which we call compaction.
But compaction is actually a feature that uses all three layers of that stack. So you need to have a model that has a concept of compaction and knows like, “Okay, as I start to approach this context window, I might be asked to prepare to be run in a new context window.” And then at the API layer, you need an API that understands this concept and has an endpoint that you can hit to do this change. And at the harness layer, you need a harness that can prepare the payload for this to be done. So shipping this compaction feature that now just made this behavior possible to anyone using Codex actually meant working across all three things. And I think that’s increasingly going to be true.
Another maybe underappreciated version of this is if you think about all the different coding products out there, they all have very different tool harnesses with very different opinions on how the model should work. So if you want to train a model to be good at all the different ways it could work, maybe you have a strong opinion that it should work using semantic search. Maybe you have a strong opinion that it should call bespoke tools or maybe you have, in our case, a strong opinion that it should just use the shell and work in the terminal, you can move much faster if you’re just optimizing for one of those worlds. So the way that we built Codex is that it just uses the shell, but in order to make that safer and secure, we have a sandbox that the model is used to operating in.
So I think one of the biggest accelerants, to go all the way back to answer to your question, is just we’re building all three things in parallel and tuning each one and constantly experimenting with how those things work with a tightly integrated product and research team.
Best Codebases and Practices for Codex
Lenny Rachitsky: Do you think you win in this space? Do you think it’ll always be this kind of race with other models constantly leapfrogging each other? Do you think there’s a world where someone just runs away with it and no one else can ever catch up? Is there a path to just, “We win”?
Alexander Embiricos: Again, comes back to this idea of building a teammate, and not just a teammate that participates in team planning and prioritization, not just a teammate that really tests its code and helps you maintain and deploy it. But even a teammate… If you think, again, an engineering teammate, they can also schedule a calendar invite or move standup or do whatever, right? And so in my mind, if we just imagine that every day or every week some crazy new capability is just going to be deployed by a research lab, it’s just impossible for us as humans to keep up and use all this technology. So I think we need to get to this world where you kind of just have an AI teammate or super assistant that you just talk to and it just knows how to be helpful on its own. So you don’t have to be reading the latest tips for how to use it, you’ve plugged it in and it just provides help.
So that’s kind of the shape of what I think we’re building. And I think that will be a very sticky winning product if we can do so. So the shape that in my head, at least I have, is that we build… Maybe a fun topic is like, “Is Chat the right interface for AI?” I actually think Chat is a very good interface when you don’t know what you’re supposed to use it for. In the same way that if I think of I’m on MS Teams or in Slack with a teammate, Chat is pretty good. I can ask for whatever I want. It’s kind of the common denominator for everything. So you can chat with a super assistant about whatever topic you want, whether it be coding or not. And then if you are a functional expert in a specific domain such as coding, there’s a GUI that you can pull up to go really deep and look at the code and work with the code.
So I think what we need to build as OpenAI is basically this idea of you have Chat, ChatGPT and not as a tool that’s ubiquitously available to everyone, you start using it even outside of work to just help you. You become very comfortable with the idea of being accelerated with AI. So then you get to work and you just can naturally just, “Yeah, I’m just going to ask it for this and I don’t need to know about all the connectors or all the different features. I’m just going to ask it for help and it’ll surface to me the best way that it can help at this point in time and maybe even chime in when I didn’t ask it for help.” So in my mind, if we can get to that, I think that’s how we really build the winning product.
Tips for Beginners
Lenny Rachitsky: This is so interesting because with my chat with Nick Turley, the head of ChatGPT, I think he shared that the original name for ChatGPT was Super Assistant or something like that. And it’s interesting that there’s that approach to the super assistant and then there’s this Codex approach. It’s almost like the B2C version and the B2B version. And what I’m hearing is the idea here is, okay, you start with coding and building and then it’s doing all this other stuff for you, scheduling meetings, I don’t know, probably posting in Slack, I don’t know, shipping designs. I don’t know, is the idea that this is the business version of ChatGPT in a sense, or is there something else there?
Alexander Embiricos: Yeah. So we’re getting to the one-year time horizon conversation. A lot of this might happen sooner, but in terms of fuzziness, I think we’re at the one year. So I’ll give you a contention and a plausible way we get there, but as for how it happens, who knows? So basically, if we’re going to build a super assistant, it has to be able to do things. So we’re going to have a model and it’s going to be able to do stuff affecting your world. And one of the learnings I think we’ve seen over the past year or so is that for models to do stuff, they’re much more effective when they can use a computer.
Right, okay, so now we’re like, okay, we need the super assistant that can use a computer, or many computers. And now the question is, okay, well, how should it use the computer? And there’s lots of ways to use a computer. You could try to hack the OS and use accessibility APIs, maybe a bit easier as you could point and click. That’s a little slow and unpredictable sometimes. And another way, it turns out the best way for models to use computers is simply to write code. So we’re kind of getting to this idea where, well, if you want to build any agent, maybe you should be building a coding agent and maybe to the user, a non-technical user, they won’t even know they’re using a coding agent, the same way that no one thinks about are they using the internet or not, which is they’re more just like, “Is WiFi on?”
So I think that what we’re doing with Codex is we’re building a software engineering teammate, and as part of that, we’re kind of building an agent that can use a computer by writing code. And so we’re already seeing some pull for this. It’s quite early, but we’re starting to see people who are using Codex for coding adjacent product purposes. And so as that develops, I think we’ll just naturally see that, oh, it turns out we should just always have the agent write code if there is a coding way to solve a problem instead of… Even if you’re doing a financial analysis, maybe write some code for that.
So basically like you were like, “Hey, is this the two ends of this product for the super assistant of ChatGPT?” In my mind, just coding is a core competency of any agent including ChatGPT. And so really what we think we’re building is that competency. So here’s the really cool thing about agents writing code is that you can import code. Code is composable, interoperable. Because one very reductive view we could have for an agent is it’s just going to be given a computer and it’s just going to point and click and go around. But that is the future. And then how we get there is difficult to chart a path because a lot of the questions around building agents aren’t like, “Can the agent do it?” But it’s more about, “Well, how can we help the agent understand the context that it’s working in?” And the team that’s using it probably has a way that they like to do things. They have guidelines. They probably want certain deterministic guarantees about what the agent can or cannot do. Or they want to know that the agent understands this detail.
An example would be if we’re looking at a crash reporting tool, hitting a connector for it, every sub-team probably has a different meta prompt for how they want the crashes to be analyzed. And so we start to get to this thing where, yeah, we have this agent sitting in front of a computer, but we need to make that configurable for the team or for the user and let them… Stuff that the agent does often, we probably just want to build in as a competency that this agent has that it can do.
So I think we end up with this generalizable thing, that you were saying, of an agent that can just write its own scripts for whatever it wants to do. But I think that the really key part here is can we make it so that everything that the agent has to do often or that it does well, we can just remember and store so that the agent doesn’t have to write a script for that again? Or maybe if I just joined a team and you are already on the same team as me, I can just use all those scripts that the agents had written already.
Skill Recommendations for the AI Era
Lenny Rachitsky: Yeah, it’s like if this is our teammate, they can share things that it’s learned from working with other people at the company. It just makes sense as a metaphor.
Agent Self-Validation and Human-in-the-Loop
Alexander Embiricos: Right. Yeah.
Examples from the Frontier
Lenny Rachitsky: It feels like you’re in the Karpathy camp of, “Agents today are not that great and mostly slop and maybe in the future they’ll be awesome.” Does that resonate?
Alexander Embiricos: So I think coding agents are pretty great. I think we’re seeing a ton of value there.
The AGI Timeline
Lenny Rachitsky: Yeah, that feels right. That feels right, yeah.
Alexander Embiricos: And then I think agents outside of coding, it’s still very early. And this is just my opinion, but I think they’re going to get a whole lot better once they can use coding too in a composable way. It’s kind of the fun part of when you’re building for software engineers, at my startup, we were building for software engineers too for a lot of that journey, and they’re just such a fun audience to build for because they also like building for themselves and are often even more creative than we are in thinking about how to use the technology. So by building for software engineers, you get to just observe a ton of emergent behaviors and things that you should do and build into the product.
Hiring and Lightning Q&A
Lenny Rachitsky: I love how you say that because a lot of people building for engineers get really annoyed because the engineers they’re just always complaining about stuff. They’re like, “Ah, that sucks. Why’d you build it this way?” I love that you enjoy it, but I think it’s probably because you’re building such an amazing tool for engineers that can actually solve problems and just code for them.
Kind of along those lines, there’s always this talk of what will happen with jobs, engineers, coding, do you have to learn coding? All these things. Clearly the way you’re describing it is it’s a teammate, it’s going to work with you, make you more superhuman, it’s not going to replace you. What’s the way you just think about the impact on the field of engineering, having this super intelligent engineering teammate?
Rapid Fire Q&A
Alexander Embiricos: I think there’s two sides to it, but the one we were just talking about is this idea that maybe every agent should actually use code and be a coding agent. And in my mind, that’s just a small part of this broader idea that, hey, as we make code even more ubiquitous… I mean, you could probably claim it’s ubiquitous today, even pre AI, right? But as we make code even more ubiquitous, it’s actually just going to be used for many more purposes. And so there’s just going to be a ton more need for humans with this competency.
So that’s my view. I think this is quite a complex topic. So it’s something we talk about a lot and we have to see how it pans out. But I think what we can do basically as a product team building in the space is just try to always think about how are we building a tool so that it feels like we’re maximally accelerating people rather than building a tool that makes it more unclear what you should do as the human?
I think, to give an example right now, nowadays when you work with a coding agent, it writes a ton of code, but it turns out writing code is actually one of the most fun parts of software engineering for many software engineers. So then you end up reviewing AI code. And that’s often a less fun part of the job for many software engineers. So I actually think we see that this plays out all the time in a ton of micro decisions. So we as a product team, we’re always thinking about, “Okay, how do we make this more fun? How do we make you feel more empowered? Where is this not working?” And I would argue that reviewing agent written code is a place that today is less fun.
So then I think, “Okay, what can we do about that?” Well, we can ship a code review feature that helps you build confidence in the AI written code. Okay, cool. Another thing we could do is we can make it so that the agent’s better able to validate its work. And it gets all the way down into micro decisions. If you’re going to have an agent capability to validate work, and let’s say you have… I’m thinking of Codex Web right now, you have a pain that sort of reflects the work the agent did, what do you see first? Do you see the diff or do you see the image preview of the code it wrote? And I think if you’re thinking about this from perspective, “How do I empower the human? How do I make them feel as accelerated as possible?” You obviously see the image first. You shouldn’t be reviewing the code unless first you’ve seen the image, unless maybe it’s been reviewed by an AI and now it’s time for you to take a look.
Family Surnames and Poetic Passions
Lenny Rachitsky: When I had Michael Truell, the CEO of Cursor on the podcast, he had this kind of vision of us moving to something beyond code. And I’ve seen this rise of something called spec-driven development where you just write the spec and then the AI writes code for you. So you start working at this higher abstraction level. Is that something you see where we’re going, just like engineers not having to actually write code or look at code and there’s going to be this higher level of abstraction that we focus on?
Contact Info and Future Outlook
Alexander Embiricos: Yeah. I mean, I think there’s constantly these levels of abstraction and they’re actually already played out today. Today, coding agents, mostly it’s prompt to patch. We’re starting to see people doing spec-driven development or planned and driven development. That’s actually one of the ways when people ask, “Hey, how do you run Codex on a really long task?” Well, it’s like often collaborate with it first to write a plan.md, like a markdown file that’s your plan. And once you’re happy with that, then you ask it to go off and do work. And if that plan has verifiable steps, it’ll work for much longer. So we’re totally seeing that.
I think spec-driven development is an interesting idea. It’s not clear to me that it’ll work out that way because a lot of people don’t like writing specs either, but it seems plausible that some people will work that way. A bit of a joke idea though is if you think of the way that many teams work today, they often don’t necessarily have specs, but the team is just really self-driven and so stuff just gets done. And so almost that it’s like, I’m coming up with this on the spot, so it’s not a good name, but chatter-driven development where it’s just like stuff is happening on social media and in your team communications tools. And then as a result, code gets written and deployed.
So yeah, I think I’m a little bit more oriented in that way of I don’t even necessarily want to have to write a spec. Sometimes I want to, only if I like writing specs. Other times I might just want to say like, “Hey, here’s the customer service channel and tell me what’s interesting to know, but if it’s a small bug, just fix it.” I don’t want to have to write a spec for that, right?
I have this sort of hypothetical future that I like to share sometimes with people as a provocation, which is in a world where we have truly amazing agents, what does it look like to be a solopreneur? And one terrible idea for how it could look is that actually there’s a mobile app and every idea that the agent has to do is just vertical video on your phone and then you can swipe left if you think it’s a bad idea and you can swipe right if it’s a good idea. And you can press and hold and speak to your phone if you want to give feedback on the idea before you swipe. And in this world, basically what your job is is just to plug in this app into every single signal system or system of record, and then you just sit back and swipe. I don’t know.
Lenny Rachitsky: I love this. So this is like Tinder meets TikTok meets Codex.
Alexander Embiricos: It’s pretty terrible.
Lenny Rachitsky: No, this is great. So the idea here is this agent is watching and listening to you, paying attention to the market, your users, and it’s like, “Cool, here’s something I should do.” It’s like a proactive engineer just like, “Here, we should build this feature, fix this thing.”
Alexander Embiricos: Exactly. Exactly.
Lenny Rachitsky: I think it’s a really good idea.
Alexander Embiricos: Communicating with you in the lowest effort way for your consumers.
Lenny Rachitsky: Yeah, yeah, the modern way we communicate, swipe left to right and vertical feed. And then the Sora video, okay, so I see how this all connects now. I see.
Alexander Embiricos: Yeah. To be clear, we’re not building that, but it’s a fun idea. I mean, in this example though, one of the things that it’s doing is it’s consuming external signals, right? I think the other really interesting thing is if we think about what is the most successful AI product to date, I would argue, it’s funny actually not to confuse things at all, but the first time we used the brand Codex at OpenAI was actually the model powering GitHub Copilot. This is way back in the day, years ago. And so we decided to reuse that brand recently because it’s just so good, Codex, code execution.
But I think actually auto completion and IDEs is one of the most successful AI products today. And part of what’s so magical about it is that when it can surface ideas for helping you really rapidly, when it’s right, you’re accelerated. When it’s wrong, it’s not that annoying. It can be annoying, but it’s not that annoying. So you can create this mixed initiative system that’s contextually responding to what you’re attempting to do.So in my mind, this is a really interesting thing for us as OpenAI as we’re building.
So for instance, when I think about launching a browser, which we did with Atlas, in my mind, one of the really interesting things we can then do is we can then contextually surface ways that we can help you as you’re going about your day. And so we break out of this, we’re just looking at code or we’re just in your terminal into this idea that, “Hey, a real teammate is dealing with a lot more than just code. They’re dealing with a lot of things that are web content. So how can we help you with that?”
Lenny Rachitsky: Man, there’s so much there. I love this. Okay, so auto complete on web with the browser. That’s so interesting. Just like, “Here’s all the things that we can help you with as you’re browsing and going about your day.”
I want to talk about Atlas. I’ll come back to that. Codex, code execution, did not know that. That’s really clever. I get it now. Okay, and then this chatter, what is a chatter-driven development? No, this is a really good idea, but it reminds me, I had Dhanji on the podcast, CTO of Block, and they have this product called Goose, which is their own internal agent thing. And he talked about an engineer at Block just has Goose watch him with his screen and listens to every meeting and proactively does work that he should probably want to do. So ships to PR, sends an email, drafts a Slack message. So he’s doing exactly what you’re describing in kind of a very early way.
Alexander Embiricos: Yeah, that’s super interesting. And I bet you, so if we went and asked them what the bottleneck to that productivity is, did they share what it is?
Lenny Rachitsky: Probably looking at it and just making sure this is the right thing to do, yeah.
Alexander Embiricos: Yeah. So we see this now. We have a Slack integration for Codex. People love if there’s something that you need to do quickly, people will just @ mention Codex, “Why do you think this bug is happening?” It doesn’t have to be an engineer. Even maybe data scientists often here are using Codex a ton to just answer questions like, “Why do you think this metric moved? What happened?” So questions, you get the answer right back in Slack. It’s amazing, super useful. But as for when it’s writing code, then you have to go back and look at the code.
So the real, I think, bottleneck right now is validating that the code worked and writing code review. So in my mind, if we wanted to get to something like the friend you were talking about’s world, I think we really need to figure out how to get people to configure their coding agents to be much more autonomous on those later stages of the work.
Lenny Rachitsky: It makes sense. Like you said, writing code, I used to be an engineer, I was an engineer for 10 years, really fun to write code, really fun to just get in the flow, build architect, test. Not so fun to look at everyone else’s code and just have to go through and be on the hook if it’s doing something dumb that’s going to take down production. And now that building has become easier, what I’ve always heard from companies that are really at the cutting edge of this is the bottleneck is now figuring out what to build. And then it’s at the end of like, “Okay, we have all this, all 100 PRs to review. Who’s going to go through all that?”
Alexander Embiricos: Right.
Lenny Rachitsky: Yeah.
What has the impact of Codex been on the way you operate as a product person as a PM? It’s clear how engineering is impacted, code is written for you. What has it done to the way you operate and the way PMs operate at OpenAI?
Alexander Embiricos: Yeah, I mean, I think mostly I just feel much more empowered. I’ve always been sort of more technical leaning PM, and especially when I’m working on products for engineers, I feel like it’s necessary to dogfood the product. But even beyond that, I just feel like I can do much, much more as a PM. And Scott Belsky talks about this idea of compressing the talent stack. I’m not sure if I’ve phrased that right. But it’s basically this idea that maybe the boundaries between these roles are a little bit less needed than before because people can just do much more. And every time someone can do more, you can skip one communication boundary and make the team that much more efficient.
So I think we see it in a bunch of functions now, but I guess since you asked about products specifically, now answering questions much, much easier. You can just ask Codex for thoughts on that. A lot of PM type work, understanding what’s changing. Again, just ask Codex for help with that. Prototyping is often faster than writing specs. This is something that a lot of people have talked about.
I think something that, I don’t think it’s super surprising, but something that’s slightly surprising is we see… We’re mostly building Codex to write code that’s going to be deployed to production, but actually we see a lot of throwaway code written with Codex now. It’s kind of going back to this idea of ubiquitous code. So you’ll see someone wants to do an analysis. If I want to understand something, it’s like, okay, just give Codex a bunch of data, but then ask it to build an interactive data viewer for this data. That’s just too annoying to do in the past, but now it’s just totally worth the time of just getting an agent to go do something.
Similarly, I’ve seen some pretty cool prototypes on our design team about if you want to… Well, a designer basically wanted to build an animation, and this is the Coin Animation Codex, and it was like normally it’d be too annoying to program this animation. So they just vibe coded a animation editor and then they use the animation editor to build the animation, which they then check into their repo.
Actually, our designers, there’s a ton of acceleration there. And speaking of compressing the talent stack, I think our designers are very PME. So they do a ton of product work and they actually have an entire vibe coded side prototype of the Codex app. And so a lot of how we talk about things is we’ll have a really quick jam because there’s 10,000 things going on, and then the designer will go think about how this should work. But instead of talking about it again, they’ll just vibe code a prototype of that in their standalone prototype. We’ll play with it. If we like it, they’ll vibe engineer that prototype into an actual PR to land. And then depending on their comfort with the code base, like Codex utilizing Rust is a little harder, maybe they’ll land it themselves or they’ll get close and then an engineer can help them land the PR.
We recently shipped the Sora Android app, and that was one of the most mind-blowing examples of acceleration, actually, because usage of Codex internally at OpenAI is obviously really, really high, but it’s been growing over the course of the year, both in terms of now it’s basically all technical staff use it, but even the intensity and know how of how to make the most of coding agents has gone up by a ton. And so the Sora Android app, a fully new app, we built it in 18 days. It went from zero to launch to employees, and then 10 days later, so 28 days total, we went to just GA, to the public, and that was done just with the help of Codex. So pretty insane velocity.
I would say it was a little bit… I don’t want to say easy mode, but there is one thing that Codex is really good at if you’re a company that’s building software on multiple platforms, so you’ve already figured out some of the underlying APIs or systems, asking Codex to port things over is really effective because it has something you can go look at. And so the engineers on that team were basically having Codex go look at the iOS app, produce plans of work that needed to be done, and then go implement those. And it was looking at iOS and Android at the same time. And so basically it was two weeks to launch to employees, four weeks total. Insanely fast.
Lenny Rachitsky: What makes that even more insane is it became the number one app in the app store. This just boggles the mind. Okay, so 28 days?
Alexander Embiricos: Yeah, so imagine number one app in the app store with a handful of engineers. I think it was two or three possibly in a handful of weeks.
Lenny Rachitsky: Yeah, this is absurd. Wow.
Alexander Embiricos: Yeah, so that’s a really fun example of acceleration. And then Atlas was the other one that I think Ben did a podcast, the engine lead on Atlas, sharing a little bit about how we built there. Atlas is actually… I mean, it’s a browser, and building a browser is really hard. So we had to build a lot of difficult systems in order to do that. And basically we got to the point where that team has a ton of power users of Codex right now, and it got to the point they where basically… We were talking to them about it, because a lot of those engineers are people I used to work with before at my startup. And so they’d say, “Before this would’ve taken us two to three weeks for two to three engineers, and now it’s like one engineer, one week.” So massive acceleration there as well.
And what’s quite cool is that we shipped Atlas on Mac first, but now we’re working on the Windows version. So the team now is ramping up on Windows and they’re helping us make Codex better on Windows too, which is admittedly earlier, just the model we shipped last week is the first model that natively understands PowerShell. So PowerShell being the native Shell language on Windows. So yeah, it’s been really awesome to see the whole company getting accelerated by Codex from… And most obviously, also research and improving how quickly we train models and how well we do it. And then even design, as we talked about, and marketing. Actually, we’re at this point now where my product marketer is often also making string changes just directly from Slack or updating docs directly from Slack.
Lenny Rachitsky: These are amazing examples. You guys are living at the bleeding edge of what is possible, and this is how other companies are going to work. Just shipping, again, what became the number one app in the app store and just beloved all over the… It just took over, I don’t know, the world for at least a week. Built, you said, in 28 days and I don’t know, 10 days, 18 days just to get the core of it working.
Alexander Embiricos: Yeah, so it was like 18 days we had a thing that employees were playing with, and then 10 days later we were out.
Lenny Rachitsky: And you said just a couple engineers.
Alexander Embiricos: Yeah.
Lenny Rachitsky: Two or three. Okay. And then Atlas you said took a week to build?
Alexander Embiricos: No, no, no. So Atlas, not the whole week, but Atlas was a really meaty project. And so I was talking to one of the engineers on Atlas about just what they use Codex for. And it’s basically like, “We use Codex for absolutely everything.” And I was like, “Okay, well, how would you measure the acceleration?” And so basically the answer I got back was, “Previously would’ve taken two to three weeks for two to three engineers, and now it’s like one engineer, one week.”
Lenny Rachitsky: Do you think this eventually moves to non-engineers doing this sort of thing? Does it have to be an engineer building this thing? Could Sora have been built by, I don’t know, a PM or designer?
Alexander Embiricos: I think we will very much get to the point, well, basically where the boundaries are a little bit blurred. I think you’re going to want someone who understands the details of what they’re building, but what details those are will evolve. Kind of like how now if you’re writing Swift, you don’t have to speak assembly. There’s a handful of people in the world, and it’s really important that they exist and speak assembly, maybe more than a handful, but that’s a specialized function that most companies don’t need to have.
So I think we’re just going to naturally see an increase in layers of abstraction. And then the cool thing is now we’re entering the language layer of abstraction, like natural language, and then natural language itself is really flexible. You could have engineers talking about a plan and then you could have engineers talking about a spec, and then you could have engineers talking about just a product or an idea. So I think we can also start moving up those layers of abstraction as well.
But I do think this is going to be gradual. I don’t think it’s going to go off to all of a sudden nobody ever writes anything, any code and it’s just specs. I think it’s going to be much more like, “Okay, we’ve set up our coding agent to be really good at previewing the build or at running tests,” maybe that’s the first part that most people have set up. And it’s like, “Okay, now we’ve set it up so they can execute the build and it can see the results of its own changes, but we haven’t yet built a good integration harness so that it can,” in the case of Atlas… By the way, I don’t know if they’ve done any of this or not. I think they’ve done a lot of this. But maybe the next stage is enable it to load a few sample pages to see how well those work. So then, okay, now we’re going to set it up to do that.
And I think for some time at least, we’re going to have humans curating which of these connectors or systems or components that the agent needs to be good at talking to. And then in the future, there will be an even greater unlock where Codex tells you how to set it up or maybe sets itself up in a repo.
Lenny Rachitsky: What a wild time to be alive. Wow. I’m curious just the second order effects of this sort of thing, just how quickly it is to build stuff. What does that do? Does that mean distribution becomes much, much more important? Does it mean ideas are just worth a lot more? It’s interesting to think about how quick how that changes.
Alexander Embiricos: I’m curious what you think. I still don’t think ideas are worth as much as maybe a lot of people think. I still think execution is really hard. You can build something fast, but you still need to execute well on it, still needs to make sense and be a coherent thing overall, yeah, and distribution is massive.
Lenny Rachitsky: Yeah. Just feels like everything else is now more important. Everything that isn’t the building piece, which is coming up with an idea, getting to market, profit, all that kind of stuff.
Alexander Embiricos: Yeah. I think we might’ve been in this weird temporary phase where, for a while, it was so hard to build product that you mostly just had to be really good at building product and it maybe didn’t matter if you had an intimate understanding of a specific customer. But now I think we’re getting to this point where actually if I could only choose one thing to understand, it would be really meaningful understanding of the problems that a certain customer has. If I could only go in with one core competency.
So I think that’s ultimately still what’s going to matter most. If you’re starting a new company today and you have a really good understanding and network of customers that are currently underserved by AI tools, I think you’re set. Whereas if you’re good at building websites, but you don’t have any specific customer to build for, I think you’re in for a much harder time.
Lenny Rachitsky: Bullish on vertical AI startups is what I’m hearing. Yeah, I completely agree. There’s the general thing that can solve a lot of problems and then there’s like, “We’re going to solve presentations incredibly well and we’re going to understand the presentation problem better than anyone and we’re going to plug into your workflows and all these other things that matter for a very specific problem.” Okay, incredible.
When you think about progress on Codex, I imagine you have a bunch of evals and there’s all these public benchmarks. What’s something you look at to tell you, “Okay, we’re making really good progress,” I imagine it’s not going to be the one thing, but what do you focus on? What’s something you’re trying to push? What’s a KPI or two?
Alexander Embiricos: One of the things that I’m constantly reminding myself of is that a tool like Codex naturally is a tool that you would become a power user of. So we can accidentally spend a lot of our time thinking about features that are very deep in the user adoption journey. And so we can kind of end up oversolving for that. And so I think it’s just critically important to go look at your D7 retention. Just go try the product, sign up from scratch again. I have a few too many ChatGPT Pro accounts that I’ve, in order to maximally correctly dogfood, signed up for on my Gmail and they charge me 200 bucks a month. I need to expense those. But I think just the feeling of being end user and the early retention stats are still super important for us because as much as this category is taking off, I think we’re still in the very early days of people using them.
Another thing that we do that I think we might be the most user feedback/social media pilled team out there in this space is like a few of us are constantly on Reddit and Twitter, and there’s praise up there and there’s a lot of complaints, but we take the complaints very seriously and look at them. And I think that, again, because you can use a coding agent for so many different things, it often is kind of broken in many sort of ways for specific behaviors. So we actually monitor a lot just what the vibes are on social media pretty often, especially I think for Twitter/X, it’s a little bit more hypey and then Reddit is a little more negative but real actually. So I’ve started increasingly paying attention to how people are talking about using Codex on Reddit actually.
Lenny Rachitsky: This is important for people to know. Which of the subreddits do you check most? Is there like an r/Codex or?
Alexander Embiricos: I mean, the algorithm’s pretty good at surfacing stuff, but r/Codex is there.
Lenny Rachitsky: Okay. I’ll take. Very interesting. And then if people tag you on Twitter, you still see that, but maybe not as powerful as seeing it on Reddit.
Alexander Embiricos: Well, yeah. Well, the thing with Twitter is it’s a little bit more one-to-one, even if it’s in public. Whereas with Reddit, those are really good upvoting mechanics and maybe most people are still not bots, unclear. So you get good signal on what matters and what other people think.
Lenny Rachitsky: So interestingly, Atlas, I want to talk about that briefly. You guys launched Atlas. I tweeted actually that I tried Atlas and then I don’t love the AI only search experience. I was just like, “I just want Google sometimes,” or whatever. Just waiting for AI to give me an answer, I’m like, “I don’t want to… ” And there was no way to switch. I just tweeted, “Hey, I’m switching back. It’s not great.” And I feel like I made some PMs at OpenAI sad. And I saw someone tweet, “Okay, we have Atlas now,” which I imagine was always part of the plan. It’s probably an example of just, “We got to ship stuff, see how people use it and then we figure it out.” So I guess one is that, I don’t know, is there anything there? And two, I’m just curious, why are you guys building a web browser?
Alexander Embiricos: So I worked on Atlas for a bit. I don’t work on it now. But a bit of the narrative here for me just to tell my story a bit was I was working on this screen sharing, pair programming startup, and then we joined OpenAI. And so the idea was really to build a contextual desktop assistant. And the reason I believe that’s so important is because I think that it’s really annoying to have to give all your context to an assistant and then to figure out how it can help you. So if it could just understand what you are trying to do, then it could maximally accelerate you. So I still think of Codex actually as a contextual assistant from a little bit of a different angle, starting with coding tasks.
But some of the thinking, at least for me personally, I can’t speak for the whole project, was that a lot of work is done in the web. And if we could build a browser, then we could be contextual for you, but in a much more first class way. We weren’t hacking other desktop software which have very varied support for what content they’re rendering to the accessibility tree. We wouldn’t be relying on screenshots, which are a little bit slower and unreliable. Instead, we could be In the rendering engine and extract whatever we needed to help you. And also I like to think of video games, I don’t know if you’ve played, I don’t know, say Halo, you walk up to an object, I mean, this is true for many games, you press… Man, it’s been a long time, this is embarrassing. Press X and it just does the right thing. And I was one of those guys who always read the instruction manual for every video game that I bought.
And I remember the first time I read about a contextual action and I just thought it was this really cool idea. And the thing about a contextual action is we need to know what you are attempting to do. We have a little bit of context and then we can help. And I think this is critically important because imagine this world that we reach where we have agents that are helping you thousands of times per day.
Imagine if the only way we could tell you that we helped you was if we could push notify you. So you get a thousand push notifications a day of an AI saying like, “Hey, I did this thing. Do you like it? ” It’d be super annoying, right? Whereas imagine, going back to software engineering, I was looking at a dashboard and I noticed some key metric had gone down. And at that point in time, an AI could maybe go take a look and then surface the fact that it has an opinion on why this metric went down and maybe a fix right there right when I’m looking at the dashboard. That would much more keep me in flow and enable the agent to take action on many more things.
So in my mind, part of why I’m excited for us to have a browser is that I think we have then much more context around what we should help with. Users have much more control over what they want us to look at. It’s like, “Hey, if you want us to take action on something, you can open it in your AI browser. If you don’t, then you can open it in your other browser.” So really clear control and boundaries. And then we have the ability to build UX that’s mixed initiatives so that we can surface contextual actions to you at the time that they’re helpful as opposed to just randomly notifying you.
Lenny Rachitsky: Hearing the vision for Codex being the super assistant, it’s not just there to code for you. It’s trying to do a lot for you as a teammate, as this kind of super teammate, and that makes you awesome at work. So I get this. Speaking of that, are there other non-engineering common use cases for Codex? Just ways that non-engineers… We talked about designers prototyping and building stuff, are there any fun or unexpected ways people are using Codex that aren’t engineers?
Alexander Embiricos: I mean, there’s a load of unexpected ways, but I think most of where we’re seeing real traction with people using things are still for now very, I would say, coding adjacent or tech-oriented, places where there’s a mature ecosystem or maybe you’re doing data analysis or something like that. I personally am expecting that we’re going to see a lot more of that over time. But for now, we’re keeping the team very focused on just coding for now because there’s so much more work to do.
Lenny Rachitsky: For people that are thinking about trying out Codex, does it work for all kinds of code bases? What code does it support? If you’re like, I don’t know, SAP, can you add Codex and start building things? What’s the sweet spot? Where does it start to not be amazing yet?
Alexander Embiricos: I’m really glad you asked this question actually because the best way to try Codex is to give it your hardest tasks, which is a little different than some of the other coding agents. Some tools you might think, “Okay, let me start easy or just vibe code something random and decide if I like the tool.” Whereas we’re really building Codex to be the professional tool that you can give your hardest problems to. And that writes high quality code in your enormous code base that is in fact not perfect right now. So yeah, I think if you’re going to try Codex, you want to try it on a real task that you have and not necessarily dumb that task down to something that’s trivial, but actually a good one would be you have a hard bug and you don’t know what’s causing that bug and you ask Codex to help figure that out or to implement that the fix.
Lenny Rachitsky: I love that answer. Just give it to your hardest problem.
Alexander Embiricos: I will say if you’re like, “Hey, okay, well, the hardest problem I have is that I need to build a new unicorn business,” obviously that’s not going to work. Not yet. So I think it’s like give it the hardest problem, but something that is still one question or one task to start. That’s if you’re testing and then over time you can learn how to use it for bigger things.
Lenny Rachitsky: Yeah. What languages does it support?
Alexander Embiricos: Basically, the way we’ve trained Codex is there’s a distribution of languages that we support and it’s fairly aligned with the frequency of these languages in the world. So unless you’re writing some very esoteric language or some private language, it should do fine in your language.
Lenny Rachitsky: If someone was just getting started, is there a tip you could share to help them be successful? If you could just whisper a little tip into someone just setting up Codex for the first time to help them have a really good time, what’s something you would whisper?
Alexander Embiricos: I might say try a few things in parallel. So you could try giving it a hard task, maybe ask it to understand the code base, formulate a plan with it around an idea that you have and kind of build your way up from there. And the meta idea here is, again, it’s like you’re building trust with a new teammate. And so you wouldn’t go to a new teammate and just give them like, “Hey, do this thing. Here’s zero context.” You would start by first making sure they understand the code base and then you would maybe align on an approach and then you would have them go off and do bit by bit. And I think if you use Codex in that way, you’ll just naturally start to understand the different ways of prompting it because it’s a super powerful agent and model, but it is a little bit different to prompt Codex than other models.
Lenny Rachitsky: Just a couple more questions. One, we touched on this a little bit, as AI does more and more coding, there’s always this question of, “Should I learn to code and why should I spend time doing this sort of thing?” For people that are trying to figure out what to do with their career, especially if they’re into software engineering computer science, do you think there’s specific elements of computer science that are more and more important to lean into, maybe things they don’t need to worry about? What do you think people should be leaning into skill-wise as this becomes more and more of a thing in our workplace?
Alexander Embiricos: I think there’s a couple angles you could go at this from. Well, the easiest one to think of at least is just be a doer of things. I think that with coding agents getting better and better over time, it’s just what you can do as even someone in college or a new grad is just so much more than what that was before. And so I think you just want to be taking advantage of that. And definitely when I’m looking at hiring folks who are earlier career, it’s definitely something that I think about is how productive are they using the latest tools? They should be super productive. And if you think of it in that way, they actually have less of a handicap than before versus a more senior career person because the divide is actually getting smaller because they’ve got these amazing coding agents now. So that’s one thing, which is, I guess the advice is just learn about whatever you want, but just make sure you spend time doing things, not just fulfilling homework assignments, I guess.
I think the other side of it though is that it’s still deeply worth understanding what makes a good overall software system. So I still think that skills, like really strong systems engineering skills, or even really effective communication and collaboration with your team, skills like that I think are important or are going to continue to matter for quite some time. I don’t think it’s going to be all of a sudden the AI coding agents are just able to build perfect systems without your help. I think it’s going to look much more gradual where it’s like, okay, we have these AI coding agents, they’re able to validate their work. It’s still important.
For example, I’m thinking of an engineer who was working on Atlas, since we were talking about it, he set up Codex so that it can verify its own work, which is a little bit non-trivial because of the nature of the Atlas project. So the way that he did that was he actually prompted Codex like, “Hey, why can’t you verify your work? Fix it,” and did that on a loop. And so you still, at various phases, are going to want a human in the loop to help configure the coding agent to be effective. So I think you still want to be able to reason about that. So maybe it’s less important that you can type really fast and you understand exactly how to write… Not that anyone writes a 4H loop or something, or you don’t need to know how to implement a specific algorithm. But I think you need to be able to reason about the different systems and what makes a software engineering team effective. So I think that’s the other really important thing.
Then maybe the last angle that you could take is, I think if you’re on the frontier of knowledge for a given thing, I still think that’s deeply interesting to go down, partially because that knowledge is still going to be… Agents aren’t going to be as good at that, but also partially because I think that by trying to advance the frontier of a specific thing, you’ll actually end up being forced to take advantage of coding agents and using them to accelerate your own workflow as you go.
Lenny Rachitsky: What’s an example that when you talk about being at the frontier of something?
Alexander Embiricos: Codex writes a lot of the code that helps manage its training runs, the key infrastructure. We move pretty fast and so we have a Codex code review is catching a lot of mistakes. It’s actually cause some pretty interesting configuration mistakes. And we’re starting to see glimpses of the future where we’re actually starting to have Codex even be on call for its own training, which is pretty interesting. So there’s lots there.
Lenny Rachitsky: Wait, what does that mean to be on call for its own training? So it’s running, it’s training and it’s like, “Oh, something broke, someone needs…” And does it alert people or it’s like, “Here, I’m going to fix the problem and restart”?
Alexander Embiricos: This is an early idea that we’re figuring out. But the basic idea is that during a training run, there’s a bunch of graphs that today humans are looking at and it’s really important to look at those. We call this babysitting.
Lenny Rachitsky: Because it’s very expensive to train, I imagine, and very important to move fast and-
Alexander Embiricos: Exactly. And there’s a lot of systems underlying the training run. And so a system could go down or there could be an error somewhere that gets introduced. And so we might need to fix it or pause things or, I don’t know, there’s lots of actions we might need to take. And so basically having Codex run on a loop to evaluate how those charts are moving over time is this idea that we have to how to enable us to train way more efficiently.
Lenny Rachitsky: I love that. And this is very much along the lines of this is the future of agents. Codex isn’t just for building code, it’s a lot more than that.
Alexander Embiricos: Yeah.
Lenny Rachitsky: Okay, last question. Being at OpenAI, I can’t not ask about your AGI timeline and how far you think we are from AGI. I know this isn’t what you work on, but there’s a lot of opinions, a lot of, I don’t know, timelines. How far do you think we are from a humanly human version of AI, whatever that means to you?
Alexander Embiricos: For me, I think that it’s a little bit about when do we see the acceleration curves go like this? Or I don’t know which way I’m mirrored here. When do we see the hockey stick? And I think that the current limiting factor, I mean, there’s many, but I think a current underappreciated limiting factor is literally human typing speed or human multitasking speed on writing prompts. And like you were talking about, it’s like you can have an agent watch all the work you’re doing, but if you don’t have the agent also validating its work, then you’re still bottlenecked on can you go review all that code?
So my view is that we need to unblock those productivity loops from humans having to prompt and humans having to manually validate all the work. So if we can rebuild systems to let the agent be default useful, we’ll start unlocking hockey sticks. Unfortunately, I don’t think that’s going to be binary. I think it’s going to be very dependent on what you’re building. So I would imagine that next year, if you’re a startup and you’re building new pieces, like some new app or something, it’ll be possible for you to set it up on a stack where agents are much more self-sufficient than not. But now let’s say, I don’t know, you mentioned SAP, let’s say you work in SAP, they have many complex systems and they’re not going to be able to just get the agent to be self-sufficient overnight in those systems. So they’re going to have to slowly maybe replace systems or update systems to allow the agent to handle more of the work end to end.
So basically my long answer to your question, maybe boring answer is that I think starting next year, we’re going to see early adopters starting to hockey stick their productivity. And then over the years that follow, we’re going to see larger and larger companies like hockey stick that productivity. And then somewhere in that fuzzy middle is when that hockey sticking will be flowing back into the AI labs and that’s when we’ll basically be at the AGI tier.
Lenny Rachitsky: I love this answer. It’s very practical and it’s something that comes up a lot on this podcast. Just like the time to reveal all the things AI is doing is really annoying and a big bottleneck. I love that you’re working on this because it’s one thing to just make coding much more efficient and do that for people. It’s another to take care of that final step of, “Okay, is this actually great?” And that’s so interesting that your sense is that’s the limiting factor. It comes back to your earlier point of even if AI did not advance anymore, we have so much more potential to unlock as we learn to use it more effectively. So that is a really unique answer. I haven’t heard that perspective on what is the big unlock. Human typing speed to review basically what AI is doing for us. So good.
Okay, Alexander, we covered a lot of ground. Is there anything that we haven’t covered? Is there anything you wanted to share, maybe double down on before we get to our very exciting lightning round?
Alexander Embiricos: I think one thing is that the Codex team is growing. And as I was just saying, we’re still somewhat limited by human thinking speed and human typing speed. We’re working on it. So if you’re an engineer or a salesperson, or I’m hiring a product person, please hit us up. I’m not sure the best way to give contact info, but I guess you can go to our jobs page, or do they have contact for you actually? Do listeners have contact for you?
Lenny Rachitsky: Where they send me like, “Hey, I want to apply to Codex”? I do have a contact form at lennyrachitsky.com. I’m afraid of all the amazing people that are going to ping me. But there we go, we could try that. Let’s see how that goes.
Alexander Embiricos: Okay. Or maybe an easier version, we can edit all that out or up to you. Yeah, or I would just say you can drop us a DM. For example, I’m Embirico on Twitter, and hit me up if you’re interested in joining the team.
Lenny Rachitsky: What a dream job for so many people. What’s a sign they… I don’t know, what’s a way to filter people a little bit so they’re not flooding your inbox?
Alexander Embiricos: So specifically if you want to join the Codex team, then you need to be a technical person who uses these tools. And I think I would just ask yourself the question, “Hey, let’s say I were to join OpenAI and work on Codex over the next six months and crush it, what does the life of a software engineer look like then?” And I think if you have an opinion on that, you should apply. And if you don’t have an opinion on that and have to think about it first, depending on how long you have to think about it, I guess that would be the filter. I think there’s a lot of people thinking about this space. So we’re very interested in folks who have already been thinking about what the future should look like with agents. And we don’t have to agree on where we’re going, but I think we want people who are very passionate about the topic, I guess.
Lenny Rachitsky: It’s very rare to be working on a product that has this much impact and is at such a bleeding edge of where it’s possible. What a cool role for the right person. So it’s awesome that you have an opening and this audience is a really good fit potentially for that role. So I hope we find someone, that would be incredible. With that, we’ve reached our very exciting lightning round. I’ve got five questions for you, Alexander. Are you ready?
Alexander Embiricos: I don’t know what these are, but I’m excited. Let’s do it.
Lenny Rachitsky: They’re the same questions I ask everyone except for the last one. So probably not a surprise. I should probably make them more often a surprise. Okay, first question, what are a couple of books that you recommend most to other people, two or three books that come to mind?
Alexander Embiricos: I have been reading a lot of science fiction recently, and I’m sure this has been recommended before, but The Culture, I think it’s Ian Banks is the name of the author. Part of why I love it is because it’s basically relatively recent writing about a future with AI, but it’s an optimistic future with AI. And I think a lot of sci-fi is fairly dystopian. But the joke, at least on The Culture subreddit is that, let me see if I can get this right, it is a space communist utopia, or I think it’s a gay space communist utopia. And I just think it’s really fun to think about, to use The Culture as a way to think about what kind of world can we usher in and what decisions can we make today to help usher in that world.
Lenny Rachitsky: Wow. I don’t think anyone’s recommended that. I know you’re reading, you’ve mentioned before I start recording, Lord of the Rings right now. If you want another AI-ish sci-fi book, have you read Fire Upon the Deep?
Alexander Embiricos: No, I haven’t.
Lenny Rachitsky: Okay, it’s incredibly good. It’s like a sci-fi space opera sort of epic tale with super intelligence.
Alexander Embiricos: Cool.
Lenny Rachitsky: Yeah. Mostly not optimistic, but somewhat optimistic.
Okay, next question. Is there a favorite recent movie or TV show that you’ve really enjoyed?
Alexander Embiricos: Yeah, there’s an anime called Jujutsu Kaisen, which I really like. Again, it’s got a slightly dark topic of demons. But what I love about it is that the hero is really nice. And I think there’s this new wave of anime and cartoons where the protagonists are really friendly and people who care about the world rather than being sort of, if you look at some older anime that started the genre, there’s Evangelion or Akita and those characters, the protagonists are deeply flawed, quite unhappy. They didn’t start the genre, but it was a trend for a while to poke fun at the idea that in these cartoons the protagonist was very young, but being given a ridiculous amount of responsibility to save the world. So there was kind of a wave of content that was critiquing this by making the character basically go through serious mental issues in the middle of the show. And I’m not saying this is better, but at least it’s quite fun to have these really positive protagonists are just trying to help everyone around them.
Lenny Rachitsky: I love how much we’re learning about your personality hearing these recommendations. Nice protagonists, optimistic futures. I like the [inaudible 01:18:53].
Alexander Embiricos: I think if you don’t believe it, you can’t will it into existence. So you need a balance.
Lenny Rachitsky: This is your training data.
Is there a product you recently discovered you really love? Could be an app, could be some clothing, could be some kitchen gadget, tech gadget, a hat.
Alexander Embiricos: Yeah, so I have been quite into combustion engines and cars. Actually, the reason I came to America initially was because I wanted to work on US aircraft, but now I work in software. And so for the longest time, I’ve basically only had quite old sports cars, old just because they were more affordable. And then recently we got a Tesla instead. And I have to say that I find the Tesla software quite inspiring. In particular, it has the self-driving feature. And I’ve mentioned a few times today, I think it’s really interesting to think about how to build mixed initiative software that makes you feel maximally empowered as a human, maximally in control, but yet you’re getting a lot of help. And I think they did a really good job with enabling the car to drive itself, but all these different ways that you can adjust what it’s doing without turning off the self-driving. So you can accelerate, it’ll listen to that. You can turn a knob to change its speed. You can steer slightly. I think it’s actually a masterclass in building an agent that still leaves the human in control.
Lenny Rachitsky: This reminds me Nick Turley’s whole mantra is, “Are we maximally accelerated?”
Alexander Embiricos: Yeah.
Lenny Rachitsky: Feels like it’s completely infiltrated everything at OpenAI, which makes sense, that tracks.
Two more questions. Do you have a life motto that you often think about and come back to in work or in life that’s been helpful?
Alexander Embiricos: I don’t know if I have a life motto, but maybe I can tell you about the number one company value from my startup.
Lenny Rachitsky: Love it.
Alexander Embiricos: Which is still something that sticks with me, which is to be kind and candid.
Lenny Rachitsky: That tracks. Kind and candid. Wow, that’s a great combo.
Alexander Embiricos: Yeah. And we had to put them together because we, as founders, realized that we often would be nice and it wasn’t actually the right thing to do. We would delay the difficult conversations and we were not candid. And so every time we would remind ourselves of this motto and then we would become more candid. And then six months later, we would realize that we were in fact not candid six months ago and we needed to be even more candid. So then the question is like, “Okay, how should we be candid?” It’s like, “Okay, well, let’s think of being candid as an act of kindness,” but also think of that both in terms of doing it and willing ourselves to do it, but also in terms of how we frame it as people.
Lenny Rachitsky: That is a beautiful way of summarizing how to lead well. What’s the book about challenge directly but care deeply? Radical Candor.
Alexander Embiricos: Yeah, yeah.
Lenny Rachitsky: So it’s like another way of thinking about Radical Candor.
Okay, last question. I was looking up your last name just like, “Hey, what’s the story here?” So your last name is Embiricos, and I was talking at ChatGPT and it told me the most famous individuals with the surname are the influential Greek poet and psychoanalyst Andreas Embiricos and his relative, the wealthy shipping magnate and art collector, George Embiricos. So the question is, which of these two do you most identify with, the Greek poet and psychoanalyst or the wealthy shipping magnate and art collector?
Alexander Embiricos: I think it’s going to have to be the poet because he loved the island that our family’s from.
Lenny Rachitsky: Wait, you know those people? Okay, this is not news to you. Okay.
Alexander Embiricos: Well, I mean, it’s an enormous family. But it’s like Greek, so these big families, everyone’s your uncle.
Lenny Rachitsky: Love this. Okay.
Alexander Embiricos: You know what I mean? My mother’s Malaysian and also everyone is my uncle or aunt in Malaysia too, if that makes sense.
Lenny Rachitsky: Yeah.
Alexander Embiricos: But yeah, he loved this island that the family initiated from. I believe, I don’t actually know where that’s shipping magnate lived, I think it was New York or something. But anyway, we all came from this island called Andros, which is a really beautiful place. And it’s like there’s more livestock there than humans. Not too many tourists go there. But I think part of what I think is really cool is he published a lot and a lot of his writing is about the beauty of that island, which I think is super cool.
Lenny Rachitsky: Wow, that was an amazing answer.
Two more questions, where can folks find you if they want to follow you online and maybe reach out? And then how can listeners be useful to you?
Alexander Embiricos: I’m one of those people who has social media only for the purposes of having work. My phone turns black and white at 9:00 PM at night. But yeah, so Twitter or X, @Embirico. And yeah, if you post in r/Codex, I’ll probably see it. So you can go there.
How can listeners be useful? I would say please try Codex, please share feedback, let us know what to improve. We pay a ton of attention to feedback. I think, honestly, the growth has been amazing, but it’s still very early times, so we still pay a lot of attention and hope to do so forever. And also, I would say if you’re interested in working on the future of coding agents and then agents generally, then please apply to our job site and/or message me in those social media places.
Lenny Rachitsky: Alexander, this was awesome. I always love meeting people working on AI because it always feels like this very, I don’t know, sterile, scary, mysterious thing. And then you meet the people building these tools and they’re always just so awesome, and you especially, just so nice. And like the examples you shared, optimism and kindness, this is what we want to be. These are the kinds of people we want to be building these tools that are going to drive the future. So I’m really thankful that you did this. I’m grateful to have met you, and thank you so much for being here.
Alexander Embiricos: Yeah, thanks so much for having me. This was fun.
Lenny Rachitsky: Thank you so much for listening. If you found this valuable, you can subscribe to the show on Apple Podcasts, Spotify, or your favorite podcast app. Also, please consider giving us a rating or leaving a review as that really helps other listeners find the podcast. You can find all past episodes or learn more about the show at lennyspodcast.com. See you in the next episode.
Reformatted by reformat_english_direct.py
走进 OpenAI:2026 是智能体之年、AI 最大的瓶颈,以及为什么算力不是问题
文字记录
Lenny Rachitsky: 你负责 Codex 的工作。
Alexander Embiricos: Codex 是 OpenAI 的编程智能体。我们把 Codex 视为软件工程队友的开端。它有点像一个特别聪明的实习生,就是不看 Slack,除非你让它去看,否则也不会去查 Datadog。
Lenny Rachitsky: 我记得 Karpathy 发推说过,他遇到那些最棘手的 bug,花了好几个小时都没搞定,其他什么方法都不管用,他就丢给 Codex,让它跑一个小时,结果就解决了。
Alexander Embiricos: 我们开始看到一些未来图景的端倪——我们已经开始让 Codex 为自己的训练值班了。Codex 编写了大量用于管理其训练运行的关键基础设施代码。我们还安排了 Codex 做代码审查,捕捉到了很多错误,其中有一些确实是相当有趣的配置错误。最令人震撼的加速案例之一是 Sora Android 应用——一个全新的应用,我们 18 天就构建完成了,然后又过了 10 天,总共 28 天,我们就面向公众发布了。
Lenny Rachitsky: 你觉得如何在这个领域胜出?
Alexander Embiricos: 我们在 Codex 上的一个主要目标是实现主动性。如果要构建一个超级系统,它必须能够做事。过去一年的一个经验教训是,要让模型去做事情,当它们能使用计算机时会高效得多。而事实证明,让模型使用计算机最好的方式就是写代码。所以我们逐渐形成了这样一个认知:如果你想构建任何智能体,也许你应该先构建一个编程智能体。
Lenny Rachitsky: 谈到 Codex 的进展,我想你们应该有一系列的 evals,还有各种公开基准测试。
Alexander Embiricos: 我们团队里有几个人一直泡在 Reddit 上。上面有赞美,也有大量抱怨。我们作为产品团队能做的,就是始终思考如何构建一个让人感觉我们是在最大程度地加速人们工作的工具,而不是构建一个让人更不清楚自己作为人类该做什么的工具。
Lenny Rachitsky: 既然在 OpenAI,我不得不问一下,你觉得我们距离 AGI 还有多远?
Alexander Embiricos: 目前被低估的瓶颈,说白了就是人类的打字速度,或者说人类的多任务处理速度。
节目介绍
Lenny Rachitsky: 今天我的嘉宾是 Alexander Embiricos,OpenAI 广受欢迎且功能强大的编程智能体 Codex 的产品负责人。用 ChatGPT 负责人、曾做客本播客的 Nick Turley 的话说:“Alex 是我合作过的最喜欢的人之一,把他和他的公司带入 OpenAI 最终成为我们做过的最好的决定之一。“同样,OpenAI 的首席产品官 Kevin Weil 也说过:“Alex 就是最好的。”
在我们的对话中,我们聊到了在 OpenAI 做产品到底是什么样的体验,Codex 如何帮助 Sora 团队发布了 Sora 应用——该应用在不到一个月内就成为应用商店排名第一的应用。还有 Codex 目前正在经历的 20 倍增长,以及他们做了什么让它如此擅长编程;为什么他的团队现在专注于让代码审查变得更容易,而不仅仅是写代码;他的 AGI 时间线;他对 AI 智能体何时才能真正发挥作用的看法,以及更多内容。非常感谢 Ed Bayes、Nick Turley 和 Dennis Yang 为这次对话推荐的话题。如果你喜欢这个播客,别忘了在你最喜欢的播客应用或 YouTube 上订阅和关注。如果你成为我 newsletter 的年度订阅者,你将免费获得一年 19 款优秀产品的使用权,包括一年免费的 Devin、Lovable、Replit、Bolt、n8n、Linear、Superhuman、Descript、Wispr Flow、Gamma、Perplexity、Warp、Granola、Magic Patterns、Raycast、ChatPRD、Mobbin、PostHog 和 Stripe Atlas。前往 lennysnewsletter.com 点击 Product Pass 即可。
在 OpenAI 工作体验
Lenny Rachitsky: Alexander,非常感谢你的到来,欢迎来到播客。
Alexander Embiricos: 非常感谢。我关注这个播客很久了,很高兴来到这里。
Lenny Rachitsky: 我比你还兴奋,真的非常感谢。我想从你在 OpenAI 的经历聊起。你大约一年前加入 OpenAI。在那之前,你经营自己的创业公司大约五年。再之前,你在 Dropbox 做产品经理。我想 OpenAI 和你之前工作过的每个地方都很不一样。我就直接问吧,OpenAI 的运作方式最大的不同是什么?你在那里学到了什么你觉得无论走到哪里都会带走的东西——假设你有一天会离开的话?
Alexander Embiricos: 毫无疑问,我会说在 OpenAI 工作的速度和野心远超我的想象。说出来有点不好意思,因为每个创业公司创始人都觉得”哦对,我的创业公司行动特别快,人才水平特别高,我们特别有野心”。但我不得不说,在 OpenAI 工作让我重新认识了这些话到底意味着什么。
Lenny Rachitsky: 我们经常听到这种说法,感觉每家 AI 公司都在说”天哪,我们发展太快了”。有没有什么具体的例子能说明”哇,这在其他任何地方都不可能这么快完成”?
Alexander Embiricos: 最直观的例子就是 Codex 本身的爆发式增长。我们有一阵子没更新对外公布的数字了,但 Codex 的规模在短短几个月内就实现了 10 倍增长,而且此后更是远超这个数字。经历过这一切之后——至少对我个人而言——我觉得以后不管做什么技术产品,我都必须以这种速度和规模作为基准来要求自己。
回想我在创业公司做的事情,节奏要慢得多。创业公司永远面临一个平衡:你在一个想法上投入多少,什么时候发现它不 work 然后转向。但我觉得在 OpenAI 我意识到的一点是,我们能够产生、也必须产生的影响力实在太大,以至于我必须对自己时间的使用变得极其狠辣。
OpenAI 的组织架构与运作方式
Lenny Rachitsky: 在聊 Codex 之前,想问一下 OpenAI 在组织架构上、或者运作方式上,有没有什么设计让团队能够这么快地推进?因为每家公司都想跑得快,我猜应该有某种结构性的方法在支撑这一点。
Alexander Embiricos: 首先,我们用来构建产品的技术本身已经改变了很多东西——既改变了我们的构建方式,也改变了我们能为用户带来的能力。我们大部分时间在讨论基础模型层面的进步,但我的观点是,即使模型从今天起不再有任何进步——当然实际情况完全不是这样——但即使如此,我们在产品层面也远远落后,还有海量的产品需要去构建。所以说,这个时机是非常成熟的。
但我觉得也有不少和我预期相反的东西让我感到意外,尤其是在组织运作方式上。一个例子是:当我在创业公司和更早在 Dropbox 的时候,尤其作为 PM,非常强调”先统一方向再全速前进”——确保你瞄准了正确的方向,然后才在那个方向上加速。但在 OpenAI,因为我们并不确切知道不久之后会出现什么新能力,我们不知道什么东西在技术上能行得通,即使技术上能行得通,我们也不知道用户会不会买账,所以对我们来说,保持谦逊、大量地从实践中学习、快速尝试,比统一方向要重要得多。整个组织也是按照这种极度自下而上的方式搭建的。
这又回到你说的——每家公司都想跑得快。我觉得很多人都喜欢说自己组织是自下而上的,至少很多人这么宣称,但 OpenAI 是真正的、彻头彻尾的自下而上。这对我来说是一个很大的学习经历。将来如果我再去工作的话……我觉得未来在非 AI 公司工作甚至都不太有意义了,我都不太知道那意味着什么。但如果让我想象一下或者回到过去,我觉得我会用完全不同的方式来做事。
Lenny Rachitsky: 我听到的是一种”先开火再瞄准”的做法,而不是”先瞄准再开火”。而且,正如你消化这个想法的过程中——这个说法可能不太容易传达到位——但我确实在 AI 公司里反复听到这种理念:因为你不知道人们会怎么用它,所以花大量时间去打磨完美是不合理的。更好的做法是先以一个原始的形态推出去,看人们怎么用,然后全力投入那个用例。
Alexander Embiricos: 对。如果沿用这个类比的话,我觉得”瞄准”这个环节是存在的,只是模糊得多。它更像是——我们大致觉得未来会是什么样子?我在这里从一位研究负责人身上学到了很多东西,他喜欢说,在 OpenAI,我们可以就一年以后的事情展开很有质量的讨论,虽然其中存在大量不确定性,但那才是合适的时间尺度。然后我们也可以就几个月或几周内将要发生的事情进行很好的讨论。但中间有一段尴尬的地带——当你开始接近一年但还没到一年的时候,推理起来就非常困难。
所以关于瞄准,我们想要知道的是:“我们正在努力构建的未来大致是什么样的?“我们在 AI 面临的很多问题,比如对齐(alignment),都是需要提前很久很久去思考的问题。所以我们在那个方向上是模糊地瞄准着。但一旦落到更具体的战术层面——“我们会构建什么产品,人们会怎么使用那个产品”——那就是我们更倾向于”用实验来验证”的领域了。
关于人才密度
Lenny Rachitsky: 这个说法很好。当人们听到这些的时候,有时候会听到你们这样的公司说”我们自下而上,我们会尝试很多东西,我们对接下来几个月的方向没有精确计划”。关键在于你们招的是世界上最好的人才。这感觉是自下而上能成功的一个非常重要的前提。
Alexander Embiricos: 这点我非常认同。我刚到 OpenAI 的时候,对这里每个人的自驱力和自主性水平感到意外,甚至震惊。所以我觉得 OpenAI 的运作方式,你不能看了这篇文章或听了这个播客就觉得”我要把它照搬到我的公司”。可能这话不太好听,但我认为极少数公司拥有足够高的人才密度来做到这一点。如果要实施这种方式的话,可能需要做一些调整。
什么是 Codex
Lenny Rachitsky: 好的,来聊聊 Codex。你负责 Codex 的工作。Codex 现在进展如何?有什么数字可以分享吗?另外,也不是所有人都确切知道 Codex 是什么,请介绍一下。
Alexander Embiricos: 当然。我的工作非常幸运——我住在未来,同时领导 Codex 的产品。Codex 是 OpenAI 的编程智能体(coding agent)。具体来说,它是一个 IDE 扩展,一个你可以安装的 VS Code 扩展,或者一个你可以安装的终端工具。安装之后,你就可以和 Codex 配合,回答关于代码的问题、编写代码、运行测试、执行代码,以及完成软件开发生命周期中间那一段厚实的工作——也就是把代码写到可以上线的状态。
更广义地看,我们认为 Codex 目前的形态仅仅是一个软件工程队友(teammate)的起点。当我们用”队友”这样一个分量很重的词时,我们设想的是:它不仅能写代码,还能在早期就参与到软件的构思和规划阶段,以及下游的验证、部署和维护代码的环节。
为了让这个概念更形象一些,我喜欢这样比喻:如果你看 Codex 今天的样子,它有点像一个特别聪明的实习生,但这个实习生不看 Slack,也不主动去看 Datadog 或 Sentry,除非你让它去看。所以不管它多聪明,你能在不跟它协作的情况下信任它写的代码吗?所以大多数人现在的使用方式是和它结对工作。但我们希望达到这样一个阶段:它能像一个你新招的实习生一样工作——你不会只让实习生写代码,你会让他们参与整个流程。这样你就知道,即使他们第一次没做对,他们最终也能通过迭代把事情做好。
主动性与产品愿景
Lenny Rachitsky: 我刚才想到的那个点——不看 Slack 和 Datadog——其实反而意味着它不会分心,始终专注,永远处于心流状态。但我理解你说的意思,就是它确实没有所有正在发生的事情的上下文。
Alexander Embiricos: 对。而且这不仅仅是在它执行任务时才成立。再说回来,如果你想想最好的团队和队友,你不会事无巨细地告诉他们该做什么。也许刚招进来的时候,你会开几次会,了解一下”好的,这样的提示对这个人有效,那样的不行,这个人的沟通方式是这样的”。然后你给他们一些入门任务,委派几件事。但最终你只会说:“好,你现在和这帮人负责代码库的这个区域。当然也可以去其他部分和其他人合作。你觉得该做什么,你来告诉我。“所以我们将此视为主动性(proactivity),而 Codex 的一个主要目标就是实现主动性。
我认为这对实现 OpenAI 的使命至关重要——将 AGI 的益处带给全人类。我喜欢开玩笑说(其实只有半开玩笑),今天的 AI 产品真的很难用,因为你必须非常用心地去想它什么时候能帮到你。如果你不去给模型写提示让它帮你,那它大概率在那会儿就没在帮你。如果你想一下普通用户今天每天给 AI 写多少次提示,大概是几十次。但如果你想一下人们每天真正能从一个智能实体那里获得帮助的次数,其实是上千次。所以我们在 Codex 上很大一部分目标,就是弄清楚一个真正有用的队友智能体(teammate agent)应该是什么形态——它默认就是有帮助的。
Lenny Rachitsky: 当人们想到 Cursor 甚至 Claude Code 的时候,脑海里浮现的是一个帮你写代码的 IDE,能自动补全,可能还能做一些智能体式的工作。而我从你这里听到的愿景不太一样——它是一个队友,就像一个远程队友,帮你写代码,你跟它对话、让它做事。同时它也做 IDE、自动补全这些。这是你们思考 Codex 时的一种差异化定位吗?
Alexander Embiricos: 核心就是这个理念:如果你是一个开发者,你想完成一件事,我们希望你感觉就像拥有了超能力,能快得多地推进。但我们不认为,要获得这些好处,你需要一直坐在那里想着”此刻我该怎么调用 AI 来做这件事”。我们希望它能融入你的工作方式,让它自己就开始做事,而不需要你去想它。
增长数据
Lenny Rachitsky: 好,我在这方面有很多问题,但先说说进展如何?有没有什么数据、什么数字可以分享,Codex 目前做得怎么样?
Alexander Embiricos: Codex 自从八月 GPT-5 发布以来增长极其迅猛。关于我们如何解锁了这轮增长,有一些有趣的产品洞察,如果你感兴趣的话可以聊聊。上次我们分享的数据是自八月以来增长超过 10 倍,实际上已经是 20 倍了。另外,Codex 模型现在每周服务数万亿 token,基本上是我们服务量最大的编程模型。我们看到一个很有意思的现象:Codex 团队的组建方式是打造了一个产品和研究紧密结合的团队,模型和运行环境(harness)一起迭代。事实证明这让你能做到更多、尝试更多实验,看这些东西怎么协同工作。
所以我们最初就是在我们自己非常强势主导的第一方运行环境中训练这些模型。而最近我们开始看到,其他主要的 API 编程客户也开始采用这些模型。我们已经到了这样一个阶段:Codex 模型也是 API 中服务量最大的编程模型。
增长的关键突破
Lenny Rachitsky: 你提到了是什么解锁了增长,我非常想听。感觉在之前——也许是你加入团队之前——Claude Code 风头正劲,所有人都在用 Claude Code,那绝对是最好的编程方式。然后突然之间 Codex 崭露头角。我记得 Karpathy 发了一条推文,说他从未见过这样的模型。那条推文大意是,他遇到的那种最棘手的 bug,花好几个小时都搞不定的,交给 Codex,让它跑一个小时,就解决了。你们做了什么?
Alexander Embiricos: OpenAI 有一个很强的使命,基本上就是构建 AGI。所以我们一直在思考如何塑造产品,使它能规模化。我之前提到过,“如果你是一个工程师,你每天应该从 AI 那里获得上千次帮助”,所以当我们推出 Codex 的第一个版本,也就是 Codex Cloud 时,我们深入思考了实现这一目标的基础原语(primitives)。那个产品基本上拥有自己的计算机,运行在云端,你可以向它委派任务。最酷的地方在于你可以并行运行很多很多任务。但我们看到的一些挑战是,它的设置有点难度,包括环境配置——给模型提供验证其变更所需的工具,以及学习如何以那种方式写提示。
回到队友的比喻:这就像你招了一个队友,但你永远不能跟他打电话,只能异步地来回沟通。这对某些队友是可行的,最终你大部分时间其实也想这样工作——那仍然是未来方向,但初期很难上手。所以我们仍然有那个愿景,那就是我们的目标——一个你可以委派任务、然后具有主动性的队友——而且我们也看到了这方面的增长。但关键的突破是,你首先需要以一种直观得多、轻而易举就能获得价值的方式落地到用户手中。
所以绝大多数用户今天发现 Codex 的方式是:下载一个 IDE 扩展,或者在 CLI 中运行它,智能体就在你的电脑上与你交互式协作。它在一个沙盒中工作——这其实是一项很酷的技术,保障了安全——但它能访问所有依赖。所以如果智能体需要做什么,比如运行一个命令,它可以在沙盒中执行,不需要做任何环境配置。如果某个命令在沙盒中无法运行,它可以直接问你。这样你就能和模型建立起一个非常强的反馈循环。然后随着时间推移,我们团队的工作就是帮助你把这个反馈循环转化成:作为使用产品的副产品,你逐渐完成了配置,从而后续可以向它委派任务。
再打一个比喻——我会一直回到这个比喻——如果你招了一个队友让他干活,但你只给他一台从商店刚买来的全新电脑,他很难做好工作。但如果你和他并肩工作,你会说”哦,你没有我们用的这个服务的密码,这是密码。别担心,尽管运行这个命令”,那他就容易得多地可以离开去独立工作好几个小时,不用你管。
Lenny Rachitsky: 所以我听到的是,Codex 的最初版本几乎过于超前了——它就像一个云端远程智能体,异步地替你写代码。而你们做的调整是,“好,让我们稍微往回退一步,真正融入工程师已经在使用 IDE 和本地开发的方式,帮助他们逐步过渡到这个新世界。”
Alexander Embiricos: 完全正确。而且这其实挺有意思的,因为我们在 OpenAI 内部大量 dogfood(内部试用)自己的产品——dogfood 的意思就是我们用自己的产品。所以 Codex 在这一年里一直在加速 OpenAI 的工作,云端的那个产品对公司来说也是一个巨大的加速器。只是结果发现,在这一点上我们从内部试用中获得的信号,跟从大众市场中获得的信号有点不同。因为在 OpenAI,我们整天都在训练推理模型,所以我们非常习惯于这种提示方式——先想清楚,然后大规模并行地运行,等一段时间再异步地回来看结果。所以现在我们在构建产品时,依然会从内部试用中获得大量信号,但同时我们也非常清楚地意识到,不同用户群体使用产品的方式是不同的。
Lenny Rachitsky: 这真的很有意思。就像是”活在未来,但也许别太远”。我能想象 OpenAI 的每个人都生活在相当远的未来,有时候这对普通用户来说并不适用。
Alexander Embiricos: 是的。
智能、训练数据与模型进步
Lenny Rachitsky: 那么纯粹从智能和训练数据的角度呢?我不知道,还有没有其他因素帮助 Codex 加速了它的实际编码能力?是更好、更干净的数据吗?更多还是模型本身的进步?还有没有其他真正起到加速作用的东西?
Alexander Embiricos: 好的,这里面有几个层面。你提到了模型,模型确实有了巨大的提升。事实上就在上周三,我们发布了 GPT-5.1-Codex-Max——一个命名非常精准的模型——非常棒。它之所以棒,是因为对于你原来用 GPT-5.1-Codex 做的任何任务,它的完成速度大约快了 30%。同时它还解锁了大量智能。所以如果你在更高的推理级别上使用它,它就更聪明。你刚才提到 Karpathy 发的那条推文,说”把你们最棘手的 bug 交给它”——虽然现在市场上有很多动态,但 Codex-Max 确实在扛起那面旗帜,专门攻克最难的问题。这一点非常酷。
但我想说的是,我们对这件事的思考方式正在有所演进——从”好,我们只关注模型,训练最好的模型就行了”,转向真正去思考一个智能体到底是什么样的整体?我不打算精确定义”智能体”这个词,但至少在我们看来,它有一个技术栈:你有一个模型,一个非常聪明的推理模型,知道如何出色地完成某一类特定任务——我们可以稍后聊聊我们是如何做到这一点的。但然后你还需要通过 API 把这个模型服务出去,接入一个 harness(运行框架),而这两层也在这里扮演着非常重要的角色。
长时间运行与 Compaction
举个例子,我们非常自豪的一点是,你可以让 GPT-5.1-Codex-Max 工作非常长的时间。这并不寻常,但你可以这样设置,或者它有时自己就会这样运行。现在我们经常听到用户说,“对,它跑了一整夜”或者”它跑了 24 个小时”。而一个模型要持续工作这么长时间,它的上下文窗口肯定会被超出。所以我们有一个解决方案,叫做 compaction(上下文压缩)。
但 compaction 实际上是一个横跨那三个层面的功能。首先,你需要一个模型理解 compaction 的概念,知道”好的,当我开始接近上下文窗口极限时,我可能需要为在新的上下文窗口中继续运行做好准备”。然后在 API 层,你需要一个理解这个概念的 API,有一个可以调用的端点来执行这个切换。在 harness 层,你需要一个能为此准备数据载荷的 harness。所以推出这个 compaction 功能——现在任何使用 Codex 的人都能享受到这个能力——实际上意味着要同时在这三个层面上协同工作。而且我认为这种情况会越来越普遍。
另一个可能被低估的方面是:如果你看看市面上所有不同的编码产品,它们都有非常不同的工具 harness,对于模型应该如何工作有着非常不同的设计理念。如果你想训练一个模型,让它在所有这些不同的工作方式下都表现出色——也许你坚信它应该用语义搜索,也许你坚信它应该调用定制工具,又或者在我们的情况下,你坚信它就应该直接用 shell 在终端中工作——如果你只优化其中一种路线,你的进展会快得多。所以 Codex 的构建方式就是直接使用 shell,但为了更安全,我们有一个沙盒环境,模型已经习惯在其中运行。
所以回到你的问题,我认为最大的加速因素之一就是,我们在同时构建这三个层面,各自调优,并且不断实验它们如何协同工作——由一个紧密协作的产品和研究团队来推动。
竞争格局与最终胜出的路径
Lenny Rachitsky: 你认为你们能在这个领域赢下来吗?你觉得它会一直是这样各种模型你追我赶的竞赛吗?还是说存在一种可能,某个人一骑绝尘,其他人都再也追不上?有没有一条路是”我们赢了”?
Alexander Embiricos: 这又回到了构建队友的理念。不仅仅是一个能参与团队规划和优先级排序的队友,不仅仅是一个真正测试代码、帮你维护和部署的队友,甚至还是一个……你想想,一个工程队友,他也可以发个日历邀请、调整站会时间,或者做些类似的事情,对吧?所以在我看来,如果我们想象每天都有、每周都有某个疯狂的新能力被某个研究实验室发布出来,我们人类根本不可能跟上并使用所有这些技术。所以我认为我们需要达到这样一个世界:你基本上就是有一个 AI 队友或者超级助手,你跟它说话,它就知道怎么自己变得有用。你不需要去读最新的使用技巧,你把它接入,它就自动提供帮助。
所以这就是我们认为我们正在构建的东西的形态。而且我认为如果我们能做到这一点,它将是一个非常具有粘性的、能赢的产品。至少在我脑海中的那个形态是,我们构建的是……也许一个有趣的话题是,“Chat 是 AI 的正确界面吗?“我实际上认为 Chat 在你不知道应该用它来做什么的时候,是一个非常好的界面。就像我在 MS Teams 或 Slack 上跟队友交流一样,Chat 挺好的——我可以提出任何需求,它几乎是所有交互方式的公约数。所以你可以跟一个超级助手聊任何话题,不管是编码还是别的什么。然后如果你是某个特定领域(比如编码)的专业人员,就会有一个 GUI 界面可以调出来,深入查看代码、操作代码。
Alexander Embiricos: 所以我认为作为 OpenAI,我们需要构建的,基本上就是这个理念:你拥有 Chat,拥有 ChatGPT,它不是作为一个工具对所有人泛泛可用,而是你甚至在工作之外也会开始使用它来帮助自己。你会变得非常习惯于被 AI 加速的感觉。然后当你到了工作中,就很自然地觉得,“嗯,我就直接问它要这个,我不需要了解所有的连接器或各种功能,我就是直接求助,它会以当下最好的方式来帮我,甚至在我没有主动求助的时候也会主动搭把手。” 所以在我看来,如果我们能做到这一点,我认为那就是我们真正打造出制胜产品的方式。
Lenny Rachitsky: 这太有意思了,因为我和 ChatGPT 的负责人 Nick Turley 聊天时,他好像提到 ChatGPT 最初的名字叫”超级助手”之类的。有趣的是,一方面有那种超级助手的路径,另一方面又有 Codex 的路径。几乎就像是 B2C 版本和 B2B 版本的区别。我现在听到的是,思路是先从编码和构建开始,然后它会帮你做所有其他事情,安排会议,我猜可能还包括在 Slack 上发消息,发布设计稿之类的。我不确定,这个想法是不是说这某种意义上就是 ChatGPT 的企业版?还是有别的什么?
Alexander Embiricos: 对。所以我们进入了关于一年时间线的讨论。很多东西可能更早就会发生,但就模糊程度而言,我觉得我们在一年的维度上。所以我给你一个论点和一个说得通的实现路径,但至于具体怎么发生的,谁知道呢?基本上,如果我们想构建一个超级助手,它必须能做事。所以我们需要一个模型,它能够做一些影响你世界的事情。而过去一年左右我认为我们看到的一个经验是,要让模型做事,当它们能够使用计算机时,效率要高得多。
好,所以现在我们在想,好吧,我们需要一个能使用计算机的超级助手,或者能使用多台计算机。接下来的问题是,它应该怎么使用计算机?使用计算机有很多种方式。你可以尝试黑进操作系统,用辅助功能 API,或者稍微简单一点,让它点击操作。但这有点慢,有时候也不太可预测。另一种方式,事实证明模型使用计算机最好的方式就是写代码。所以我们逐渐得到了这样一个想法:如果你想构建任何 agent,也许你应该构建一个编码 agent,而对用户来说,一个非技术用户,他们甚至不会知道自己使用的是一个编码 agent,就像没有人会去想自己是不是在用互联网一样,他们更多只是在想,“WiFi 开了吗?”
所以我认为我们在 Codex 上做的事情,就是构建一个软件工程队友,而作为其中的一部分,我们实际上在构建一个通过写代码来使用计算机的 agent。所以我们已经看到了一些这方面的需求拉力。目前还很早期,但我们开始看到有人把 Codex 用于编码相邻的产品场景。随着这方面的发展,我认为我们自然会发现,哦,原来我们应该总是让 agent 写代码来解决一个问题——只要有可以用编码解决的方式。即使你在做财务分析,也许也可以为此写点代码。
所以基本上就像你说的,“这是 ChatGPT 超级助手的产品的两个端点吗?” 在我看来,编码是任何 agent(包括 ChatGPT)的核心能力。所以我们真正认为自己在构建的,就是那项能力。agent 写代码真正酷的地方在于,你可以导入代码。代码是可组合的、可互操作的。因为对 agent 一种非常简化的看法是,给它一台计算机,它就点击操作、到处摸索。但那是未来。而到达那里的路径很难规划,因为构建 agent 的很多问题不是”agent 能不能做”,而更多是”我们如何帮助 agent 理解它所处的上下文?” 使用它的团队可能有一套他们喜欢的工作方式。他们有各种规范。他们可能想要对 agent 能做什么或不能做什么有确定性的保证。或者他们想知道 agent 理解了这个细节。
agent 的可配置性与脚本复用
举个例子,如果我们看一个崩溃报告工具,对接它的连接器,每个子团队可能都有不同的元提示词来规定他们希望崩溃被如何分析。所以我们逐渐走到这一步:我们有一个坐在计算机前面的 agent,但我们需要让它对团队或用户是可配置的,让他们可以……agent 经常做的事情,我们可能就想把它做成 agent 内置的一项能力。
所以我认为我们最终会得到你说的那种通用化的东西,一个 agent 可以自己写脚本来做任何它想做的事。但我认为这里真正关键的部分是,我们能不能做到这一点:agent 经常需要做的或者做得好的事情,我们可以直接记住并存储下来,这样 agent 就不需要再为那个任务重写脚本了?或者如果我刚加入一个团队,而你已经在同一个团队里了,我就可以直接使用那些 agent 之前已经写好的脚本。
Lenny Rachitsky: 对,就像如果这是我们的队友,它可以分享从公司其他人那里学到的东西。这个比喻完全说得通。
Alexander Embiricos: 没错,是的。
Lenny Rachitsky: 感觉你站在 Karpathy 那个阵营里,就是”今天的 agent 还不怎么样,大多是垃圾产出,也许未来会很厉害”。这个说法你认同吗?
Alexander Embiricos: 我认为编码 agent 已经相当不错了。我认为我们在那里看到了大量的价值。
Lenny Rachitsky: 对,这感觉是对的。感觉是对的,是的。
Alexander Embiricos: 然后我认为编码之外的 agent,还非常早期。这只是我的个人看法,但我认为一旦它们也能以可组合的方式使用编码,就会好非常多。这有点像你为软件工程师构建产品时有趣的部分。在我的创业公司里,那段时间我们也一直是为软件工程师构建产品,他们真的是一个非常有趣的受众群体,因为他们也喜欢为自己构建东西,而且在思考如何使用技术方面往往比我们更有创造力。所以通过为软件工程师构建产品,你可以观察到大量涌现的行为,以及你应该做和构建到产品中的东西。
Lenny Rachitsky: 我喜欢你这么说,因为很多为工程师构建产品的人会很烦,因为工程师总是在抱怨,“啊,这太烂了,你为什么这么构建?” 我喜欢你乐在其中的态度,但我觉得可能是因为你在为工程师构建一个真正能解决问题、帮他们写代码的了不起的工具。
AI 对工程领域的影响
顺着这个话题,关于工作岗位、工程师、编程、还需不需要学编程,这些讨论一直没停过。很明显你描述的方式是,它是一个队友,会和你协作,让你变得更超人,不会取代你。你怎么看这件事对工程领域的影响,拥有这样一个超级智能的工程队友?
Alexander Embiricos: 我觉得这件事有两个方面,但我们刚才聊到的那个方面是——也许每个 agent 都应该使用代码,都应该是编程 agent。在我来看,这只是更宏大理念的一小部分,那就是,随着代码变得更加无处不在……我的意思是,你大概可以说它在今天已经无处不在了,甚至在 AI 之前就是如此,对吧?但随着代码变得更加无处不在,它实际上会被用于更多更多的场景。因此,对具备这种能力的人类的需求也会大大增加。
所以这就是我的看法。我觉得这是一个相当复杂的话题,也是我们经常讨论的事情,还得看它最终如何发展。但我认为,作为在这个领域构建产品的团队,我们基本能做的就是,始终思考我们正在构建的工具如何让人感觉我们在最大化地加速人类,而不是构建一个让人更不清楚自己该做什么的工具。
举个例子,现在当你和一个编程 agent 协作时,它会写大量代码,但事实证明,对很多软件工程师来说,写代码其实是软件工程中最有趣的部分之一。结果你变成了审查 AI 代码的人,而对很多软件工程师来说,这往往是工作中不那么有趣的部分。所以我确实认为,这种博弈在无数微观决策中不断上演。作为产品团队,我们一直在思考:“好,我们怎么让这件事变得更有趣?怎么让你感觉更有掌控力?哪些地方做得不对?“我觉得审查 agent 写的代码就是一个目前不那么有趣的地方。
那我就想,“好吧,我们能对此做些什么?“我们可以发布一个代码审查功能,帮你对 AI 写的代码建立信心。好的,没问题。另一件我们能做的事是,让 agent 更好地验证自己的工作。这会一直深入到微观决策层面。如果你要让 agent 具备验证工作的能力,假设你在用 Codex Web,你有一个反映 agent 工作成果的界面,你首先看到的是什么?是 diff 还是它写的代码的图片预览?如果你从”我如何赋能人类?如何让他们感觉被最大化地加速?“这个角度来思考,你显然应该先看到图片。你不应该在还没看到图片的情况下就去审查代码,除非它已经被 AI 审查过了,现在轮到你来过目。
Lenny Rachitsky: 我之前请 Cursor 的 CEO Michael Truell 上播客的时候,他有一个愿景,认为我们会走向某种超越代码的东西。我也看到了一种叫”spec 驱动开发”的模式兴起,就是你写规格说明,然后 AI 帮你写代码。你开始在更高的抽象层级上工作。你觉得这是我们的方向吗——工程师不再需要亲自写代码或看代码,会有一个我们专注的更高抽象层级?
Alexander Embiricos: 对,我的意思是,抽象层级一直在不断提高,而且实际上今天已经在发挥作用了。现在的编程 agent,基本上是 prompt to patch。我们开始看到有人在尝试 spec 驱动开发或计划驱动开发。这实际上是当人们问”嘿,你怎么让 Codex 执行一个很长的任务?“的方法之一——通常的做法是先和它协作写一个 plan.md,就是一个 markdown 格式的计划文件。你对计划满意之后,再让它放手去干。如果那个计划中有可验证的步骤,它就能工作更长时间。所以这种现象我们完全看到了。
闲聊驱动开发
我觉得 spec 驱动开发是一个有趣的想法。我不太确定它最终是否会朝那个方向发展,因为很多人也不喜欢写规格说明。但似乎有些人会这样工作,这是说得通的。不过有一个有点开玩笑的想法是,如果你想想今天很多团队是怎么工作的,他们通常不一定有规格说明,但团队本身非常有自驱力,所以事情就这么做成了。所以几乎是——这个是我临时想的,所以名字不太好——“闲聊驱动开发”(chatter-driven development),就是各种事情在社交媒体和团队沟通工具中发生,然后代码就被写出来、部署了。
所以是的,我其实更倾向于那个方向,我甚至不一定想写规格说明。有时候我想写,但仅限于我喜欢写规格说明的时候。其他时候我可能只想说:“嘿,这是客户服务频道,告诉我有什么值得关注的,如果是个小 bug,直接修了就行。“我不想为这种事写规格说明,对吧?
Tinder 式的未来:滑动审批 agent 提案
我有一个经常分享给别人的假想未来,算是一种思想实验。在我们拥有真正出色的 agent 的世界里,作为一个独立创业者会是什么样的?一种糟糕的想法是:实际上有一个移动应用,agent 的每一个待办事项都以竖屏短视频的形式推送到你手机上,然后你可以左滑表示这是个坏主意,右滑表示是个好主意。你可以长按对着手机说话,在滑动之前对提案给出反馈。在这个世界里,你的工作基本上就是把这个应用接入每一个信号系统或记录系统,然后你就坐下来滑动就行了。我不知道。
Lenny Rachitsky: 我太喜欢这个了。这就是 Tinder 遇上 TikTok 遇上 Codex。
Alexander Embiricos: 这想法挺糟糕的。
Lenny Rachitsky: 不,这很棒。这里的想法是,agent 在观察和倾听你,关注市场、你的用户,然后说:“好的,这件事我应该做。“就像一个主动的工程师:“嘿,我们应该构建这个功能,修掉这个东西。”
Alexander Embiricos: 没错,没错。
Lenny Rachitsky: 我觉得这是个非常好的主意。
Alexander Embiricos: 以对你来说最低成本的方式和你沟通。
Lenny Rachitsky: 对对,我们现代人沟通的方式——左右滑动加竖屏信息流。然后再加上 Sora 视频,好吧,我现在明白这一切是怎么串起来的了。我明白了。
Alexander Embiricos: 对。澄清一下,我们并没有在构建那个东西,但这确实是个有趣的想法。不过在这个例子中,它在做的事情之一是消费外部信号。我觉得另一个非常有趣的事情是,如果我们想想迄今为止最成功的 AI 产品是什么,我认为——说起来有点好笑,虽然不会搞混,但 OpenAI 第一次使用 Codex 这个品牌,实际上是驱动 GitHub Copilot 的那个模型。那是很早以前的事了,几年前。所以我们最近决定重新使用那个品牌,因为 Codex 这个名字太好了,代码执行。
最成功的 AI 产品:IDE 自动补全
但实际上我认为 IDE 中的自动补全是当今最成功的 AI 产品之一。它之所以如此神奇,部分原因在于,当它能快速地为你提供帮助建议时——当它是对的,你被加速了;当它是错的,也没那么烦人。它可能会烦人,但没那么烦人。所以你能创造出这样一种混合主导的系统,它会根据你正在尝试做的事情做出上下文感知的响应。所以在我看来,这对我们 OpenAI 在构建产品时是一个非常值得关注的方向。
浏览器中的情境化辅助
Alexander Embiricos: 举个例子,当我想起我们通过 Atlas 推出浏览器的时候,在我看来,随之而来的一件非常有意思的事情是,我们可以在你日常工作的过程中,情境化地浮现出我们能够帮助你的方式。这样我们就跳出了仅仅查看代码或仅仅在终端里的局限,进入到一个新的理念——“嘿,一个真正的队友要处理的事情远不止代码。他们还要处理大量与网页内容相关的事情。那么我们如何在这些方面也能帮到你?”
Lenny Rachitsky: 天哪,这里面信息量太大了,我很喜欢。好的,浏览器中的自动补全,这太有意思了。就是那种——“这是你在浏览和日常工作中我们能帮到你的一切。”
我想聊聊 Atlas,稍后再回来。Codex,代码执行,之前不知道这个,真的很巧妙。我现在明白了。然后这个”闲聊”,什么是闲聊驱动开发?不,这确实是个很好的想法,但它让我想起,我之前请过 Dhanji 上播客,他是 Block 的 CTO,他们有一个叫 Goose 的产品,是他们内部的 agent 工具。他讲到 Block 的一位工程师就是让 Goose 盯着他的屏幕,旁听每一场会议,然后主动去做他可能想做的工作。比如提交 PR、发邮件、起草 Slack 消息。所以他做的正是你描述的那种事情,只是以一种非常早期的方式。
Alexander Embiricos: 对,那非常有趣。我猜,如果我们去问他们,这种生产力的瓶颈在哪里,他们有分享过吗?
Lenny Rachitsky: 大概就是去看一眼,确认这确实是对的事情,嗯。
Alexander Embiricos: 对。所以我们现在看到了这一点。我们有 Codex 的 Slack 集成。如果你需要快速做点什么,人们非常喜欢 @ 提一下 Codex,“你觉得这个 bug 是怎么回事?” 这不一定要是工程师。甚至可能数据科学家在这里也大量使用 Codex 来回答问题,比如”你觉得这个指标为什么会变动?发生了什么?” 提问之后,你直接在 Slack 里就能得到回答。这太棒了,超级有用。但当涉及到写代码的时候,你还是得回去看代码。
代码验证与审查的瓶颈
所以我认为,当前的真正瓶颈是验证代码是否正常工作以及撰写代码审查。因此在我看来,如果我们想达到你朋友所描述的那种状态,我认为我们真正需要解决的是,如何让人们在那些工作的后期阶段将他们的编码 agent 配置得更加自主。
Lenny Rachitsky: 这说得通。就像你说的,写代码——我以前也是工程师,做了十年工程师——写代码真的很有趣,进入心流、搭建架构、测试,真的很好玩。但看别人的代码就没那么有趣了,还得逐行检查,如果代码做了什么蠢事会导致生产环境崩溃,责任还在你身上。而现在构建变得越来越容易,我从那些真正处于前沿的公司那里一直听到的是,瓶颈变成了弄清楚该构建什么。然后到了最后就是,“好的,我们有这 100 个 PR 要审查。谁去全部过一遍?”
Alexander Embiricos: 对。
Lenny Rachitsky: 嗯。
Codex 对产品经理工作方式的影响
Lenny Rachitsky: Codex 对你作为产品人员、作为 PM 的运作方式产生了什么影响?对工程的影响很清楚,代码替你写了。但它对你在 OpenAI 的运作方式、对 PM 们的运作方式产生了什么影响?
Alexander Embiricos: 嗯,我觉得主要是我感到自己的能力大大增强了。我一直算是偏技术型的 PM,尤其是在为工程师做产品的时候,我觉得内部试用产品是必要的。但除此之外,我觉得作为 PM 我能做的事情多了非常多。Scott Belsky 谈过一个关于压缩人才栈的想法。我不确定我说得对不对,但大致意思就是,也许这些角色之间的边界比以前没那么必要了,因为人们可以做到更多的事情。而每当一个人能做更多的事情,你就可以跳过一个沟通边界,让团队的效率提升一截。
所以我们现在在很多职能中都看到了这种现象,不过既然你特别问到了产品方面——现在回答问题变得容易多了。你可以直接问 Codex 的想法。很多 PM 类型的工作,比如理解发生了什么变化,同样直接让 Codex 帮忙就行。做原型往往比写 spec 还要快。这是很多人都在谈论的事情。
一次性代码与设计团队的变化
我觉得有一件事——可能不算特别意外,但稍微有点意外的是——我们看到……我们构建 Codex 主要是为了编写将要部署到生产环境的代码,但实际上我们现在看到大量用 Codex 写的一次性代码。这又回到了代码无处不在的理念。比如有人想做分析,如果我想了解某个东西,就是——好的,给 Codex 一堆数据,然后让它为这些数据构建一个交互式数据查看器。这在过去太麻烦了,但现在完全值得花时间让一个 agent 去做。
类似地,我在我们的设计团队看到了一些非常酷的原型——如果你想……嗯,一位设计师想要构建一个动画,这是那个 Coin Animation Codex,正常情况下编程做这个动画太麻烦了。所以他们直接 vibe coded 一个动画编辑器,然后用这个动画编辑器来构建动画,最后把它提交到他们的代码仓库里。
实际上,我们的设计师在那里获得了大量的加速。说到压缩人才栈,我觉得我们的设计师非常 PME 化。所以他们做了大量的产品工作,而且他们实际上有一整个 vibe coded 的 Codex 应用的独立原型。所以我们讨论很多事情的方式是——我们会做一个非常快速的讨论,因为同时有一万件事情在进行,然后设计师会去思考这应该怎么运作。但他们不会再来讨论一遍,而是直接用 vibe code 在他们的独立原型里做一个原型出来。我们会试用一下,如果觉得不错,他们就把那个原型 vibe engineer 成一个实际的 PR 落地。然后取决于他们对代码库的熟悉程度——比如 Codex 用到 Rust 会稍微难一些——也许他们自己提交,或者做到差不多然后让工程师帮他们完成 PR。
Sora 与 Atlas:加速开发的真实案例
Alexander Embiricos: 我们最近发布了 Sora 的 Android 应用,这实际上是加速效果最令人震撼的案例之一。因为 OpenAI 内部使用 Codex 的比例显然非常非常高,而且今年一直在增长——现在基本上所有技术人员都在用——但更重要的是,大家对如何最大化利用 coding agent 的熟练度和深度也提升了很多。Sora 的 Android 应用是一个全新的 App,我们用了 18 天就建好了——从零到发布给内部员工使用,然后又过了 10 天,总共 28 天,我们就面向公众正式发布了,全程都是在 Codex 的帮助下完成的。这个速度确实非常惊人。
不过我想说,这个案例有点……我不想说它算简单模式,但如果你是一家在多个平台上开发软件的公司,Codex 有一件事特别擅长——就是你已经搞定了底层的一些 API 或系统,然后让 Codex 去做跨平台移植,这非常高效,因为它有现成的东西可以参考。所以那个团队的工程师基本上就是让 Codex 去看 iOS 版的 App,生成需要完成的工作计划,然后逐一实现。它可以同时看 iOS 和 Android 的代码。所以基本上两周发布到内部员工,四周正式上线。速度快得不可思议。
Lenny Rachitsky: 更让人不可思议的是,它直接成了应用商店排名第一的 App。这简直令人难以置信。所以是 28 天?
Alexander Embiricos: 对,想象一下,排名第一的 App,只有两三个工程师,几周时间就搞定了。
Lenny Rachitsky: 这太离谱了。哇。
Alexander Embiricos: 是的,这是一个非常有趣的加速案例。另一个是 Atlas,Ben——Atlas 的引擎负责人——做了一个播客,分享了我们是怎么构建它的。Atlas 其实……它是一个浏览器,而构建浏览器是非常困难的。所以我们需要构建大量复杂的系统才能实现。基本上那个团队现在全是 Codex 的重度用户,而且达到了这样一个程度——我们和他们聊过,因为那些工程师中很多人是我之前在创业公司时的老同事。他们会说,“以前这需要两三个工程师花两到三周,现在一个工程师一周就能搞定。“所以那边也实现了巨大的加速。
还有一个很酷的事,我们最先在 Mac 上发布了 Atlas,但现在正在做 Windows 版本。团队正在推进 Windows 的开发,同时也在帮我们把 Codex 在 Windows 上做得更好——坦白说这个版本还比较早期,我们上周发布的模型是第一个原生理解 PowerShell 的模型。PowerShell 是 Windows 上的原生 Shell 语言。所以,看到整个公司都在被 Codex 加速,真的很棒——从最明显的研究层面,到模型训练速度和质量的提升,再到设计,正如我们之前谈到的,还有市场营销。实际上现在我们的产品营销人员经常直接从 Slack 里修改文案或者更新文档。
Lenny Rachitsky: 这些案例真的很棒。你们正处于可能性的最前沿,而这也是其他公司未来的工作方式。发布一个成为应用商店排名第一的 App,而且受到全世界的喜爱——它简直是横扫了,至少横扫了一整周。你说 28 天就做出来了,其中核心功能大概 18 天、10 天就搞定了。
Alexander Embiricos: 对,18 天我们就有了一个员工可以在内部试用的版本,然后 10 天后面向公众发布。
Lenny Rachitsky: 你说只有几个工程师?
Alexander Embiricos: 对。
Lenny Rachitsky: 两三个。好的。那 Atlas 你说一周就建好了?
Alexander Embiricos: 不不不,Atlas 不是一周,Atlas 是一个非常有分量的项目。我和 Atlas 团队的一个工程师聊过他们用 Codex 做什么,基本上就是”我们用 Codex 做所有事情”。我就问,“那你怎么衡量加速效果?“基本上我得到的回答是,“以前两三个工程师要花两到三周的工作量,现在一个工程师一周就能完成。“
非工程师也能开发吗?
Lenny Rachitsky: 你觉得这最终会不会发展到非工程师也能做这些事情?这类东西必须由工程师来构建吗?Sora 有可能由一个 PM 或设计师来构建吗?
Alexander Embiricos: 我觉得我们非常接近那样一个阶段——基本上各种角色的边界会变得模糊。我认为你还是需要有人理解他们所构建的东西的细节,但那些细节具体是什么,会不断演变。就像现在你写 Swift 不需要懂汇编语言。世界上有一些人——可能不止一些——他们懂汇编,他们的存在非常重要,但这是一个专业化的职能,大多数公司不需要。
所以我认为我们会自然地看到抽象层的不断增加。而酷的地方在于,现在我们进入了语言层——自然语言的抽象层,而自然语言本身又非常灵活。工程师可以讨论一个计划,可以讨论一个 spec,也可以只讨论一个产品想法或概念。所以我觉得我们也可以在这些抽象层之间逐步向上攀升。
但我确实认为这会是一个渐进的过程。我不认为会出现那种突然间没人再写任何代码、全都是 spec 的情况。我觉得更可能是这样的——“好,我们已经把 coding agent 配置得很擅长预览构建结果或运行测试了,“也许这是大多数人最先搞定的部分。然后是,“好,现在我们让它能执行构建并且能看到自己修改的结果了,但我们还没有建好一个很好的集成 harness,让它能够……”比如 Atlas 的情况——顺便说一句,我不确定他们具体做了多少这些,我觉得他们做了很多。但也许下一阶段是让它能加载几个示例页面,看看效果如何。然后,“好,现在我们把它配置成能做这件事。”
我认为至少在一段时间内,我们还是需要人类来筛选和配置 agent 需要对接的那些连接器、系统或组件。而在未来,会有一个更大的突破——Codex 会告诉你怎么设置,甚至可能自己在代码仓库里完成设置。
Lenny Rachitsky: 活在这个时代真是太疯狂了。哇。我很好奇这种事情的二阶效应——构建东西的速度变得这么快,会带来什么影响?是不是意味着分发变得重要得多?是不是意味着想法的价值大大提升?想想这种速度变化带来的影响,确实很有意思。
Alexander Embiricos: 我倒想听听你的看法。我仍然觉得想法的价值没有很多人认为的那么高。我觉得执行依然很难。你可以快速构建出东西,但仍然需要执行得好,最终它需要是一个合理的、整体一致的产品。是的,而且分发的分量非常重。
Lenny Rachitsky: 感觉现在除了构建之外的一切都变得更加重要了——提出想法、推向市场、盈利,诸如此类。
Alexander Embiricos: 对。我觉得我们可能经历过一个奇怪的临时阶段:在一段时间里,构建产品如此之难,以至于你基本上只需要擅长构建产品就行,至于你是否深入理解某个特定客户,可能并不那么重要。但现在我觉得我们到了这样一个节点:如果只能选择理解一件事,那我会选择深入理解某类客户的问题。如果只能带一项核心能力入场的话。
所以我认为最终真正重要的还是这件事。如果你今天正在创办一家新公司,而且你对那些目前被 AI 工具服务得不够好的客户有着非常深入的理解和人脉,我觉得你就稳了。反过来,如果你擅长做网站,但没有特定的客户群体可为之构建,我觉得你的路会难走得多。
垂直 AI 创业的前景
Lenny Rachitsky: 听下来就是看好垂直 AI 创业。完全同意。通用型产品可以解决很多问题,但还有一种路径是——“我们要把演示文稿这件事做到极致,我们要比任何人都更理解演示文稿的问题,我们要接入你的工作流,以及所有对解决某个具体问题真正重要的东西。“好的,太棒了。
Codex 的衡量指标
当你思考 Codex 的进展时,我想你们应该有一堆 evals,还有各种公开的 benchmark。你会看什么来判断”好的,我们在取得很好的进展”?我想不会只有一个指标,但你重点关注什么?你在推动什么?有哪些 KPI?
Alexander Embiricos: 我经常提醒自己的一点是,像 Codex 这样的工具,天然地是一个会让你成为高级用户的工具。所以我们很容易不小心把大量时间花在思考用户 adoption 旅程中非常深入的功能上,结果可能过度优化了那部分。因此我觉得去看 D7 留存率是至关重要的。亲自去试用产品,从头注册一个新账号。我手头有太多 ChatGPT Pro 账号了——为了最大程度正确地内部试用,我用 Gmail 注册了好几个,每个月被扣 200 美元。我还得去报销。但我认为那种作为终端用户的感受,以及早期留存数据,对我们来说仍然非常重要,因为这个品类虽然在起飞,我觉得人们的使用仍然处于非常早期的阶段。
另一件事是,我们可能是这个领域里被用户反馈和社交媒体影响最深的团队——我们几个人经常刷 Reddit 和 Twitter,上面有赞美,也有很多抱怨,但我们会非常认真地对待那些抱怨并仔细研究。我觉得同样因为编码 agent 可以用来做太多不同的事情,它在很多特定行为上往往有各种各样的问题。所以我们确实会经常关注社交媒体上的风评,尤其是 Twitter/X 上的内容会更偏向炒作,而 Reddit 上虽然负面一些,但更真实。所以我越来越关注人们在 Reddit 上是怎么讨论使用 Codex 的。
Lenny Rachitsky: 这一点对大家来说很重要。你最常看哪些 subreddit?有 r/Codex 吗?
Alexander Embiricos: 算法推送挺准的,不过 r/Codex 确实存在。
Lenny Rachitsky: 好的,了解了。很有意思。那如果有人在 Twitter 上 tag 你,你也能看到,但可能不如在 Reddit 上那么有效。
Alexander Embiricos: 嗯,是的。Twitter 的问题在于它更偏一对一,即使是在公开场合。而 Reddit 有很好的投票机制,而且可能大多数人还不是机器人——这不好说。所以你能从中获得关于什么重要、其他人怎么想的很好的信号。
Atlas 浏览器的初衷
Lenny Rachitsky: 说个有意思的话题——Atlas,我想简单聊聊。你们发布了 Atlas,我当时其实发了条推文,说我试了 Atlas,但不太喜欢纯 AI 的搜索体验。我就想,“有时候我就是要用 Google 啊”之类的。等 AI 给我答案的时候我就想,“我不想……”而且也没法切换。我就发了条推文说,“我要换回去了,体验不太好。“我觉得我让 OpenAI 的一些 PM 伤心了。后来我看到有人发推说,“好了,我们有 Atlas 了”——我想这应该一直是计划的一部分。这大概就是一个例子,“先发布,看人们怎么用,然后再迭代。“所以,一个是关于这件事你有什么想说的吗?二是,我很好奇你们为什么要做一个网页浏览器?
Alexander Embiricos: 我之前参与过 Atlas 的工作,现在没有在做。对我来说这背后的一条叙事线是这样的——我之前在做那个屏幕共享、结对编程的创业公司,然后我们加入了 OpenAI。当时的想法就是构建一个有上下文感知的桌面助手。我之所以认为这很重要,是因为我觉得每次都要把所有上下文告诉助手,然后再弄清楚它能怎么帮你,这真的很烦人。如果它能直接理解你在做什么,就能最大程度地加速你。所以其实我现在仍然把 Codex 看作一个有上下文感知的助手,只是从不同的角度切入,从编码任务开始。
至少对我个人而言——我不能代表整个项目说话——当时的一些想法是,很多工作是在浏览器中完成的。如果我们能构建一个浏览器,就能以一等公民的方式为你提供上下文感知,而不是去 hack 其他桌面软件——它们对 accessibility tree 中渲染内容的支持参差不齐。我们也不需要依赖截图,那有点慢而且不够可靠。相反,我们可以深入渲染引擎,提取任何我们需要的信息来帮助你。另外我喜欢拿电子游戏来类比——不知道你玩过没有,比如《光环》,你走到一个物体面前——其实很多游戏都这样——你按一下……天哪,太久了,有点尴尬。按 X,它就会执行正确的操作。我是那种会读每一款买的游戏说明书的人。
我记得第一次了解到”上下文动作”这个概念时,就觉得这是一个非常酷的想法。上下文动作的关键在于,我们需要知道你正在试图做什么。我们获取一点上下文,然后就能帮助你。我认为这一点至关重要,因为想象一下我们达到了那样的世界——agent 每天帮你上千次。
想象一下,如果 agent 帮助你的唯一方式是给你发推送通知。那你每天会收到一千条推送通知,AI 说”嘿,我做了这件事,你觉得怎么样?“那会非常烦人,对吧?反过来想象一下——回到软件工程的例子——我正在看一个仪表盘,注意到某个关键指标下降了。在那个时刻,AI 也许可以去查看一下,然后就在我盯着仪表盘的时候,把它对这个指标为什么下降的判断,以及可能的修复方案,直接呈现在我面前。这样更能让我保持心流状态,也让 agent 能够在更多事情上采取行动。
浏览器与上下文感知的优势
Alexander Embiricos: 所以在我心目中,我们对做浏览器感到兴奋的部分原因是,我认为这样我们就能对你需要什么帮助拥有更多的上下文感知。用户也能对希望我们关注的内容有更大的控制权。比如,“嘿,如果你希望我们对某件事采取行动,你可以在 AI 浏览器中打开它。如果你不希望,那就在你其他的浏览器里打开。” 控制和边界非常清晰。然后我们有能力构建混合主导的 UX,这样我们就能在有用的时刻向你呈现上下文动作,而不是随机给你发通知。
Lenny Rachitsky: 听到 Codex 的愿景是成为超级助手,它不仅仅是在那里帮你写代码,而是试图作为队友、作为超级队友为你做很多事情,让你在工作中表现得很出色。我理解这一点。说到这里,Codex 还有哪些非工程类的常见使用场景?非工程师的使用方式……我们聊过设计师做原型和构建东西,有没有什么有趣的或者出人意料的方式是工程师以外的人在用 Codex 的?
Alexander Embiricos: 出人意料的使用方式确实有很多,但我认为目前真正看到用户产生实质性使用的地方,还是集中在与编码密切相关或者技术导向的领域——比如有成熟生态系统的场景,或者你在做数据分析之类的事情。我个人预期,随着时间的推移,我们会看到更多非编码场景的使用。但就目前而言,我们让团队非常专注于编码这一件事,因为还有太多的工作要做。
Codex 适用的代码库与最佳使用方式
Lenny Rachitsky: 对于正在考虑尝试 Codex 的人来说,它能适用于各种类型的代码库吗?它支持哪些代码?比如说,如果你是 SAP 的用户,能不能接入 Codex 开始构建东西?它的最佳适用场景是什么?在什么情况下它还不够出色?
Alexander Embiricos: 我很高兴你问了这个问题,因为尝试 Codex 最好的方式是给它你最难的任务,这跟其他一些编码 agent 有点不同。有些工具你可能会想,“好吧,让我先从简单的开始,或者随便 vibe code 点什么,看看我喜不喜欢这个工具。” 而 Codex 的定位是专业工具,你可以把最难的问题交给它。在你庞大的代码库中写出高质量代码——当然,这一点目前还不是完美的。所以,如果你想尝试 Codex,你应该在一个真实的任务上试用,不必把任务简化成微不足道的东西。一个好的例子是:你遇到了一个棘手的 bug,不知道是什么原因导致的,你让 Codex 帮你排查,或者让它实现修复方案。
Lenny Rachitsky: 我喜欢这个回答。直接把最难的问题交给它。
Alexander Embiricos: 我要补充一句,如果你说”好吧,我遇到的最难的问题是我需要创立一家新的独角兽公司”,那显然是不行的。至少目前还不行。所以我认为关键是给它最难的问题,但那个问题仍然是一个明确的问题或一个任务来开始。这是试用阶段的建议,之后你可以逐步学会用它处理更大的事情。
Lenny Rachitsky: 它支持哪些编程语言?
Alexander Embiricos: 基本上,我们训练 Codex 时支持的语言分布,与这些语言在世界上的使用频率基本一致。所以除非你在写某种非常冷门的语言或者私有语言,否则在你的语言上应该都能正常工作。
新手使用建议
Lenny Rachitsky: 如果有人刚开始使用,你有什么建议能帮助他们顺利上手?如果你能对一个刚设置好 Codex 的人耳语一个小技巧,帮助他获得很好的体验,你会说什么?
Alexander Embiricos: 我可能会说,尝试同时并行做几件事。你可以给它一个困难的任务,也可以让它理解代码库,然后围绕你的想法跟它一起制定计划,逐步往上构建。这里的核心理念是,还是那个类比——你是在跟一个新队友建立信任关系。你不会走到一个新队友面前直接说,“嘿,做这件事。” 零上下文。你会先确保他们理解代码库,然后也许对齐一下方案,再让他们一件一件去做。我认为如果你以这种方式使用 Codex,自然会逐渐理解不同的提示方式,因为它是一个非常强大的 agent 和模型,但提示 Codex 的方式跟其他模型确实有些不同。
AI 时代的技能建议
Lenny Rachitsky: 最后几个问题。第一个,我们之前稍微谈到过一点,随着 AI 越来越多地参与编码,总会有这个问题:“我还应该学编程吗?我为什么要花时间做这种事情?” 对于正在思考职业方向的人,尤其是对软件工程和计算机科学感兴趣的人来说,你认为计算机科学中有哪些具体领域变得越来越重要,应该深入钻研?又有哪些可能不太需要担心?随着 AI 在工作场所越来越普及,你觉得人们应该在哪些技能上投入更多?
Alexander Embiricos: 我认为可以从几个角度来看这个问题。最直接的一个角度就是做一个实干的人。随着编码 agent 越来越强,即使是在校大学生或应届毕业生,能做的事情也比以前多得多了。所以我认为你应该充分利用这一点。而且我在招聘初级人员时,确实会考虑他们使用最新工具的效率有多高——他们应该非常高效。如果你从这个角度来看,初级人员相对于资深人员的劣势实际上在缩小,因为现在有了这些强大的编码 agent。所以这是一方面,建议就是学什么都行,但要确保花时间动手做事情,而不只是完成作业。
另一方面,深入理解什么构成了一个好的软件系统依然是非常值得的。所以我认为像扎实的系统工程设计能力,甚至与团队高效沟通和协作的能力——这些技能是重要的,而且会在相当长的时间内继续重要。我不认为会出现 AI 编码 agent 突然就能在无需你帮助的情况下构建完美系统的局面。我认为它会是一个更渐进的过程——我们有这些 AI 编码 agent,它们能够验证自己的工作,但你的参与仍然重要。
Agent 自我验证与人在回路中的作用
Alexander Embiricos: 举个例子,我想到之前在 Atlas 项目上工作的一位工程师,既然我们刚才提到了这个项目。他给 Codex 做了配置,让它能够验证自己的工作。这件事其实不太容易,因为 Atlas 项目的性质比较特殊。他的做法是直接给 Codex 写 prompt,说”嘿,你为什么不能验证自己的工作?把它修好”,然后循环执行这个过程。所以在各个阶段,你仍然会希望有人参与其中,帮助配置编码 agent 使其更有效。所以我认为你仍然需要具备这种推理能力。也许打字速度很快、精确知道怎么写代码已经没那么重要了——当然本来也没人手写一个 4H 循环之类的——你不需要知道如何实现某个特定算法。但我认为你需要能够对不同系统进行推理,理解什么让一个软件工程团队高效运转。所以我认为这是另一件非常重要的事情。
Alexander Embiricos: 然后也许最后一个角度是,我认为如果你处于某个领域的前沿,深入钻研下去依然是非常有价值的。部分原因是那些知识仍然会……agent 在前沿领域不会那么厉害,但也部分原因是,我认为通过尝试推进某个具体领域的前沿,你实际上会被迫去利用编码 agent,在过程中用它们来加速自己的工作流。
处于前沿的例子
Lenny Rachitsky: 你说到处于某个领域的前沿,能举个例子吗?
Alexander Embiricos: Codex 自己写了大量用于管理自身训练运行的代码,这些都是关键基础设施。我们推进速度很快,所以 Codex 的代码审查会 catch 到很多错误。实际上它也造成了一些相当有趣的配置错误。我们正在看到未来的一些端倪——我们甚至开始让 Codex 为自己的训练值班(on call),这很有意思。这方面的东西很多。
Lenny Rachitsky: 等等,为自己的训练值班是什么意思?就是它在训练过程中发现”哦,出了问题,需要有人……”它是通知人工,还是说”我来修复这个问题然后重启”?
Alexander Embiricos: 这是一个我们正在探索的早期想法。但基本思路是,在训练运行期间,有大量的图表需要人工去监控,这非常重要。我们管这个叫”看孩子”(babysitting)。
Lenny Rachitsky: 我猜是因为训练成本非常高,而且快速推进也很重要——
Alexander Embiricos: 完全正确。而且训练运行背后有很多系统支撑。某个系统可能宕机,或者某个地方引入了错误。所以我们需要去修复它,或者暂停运行,或者——有很多可能的操作。所以基本想法就是让 Codex 循环运行,持续评估那些图表随时间的变化,这是我们想到的一个方案,可以让我们更高效地进行训练。
Lenny Rachitsky: 我很喜欢这个。这非常符合 agent 的未来方向——Codex 不仅仅是为了写代码,它的作用远不止于此。
Alexander Embiricos: 是的。
AGI 时间线
Lenny Rachitsky: 好的,最后一个问题。既然你在 OpenAI 工作,我不可能不问你关于 AGI 的时间线——你认为我们离 AGI 还有多远?我知道这不是你负责的领域,但外面有很多观点、很多预测。你觉得我们距离一个真正像人一样的 AI 还有多远,你怎么定义都行。
Alexander Embiricos: 对我来说,我觉得关键在于什么时候看到加速曲线开始这样走——我不知道我这边镜像对不对——什么时候看到那条 hockey stick 曲线。我认为当前的制约因素,或者说有很多制约因素,但一个被低估的制约因素,说到底是人类的打字速度,或者人类同时处理多个 prompt 的速度。就像你之前说的,你可以让一个 agent 观察你所有的工作,但如果你没有让 agent 同时验证它自己的工作,那你仍然受限于你是否能去审查所有那些代码。
所以我的观点是,我们需要解除这些生产力循环对人类的依赖——不再需要人类去发 prompt,不再需要人类手动验证所有工作。如果我们能重建系统,让 agent 默认就是有用的,我们就会开始解锁 hockey stick 式的增长。不过,我不认为这会是一个非此即彼的二元变化。我认为它会非常取决于你在构建什么。所以我能想象,明年,如果你是一家创业公司,正在构建新的东西——比如某个新应用——你将能够把它搭建在一个 agent 自主性很高的技术栈上。但假设,我不知道,你提到了 SAP,假设你在 SAP 工作,他们有大量复杂的系统,不可能一夜之间就让 agent 在那些系统里变得自主。所以他们需要慢慢地替换或升级系统,让 agent 能够端到端地处理更多工作。
所以基本上对你这个问题的长篇回答,也许是个无聊的回答——我认为从明年开始,我们会看到早期采用者的生产力开始出现 hockey stick 式增长。然后在接下来的几年里,我们会看到越来越大的公司也出现那种生产力的 hockey stick 增长。然后在这中间某个模糊的地带,那种 hockey stick 增长的效果会回流到 AI 实验室,那时候我们基本上就达到了 AGI 的层级。
Lenny Rachitsky: 我很喜欢这个回答。它非常务实,也是这个播客上经常出现的话题。就是把 AI 正在做的事情全部审查一遍的时间真的很烦人,而且是一个很大的瓶颈。很高兴你在做这件事,因为让编码变得更高效是一回事,但解决最后那一步——“好的,这东西到底好不好?“——是另一回事。而且你的判断是这才是制约因素,这很有意思。这也呼应了你之前说的:即使 AI 不再进步,随着我们学会更有效地使用它,还有巨大的潜力可以释放。这是一个非常独特的回答。我还没听过有人从这个角度谈什么是关键解锁点——人类的打字速度来审查 AI 为我们做的事情。非常好。
招聘与闪电问答环节
Lenny Rachitsky: 好的 Alexander,我们聊了很多。有没有什么没覆盖到的?有没有什么你想分享的、或者想再强调一下的,在我们进入非常精彩的闪电问答环节之前?
Alexander Embiricos: 我想提一件事:Codex 团队正在扩张。就像我刚才说的,我们仍然在一定程度上受限于人类的思考速度和打字速度。我们正在解决这个问题。所以如果你是工程师、销售人员,或者——我正在招产品经理——请联系我们。我不确定最好的联系方式是什么,但你可以去我们的招聘页面,或者他们有你的联系方式吗?听众有你的联系方式吗?
Lenny Rachitsky: 他们给我发消息说”嘿,我想申请 Codex”?我的确在 lennyrachitsky.com 上有一个联系表单。我怕会有很多厉害的人来联系我。不过就这样吧,我们可以试试看。看看效果怎么样。
Alexander Embiricos: 好的。或者更简单的方式,我们可以把那些剪掉,或者随你。对,或者我就直接说你可以给我们发 DM。比如我在 Twitter 上是 Embirico,如果你有兴趣加入团队,请联系我。
Lenny Rachitsky: 对很多人来说这简直是梦想中的工作。有没有什么筛选标准,让人们不至于淹没你的收件箱?
Alexander Embiricos: 具体来说,如果你想加入 Codex 团队,你需要是一个使用这些工具的技术人员。我觉得你应该问自己一个问题:“假设我要加入 OpenAI,在未来六个月内负责 Codex 并做到极致,那时候软件工程师的生活会是什么样子?“我觉得如果你对这个有自己的看法,就应该来申请。如果你没有自己的看法,需要先想一想,那取决于你需要想多久,我想这大概就是筛选标准。有很多人在思考这个领域,所以我们非常感兴趣的是那些已经在思考有了 agent 之后未来应该是什么样子的人。我们不需要在方向上完全一致,但我想我们想要的是对这个话题非常有热情的人。
Lenny Rachitsky: 能参与一个影响力如此之大、又处于技术最前沿的产品,这样的机会非常罕见。对合适的人来说,这简直是梦寐以求的工作。你们有这样一个开放岗位真是太棒了,而这个听众群体可能非常适合这个角色。希望能找到合适的人,那将太棒了。说到这里,我们已经进入了非常令人期待的快问快答环节。我有五个问题要问你,Alexander,准备好了吗?
快问快答环节
Alexander Embiricos: 我不知道会是什么问题,但我很期待。来吧。
Lenny Rachitsky: 这些问题我问每个人都是一样的,除了最后一个。所以可能不算惊喜。我也许应该让它们更有惊喜感一些。好吧,第一个问题:你会最常向别人推荐的几本书是什么?想个两三本?
Alexander Embiricos: 我最近读了很多科幻小说,我相信这个之前可能被推荐过,但《文化》系列,作者是 Iain Banks。我之所以喜欢它,部分原因是它基本上是比较近期写的关于 AI 未来的作品,而且是一个对 AI 持乐观态度的未来。我觉得很多科幻都是相当反乌托邦的。但在《文化》的 subreddit 上的笑话是——让我看看能不能说对——它是一个太空共产主义乌托邦,或者我觉得是同性恋太空共产主义乌托邦。我只是觉得这非常有意思,用《文化》来思考我们可以引领一个怎样的世界,以及我们今天可以做出什么决策来帮助引领那个世界。
Lenny Rachitsky: 哇,我觉得之前没人推荐过这本。我知道你正在读——你在录音开始前提到过——现在在读《指环王》。如果你想要另一本偏 AI 的科幻小说,你读过《深渊上的火》吗?
Alexander Embiricos: 没有,没读过。
Lenny Rachitsky: 好的,那本书极其出色。它就像一部科幻太空歌剧般的史诗故事,涉及超级智能。
Alexander Embiricos: 酷。
Lenny Rachitsky: 对。大部分不算乐观,但也有那么一点乐观。好吧,下一个问题。有没有一部你最近非常喜欢、特别喜欢的电影或电视剧?
Alexander Embiricos: 有的,有一部动漫叫《咒术回战》,我非常喜欢。同样,它的主题有点黑暗,是关于恶魔的。但我喜欢的地方在于主角非常善良。我觉得现在有一波新的动漫和动画片,主角都非常友好,是关心这个世界的人,而不是——如果你看一些开创这个类型的早期动漫,比如《新世纪福音战士》或者《阿基拉》,那些角色,主角都有很深的缺陷,很不快乐。他们不是开创这个类型的,但曾经有一段时间流行一种趋势,就是嘲讽这些动画片里主角非常年轻,却被赋予拯救世界的荒谬重担。所以有一波作品通过让角色在剧中经历严重的心理问题来批判这一点。我不是说这种更好,但至少拥有这些非常积极正面的、只想帮助身边所有人的主角,是相当有趣的。
Lenny Rachitsky: 我太喜欢从这些推荐中了解你的性格了。善良的主角,乐观的未来。我喜欢这种风格。
Alexander Embiricos: 我觉得如果你不相信它,你就无法把它变成现实。所以需要一种平衡。
Lenny Rachitsky: 这就是你的训练数据。有没有一个你最近发现并非常喜欢的产品?可以是 app,可以是某件衣服,可以是厨房小工具、科技产品,一顶帽子都行。
Alexander Embiricos: 有的,我一直对内燃机和汽车很感兴趣。实际上,我最初来美国的原因是因为我想从事美国的航空器工作,但现在我在做软件。所以在很长一段时间里,我基本上只有比较老的跑车,买老的只是因为更负担得起。然后最近我们换了一辆特斯拉。我不得不说,我觉得特斯拉的软件非常令人受启发。特别是它有自动驾驶功能。我今天提过好几次了,我觉得思考如何构建混合主导软件非常有意思——让你作为人类感到最大程度的赋能、最大程度的掌控,同时又得到大量帮助。我觉得他们在让汽车自动驾驶方面做得非常好,同时提供了各种不同的方式让你可以调整它的行为而不需要关闭自动驾驶。你可以加速,它会听从你的指令。你可以转动旋钮来改变它的速度。你可以稍微转动方向盘。我觉得这实际上是构建一个仍然让人类保持掌控的 agent 的大师级示范。
Lenny Rachitsky: 这让我想起 Nick Turley 的座右铭是”我们是否做到了最大加速?”
Alexander Embiricos: 是的。
Lenny Rachitsky: 感觉这个理念已经渗透到了 OpenAI 的一切之中,这很合理,完全说得通。还有两个问题。你有没有一个人生格言,在工作或生活中经常想起、常常回味的?
Alexander Embiricos: 我不知道算不算人生格言,但我可以告诉你我之前创业公司的第一条公司价值观。
Lenny Rachitsky: 太好了,请说。
Alexander Embiricos: 这个价值观至今仍然伴随我,就是:友善而坦诚。
Lenny Rachitsky: 完全符合你的风格。友善而坦诚。哇,这是一个很好的组合。
Alexander Embiricos: 对。我们必须把它们放在一起,因为作为创始人,我们发现我们经常表现得友善,但那其实并不是正确的做法。我们会推迟那些困难的对话,我们不够坦诚。所以每次我们提醒自己这个格言,然后就会变得更坦诚。然后六个月后,我们会意识到六个月前的我们其实并不够坦诚,我们需要更加坦诚。那么问题就变成了:“好吧,我们应该怎样坦诚?“答案就是:“好吧,让我们把坦诚视为一种友善的行为。“但不仅是在践行和鞭策自己去践行时要这样想,在我们如何向他人表达的时候也要这样想。
Lenny Rachitsky: 这是一个非常美好的总结如何做好领导力的方式。那本讲直接挑战但深切关怀的书叫什么来着?《彻底坦率》。
Alexander Embiricos: 对,对。
Lenny Rachitsky: 所以这就像是另一种理解《彻底坦率》的方式。
家族姓氏与诗人情怀
Lenny Rachitsky: 好的,最后一个问题。我之前查了一下你的姓氏,心想”嘿,这背后有什么故事?“你的姓氏是 Embiricos,我跟 ChatGPT 聊了一下,它告诉我这个姓氏中最著名的人物是极具影响力的希腊诗人和精神分析学家 Andreas Embiricos,以及他的亲属、富有的航运大亨兼艺术收藏家 George Embiricos。所以问题是,这两位你更认同哪一位——希腊诗人和精神分析学家,还是富有的航运大亨兼艺术收藏家?
Alexander Embiricos: 我想应该是那位诗人,因为他热爱我们家族来自的那座岛屿。
Lenny Rachitsky: 等等,你认识那些人?好吧,看来你早就知道了。
Alexander Embiricos: 嗯,那是个庞大的家族。而且是希腊家族,这种大家族里,每个人都是你的叔叔。
Lenny Rachitsky: 我喜欢这个。好的。
Alexander Embiricos: 你懂我的意思吧?我母亲是马来西亚人,在马来西亚也一样,所有人都是我的叔叔或阿姨,如果这说得通的话。
Lenny Rachitsky: 嗯。
Alexander Embiricos: 但没错,他热爱我们家族起源的那座岛屿。我不太确定那位航运大亨住在哪里,大概是纽约之类的吧。但总之,我们都来自一座叫 Andros 的岛屿,那是一个非常美丽的地方。岛上的牲畜比人还多,游客也不算多。但我觉得特别酷的是,他出版了很多作品,其中大量文字描写的就是那座岛屿的美,我觉得这真的非常棒。
Lenny Rachitsky: 哇,这个回答太棒了。
联系方式与未来展望
还有两个问题——如果大家想在网上关注你或者联系你,可以在哪里找到你?听众可以怎样帮到你?
Alexander Embiricos: 我是那种只为了工作才用社交媒体的人。我的手机晚上九点会自动变成黑白模式。嗯,所以 Twitter 或者说 X,我的账号是 @Embirico。另外,如果你在 r/Codex 发帖,我大概率会看到,可以去那里。
关于听众怎么帮到我,我想说请试试 Codex,请分享你的反馈,告诉我们需要改进什么。我们非常重视反馈。说实话,虽然增长很惊人,但我们仍处于非常早期的阶段,所以我们仍然高度关注每一条反馈,并希望永远如此。另外,如果你对构建编码 agent 的未来,以及更广泛的 agent 的未来感兴趣,欢迎通过我们的招聘网站申请,或者通过那些社交媒体私信我。
尾声
Lenny Rachitsky: Alexander,太棒了。我一直很喜欢认识从事 AI 工作的人,因为 AI 总让人觉得是一种很——我不知道——冷冰冰的、令人恐惧的、神秘的东西。但当你遇到构建这些工具的人,他们总是那么棒,尤其是你,真的很 nice。就像你分享的那些例子——乐观和善意——这正是我们希望看到的。我们希望由这样的人来构建将驱动未来的工具。所以非常感谢你做这期节目,很高兴认识你,非常感谢你的到来。
Alexander Embiricos: 谢谢你的邀请,这次很有趣。
Lenny Rachitsky: 非常感谢大家的收听。如果你觉得这期节目有价值,可以在 Apple Podcasts、Spotify 或你喜欢的播客应用上订阅。也请考虑给我们评分或留下评论,这真的能帮助更多听众发现这个播客。你可以在 lennyspodcast.com 找到所有往期节目或了解更多关于这个节目的信息。下期再见。
术语表
| 原文 | 中文 |
|---|---|
| agent | agent(智能体) |
| Akira | 《阿基拉》(日本动漫) |
| Alexander Embiricos | Alexander Embiricos(OpenAI Codex 产品负责人) |
| Andreas Embiricos | Andreas Embiricos(希腊诗人、精神分析学家,Alexander 的家族成员) |
| Andros | Andros(希腊岛屿,Embiricos 家族起源地) |
| assembly | 汇编语言 |
| Atlas | Atlas(OpenAI 浏览器产品) |
| B2B | B2B(企业对企业) |
| B2C | B2C(企业对消费者) |
| Ben | Ben(Atlas 引擎负责人) |
| chatter-driven development | 闲聊驱动开发(chatter-driven development) |
| compaction | compaction(上下文压缩) |
| compressing the talent stack | 压缩人才栈 |
| Dennis Yang | Dennis Yang |
| Dhanji | Dhanji(Block CTO) |
| dogfood | 内部试用(吃自己的狗粮,指公司内部使用自己的产品) |
| Ed Bayes | Ed Bayes |
| Evangelion | 《新世纪福音战士》(日本动漫) |
| Fire Upon the Deep | 《深渊上的火》(Vernor Vinge 的科幻小说) |
| George Embiricos | George Embiricos(希腊航运大亨、艺术收藏家,Alexander 的家族成员) |
| GitHub Copilot | GitHub Copilot |
| Goose | Goose(Block 内部 agent 产品) |
| GPT-5.1-Codex-Max | GPT-5.1-Codex-Max |
| harness | harness(运行框架) |
| Iain Banks | Iain Banks(《文化》系列作者) |
| Jujutsu Kaisen | 《咒术回战》(日本动漫) |
| Karpathy | Karpathy |
| Kevin Weil | Kevin Weil |
| Lenny Rachitsky | Lenny Rachitsky(播客主持人) |
| Michael Truell | Michael Truell(Cursor CEO) |
| mixed initiative | 混合主导(人机协作模式) |
| Nick Turley | Nick Turley |
| plan.md | plan.md(计划文件) |
| PME | PME(产品-管理-工程复合角色) |
| PowerShell | PowerShell(Windows 原生 Shell 语言) |
| prompt to patch | prompt to patch(从提示到补丁的代码生成模式) |
| Radical Candor | 《彻底坦率》(Kim Scott 的管理类书籍) |
| Scott Belsky | Scott Belsky |
| slop | 垃圾产出(AI 生成的低质量内容) |
| solopreneur | 独立创业者 |
| spec-driven development | spec 驱动开发(规格驱动开发) |
| Swift | Swift(苹果编程语言) |
| The Culture | 《文化》系列(Iain Banks 的科幻小说系列) |
| vibe coded | vibe coded(凭感觉/直觉编写代码) |
| vibe engineer | vibe engineer(凭感觉/直觉做工程) |
此文档由 AI 分片翻译(translate_long_document)