为什么专家撰写 AI 评估正在催生历史上增长最快的公司 | Brendan Foody
Why experts writing AI evals is creating the fastest-growing companies in history | Brendan Foody
Brendan Foody: The wealthiest companies in the world are willing to spend whatever it takes to improve model capabilities.
Meet Brendan Foody & Mercor
Lenny Rachitsky: We are entering the era of evals.
Brendan Foody: We started working with all of the top AI labs. What the labs need is labor marketplace. They actually need extraordinary professionals that can measure model capabilities.
The Age of AI Evals
Lenny Rachitsky: They found this pocket, maybe the biggest business opportunity in history.
Brendan Foody: We grew from 1 to 400 million in revenue run rate in 16 months, fastest ascent in history.
Market Landscape & Mercor’s Origins
Lenny Rachitsky: Why is this so valuable?
Brendan Foody: The market is bound by the amount of things where humans can do something that models can’t. The lab’s primary bottleneck to improve models is how they can effectively have some way of measuring what success looks like for the model.
Inside the Job Workflow
Lenny Rachitsky: There’s this tweet that you retweeted. “If you really think about it, we were put on Earth to create reinforcement learning training data for labs.”
Brendan Foody: It’s highly likely that the entire economy will become an aural environment machine, building out all of these worlds and contexts. And I think the narrative in AI over the last three years has almost entirely been one of job displacement, but very few companies and people have talked about this new category of jobs that’s being created.
The Broader Labor Market
Lenny Rachitsky: I talked to a lot of people about what should I be studying? Where should I be getting better?
Brendan Foody: How can they leverage this technology to do so much more? We’ll give people interviews where we say, “Use whatever tools are available to build a website and let’s see what product you’re able to build in an hour."
"We’re Here to Create Training Data”
Lenny Rachitsky: Today, my guest is Brendan Foody, CEO and co-founder of Mercor. Mercor is the fastest-growing company in history to go from 1 to 100 million at $2 billion valuation. Mercor, if you haven’t heard of them, helps AI labs and AI companies hire experts to help them train their models using AI. They’ve never had a customer churn, their net retention is over 1,600%, and they’re on a nine-figure revenue run rate.
In our conversation, we talk about the increasing value and importance of evals, the landscape of AI training companies like Mercor, and why they’ve become so important and valuable, how Brendan discovered this opportunity, his insights on what product market fit looks like, the core tenets he’s instilled within his organization that have allowed him to build the fastest growing company in history, what people writing evals for labs are actually doing day to day, which skills and jobs are going to last the longest with the rise of AI, why he doesn’t think we’ll see AGI or superintelligence anytime soon, and so much more. This episode is incredible. You need to hear this.
If you enjoy this podcast, don’t forget to subscribe and follow it in your favorite podcasting app or YouTube. It helps tremendously. Also, if you become an annual subscriber of my newsletter, you get 15 incredible products for free for one year, including Lovable, Replit, Bolt, N8N, Linear, Superhuman, Descript, Whisperflow, Gamma, Perplexity, Warp, Granola, Magic Patterns, Raycast, ChatPRD, and Mobbin. Check it out at lennysnewsletter.com and click Product Pass. With that, I bring you Brendan Foody.
WorkOS also recently acquired Warrant, the fine fine-grained authorization service. Warrant’s product is based on a groundbreaking authorization system called Zanzibar, which was originally designed for Google to power Google Docs and YouTube. This enables fast authorization checks at enormous scale while maintaining a flexible model that can be adapted to even the most complex use cases. If you’re currently looking to build role-based access control or other enterprise features like single sign-on, SCIM, or user management, you should consider WorkOS. It’s a drop-in replacement for auth zero and supports up to one million monthly active users for free. Check it out at workos.com to learn more. That’s workos.com.
You fell in love with building products for a reason, but sometimes the day-to-day reality is a little different than you imagined. Instead of dreaming up big ideas, talking to customers, and crafting a strategy, you’re drowning in spreadsheets and roadmap updates and you’re spending your days basically putting out fires. A better way is possible.
Introducing Jira Product Discovery, the new prioritization and roadmapping tool built for product teams by Atlassian. With Jira Product Discovery, you can gather all your product ideas and insights in one place and prioritize confidently, finally replacing those endless spreadsheets. Create and share custom product roadmaps with any stakeholder in seconds. And it’s all built on Jira, where your engineering team’s already working so true collaboration is finally possible. Great products are built by great teams, not just engineers. Sales, support, leadership, even Greg from finance, anyone that you want can contribute ideas, feedback, and insights in Jira Product Discovery for free. No catch. And it’s only $10 a month for you. Say goodbye to your spreadsheets and the never-ending alignment efforts. The old way of doing product management is over. Rediscover what’s possible with Jira Product Discovery. Try it for free at atlassian.com/lenny. That’s atlassian.com/lenny.
Brendan, thank you so much for being here. Welcome to the podcast.
Brendan Foody: Thank you so much for having me, Lenny. I’m a huge fan, and so excited to have a conversation.
Skills Still Worth Investing In
Lenny Rachitsky: I’m really excited to have this conversation as well. I’m a huge fan of yours. I’m excited for more people to learn about you and what you’re building.
I want to start with a tweet that you have pinned at the top of your Twitter feed right now, and here’s the tweet. “We are now working with six out of the Magnificent 7, all of the top five AI labs, most of the AI application layer companies. One trend is common across every customer. We are entering the era of evals.”
The reason this caught my attention is that’s one of the most recurring trends on this podcast, people talking about the increasing value of learning how to do evals well and the value of evals for companies. It feels like still people don’t know what the hell this is what we’re talking about, why this is so important. Talk about just what you think people are still missing, what they need to know, what this era of evals means.
Brendan Foody: If the model is the product, then the eval is the product requirement document. And the way that researchers’ day-to-day looks is that they’ll run dozens of experiments where they’ll make small improvements on an eval set. And reinforcement learning is becoming so effective that once they have an eval, they can help climb it. If you look at just how fast people were able to saturate Olympiad Math once they focused on it, how fast we’re even saturating SWE-bench once we focus on it. And so in many ways, the barrier to applying agents the entire economy to automate every workflow is how do we measure success? How do we eval it? And write the PRDs for everything that we want agents to do, which Mercor is obviously a huge part of doing.
Understanding Demand Elasticity
Lenny Rachitsky: So people hearing this, they’re like, “Oh, yeah. Okay, shit. I got to really pay attention to this eval stuff.” Any advice about learning how to do this well? What companies that are doing this well are doing differently? Help people get better at this thing.
Brendan Foody: Yeah. I think that for enterprises especially, the core way to think about it is how can they build a test or systematic way to measure how well AI automates their core value chain? So if it’s an architecture firm that’s producing these architecture diagrams of what they provide to their end customer, how can they effectively measure that? And each company has its own value chain or maybe a handful of them if it’s a multi-product company. And just thinking about how they measure that is the prerequisite to really effectively applying AI throughout their entire business.
Roles in Elastic Demand Industries
Lenny Rachitsky: I saw you talking about this on the No Priors podcast with Sarah and Elad, and I don’t know if it was after this or before this, but Sarah tweeted, “Evals equals your new marketing.” What does that mean? What do you think she’s saying there?
Brendan Foody: Yeah. Well, it ties to what I said earlier about how if the model is the product, evals are the PRD, but also subsequently the sales collateral, right? Because evals are what you give to researchers to show them what they should be building and going on, but they’re also the way that you demonstrate the efficacy of capabilities.
And historically, everyone’s been pointing to these academic evals of PhD level reasoning with GPQA, Humanity’s Last Exam, or Olympiad Math, but now it’s moving towards the capabilities that people practically care about of how do we get models to automate the way that we build a software platform or automate the way that we do an investment banking analysis. And I think labs as well as application layer companies will increasingly use evals to demonstrate the capabilities of their models and their products.
AI Tools Are Superpowers
Lenny Rachitsky: Okay. So let’s build on this and zoom out a little bit and talk about the landscape of the market that you’re in. And I was just reflecting on this as I was preparing for this conversation. If you think about the companies growing faster than any company’s ever grown in history, there’s essentially three buckets. There’s the foundational model companies, there’s vibe coding apps, Cursor and Loveable and Bolt and Replit and all these here, and then there’s data labeling data companies like you. So I’ve had the CEO of Handshake on the podcast. I have the CEO of Scale coming on. There’s also Surge. There’s you guys. Help us just understand the landscape of what this is all about because I think people don’t really know what the hell’s going on and see all these companies growing like crazy.
Brendan Foody: Yeah, I’ll give a little bit of the origin story, incorporate in that and how it frames the landscape. Because when we started the company, I met my co-founders when we were 14 years old. We started the company together when we were 19 initially, in January 2023, initially hiring people internationally, matching them with our friends and automating all the processes of how we did that. So similar to how a human would review a resume, conduct an interview, and decided to hire. We automated all of those processes with LLMs, bootstrap the company to a million dollar revenue run rate before we dropped out of college.
And then a handful of other things happened, but we met OpenAI and we saw that there was this enormous transition in the human data market where it was moving away from this crowdsourcing problem of how do you find low and medium skilled people that can write barely grammatically correct sentences for early versions of LLMs and moving towards this sourcing and vetting problem. How do we source and assess the best professionals, the experienced? Think software engineers, the investment bankers and doctors and lawyers that can actually help to evaluate and interpret all of the capabilities that people want models to have.
So from there, we start working with all of the top AI labs. We grew from 1 to 400 million in revenue run rate in 16 months, and it’s been an extraordinary journey and super exciting.
The Future of the Labor Market
Lenny Rachitsky: Okay. First of all, that is out of control. I don’t know if people understand. I think this is the first time you’re sharing that number. I know we’re recording this, you’ll have announced it by now, but 1 to 400 million in revenue in 16 months.
Brendan Foody: Exactly. So fastest ascent in history, which is an exciting statistic we’re very proud of.
How Models Learn Expert Knowledge
Lenny Rachitsky: Okay. So something big is happening here. Why is this so valuable? What is going on here? So it’s just to try to summarize what you guys do simply is you help hire people for labs to help them train their models, and you help them find not just generalist labor, but experts, helping them with very specific gaps in the model’s knowledge.
Brendan Foody: Yeah, precisely. And so it really ties to your first question around the era of evals that’s framing all of this, which is that the lab’s primary bottleneck to being able to improve models is how they can effectively have some way of measuring what success looks like for the model, both to use it as the eval for the tests that they’re measuring their progress against, as well as the verifiers in an RL environment to then reward the model, improve capabilities, et cetera. And they need this across every domain for every capability that models don’t know how to use. And the wealthiest companies in the world are willing to spend whatever it takes to improve model capabilities where Mercor is sitting at the forefront and the primary bottleneck.
Post-Training Scale & Competition
Lenny Rachitsky: Okay, what are these people actually doing? So what’s an example of a kind of person that is sought after? And then what are they doing sitting there at the computer?
Brendan Foody: Effectively, the market is bound by the amount of things where humans can do something that models can’t. So I’ll make that very concrete. Say you have a model that you want to write a red line for a contract in the way that a lawyer would, and it makes a handful of mistakes, misses a bunch of key points in doing so. What you could do is have a lawyer create a rubric similar to how a professor might create a rubric to create a deliverable for what are the things we want the model to be able to do?
So it can effectively score that, right? Plus however much of it identifies this or XYZ key point. And that’s really the foundation to measuring what does progress look like for models? Is this model achieving the capabilities that these professionals want? As well as how do we use this as training data to reward and to reinforce a lot of the capabilities that people want models to achieve.
Expert Types & Creative Capacity
Lenny Rachitsky: Okay, so they’re essentially writing evals just to connect it back to original conversation.
Brendan Foody: Exactly. Well, that’s an interesting thing is everyone talks about RL environment. I feel like the two hot button things are like RL environments and evals, but one thing like Andrej Karpathy’s tweeted out about a bunch is there’s not actually a nuance. It’s in the data type. It’s more just a different semantic way of describing what it’s being used for. But ultimately, it’s just some stasis point for how do you measure what good looks like? And you can use that either as the benchmark to the sales collateral, as Sarah was saying, to say, here is why are models the best model in the world and here’s the capabilities that we’ve been working towards, or you can use it on the post-training side to reward certain model trajectories and achieve those capabilities.
How Experts Work
Lenny Rachitsky: Okay. So say this lawyer, this person is writing, “Here’s what a great red line contract looks like and here’s the rubric of what excellent is.” Then are they also providing data, like actual examples of red line documents as a part of that?
Brendan Foody: They may. The data landscape historically has included two kinds of data. The first is supervised fine-tuning data, which is input/output. When people think about fine-tuning in the historical sense, that’s what it is. The second is RLHF where the model will generate a couple of examples. We’ll choose which is the most popular example.
What everyone is generally moving towards is reinforcement learning from AI feedback instead of human feedback where you have instead the human defined some sort of success criteria, some way to measure that. And examples in code, it could be a unit test. We can scalably measure success and other domains that could be a rubric. And then you use that to incentivize model capabilities. And it’s far more scalable and data-efficient, and so that’s why a lot of the broader trend in the market across the board is moving towards RLHF to both eval models as well as improved capabilities.
Why Anthropic Leads in Coding
Lenny Rachitsky: I had one of the co-founders of Anthropic on. He said exactly the same thing. That’s what they’ve done at Anthropic, is move towards AI-driven reinforcement learning.
So essentially, if I can understand this correctly, I’m the lay person here trying to understand this on behalf of the audience. So essentially a lawyer is like, “Here’s what correct looks like for redlining,” and then it’s AI is just on its own almost, just like, “I’m going to try to get this. I’m going to try to improve on this and I know if I’m heading the right direction based on this eval/rubric I’ve been given.”
Brendan Foody: Exactly. Applying all of the criteria of what good looks like similar to how the TA might apply the professor’s criteria of does the student’s response meet this criteria or this criteria plus however many planes, et cetera.
Mercor’s Growth Secret
Lenny Rachitsky: Awesome, okay. Let me shift to talking about the broader labor market here. So there’s two parts to this question as we talk about this. One is just how long will we need to do this? You guys grew so incredibly fast. Is there a point of like, “Okay, we don’t need humans. We’re tapped out.” So let’s start there and then I’ll ask a broader question.
Brendan Foody: So the key question is how long there’s going to be things in the economy that humans can do that AI can’t do? And I think there’s certainly a bucket of people that say we’re going to have superintelligence within three years and humans won’t play a role in the economy. And that’s one school of thought.
Our perspective is very different. Our perspective is that these models are extraordinary and automating a lot of things very quickly, but there’s a lot of things that they’re horrible at. Even still, it can’t schedule time on my calendar. It can’t draft emails for me. It can’t use basic tools. And we need evals for everything. For everything that the models can’t do, we need evals for the tool use, evals for the long horizon reasoning.
Imagine in 10 years when we want models to be able to go out and build a startup for 30 days. We need evals for that to effectively reward it. And I think that that road to improving models will last for as long as there is anything in the economy that humans can do which models can’t and be a huge portion of what the future of work looks like. And so our mission is creating the future of work, and I think that this is a really exciting industry and giving us a glimpse into the direction that everything is headed towards.
Spotting the Opportunity
Lenny Rachitsky: There’s this tweet that you retweeted that I want to ask you about. “If you really think about it, we were put on Earth to create reinforcement learning training data for labs.”
Brendan Foody: Yeah.
Three Values Behind Success
Lenny Rachitsky: What does that mean to you? What is this person implying? And it’s basically what you’re saying is we’re just helping train models.
Brendan Foody: It speaks to conversations I’ve had with a lot of researchers and executives at top labs, which is that it’s highly likely that the entire economy will become an aural environment machine, building out all of these worlds and contexts for us to then have rubrics or other kinds of verifiers. And that is really exciting in so many ways.
Because I think let’s draw an analog to other revolutions where when we had the industrial revolution, everyone was freaking out about losing their jobs, but there was this whole new class of jobs of how do we build the machines? How do we have knowledge work? How do we create everything new? And I think that the narrative in AI over the last three years has almost entirely been one of job displacement, right? Sure, there’s ChatGPT is growing fast and it’s very cool that everyone loves using it, but from an economic standpoint, people talking a lot about job displacement. But very few companies and people have talked about this new category of jobs that’s being created and what that’s going to mean and how people can prepare and upskill for that. And I think that the most exciting thing possible is creating that future of how do humans fit into the economy and how will that evolve over time?
Slow vs Fast Hiring Trade-offs
Lenny Rachitsky: I talk to a lot of people about just what should I be studying? Where should I be getting better? People in school right now are just like, “What is even going to be valuable in the future?” You’re at the center of a lot of just what jobs are most in demand, how hiring is evolving. So let me just ask you a very concrete question. What jobs do you think will remain in the future/what skills are still worth investing in for younger people, especially?
Brendan Foody: In terms of jobs, I would respond with a category of things that have very elastic demand are going to be super exciting. Because when we make people 10 times more productive, we’ll build 10 times, if not 100 times as much software as an example. And so I think the product managers that can now do so much more are going to be extremely well-positioned. And so far as the skills, I think it’s people that can leverage AI to do whatever their day-to-day workflows are.
I have had a couple conversations with teachers where they get my thoughts on how they should be assessing their students because we originally started out curating all of these AI interviews and assessments for people and have thought about this immensely. And what we realized is that you don’t want to fight against them using the models. It’s similar to when the calculator came out, you don’t want to give people all of this arithmetic work of how do you get them to do it and not use the calculator. You want to tell them, “Use the tools and let’s see what you can do.”
And so we’ll give people interviews where we say, “Use ChatGPT and Kodak. Use Claude code. Use whatever tool cursor and whatever tools are available to build a website and let’s see what product you’re able to build in an hour.” And so I think that I give that an example in so far as talent assessment because I think it pertains also to the skills that people should be honing in on of how can they leverage this technology to do so much more in whatever industry or vertical they’re operating in.
A CEO’s Time Allocation
Lenny Rachitsky: When you talk about elastic, being elastic, is it generalists being good at just a bunch of different things, or what do you say? What do you mean when you think elastic?
Brendan Foody: So I more mean how much capacity for demand there is in that industry. So I’ll give a couple of examples. In accounting, I think realistically we only need so much accounting in the world. Maybe there’s areas where we can do more and that’ll be good, but it doesn’t feel like the world needs 100 times more accounting.
On the other hand, in software development, I think we can ship 100 times more features for our products, move 100 times faster, build so much more. There’s just it feels like there’s unlimited demand for the industry. And I think Mark Andreessen tweeted about this recently, that software is the most elastic industry of all where when we increase productivity, there’s so much more that will be built. And it’s definitely characteristic of a lot of other domains as well. And so I would focus on those domains where if we make everyone 10 times more productive, that’ll increase demand, not reduce it.
Startup Past: The Donut Dynasty
Lenny Rachitsky: Okay. So you’re in the bucket of learn to code, still useful as a skill. You take computer science. And so in terms of elastic categories of jobs, sounds like engineering, product management is in that bucket. Great. A lot of people listening to this are PMs. What else, like design users? I don’t know. What else do you feel is in that bucket from what you’ve seen?
Brendan Foody: Yeah, I think that there’s a lot of things where the whole value chain of building companies has a lot of these variable costs, even large portions of operations or consulting. Imagine if we could have 10 times as many McKinsey consultants, what would be possible in so far as the research we could do, the analysis, et cetera. But I think the companies and people that are going to succeed are those that lean into this narrative of abundance of how do we do so much more rather than fighting back against it of how do we try to stop displacement.
The Story Behind Mercor’s Name
Lenny Rachitsky: So along those lines, I think about your second bucket, which is the people that will be most successful. It’s not like a specific skill, but it’s being good with AI, using AI to become better at what you’re already doing. This reminds me of Elon’s whole thing with Neuralink, which I don’t know if this is how we put it, but the way I’ve always heard it is you wanted to build Neuralink because in the future when AGI and superintelligence is around, we need a way to compete and the best way to compete is plug our brains into a superintelligence so we have a chance. And it feels like that’s what AI is. Getting good at AI tools is essentially is having this super superpower.
Brendan Foody: Figuring out how to leverage them and incorporate it will definitely be of paramount importance.
What’s Next for Model Progress
Lenny Rachitsky: It just comes back to this almost cliche quote now. It’s, “AI won’t replace you. People that are really good with AI will replace you.”
Brendan Foody: I think it’s totally spot on. And I’ve definitely seen this at the enterprise level as well where there’s certain enterprises we talk to that are almost fearful not wanting to engage, not wanting to eval their businesses because that’ll provide the evidence that their value chain is being automated. And there’s others that… Literally some of the most recognized sophisticated Fortune 500 businesses that have this mentality and there’s others that are leaning into it of if we have the ability to do 10 or 100 times more, what will that mean and how do we lean into that future? Because there’s so many things that are going to change over the next 10 years, and I think those are the kinds of businesses that are going to be successful.
Scaling Laws & Model Intelligence
Lenny Rachitsky: Let’s talk about labor markets more broadly. You guys, so it’s interesting though. You started not feeding people to AI labs, not training models. It was just like help people find jobs, help companies hire, and then you’re like, “Oh wow, this whole opportunity.” You have this really interesting view on the future of just labor markets and hiring. Talk about that.
Brendan Foody: Yeah, it’s interesting. I remember when we started the company, as I mentioned, we were 19, and just had this gut intuition that it felt so wildly inefficient that labor markets are so disaggregated. And what I mean by that is when we would hire someone internationally, they would apply to a dozen jobs. When we as a company in the Bay Area were considering candidates, we would consider a fraction of a percent of candidates that were available in the market. And the reason for that is that there was this matching problem that everyone’s solving manually where they’ll manually review resumes, they’ll manually conduct interviews, and manually decide who to hire. But when we’re able to automate that matching problem at the cost of software, it makes way for this global unified labor market that every candidate applies to and every company hires from facilitating a perfect flow of information in the economy.
And I think that that future is undoubtedly what we’re heading towards, but what we’ve realized over time is that the nature of work is also changing dramatically. And part of building that future over a 10-year time horizon is creating that future of work and all of the more tactical things we do and building these incredible data sets across evals and RL environments for our customers.
AI Corner: Personal AI Usage
Lenny Rachitsky: What I’ve seen in how hiring has changed, I’m doing research on this with a partner, Gnome, it’s so much easier to apply for companies that everyone’s just applying now, to hundreds of companies. AI is just making it easy to adjust their resumes and cover letters and make it feel like, “Oh, I applied to more of course very specifically, but it was one of 100 places.” And then on the flip side, hiring managers are getting flooded with applications and so now they need AI to filter. So even if we didn’t want to get to this place, we’re almost being pushed into this direction of so much volume on both sides. We need something really smart at filtering and helping us hire and select, and this is exactly what you guys have been building for a long time.
Brendan Foody: Precisely, yeah. And the fascinating thing a lot of people ask, do we think about ourselves as a labor marketplace or do we think about ourselves as a data company? And I think that the reason it’s an interesting question is our realization on from what the labs need is that they actually need a labor marketplace. They actually need these exceptionally high caliber people. And of course we’ll layer on some project management and some software platform associated with it. But the really core thing that they want is how do they find these extraordinary professionals across all of these different domains that can measure model capabilities and work to build that future work together?
Quick Fire Q&A
Lenny Rachitsky:
What makes Enterpret unique is its ability to build and update a customer-specific knowledge graph that provides the most granular and accurate categorization of all customer feedback and connects that customer feedback to critical metrics like revenue and CSAT. If modernizing your voice of customer program to a generational upgrade is a 2025 priority, customer-centric industry leaders like Canva, Notion, Perplexity, and Linear, reach out to the team at enterpret.com/lenny. That’s E-N-T-E-R-P-R-E-T.com/lenny.
Going back to just how this all works and what you guys do for models, I was talking to a friend who had an ankle sprain or his foot was hurting and he got an x-ray and he fed the x-ray into ChatGPT and then asked him, “Give me this specific x-ray.” And it’s like, “Okay, sure.” And then it gave him, “Here’s what you have.” And he was talking to me, he’s like, “What is out there on the internet that trained this model to know this stuff?” And I was like, “No, it’s actually somebody sitting there helping the model understand this. Once they recognize, it doesn’t fully understand this. Humans are actually helping them learn these things.”
Brendan Foody: Exactly. Well, so the way it works, at least what most people’s understanding is there’s a lot of complexity in how the models work, is that pre-training gets a lot of the knowledge into the model of what are all the different things that see into the world. And then post-training and reinforcement learning is for all of the reasoning of what are the pieces of knowledge that are accurate, what are inaccurate, and what to prioritize at any given time to make a decision. And so behind that, there would’ve been radiologists that worked on the post-training data set to create some stasis point for here’s the diagnosis and rewards and penalties associated with it. And it’s really the quality of those people that went into the quality of the decision and recommendation that ChatGPT ultimately made.
Lenny Rachitsky: So let’s actually follow that, right, because that’s really interesting and I don’t know how many people understand it. I understand it. So the work that you do and these experts do is post-training. It’s not feeding data into the model that it’s trained on. It’s, “We have this model GPT-5. Now here’s all the things that’s missing. Let’s add to it.”
Brendan Foody: Exactly, yeah. It’s really unlocking, allowing the model to focus on all the right tokens, from pre-training all the right things in model context, up weighting the effective reasoning chains to enable the models to reason better in a more generalized way.
Lenny Rachitsky: What’s the scale of people just working on the stuff. It’s like thousands, tens of thousands, hundreds of thousands?
Brendan Foody: Tens of thousands at any given time, hundreds of thousands more generally. It’s huge. And the most exciting thing is that it’s growing really quickly. I think that to your question also about the competitive landscape, historically there were all these crowdsourcing companies that would get these super high volumes of low-skilled people. I think Scale and Surge were the primary companies that pioneered that industry. And then in this transition to higher-skilled labor, what people realized is that actually you can go a lot further with just getting higher caliber people even in smaller amounts initially, and now subsequently scaling that back up once they’re able to meet the quality bar.
And I think that there’s a bunch of companies that after our success and very rapid revenue growth that started early last year have chased after that, which makes sense. And seeing that the market was changing very quickly, we were taking off, and trying to pursue a similar thesis on the market.
Lenny Rachitsky: It’s interesting. There’s always been these companies, AlphaSights and GLG, that did this before AI or is paid to connect to an expert and ask them questions about stuff. And essentially, okay, it turns out this is really useful for models. We don’t need the person in the middle.
Brendan Foody: Exactly, yeah. Well, but one core difference is that AlphaSights would generally be a one-off call versus a lot of our work is really hiring people for projects of how do they work on something for a longer period of time. And so that’s, I think, one of the reasons that some of the traditional expert networks have struggled to get into this. And also how do you retain those people and think about all the incentives where it actually looks more similar in some ways to one of the traditional labor marketplaces of an Uber or DoorDash, just with much higher-skilled talent that’s treated exceptionally well?
Lenny Rachitsky: It’s such a good opportunity for me to learn so much about this, so I’m going to ask questions.
Brendan Foody: Yeah.
Lenny Rachitsky: It’s so interesting to me. How much of the experts are focused on specific concrete knowledge versus personality and softer skills? How much of it’s like, “Here’s how you do an exam. Here’s how you do an x-ray”?
Brendan Foody: It depends on the lab. It’s a lot of both. I think that previously it might’ve been more softer skills, but now a lot of the labs are focused on their business models of what are the economically valuable capabilities that drive revenue and leaning a lot into these professional domains. But I think the creative side is also still really important to everyone. And so we’re seeing a meaningful amount of both. We hired all the people from the Harvard Lampoon a couple of months ago, their comedy club, to help with making models funnier. And so do all sorts of stuff like that, hiring Emmy award-winning screenwriters and everything across the board on creative capabilities that you’d look for.
Lenny Rachitsky: That is amazing. What a cool story. I’m excited for this to kick in. How fast do these things turn around? Say you hired this team, how fast are we going to see the impact potential? Is like months? Is it years?
Brendan Foody: Well, so it depends, because some models or some labs will release iteratively where they’ll just improve the model behind the scenes.
Lenny Rachitsky: Without announcing a new model?
Brendan Foody: Exactly. Every couple of weeks versus others do these big releases. And so it depends a lot. We’re behind all of them, but we move really fast. It would be a customer gives us a request of we need these award-winning screenwriters, and within 24 hours we’ll turn around the experts. And then there’s also this really interesting dynamic where in a set of 100 people that we hire, oftentimes the top 10% of people will drive majority of the model improvement. It’s like a company. If you have 100-person company, oftentimes the top 10% of the company will drive majority of the impact. And what that means is that when we’re able to build proprietary advantages in identifying who are those top 10% of people, both in so far as how do we have them on our platform but also identify and match them effectively, it creates so much value for customers that it’s difficult to compete against.
And so it really does tie back to the founding thesis of the company, which is how do we find these extraordinary people and identify them so that we can reliably deliver these top 10% or top 10X experiences for our customers.
Lenny Rachitsky: So on that, so is the idea, you hire Jane. She’s incredible at coding and she now works for Anthropic and that’s her full-time job doing this? Or is this a part-time thing? Is this a project thing mostly?
Brendan Foody: It would sometimes be part-time. Sometimes it would be full-time. I would say most often it’s part-time where it’s like someone might work at a thing company where they’re underemployed, maybe one of the ones that’s moving slower where they have an extra 20 hours a week and then they’re able to do this on the side or whatever the equivalent is across a bunch of different industries. But we also do a lot of 40 hour a week roles as well.
Lenny Rachitsky: And how much are they making? Is it meaningful enough for a AI engineer to spend time on this?
Brendan Foody: Yeah, very meaningful. So our median pay rate in the marketplace is 500 an hour based on the depth of someone’s expertise. And one thing that highlights this difference relative to a lot of the crowdsourcing companies is if you look at the economics of the crowdsourcing companies, oftentimes they would pay 30 now versus the Goldman bankers, the McKinsey analysts, the Fang software engineers. And ultimately it comes down to what are the capabilities that labs want their models to have? And it much more falls in the latter bucket than the former one.
Lenny Rachitsky: I know there’s only so much you can talk about with this stuff, but so Anthropic, Claude has been so good at coding so much better historically than other models. I also use it for writing, giving feedback on writing. What is it that allowed them to get so good at this and continue to be so good at this?
Brendan Foody: Well, I can’t go too much into detail about customer work, but I think that it’s this trend of reinforcement learning and being very thoughtful about defining the right rewards that we’re releasing across the board. And how we could mitigate reward hacking, set up the right rewards, that’s super impactful.
Lenny Rachitsky: Evals. Again, evals is all you need.
Brendan Foody: Back to evals.
Lenny Rachitsky: Yeah.
Brendan Foody: One of my favorite quotes from customers is that, “Models are only as good as their evals,” which has always held true.
Lenny Rachitsky: I think Greg Brockman tweeted this once. “Evals are all you need.”
Brendan Foody: Yeah, truly.
Lenny Rachitsky: Let’s talk about Mercor a little bit more. One of the maybe, not even maybe, I believe the data tells us it’s the fastest growing company in history.
Brendan Foody: Yeah.
Lenny Rachitsky: I want to understand what you did to make this happen. So let me just ask, what do you think are some of the core tenets of how you built Mercor that most contributed to being this successful?
Brendan Foody: I think the most important thing is looking at the leading indicators in fast-moving markets. I remember when I used to think… Everyone in venture talks about the why now, and I used to think about the why now of how from a product standpoint, less from a market standpoint of now we can automate the way that we review resumes or the way that we conduct interviews, et cetera. But ultimately there is this legacy market that’s has all these incumbents and it’s relatively stagnant. But what matters a ton is actually figuring out what are the new markets, the new pockets of demand that are changing very quickly where the wealthiest customers in the world are willing to pay whatever it takes to improve model capabilities, and how do we focus on the leading indicators of those markets to make sure that we have the best solution for the flagship customers in the market and optimize everything around that.
And that’s what I found has been most impactful in building the business. I think maybe that’s one thing is leading indicators in markets. If I had to choose another, it’s customer obsession. We have had for the last… We’re starting to have a couple of product managers help out with go-to-market, but for the last year and a half of the business, we’ve had no one in sales and marketing. And so we’re immature from a sales and marketing standpoint because we focused 100% of company resources on how do we build great products and experiences for our customers. Just getting word of mouth, the people that have worked with us at other businesses want to keep working with us and leaning into creating those great experiences. And so that’s where I spend all my time. And I think that some founders can get caught up in how do they get really good at marketing before they’ve figured out the thing that really drives a lot of customer love and creates the six-star experiences that you’re used to building.
Lenny Rachitsky: I’m going to go back to that first point, which is like, okay, you found this pocket, maybe the biggest business opportunity in history. How did you first find… What was that moment of, “Wait, this could be really big”?
Brendan Foody: So there’s some crazy stories here. I remember we started the company as I mentioned in January 2023. And then in August 2023 when I was still in college, one of our customers introduced us to the co-founders of xAI over a Zoom call saying how we had these really smart Indian software engineers that were great at math and coding. So we met them and we explained how the software engineers we had were really good at math and coding because they weren’t distracted by all the humanities. They didn’t have to study history and English and all these other things, and they loved it. So they had us in two days later to the Tesla office and we met the entire xAI co-founding team except for Elon, while I was still a college student. And xAI was just getting started at that point and they were super excited about our focus on the quality of the experts.
And so while they were still doing pre-training, they weren’t ready for human data at the time and we didn’t start working with them at that point. We just knew from that point forward before we even dropped out that the market was about to change radically and we needed to be at the frontier of that. And so then fast-forward a few months, one of the crowdsourcing players came to us and actually used our platform to hire over 1,000 people where this is very interesting experience because we started getting flooded with support tickets about how those people weren’t getting paid. And we obviously felt horrible because we had referred them to this opportunity. It was this reputable company. And we realized that a lot of the incumbents were resting on their laurels with respect to what was needed in the experiences they were creating for talent in their marketplaces to help improve models. And there was this opportunity to work directly with the labs in a way that kept the dignity of the experts in the marketplace, paid them extremely well, and cut out the middlemen.
And so we started doing that in May of last year, and then the rest is history.
Lenny Rachitsky: Wow, okay. Hundreds of millions of dollars in revenue since. So what I’m hearing here is you were very open to looking for poll. You saw some poll, you explored it. And then once you saw that there was something really meaningful there, you just went deep on making that an incredible experience as amazing as possible.
Brendan Foody: Exactly. I think if I had to distill it into advice for founders, one thing I’ve realized is that I spent a lot of time trying forced product-market fit. And in some ways you should be persistent. You should have these theses that you have conviction about how the world will change. But sometimes you just need to sheer it from the market and know that it’s there, the poll, to know the right places to focus. Because if it’s difficult to sell, if it’s extremely difficult to sell the marginal customer, you’re not going to be able to grow a huge business. What you actually need to find is the customer that’s surprisingly easy to sell into where you’re going to be able to grow with them. You know that it’s a large pain point. And so it’s some combination of being stubborn with respect to your thesis around how the world will change, but also very open-minded with respect to exactly what form that takes and how the market’s developing and how your company will fit into it.
Lenny Rachitsky: That’s an amazing insight. In the moments you described, felt like it was a combination of this xAI meeting feeling like, “Oh wow, they really, really want this thing that we have. We’re now doing an amazing job,” and then it’s 1,000 people hiring in the platform. Was that those two moments that are like, “Wow”?
Brendan Foody: Exactly. And those happened, keep in mind, while we were a seed company, right? Well, so the first one was before we even raised any seed funding, we were totally bootstrapped because we bootstrapped the company to a million dollar revenue run rate and have always remained super capital-efficient. We’ve never burned money. We were lifetime profitable. And then we raised our seed round in September from General Catalyst, and it was the other experience after we raised our seed round where we really knew that there was an enormous amount of demand in this market where we saw the volume and we saw that the incumbents were sleeping with respect to how the market was changing and the kinds of people that were needed to make that change happen.
Lenny Rachitsky: It’s one thing to see this opportunity and start to execute on it. It’s another to actually succeed at this scale and consistently win. You guys have very specific values within the business. Talk about those. It feels like that’s a big part of your success too.
Brendan Foody: It totally is. So I’ll give the three and maybe a brief story associated with each of them.
So the first one is having a can-do attitude, which everyone gives me a little bit of a hard time for because it’s a funny saying, but we’ve always set these ridiculously ambitious goals, and then somehow the trajectory of the company forms around those goals. Where I remember when we were talking to Benchmark before they led our Series A, we were at 1.5 million in run rate. And I said we’d be at 50 million in run rate by the end of the year. And they said we were absolutely insane, right, as anyone would. And plus or minus two weeks, we hit it. And then we’ve now well blown past the tracking to 500 million in run rate, which was initially our goal for this year. So setting these incredibly ambitious goals with respect to the revenue scale of the business, the caliber of experiences for talent, all those dimensions is super important to first have a can-do attitude.
The second thing is really high standards, which is who we hire and what we expect of them. We have an incredibly high hiring bar where we hire tons of former founders, people that have incredible experiences. We just hired or partnered with Sundeep Jain who joined us as president. He was previously the chief product officer and chief technology officer at Uber and joined our relatively small in the grand scheme of things company to help scale up all the processes where Uber is of course the largest labor marketplace in the world. So super high standards is of paramount importance.
And then the third one that we really lean on significantly is intensity. And that if you look at the early cultures of the legendary companies, thinking of Meta or Google, they have these incredible, intense early-stage cultures of people just moving heaven and earth and doing whatever it takes to push the frontier of model capabilities. And so still very much output-oriented of what do people achieve rather than input-oriented of the specific hours they work, but recognizing that it takes a lot to build a legendary business, and that’s ultimately what we’re optimizing for.
Lenny Rachitsky: I could see why this works. Can-do attitude plus high standards plus intensity, I could see how that leads to success. There’s a lot of talk these days about this 6-9-9 culture, working six days a week, 9:00 AM to 9:00 PM. A lot of people are like, “Why? That’s terrible. Why would you make people do that?” But at the same time, I’m just constantly hearing this from the most successful AI companies. This is just the way it is to be successful. Things are moving so fast. This is an opportunity you’ll never see again. Just talk about your thoughts on that.
Brendan Foody: Yeah. Well, to clarify, we’ve never mandated hours. It’s more been a byproduct of people that care a lot where we care a lot about the trajectory of the business. And so a lot of people come into the office and stay late. But if they need to leave early and get dinner with their kids or travel on the weekend, of course that’s totally fine. And for us, it’s much more about finding people who have a lot of ownership and are really bought in, less so about the specific hours in the office, even though we found that oftentimes it’s the people that are most bought in, not always, but oftentimes it’s the people that are most bought in and that burn the midnight oil with us.
Lenny Rachitsky: When you say high standards, is there something you could share that gives us an example of what you mean there? Because a lot of people think they have high standards and they don’t.
Brendan Foody: If you are very patient, there’s always some trade-off between speed and quality when hiring. And I remember especially for our first 10 people, we were just so patient and disciplined about finding some of the best people in the world. Half of them are… Our second employee, Sid, as an example, our second employee in the US, Sid was previously the head of growth at Scale who joined us when we were a seed stage company. Daniel who joined us was previously scaled to consumer apps to over 100,000 users and all sorts of just extraordinary backgrounds of our first 10 hires. And I think that that initial talent density shaped so much of what the rest of the org looks like as you scale it up.
Lenny Rachitsky: I know you also have this perspective that people talk about waiting to hire, to hire really slowly, but it’s actually not necessarily the right advice. Talk about that.
Brendan Foody: It’s painful because it’s a double-edged sword. On one hand, I’m thrilled that our first 10 people are so phenomenal and I think that that has paid dividends for the business. But on the other hand, I think that companies do get to the point where you just need to hire really fast. And there’s some things where you need a lot of people to do them and you need to recognize that there’s going to be some variants associated with hiring, but moving quickly is the priority.
And I think that in some ways, we move too slowly with how we scaled out the team. And so the benefit is that everyone is extraordinary. We have this super high bar and we want to maintain that over time. But I think the downside is that while the company has grown incredibly quickly, we likely could have grown even faster if we had moved a little bit more quickly with especially ramping from call, like 10 to 100 people.
Lenny Rachitsky: Okay, I was going to ask. So it sounds like the first 10, be very careful, take your time, 10 to 100, maybe speed up a bit.
Brendan Foody: Yes, though I wouldn’t say it’s necessarily 10. It’s determined by the point where you know it’s really working. And I know that’s still not a bright line, but it’s like once you know that there’s so much more demand than you can handle, that’s when you want to step on the gas and optimize for speed in a lot of ways. But I think especially until then, it’s important to be patient, be disciplined. Get the best people is always important, but speed becomes more important once you find the market opportunity, the market vacuum.
Lenny Rachitsky: I know you’ve started a couple companies in the past, much smaller scale. In this new role as CEO of this massive hyper growth company, what surprised you most about where you spend the time most or just what the role involves? Because a lot of people want to start companies dream about being in your shoes. What are they maybe not understanding about where a lot of your time goes?
Brendan Foody: Yeah, it’s actually not too surprising. The top two buckets are always working on hiring and time with customers of how do I really deeply understand what customers need and how we can support them? And then how do I build the team and a lot of the processes around that? Of course, there’s all of the ad hoc things I didn’t expect of dealing with the people questions of how do we set up our levels and our comp bands and all of that, which you learn as you scale a business. But I think that the core places that I spend my time are in line with what I expected as well as what I love doing, which is very fortunate.
Lenny Rachitsky: So these two companies you’ve started in the past, maybe share what they work because they’re fun, and then how do they help you be successful in this? What’s something that they taught you that helped you in your current role?
Brendan Foody: Yeah, so there’s been like a dozen, but I’ll choose my favorite two. So when I was in eighth grade, I started Donut Dynasty where I saw that Safeway Donuts were selling for 5 a dozen, and then go back to my middle school and then sell them for 20 to drive me in her minivan down to Safeway, buy 10 dozen donuts, go to my middle school, sell them all out.
And then the school tried to shut me down because I was selling food on school campus, which they didn’t like. So they had me in the principal’s office asking me to not do that. And then I moved my donut stand over 50 feet, so it was off school campus, saying that they could no longer police me. I remember we had competitors pop up where the competitors were charging. They bought these Chuck’s Donuts, which if anyone in the Bay Area knows, are higher end donuts than Safeway Donuts, but they have a higher cost basis. They cost a dollar per. And so I dropped my prices to 2 each where they could sell them throughout the school and I could have a lower cost basis on them.
So I had all of these fun experiences in selling donuts, and then I could talk more about my high school business as well, which was a more significant scale. But I think the takeaway from that was just like you can just do things. So many people have ideas, but the barrier to more companies being built, I think, is just initiative and taking the steps to build the product or experience that customers want and investing the time and the ambition to scale that up. And so I think it was really getting reps of that that enabled me to realize that I should do it later on at a much larger scale.
Lenny Rachitsky: Amazing story. I love how wholesome that is versus drugs, selling donuts.
Brendan Foody: Then my mom was very worried. She was like, “Oh, is there any pot of these donuts?” I was like, “No, mom, I assure you these are pure donuts.”
Lenny Rachitsky: I love that you paid your mom $20 to drive.
Brendan Foody: Yeah. She was adamant it couldn’t be a handout that she was taking her time to drive me, so she needed to make a little bit of money off of it. We haggled over her title where eventually she wanted to be head of global operations, which we found very entertaining.
Lenny Rachitsky: I hope that’s on her LinkedIn.
Brendan Foody: Not yet. Maybe she’ll have to add it.
Lenny Rachitsky: So you said that you’ve started a dozen companies?
Brendan Foody: Yeah.
Lenny Rachitsky: Wow. Okay.
Brendan Foody: Well, a dozen projects, but I think it was that, and then my AWS company were the two that I scaled up.
Lenny Rachitsky: What’s the story behind Mercor as the name?
Brendan Foody: Mercor means marketplace in Latin or to buy, sell, trade. And we want to build the largest marketplace in the world, the marketplace for how everyone finds jobs, and that was really the draw to it.
Lenny Rachitsky: Okay, maybe a last question. This is going back to earlier in discussion because it’s something I’ve been thinking about as we’re talking. There’s been this shift from data as the fuel for models, and now it’s experts. Do you think there’s a next step, or is this just will take us to AGI, superintelligence?
Brendan Foody: I don’t think it’s necessarily changing from data to experts. It’s more just the paradigm of realizing that labs need this close collaboration with experts to help understand what are the evals that they’re building and how can they push the frontier. But I think it’s very clear that evals are evergreen, that so long as we want to improve models, we’ll need experts to create evals for them and to create the post-training data for them to learn those capabilities. And of course there might be changes in the exact way that people do training with RL or otherwise, but they will always need an eval to measure what does success look like across every domain that they want to build.
Lenny Rachitsky: Okay. So then building on that, a question that comes up a lot these days is, and I know we’re talking about fun stuff but I’m getting to serious stuff again, scaling laws and just progression of model intelligence. A lot of people are feeling like, “I don’t know, it’s slowing down. We’re not going to really get to superintelligence at this rate.” What is your sense?
Brendan Foody: I totally agree with that. I know there’s been some executives to big labs that say we’ll have superintelligence in three years, but I think the truth is that it’s a longer road. And that’s not to diminish from how extraordinary the models are. I think we’ll be able to automate a majority of knowledge work tasks in the next 10 years for sure, but that long road is paved with all of the evals that help to make those capabilities possible. And it’s not going to be 10X more pre-training data that gets those capabilities. It’s much more going to be all of the post-training data sets that are far more data-efficient and thoughtful that help us get there.
Lenny Rachitsky: David Sachs tweeted this interesting point that the situation we’re now is almost the best case scenario where AI is not in this fast takeoff to superintelligence. There’s a lot of competitors keeping each other in check. Models are already very valuable and only getting valuable, more valuable, but there’s not just this winner superintelligence taking over the world situation.
Brendan Foody: Yeah, I think that’s true. I think a lot of the super intelligence fearmongering is probably overrated, but at the same time a lot of people’s framing around that is even if there is a 5 to 10% chance of this P-Doom, then we should be careful, which seems logical. But I think that it’s going to be an extraordinary 10 years for all of Silicon Valley and all of the world as this technology is able to create abundance and giving everyone better medical treatment, the best access to legal recommendations, and the ability to build great products more than we’ve ever seen before.
Lenny Rachitsky: And education feels like is transforming.
Brendan Foody: Absolutely, right. I even have felt bits of this over the last 10 years where I remember ever… My parents would give me a hard time for not going to classes in college and I’d be like, well, there’s way better lectures on YouTube. Why not just listen there? But I can only imagine as the models get extremely good at conveying information, better than the best professor, what that’ll mean and access to all sorts of information to better forward humanity and upskill everyone.
Lenny Rachitsky: So I’ll use that as a segue to a final question. I’m going to take us to AI Corner, which is a recurring segment on the podcast. What’s some way that you personally use AI to do better work to help you in life?
Brendan Foody: Well, let’s see. I use it a lot to write documents, as you would expect. I also talk to get advice on problems. I find it helpful to just reason through almost as a thought partner because, yeah, I don’t know. I find I think better sometimes when I’m talking something through, but I can’t talk through everything with colleagues or people around me.
Lenny Rachitsky: And so this is like ChatGPT Voice Mode mostly or something else.
Brendan Foody: Yeah, I like ChatGPT Voice Mode a lot. There’s stuff-
Lenny Rachitsky: Me too.
Brendan Foody: … or room for improvement, but I am very excited about the future of Voice.
Lenny Rachitsky: Let me show you something I built, actually. I wasn’t planning to talk about this, but there’s this guy, Eric Antonow, who’s been recommended by a lot of people to get him on this podcast. He’s this creative product person that’s under the radar now. He’s at Facebook for a long time. He built this project called Pirate GPT, which is you basically put ChatGPT into a stuffed animal to talk to it. So built a little wise owl. I don’t have it on right now.
Brendan Foody: Wow.
Lenny Rachitsky: But basically you sew in a little speaker right here and you put a little magnet underneath and you can put it on your shoulder and then you just talk to it.
Brendan Foody: That’s so cute. Wow. I love it. I’ll have to get one of those. Because I have some of the voice assistants in my apartment, but I really want a ChatGPT voice assistant, so I’m excited for-
Lenny Rachitsky: I was just thinking that. Yeah, just come on. Why can’t we have a ChatGPT voice just sitting around listening to us all the time. And you can’t on your phone because it goes to sleep and it’s like, “Hello, what?”
Brendan Foody: Exactly. Yeah.
Lenny Rachitsky: Yeah, so it’s what this is trying to be. Well, there’s a kickstarter he started that we’ll link to that. You could help out.
Brendan Foody: There we go.
Lenny Rachitsky: That’s really easy.
Brendan, is there anything else that you wanted to share or touch on or maybe leave listeners with before we get to a very exciting lighting round?
Brendan Foody: Tying to the point around initiative and that you can just do things, I encourage everyone, especially with AI and it being so much easier to build, just take the initiative to go out and build products and talk with customers and take that leap of faith because I think that that is in so many ways, the largest barrier to more innovation, the economy in any way that we can support that.
Lenny Rachitsky: Yeah. There’s so many people that just, let’s not bash the podcast, but just listen to podcasts, read posts, just keep reading and listening and don’t do anything with that information. And there’s never been an easier time to actually build stuff and try stuff.
Brendan Foody: Totally.
Lenny Rachitsky: So definitely take that advice. Just you can do things. You can move your donut stand 50 feet and get out of their jurisdiction.
Brendan Foody: Yeah.
Lenny Rachitsky: Okay, Brendan, with that, we’ve reached a very exciting lightning round. I’ve got five questions for you. Are you ready?
Brendan Foody: All set.
Lenny Rachitsky: What are two or three books that you find yourself recommending most to other people?
Brendan Foody: Let’s see. I would say in order, High Output Management is a phenomenal book on running companies. Second is Zero to One, which of course is a classic. And then third is Shoe Dog, where I just find it to be a really inspirational story.
Lenny Rachitsky: What is a recent movie or TV show you really enjoyed?
Brendan Foody: I really liked Oppenheimer. My favorite TV show of all time is Suits, so I know not recent, but if I had to choose a recent one, probably Oppenheimer.
Lenny Rachitsky: Very cool. Suits, first time someone’s mentioned that. Favorite product you recently discovered that you really love?
Brendan Foody: I love using Codex, like the new version. I know it’s sort of new in terms of version. Yeah, I think it’s incredible and just a huge, huge improvement. So yeah.
Lenny Rachitsky: Do you have a life motto that you find yourself coming back to, sharing with folks, finding useful in work or in life?
Brendan Foody: I think it’s you can just do stuff, what we were talking about earlier. Take the leap of faith.
Lenny Rachitsky: I thought you were going to say can do, which is in your Twitter profile.
Brendan Foody: Can do as well, yeah.
Lenny Rachitsky: Two great ones. Final question. So we were chatting before this about things that we could talk about and you shared this interesting thing that you haven’t shared anywhere else, which is that you’re dyslexic. Why don’t you share that with folks? And just how do you get around that having built the fastest-growing company in history?
Brendan Foody: I don’t hide it at all. I think a lot of my colleagues know. And I think on one hand it definitely makes it difficult to go through 1,000 emails a day or read every document that I’m supposed to, but on the other hand, I feel like it helps me to think a little bit differently, to be more creative, and perhaps see that markets are changing that not everyone sees. And so it’s turned out okay so far. And so I think one thing it’s helped me realize from a management standpoint is that we focus much more on how we can leverage people’s strengths rather than helping to improve weaknesses, because there’s some things that I’m not great at and I’ll never be the best in the world at, and there’s others that I can hopefully refine and strive to be.
Lenny Rachitsky: That’s such a also recurring theme on this podcast of just focusing on strengths and not focusing over all your focus on weaknesses.
Brendan, this was incredible. I learned so much. I have a billion more questions, but you got shit to do. Two final questions. What should people know about what you’re doing and roles you’re hiring for? And then how can listeners be useful to you?
Brendan Foody: Absolutely. We’re hiring a ton across the board on our team. We’re hiring strategic project leads on our operations team, software engineers in our engineering team, as well as researchers. And so please go to mercor.com and we would love to work with you, and that’s the largest way that you can help us. Share it with your friends as well. Over half of people in our marketplace come from referrals because we have a platform of people that love us. And so any jobs that you want to apply to or send your friends to, we would love to have you.
Lenny Rachitsky: Brendan, thank you so much for joining me.
Brendan Foody: Thank you for having me.
Lenny Rachitsky: Bye, everyone.
Thank you so much for listening. If you found this valuable, you can subscribe to the show on Apple Podcasts, Spotify, or your favorite podcast app. Also, please consider giving us a rating or leaving a review as that really helps other listeners find the podcast. You can find all past episodes or learn more about the show at lennyspodcast.com. See you in the next episode.
Glossary
| English | 中文 |
|---|---|
| Andrej Karpathy | Andrej Karpathy(保留原文,人名,AI 研究者) |
| aural environment machine | 奖励环境机器(原文 aural 疑为 oral/aural,此处根据上下文译为奖励环境机器,指强化学习中的 reward environment) |
| Benchmark | Benchmark(保留原文,风险投资机构名) |
| bootstrapped | 自力更生/白手起家 |
| Brendan Foody | Brendan Foody(保留原文,首次出现) |
| can-do attitude | ”能做到”的态度 |
| Chuck’s Donuts | Chuck’s Donuts(湾区甜甜圈品牌,保留原文) |
| Codex | Codex(保留原文,OpenAI 的代码生成产品) |
| crowdsourcing | 众包 |
| Daniel | Daniel(保留原文,人名) |
| Donut Dynasty | Donut Dynasty(甜甜圈王朝,Brendan Foody 初中时期创办的甜甜圈销售项目) |
| dyslexic | 阅读障碍(dyslexia,阅读困难症) |
| Elad | Elad(保留原文,人名) |
| elastic demand | 需求弹性大 |
| Eric Antonow | Eric Antonow(保留原文,人名) |
| evals | 评估(全称 evaluations,指 AI 能力评估/基准测试) |
| General Catalyst | General Catalyst(保留原文,风险投资机构名) |
| Gnome | Gnome(保留原文,人名/合作伙伴名) |
| GPQA | GPQA(Graduate-Level Google-Proof Q&A,研究生级别问答基准测试) |
| Handshake | Handshake(保留原文,公司名) |
| High Output Management | 《High Output Management》(保留原文,Andy Grove 所著管理类书籍) |
| human data | 人类数据 |
| Humanity’s Last Exam | Humanity’s Last Exam(人类终极考试,AI 能力评估基准) |
| incumbents | 现有企业/在位者(文中保留原文写法) |
| job displacement | 岗位替代 |
| leading indicators | 领先指标 |
| Lenny Rachitsky | Lenny Rachitsky(保留原文,人名) |
| Magnificent 7 | 科技七巨头(指美股七大科技公司) |
| Marc Andreessen | Marc Andreessen(保留原文,人名,硅谷知名投资人/创业者) |
| Mercor | Mercor(保留原文,公司名) |
| net retention | 净留存率 |
| No Priors | No Priors(保留原文,播客名) |
| P-Doom | P-Doom(Probability of Doom,AI 导致人类灭绝的概率) |
| Pirate GPT | Pirate GPT(保留原文,将 ChatGPT 集成到毛绒玩具中的项目名) |
| post-training data | 训练后数据 |
| PRD | PRD(Product Requirement Document,产品需求文档) |
| pre-training | 预训练 |
| red line | 批注(法律文件中的红线批注/修订标记) |
| reinforcement learning from AI feedback | 基于 AI 反馈的强化学习 |
| revenue run rate | 收入运转率 |
| reward hacking | reward hacking(奖励投机/作弊行为,强化学习中的术语) |
| RLHF | RLHF(Reinforcement Learning from Human Feedback,基于人类反馈的强化学习) |
| rubric | 评分标准 |
| Safeway | Safeway(美国连锁超市品牌,保留原文) |
| sales collateral | 销售物料 |
| Sarah | Sarah(保留原文,人名) |
| Scale | Scale(保留原文,公司名) |
| scaling laws | scaling laws(扩展定律,指模型性能随规模增长的规律) |
| seed round | 种子轮 |
| Series A | A 轮(融资轮次) |
| Shoe Dog | 《Shoe Dog》(保留原文,Phil Knight 所著自传) |
| Sid | Sid(保留原文,人名) |
| sourcing | 人才寻源 |
| Suits | 《Suits》(保留原文,美剧名) |
| Sundeep Jain | Sundeep Jain(保留原文,人名) |
| supervised fine-tuning | 监督微调 |
| Surge | Surge(保留原文,公司名) |
| SWE-bench | SWE-bench(软件工程能力基准测试) |
| talent density | 人才密度 |
| unicorn | 独角兽(指估值超过 10 亿美元的初创公司) |
| unit test | 单元测试 |
| upskill | 提升技能 |
| verifier | 验证器 |
| vetting | 审查 |
| Zero to One | 《Zero to One》(保留原文,Peter Thiel 所著创业类书籍) |
Reformatted by reformat_english.py
为什么专家撰写 AI 评估正在催生历史上增长最快的公司 | Brendan Foody
文字稿
Brendan Foody: 世界上最富有的公司愿意不惜一切代价来提升模型能力。
Lenny Rachitsky: 我们正在进入评估(evals)时代。
Brendan Foody: 我们开始与所有顶级 AI 实验室合作。这些实验室需要的是劳动力市场,他们真正需要的是能够衡量模型能力的卓越专业人士。
Lenny Rachitsky: 他们发现了这个领域,也许这是历史上最大的商业机会。
Brendan Foody: 我们在 16 个月内从 1 美元增长到 4 亿美元收入运转率,这是历史上最快的增长。
Lenny Rachitsky: 为什么这如此有价值?
Brendan Foody: 市场的边界取决于人类能做而模型做不到的事情有多少。实验室提升模型的主要瓶颈在于,他们如何能够有效地找到某种方式来衡量模型的成功标准是什么。
Lenny Rachitsky: 有这么一条推文你转发过。“如果仔细想想,我们被放在地球上是为了给实验室创建强化学习训练数据。”
Brendan Foody: 整个经济极有可能变成一个奖励环境机器,构建出所有这些世界和情境。我认为过去三年里 AI 领域的叙事几乎完全围绕着岗位替代,但很少有公司和人在谈论正在被创造的这一新型工作岗位。
Lenny Rachitsky: 我和很多人讨论过我应该学什么、应该在哪些方面提升自己。
Brendan Foody: 他们如何利用这项技术做到更多?我们会给人们安排面试,说”使用你能用到的任何工具来建一个网站,让我们看看你一小时能做出什么产品。“
认识 Brendan Foody 与 Mercor
Lenny Rachitsky: 今天的嘉宾是 Brendan Foody,Mercor 的 CEO 兼联合创始人。Mercor 是历史上从 1 美元增长到 5 亿美元收入最快的公司。他们用 17 个月完成了这件事,不到一年半。Brendan 也是有史以来最年轻的独角兽创始人。他们刚刚以 20 亿美元估值融资了 1 亿美元。Mercor,如果你还没听说过,帮助 AI 实验室和 AI 公司聘请专家来协助他们用 AI 训练模型。他们从未有客户流失,净留存率超过 1600%,收入运转率已达九位数。
在我们的对话中,我们聊到了评估(evals)日益增长的价值和重要性,像 Mercor 这样的 AI 训练公司的格局,以及为什么它们变得如此重要和有价值;Brendan 是如何发现这个机会的;他对产品市场契合度应该长什么样子的洞察;他在组织中植入的核心原则,正是这些原则让他建立起历史上增长最快的公司;为实验室撰写评估的人日常工作实际上在做什么;在 AI 崛起的背景下,哪些技能和工作将持续最久;为什么他认为我们短期内不会看到 AGI 或超级智能;以及更多内容。这期节目非常精彩,你一定要听。
Brendan,非常感谢你来参加节目。欢迎来到播客。
Brendan Foody: 非常感谢邀请我,Lenny。我是你的超级粉丝,非常期待这次对话。
Lenny Rachitsky: 我也非常期待这次对话。我是你的超级粉丝。我希望更多人认识你和你在做的事情。
评估时代的来临
我想从你推特置顶的那条推文开始聊。推文是这样的:“我们现在正在与 Magnificent 7 中的六家合作,与所有前五大 AI 实验室合作,与大多数 AI 应用层公司合作。每个客户都有一个共同趋势——我们正在进入评估(evals)时代。”
这条推文引起我注意的原因是,这是本播客中最反复出现的趋势之一,人们在谈论学会做好评估的越来越大的价值,以及评估对公司的价值。但感觉大多数人仍然不知道我们到底在说什么、为什么这如此重要。谈谈你认为人们还在忽略什么、他们需要了解什么,以及这个评估时代意味着什么。
Brendan Foody: 如果模型就是产品,那么评估就是产品需求文档。研究人员的日常工作状态是这样的:他们会运行几十个实验,在评估集上做小的改进。而强化学习已经变得如此高效,一旦他们有了一个评估,就可以帮助模型攀爬这个评估。你看看,一旦人们开始专注于奥赛数学,多快就达到了饱和;一旦我们专注于 SWE-bench,又多快达到了饱和。所以在很多方面,要将智能体应用到整个经济体、自动化每一个工作流,障碍就在于——我们如何衡量成功?我们如何进行评估?为我们希望智能体做的每一件事编写 PRD,而这显然是 Mercor 正在大量参与的工作。
Lenny Rachitsky: 所以人们听到这些,会觉得”哦,对。好吧,见鬼。我真的得好好关注这个评估的事了。“关于如何学好这个,你有什么建议吗?那些在这方面做得好的公司,做法有什么不同?帮大家在这个事情上变得更好。
Brendan Foody: 是的。我认为,尤其是对企业来说,核心的思考方式是:他们如何建立一套测试或系统化的方法,来衡量 AI 在多大程度上自动化了他们的核心价值链?比如,如果是一家建筑公司,他们生产建筑图纸提供给最终客户,他们如何有效地衡量这一点?每家公司都有自己的价值链,如果是多产品公司,可能有好几条。仅仅思考如何衡量这些,就是在整个业务中真正有效应用 AI 的前提。
Lenny Rachitsky: 我看到你在 No Priors 播客上和 Sarah 及 Elad 聊过这个,我不记得是在这次之前还是之后,但 Sarah 发了一条推文:“评估等于你的新营销。“这是什么意思?你觉得她在说什么?
Brendan Foody: 是的。这和我之前说的相关——如果模型是产品,评估就是 PRD,但同时也是销售物料,对吧?因为评估既是你给研究人员看的东西,告诉他们应该构建什么,同时也是你展示能力有效性的方式。
过去,大家一直指向那些学术评估——GPQA 的博士级推理、Humanity’s Last Exam、奥赛数学——但现在正在转向人们实际关心的能力:我们如何让模型自动化构建软件平台的方式,或者自动化做投行分析的方式。我认为实验室和应用层公司都会越来越多地使用评估来展示其模型和产品的能力。
市场格局与 Mercor 的起源
Lenny Rachitsky: 好。让我们在这个基础上再往大了看一点,聊聊你所处的市场格局。我在准备这次对话时就在想这个问题。如果你看看那些增长速度超过历史上任何公司的企业,基本上有三个类别:基础模型公司,vibe coding 应用——Cursor、Loveable、Bolt、Replit 等等——然后就是数据标注数据公司,比如你们。我已经请了 Handshake 的 CEO 上过播客,Scale 的 CEO 也会来。还有 Surge。还有你们。帮我们理解一下这个格局到底是怎么回事,因为我觉得人们真的不知道到底发生了什么,只看到所有这些公司在疯狂增长。
Brendan Foody: 好,我讲一点起源故事,把市场格局也融进去讲。因为当我们创办这家公司的时候,我 14 岁就认识了我的联合创始人。我们 19 岁时一起创办了公司,那是在 2023 年 1 月,最初是在国际上招聘人员,把他们和我们自己的朋友匹配起来,并自动化了整个流程。就像人类会审阅简历、进行面试、然后决定录用一样,我们用大语言模型自动化了所有这些流程,在从大学辍学之前就把公司做到了百万美元收入运转率。
之后又发生了一些事情,但我们遇到了 OpenAI,我们看到了人类数据市场正在发生一场巨大的转型——它正在从众包问题(如何找到低技能和中等技能的人,让他们为早期版本的大语言模型写出勉强语法正确的句子),转向人才寻源和审查问题。我们如何寻源和评估最优秀的专业人士、有经验的人?想想软件工程师、投资银行家、医生和律师——这些人才能真正帮助评估和解读人们希望模型具备的所有能力。
从那时起,我们开始与所有顶级 AI 实验室合作。我们在 16 个月内将收入运转率从 100 万增长到 4 亿,这是一段非凡的旅程,令人无比兴奋。
Lenny Rachitsky: 好。首先,这太疯狂了。我不知道人们是否理解——我觉得这是你第一次公开分享这个数字。我知道我们在录制这期节目,到你发布的时候应该已经公布了,但 16 个月内收入从 100 万到 4 亿。
Brendan Foody: 没错。历史上最快的增长,这是一个我们非常自豪的激动人心的统计数据。
Lenny Rachitsky: 好。所以这里正在发生某种大事。为什么这件事这么有价值?到底是怎么回事?如果简单地总结一下你们做的事情,就是你帮助实验室招聘人员来帮助训练他们的模型,而且你帮他们找的不是通用的劳动力,而是专家,帮助填补模型知识中非常具体的空白。
Brendan Foody: 是的,没错。这确实和你第一个关于评估时代的问题紧密相关,正是这个框架统领了一切——实验室改善模型的主要瓶颈在于,他们如何才能有效地建立某种衡量模型成功标准的方式,既用它作为评估来衡量进展,也用它作为强化学习环境中的验证器来奖励模型、提升能力等等。他们需要在每一个领域、针对模型还不会的每一种能力都有这样的衡量方式。而世界上最富有的公司愿意不惜一切代价来提升模型能力,Mercor 正坐在最前沿,也是最主要的瓶颈环节。
具体工作场景
Lenny Rachitsky: 好,这些人到底在做什么?比如,哪类人是被争相招募的?他们坐在电脑前在干什么?
Brendan Foody: 实际上,这个市场的边界就是人类能做但模型做不到的事情的总量。我举个具体的例子。假设你有一个模型,你想让它像律师一样为合同写批注,但它犯了若干错误,遗漏了一堆关键点。你可以做的是,让一位律师制定一份评分标准,就像教授可能会制定评分标准来创建一个交付物,明确我们希望模型能够做到哪些事情。
这样就可以有效地打分了,对吧?加上它识别了这个或那个关键点的多少。这确实是衡量模型进展是什么样的基础——这个模型是否达到了这些专业人士期望的能力?以及我们如何将这些用作训练数据,来奖励和强化人们希望模型达到的许多能力。
Lenny Rachitsky: 好,所以本质上他们就是在写评估,把我们拉回到最初的对话。
Brendan Foody: 没错。嗯,有意思的一点是,大家都在谈强化学习环境。我觉得当下最热的两个话题就是强化学习环境和评估,但正如 Andrej Karpathy 在推特上反复提到的一点,这里面其实并没有那么微妙。关键在于数据类型,只是用不同的语义方式来描述其用途而已。但归根结底,它就是某种衡量”好是什么样”的基准点。你可以把它用作 Sarah 说的那种面向销售物料的基准,来说明为什么我们的模型是世界上最好的模型、我们在朝着哪些能力努力;你也可以在后训练阶段用它来奖励模型的某些行为轨迹,从而实现那些能力。
Lenny Rachitsky: 好,假设这位律师——这个人写下了”一份优秀的批注合同长什么样,以及优秀的评分标准是什么”——那他们是否也会提供数据,比如实际的批注文档示例?
Brendan Foody: 可能会。从历史上看,数据格局包含两类数据。第一类是监督微调数据,即输入/输出对。当人们谈到传统意义上的微调时,指的就是这个。第二类是 RLHF,即模型会生成若干示例,然后我们选择其中最受欢迎的那个。
现在大家普遍在转向的方向,是用 AI 反馈替代人类反馈来进行强化学习——由人类定义某种成功标准、某种衡量方式。在代码领域,这可以是单元测试,我们可以大规模地衡量成功与否;在其他领域,则可以是评分标准。然后你用这些来激励模型的能力发展。这种方式的可扩展性和数据效率都远高于前者,所以整个市场上一个大趋势就是从 RLHF 转向既用评估来衡量模型,也用来提升能力。
Lenny Rachitsky: 我请过 Anthropic 的一位联创上节目,他说了一模一样的话。Anthropic 做的事情就是转向 AI 驱动的强化学习。
所以如果我能这样理解的话——我是外行,在这里代表听众试图搞明白——本质上就是一位律师说”批注合同中正确答案长这样”,然后 AI 几乎是自主地运作,就像在说”我要努力达到这个标准,我要努力改进,而我是否走在正确的方向上,取决于我拿到的这份评估/评分标准”。
Brendan Foody: 完全正确。就是应用所有关于”好是什么样”的标准,类似于助教可能按照教授的标准来判断学生的回答是否符合这条或那条标准,加上多少个维度等等。
更广泛的劳动力市场
Lenny Rachitsky: 太好了。好,让我转到更广泛的劳动力市场话题。这个问题分两部分。第一部分就是:我们还需要做这件事多久?你们增长速度惊人,会不会有一个时间点出现”好了,我们不需要人类了,已经到顶了”的情况?先聊这个,然后我再问一个更大的问题。
Brendan Foody: 核心问题是:经济中那些人类能做而 AI 做不了的事情,还会存在多久?确实有一派人认为我们三年内就会实现超级智能,人类将不再在经济中扮演角色。这是一种观点。
我们的看法很不一样。我们认为这些模型确实非常强大,正在迅速自动化很多事情,但它们在很多方面仍然很糟糕。即使到现在,它还不能帮我安排日程,不能替我起草邮件,不能使用基本的工具。我们需要为所有东西建立评估——模型做不了的所有事情,我们都需要工具使用的评估、长程推理的评估。
想象一下十年后,我们希望模型能够出去花 30 天时间从零创建一家创业公司。我们需要为此建立评估,才能有效地奖励它。我认为,只要经济中还存在人类能做而模型做不了的事情,这条改进模型的道路就会持续下去,而这将成为未来工作的很大一部分。所以我们的使命就是创造未来的工作方式,我认为这是一个非常令人兴奋的行业,让我们得以窥见一切的发展方向。
“我们被放在地球上是为了创造训练数据”
Lenny Rachitsky: 有一条你转发的推文我想问你。上面写着:“如果你真的想一想,我们被放在地球上是为了给实验室创造强化学习训练数据。”
Brendan Foody: 对。
Lenny Rachitsky: 这句话对你来说意味着什么?这个人想暗示什么?基本上就是你说的——我们只是在帮忙训练模型。
Brendan Foody: 这呼应了我和许多顶尖实验室的研究人员和高管进行的对话。他们普遍认为,整个经济很有可能会变成一台奖励环境机器,构建出所有这些世界和情境,让我们能够拥有评分标准或其他形式的验证器。这在很多方面确实令人兴奋。
因为我们可以类比其他革命——工业革命时期,所有人都在恐慌失去工作,但随之出现了一整套全新的职业类别:我们如何制造机器?如何从事知识工作?如何创造一切新生事物?而过去三年里,AI 领域的叙事几乎完全围绕岗位替代展开。当然,ChatGPT 增长很快,大家都爱用,这很酷,但从经济角度来看,人们谈论的大多是岗位替代。但很少有公司和人在谈论正在被创造的这个全新职业类别,以及这将意味着什么、人们如何为此做准备和提升技能。我认为最令人兴奋的事情,就是创造那种未来——人类如何融入经济、这将如何随时间演进。
未来哪些技能仍然值得投资
Lenny Rachitsky: 我和很多人讨论过到底该学什么、该在哪些方面提升自己。现在还在上学的人就在想”未来到底什么才是有价值的?“你正处于一个核心位置,了解哪些岗位需求最大、招聘趋势如何变化。那我就问一个非常具体的问题:你认为未来哪些工作会继续存在?对年轻人来说,哪些技能仍然值得投资?
Brendan Foody: 关于岗位,我的回答是:那些需求弹性非常大的领域将会非常令人兴奋。因为当我们将人们的生产力提升十倍时,我们会建造十倍甚至一百倍的软件。所以我认为那些现在能做更多事情的产品经理将处于极其有利的位置。至于技能方面,我认为是那些能够利用 AI 来完成日常工作流程的人。
我和几位老师有过交流,他们问我对学生评估方式有什么建议,因为我们最初就是为人们策划所有这些 AI 面试和评估起家的,对这个问题思考得非常深入。我们的认识是,你不应该对抗他们使用模型。就像计算器刚出来的时候,你不应该给学生布置大量算术题,然后想方设法让他们不用计算器。你应该告诉他们”去用这些工具,让我们看看你能做到什么。”
Brendan Foody: 所以我们会给人们安排面试,告诉他们:“用 ChatGPT 和 Kodak,用 Claude code,用 Cursor,用任何可用的工具来建一个网站,让我们看看你一小时内能做出什么产品。” 我举这个人才评估的例子,是因为我认为它同样适用于人们应该重点培养的技能——即如何利用这些技术在任何行业或领域中做到更多。
需求弹性的含义
Lenny Rachitsky: 你提到弹性,“有弹性”是什么意思?是指通才——擅长很多不同事情的人,还是你怎么看?你说的弹性到底指什么?
Brendan Foody: 我更多是指该行业的需求容量有多大。举几个例子。在会计领域,现实地说,世界上对会计的需求是有限的。也许在某些领域可以做得更多、做得更好,但感觉世界不需要一百倍以上的会计。
相比之下,在软件开发领域,我认为我们可以为一百倍的功能发布产品,以一百倍的速度推进,建造多得多的东西。这个行业的需求似乎是无限的。Marc Andreessen 最近在推文中也谈到过这一点,说软件是所有行业中需求弹性最大的——当我们提高生产力时,能构建的东西会多得多。这个特征在很多其他领域也同样存在。所以我会建议关注那些领域——当每个人都变得十倍更高效时,需求会增加而不是减少的领域。
弹性需求行业中的具体岗位
Lenny Rachitsky: 好的。所以你是”学编程仍然有用”这个阵营的,计算机科学仍然值得学。那么在弹性需求的工作类别中,看起来工程、产品管理都属于这一类。很好。很多听众是 PM。还有别的吗?比如设计?我不知道。根据你的观察,还有哪些属于这个类别?
Brendan Foody: 我觉得很多东西都属于这个范畴。创建公司的整个价值链中存在大量可变成本,甚至包括很大一部分运营或咨询工作。想象一下,如果我们能拥有十倍的麦肯锡顾问,在研究、分析等方面能做到什么程度。但我认为,最终成功的公司和个人,是那些拥抱”丰裕”叙事的人——思考如何做得更多,而不是与之对抗、试图阻止岗位替代。
AI 工具就是超能力
Lenny Rachitsky: 顺着这个思路,我想到你说的第二类——最成功的人。这不是某个具体技能,而是善于使用 AI,用 AI 来提升自己已经在做的事情。这让我想到 Elon 关于 Neuralink 的说法,我不确定他是不是这么说的,但我一直听到的版本是,他之所以想造 Neuralink,是因为当未来 AGI 和超级智能出现时,我们需要一种方式来竞争,而最好的竞争方式就是把我们的大脑接入超级智能,这样才有一线机会。感觉 AI 工具就是这样——精通 AI 工具本质上就是拥有一种超级能力。
Brendan Foody: 找到如何利用它们并将其融入工作流程,绝对至关重要。
Lenny Rachitsky: 这又回到了那句现在几乎已成陈词滥调的话:“AI 不会取代你,真正精通 AI 的人才会取代你。”
Brendan Foody: 我认为这话说得完全正确。我在企业层面也确实看到了这一点。我们接触的一些企业几乎心存恐惧,不愿意参与,不愿意对自身业务进行评估,因为那将提供证据证明他们的价值链正在被自动化。而另一些企业则积极拥抱——如果我们有能力做十倍甚至一百倍的事情,那意味着什么?我们如何拥抱这个未来?因为未来十年会有太多事情发生变化,我认为后一种企业才是会成功的。
劳动力市场的未来
Lenny Rachitsky: 让我们聊聊更广泛的劳动力市场。你们的情况很有意思——你们最初并不是为 AI 实验室输送人才,也不是训练模型,而是帮人找工作、帮公司招聘,然后你们发现”哇,这整个机会”。你对劳动力市场和招聘的未来有着非常有趣的视角。谈谈这个。
Brendan Foody: 确实很有意思。我记得我们创办公司的时候,就像我提到的,我们才 19 岁,就有一种直觉——劳动力市场如此分散,效率低得令人难以置信。我说的分散是指,当我们从海外招聘时,候选人会投递十几个职位。而当我们作为湾区公司在考虑候选人时,我们只会考虑市场上可用候选人中极小的一部分。原因在于存在一个匹配问题,每个人都在手动解决——手动审阅简历,手动进行面试,手动决定录用谁。但当我们能够以软件的成本自动化这个匹配问题时,就为全球统一的劳动力市场铺平了道路——每个候选人都在其中申请,每家公司都从中招聘,实现经济体中信息的完美流通。
我认为这个未来毫无疑问是我们正在迈向的方向。但随着时间的推移我们意识到,工作的本质也在发生剧烈变化。在十年时间维度上构建那个未来,一部分工作就是创造那个未来的工作形态,以及我们在评估(evals)和 RL 环境方面为客户构建的这些令人惊叹的数据集等更具体的事情。
Lenny Rachitsky: 我在观察招聘方式变化时发现——我正在和一个合作伙伴 Gnome 做这方面的研究——现在申请公司变得如此容易,每个人都在向几百家公司投递申请。AI 让调整简历和求职信变得很简单,看起来像是”我非常专门地申请了你们公司”,但实际上这是一百个申请之一。而另一方面,招聘经理被海量申请淹没,所以现在他们也需要 AI 来筛选。所以即使我们不想走到这一步,也几乎被推向了这个方向——两端都是海量规模,我们需要非常智能的工具来筛选、帮助我们招聘和选拔,而这正是你们长期以来一直在做的事情。
Brendan Foody: 正是如此。一个很多人会问的有趣问题是:我们把自己看作一个劳动力市场,还是一个数据公司?我觉得这个问题之所以有趣,是因为我们从 AI 实验室的需求中认识到:他们实际上需要一个劳动力市场。他们确实需要这些极其高素质的人才。当然,我们会在上面叠加一些项目管理和软件平台。但他们真正核心的需求是:如何找到这些跨越各个不同领域的杰出专业人士,来衡量模型能力,并共同构建那个未来?
模型如何学习专业知识
Lenny Rachitsky: 回到这一切是怎么运作的、你们为模型做了什么这个问题——我有个朋友脚踝扭伤了,脚疼,去拍了张 X 光片,然后把 X 光片喂给 ChatGPT,让它”给我看看这张 X 光片”。它说”好的,没问题”,然后告诉他”你这是这个情况”。他跟我聊起这事,说”互联网上到底有什么东西训练了这模型,让它能知道这些?“我说,“不是的,实际上是有人坐在那里帮助模型理解这些东西的。一旦他们发现模型并没有完全理解,人类就会帮助模型学习这些知识。”
Brendan Foody: 没错。它的运作方式是这样的——至少根据大多数人的理解,模型的运作机制其实很复杂——预训练把大量知识注入模型,让它认识世界上各种各样的事物。然后后训练和强化学习则是负责所有的推理:哪些知识片段是准确的,哪些是不准确的,以及在做出决策时应该优先考虑什么。所以在这背后,会有放射科医生参与后训练数据集的构建,创建一个基准点:这里的诊断是什么,以及与之相关的奖励和惩罚。最终 ChatGPT 做出的决策和推荐的质量,实际上取决于参与其中的人员的质量。
Lenny Rachitsky: 让我们继续深入聊聊这个话题吧,因为确实非常有趣,而且我不确定有多少人真正理解这一点。我是理解的。所以你们和这些专家做的工作是后训练。不是往模型里喂训练数据,而是”我们有了这个模型 GPT-5,现在它缺少这些东西,让我们来补充。”
Brendan Foody: 完全正确。它的本质是解锁、让模型能够关注所有正确的 token,从预训练中提取模型上下文中所有正确的内容,上调有效的推理链条,使模型能够以更具泛化性的方式进行更好的推理。
后训练的规模与竞争格局
Lenny Rachitsky: 从事这项工作的人规模有多大?几千人、几万人、还是几十万人?
Brendan Foody: 在任何给定时间点是几万人,总体来说是几十万人。规模很大。而最令人兴奋的是它增长得非常快。回到你之前关于竞争格局的问题,过去有很多众包公司,招募大量低技能人员来完成任务。Scale 和 Surge 是开创这个行业的两家主要公司。而在这波向高技能劳动力转型的过程中,人们发现实际上仅仅通过获取更高质量的人才——即使最初数量较少——也能走得更远,然后一旦他们达到了质量门槛,再迅速扩大规模。
我觉得在我们取得成功并且去年初开始收入快速增长之后,有很多公司追随了这条路线,这也合理。他们看到市场在快速变化,我们正在起飞,于是试图在市场上追求类似的论点。
Lenny Rachitsky: 很有意思。其实以前就有一些公司,像 AlphaSights 和 GLG,在 AI 出现之前就在做这件事——付费联系专家向他们咨询问题。而现在的情况本质上是,这些东西对模型来说非常有用,而且我们不需要中间人了。
Brendan Foody: 没错。不过一个核心区别是,AlphaSights 通常是一次性的通话,而我们的很多工作是真正地聘请人员参与项目,让他们在较长时间内专注于某个任务。所以这也是一些传统专家网络难以进入这个领域的原因之一。另外还有如何留住这些人、如何考虑所有的激励机制——它在某些方面反而更类似于 Uber 或 DoorDash 等传统劳动力市场,只不过面向的是技能水平高得多、并且受到极好对待的人才。
专家的类型与创意能力
Lenny Rachitsky: 这对我来说真是一个了解这些东西的绝佳机会,所以我要多问一些问题。
Brendan Foody: 好的。
Lenny Rachitsky: 我觉得这太有意思了。这些专家中,有多少是专注于具体的硬性知识,又有多少是关于个性和软技能的?有多少是”这是怎么做考试的”、“这是怎么看 X 光片”这类内容?
Brendan Foody: 这取决于不同的实验室。两者兼有。以前可能更多是软技能,但现在很多实验室关注的是他们的商业模式——哪些能力具有经济价值、能带来收入——并大力投入这些专业领域。但创意方面对所有实验室来说也仍然非常重要。所以我们看到这两方面都有相当大的需求。几个月前我们招募了哈佛 Lampoon 的全部成员,那是他们的喜剧社团,来帮助让模型变得更幽默。我们还会做各种类似的事情,比如聘请艾美奖获奖编剧,涵盖你能想到的各种创意能力方向。
Lenny Rachitsky: 太不可思议了。多酷的故事。我很期待这些投入能产生效果。这种事情的反馈周期有多快?假设你聘请了这样一个团队,我们多快能看到潜在影响?几个月?还是几年?
Brendan Foody: 这要看情况,因为有些实验室会采用迭代发布的方式,在幕后不断改进模型。
Lenny Rachitsky: 不发布新模型的这种?
Brendan Foody: 没错,每隔几周就改进一次。而其他实验室则会做大版本发布。所以差异很大。我们服务于所有实验室,但我们的速度非常快。客户会提出需求——“我们需要这些获奖编剧”,我们 24 小时内就能把专家匹配到位。还有一个非常有趣的现象:在我们聘请的 100 人团队中,往往排名前 10% 的人会贡献大部分的模型改进。这就像一家公司——如果你有一家 100 人的公司,通常前 10% 的员工会创造大部分的价值。这意味着,当我们在识别谁是那前 10% 的人才方面建立起专有优势时——既包括如何将他们留在我们的平台上,也包括如何有效地识别和匹配他们——这就为客户创造了如此巨大的价值,使得竞争对手很难与我们抗衡。
Brendan Foody: 所以这确实追溯到公司的创立 thesis,即如何找到这些杰出的人才并识别他们,从而为客户稳定地提供这些前 10% 甚至前 10 倍的体验。
专家的工作模式
Lenny Rachitsky: 那关于这一点,具体是什么情况?你聘请了 Jane,她编程能力很强,然后她现在为 Anthropic 全职做这份工作?还是兼职?还是主要以项目制为主?
Brendan Foody: 有时是兼职,有时是全职。我觉得大多数情况下是兼职——比如某人可能在一家科技公司工作,但工作不饱和,也许是那种节奏比较慢的公司,每周多出 20 个小时,然后他们可以利用空闲时间来做这个,各行各业都有类似的情况。但我们也有大量每周 40 小时的全职岗位。
Lenny Rachitsky: 他们的报酬如何?对于一个 AI 工程师来说,值得花时间在这上面吗?
Brendan Foody: 非常值得。我们平台的时薪中位数是 95 美元,但根据专业深度的不同,可以远高于此,最高可达每小时 500 美元。有一点可以突出我们与众包公司的差异——如果你看众包公司的经济模型,他们通常平均只支付每小时 30 美元。想想你能用 30 美元雇到什么样的人——本科生,对比高盛银行家、麦肯锡分析师、FAANG 软件工程师。归根结底,实验室希望他们的模型具备什么样的能力?显然更偏向后者而非前者。
Anthropic 为何在编程上领先
Lenny Rachitsky: 我知道这方面你能说的有限,但 Anthropic 的 Claude 在编程方面一直表现很好,历史上比其他模型强很多。我也用它来写作、获取写作反馈。是什么让他们在这方面如此出色并持续保持领先?
Brendan Foody: 关于客户的工作我不能透露太多细节,但我认为这得益于强化学习的趋势,以及在各方向上非常审慎地定义正确的奖励。如何减少 reward hacking,如何设置正确的奖励,这影响非常大。
Lenny Rachitsky: 评估。再一次,评估就是全部。
Brendan Foody: 回到评估。
Lenny Rachitsky: 对。
Brendan Foody: 我最喜欢的一位客户说过一句话:“模型的上限取决于评估的质量”,这一点始终成立。
Lenny Rachitsky: 我记得 Greg Brockman 曾经发过一条推文:“评估就是全部。”
Brendan Foody: 确实如此。
Mercor 的增长秘诀
Lenny Rachitsky: 我们再多聊聊 Mercor。也许是——不,不用也许——数据显示它是历史上增长最快的公司。
Brendan Foody: 是的。
Lenny Rachitsky: 我想了解你是怎么做到的。你觉得在创建 Mercor 的过程中,哪些核心原则最促成了这样的成功?
Brendan Foody: 我认为最重要的是关注快速变化市场中的领先指标。我记得以前我常常想……投资圈每个人都在谈论”为什么是现在”,我过去是从产品角度思考”为什么是现在”——现在我们可以自动化简历筛选、自动化面试等等。但归根结底,这是一个传统市场,有各种现有 incumbents,相对停滞。真正重要的是找到那些新市场、那些正在快速变化的新需求领域——世界上最有钱的客户愿意不惜代价来提升模型能力——然后专注于这些市场的领先指标,确保我们为市场上的旗舰客户提供最好的解决方案,并围绕这一点优化一切。
我觉得这一点最关键——市场中的领先指标。如果还要选第二个,那就是对客户的极致关注。过去……我们最近开始有几位产品经理协助市场推广,但在公司过去一年半的时间里,我们没有销售和市场营销人员。所以从销售和市场营销的角度来看,我们很不成熟,因为我们把公司 100% 的资源都集中在如何为客户打造出色的产品和体验上。完全靠口碑传播——在其他公司与我们合作过的人想继续合作,我们就持续创造这些出色的体验。这就是我花所有时间的地方。我认为有些创始人容易在还没找到真正能驱动客户热爱、创造六星级体验的核心要素之前,就过早地投入精力去做营销。
发现机遇的时刻
Lenny Rachitsky: 我想回到你说的第一点——你发现了这个机会,也许是历史上最大的商业机会。你是怎么发现的……那个”等等,这可能非常巨大”的时刻是什么?
Brendan Foody: 这里有几个疯狂的故事。我记得我们公司正如我提到的成立于 2023 年 1 月。然后到了 2023 年 8 月,那时我还在上大学,我们的一位客户把我们介绍给了 xAI 的联合创始人,通过一次 Zoom 通话,说我们有非常聪明的印度软件工程师,擅长数学和编程。所以我们和他们见了面,向他们解释我们的软件工程师为什么擅长数学和编程——因为他们不用分心学人文学科,不用学历史、英语这些东西。他们非常喜欢这个说法。两天后就把我们叫到了 Tesla 办公室,我们见到了 xAI 整个联合创始团队,除了 Elon,那时我还是个大学生。xAI 当时才刚刚起步,他们对我们专注专家质量这一点非常兴奋。
所以当时他们还在做预训练,还没准备好使用人类数据,我们那时没有开始合作。但从那个时刻起,甚至在退学之前,我们就知道市场即将发生剧变,我们需要站在前沿。然后快进几个月,一家众包公司找到我们,实际上使用了我们的平台招聘了超过 1000 人。这是一段非常有趣的经历——因为我们开始收到大量支持工单,投诉那些人没有拿到报酬。我们当然感到非常糟糕,因为是我们把这些机会介绍给他们的,而那是一家有信誉的公司。我们意识到,很多现有 incumbents 在改善市场平台上人才的体验以帮助提升模型方面,已经在吃老本了。这里存在一个机会——直接与实验室合作,维护平台上专家的尊严,给他们极其优厚的报酬,去掉中间商。
所以我们去年五月开始这样做,然后一路走到今天。
Lenny Rachitsky: 哇,好的。数亿美元的收入就这么来了。所以我听到的是,你非常愿意去寻找需求的 pull,你看到了一些 pull,去探索了它。一旦你发现其中有真正有价值的东西,你就全力深入,把这个体验做到极致。
Brendan Foody: 没错。我觉得如果要把这提炼成给创业者的建议,有一点是我意识到的——我花了大量时间去强求产品市场契合。在某些方面你确实应该坚持,你应该对世界将如何变化有自己的论点和信念。但有时候你也需要从市场中捕捉信号,知道需求在哪里,知道该把精力投向何处。因为如果销售很困难,如果每多卖一个客户都极其艰难,你是无法建立起一个巨大业务的。你真正需要找到的是那些出人意料地容易成交的客户,你可以和他们一起成长。你知道那是一个巨大的痛点。所以这某种程度上是对自己的论点——关于世界将如何变化——保持固执,但同时对论点具体以什么形式呈现、市场如何发展、你的公司如何融入其中,保持非常开放的心态,两者结合。
Lenny Rachitsky: 这个洞察太棒了。在你描述的那些时刻里,感觉像是两种体验的叠加——xAI 那次会议让你觉得”哇,他们真的非常非常需要我们有的这个东西,我们现在做得太好了”,然后平台上突然有一千人在招聘。就是这两个时刻让你觉得”哇”,对吗?
Brendan Foody: 没错。而且请注意,这些事情发生的时候我们还只是一家种子轮阶段的公司。第一个时刻甚至在我们还没融任何种子轮资金之前,我们完全靠自力更生——我们把公司从零做到一百万美元的收入运转率,一直保持极高的资本效率。我们从没烧过钱,我们在生命周期内一直盈利。然后我们在九月从 General Catalyst 那里融了种子轮,正是在那之后我们才真正意识到这个市场有着海量的需求——我们看到了交易量,也看到那些 incumbents 在市场变化和变革所需的人才类型方面还在沉睡。
成功背后的三大价值观
Lenny Rachitsky: 看到机会并开始执行是一回事,真正在这个规模上取得成功并持续获胜又是另一回事。你们在公司内部有非常明确的价值观。能聊聊这些吗?感觉这也是你们成功的重要原因。
Brendan Foody: 确实是。我来讲三个价值观,每个配一个简短的故事。
第一个是”能做到”的态度,每个人都因为这个说法调侃我,因为听起来很老套,但我们一直在设定那些荒谬般宏大的目标,然后公司的轨迹不知怎么就围绕着这些目标成形了。我记得我们在和 Benchmark 谈他们领投 A 轮的时候,我们的运转率是一千五百万美元。我说我们在年底前会达到五千万美元运转率。他们说我们简直是疯了,任何正常人都会这么说。但误差不超过两周,我们真的做到了。然后我们现在已经远远超过了五亿美元运转率的目标,那本来是我们今年的目标。所以在业务收入规模、人才体验质量等所有维度上设定极其宏大的目标,首先要有一种”能做到”的态度,这非常重要。
第二个是极高的标准,体现在我们招什么样的人以及对他们的期望。我们的招聘门槛极高,我们招了大量的前创始人、拥有非凡经历的人。我们最近刚聘请了 Sundeep Jain 作为我们的总裁。他之前是 Uber 的首席产品官和首席技术官,加入了我们这家相对来说规模还较小的公司,帮助我们扩展各项流程——毕竟 Uber 是全球最大的劳动力交易平台。所以极高的标准至关重要。
第三个我们非常倚重的是强度。如果你去看那些传奇公司的早期文化,想想 Meta 或 Google,它们都有这种极其高强度的早期文化——人们竭尽全力、不惜一切代价去推动模型能力的前沿。这仍然是产出导向的——关注人们取得了什么成果,而不是投入导向的——关注具体工作了多少小时。但我们认识到,打造一家传奇公司需要付出很多,而这正是我们最终在追求的目标。
Lenny Rachitsky: 我能理解为什么这会奏效。“能做到”的态度加上高标准再加上强度,我能看出这如何导向成功。现在关于 6-9-9 文化的讨论很多,每周工作六天,早九点到晚九点。很多人会说:“为什么?太糟糕了。为什么要让人这样?“但与此同时,我从最成功的 AI 公司那里不断听到这个说法。这就是成功的方式。节奏太快了,这是一个你再也遇不到的机会。聊聊你对此的看法。
Brendan Foody: 嗯,先澄清一下,我们从未强制规定工作时长。这更像是那些非常在乎的人自然而然的结果,我们在乎公司的发展轨迹。所以很多人会来办公室待到很晚。但如果他们需要早点离开去和孩子吃晚饭,或者周末去旅行,当然完全没问题。对我们来说,更重要的是找到那些有强烈主人翁意识、真正投入的人,而不是具体的在岗时间——尽管我们发现,往往最投入的人——不总是,但往往是——最投入的人就是那些和我们一起加班到深夜的人。
Lenny Rachitsky: 你说高标准的时候,有没有什么具体的例子可以分享,让我们理解你指的是什么?因为很多人觉得自己有高标准,但其实并没有。
Brendan Foody: 如果你有足够的耐心——招聘中速度和质量之间总是存在权衡。我记得尤其是我们最初的十个人,我们非常耐心、非常有纪律地去寻找世界上最优秀的人。我们的第二名员工 Sid 就是一个例子,我们在美国的第二名员工,Sid 之前是 Scale 的增长负责人,在我们还只是种子轮阶段公司的时候就加入了我们。加入我们的 Daniel 之前把消费者应用做到了超过十万用户,我们前十名员工各种各样的非凡背景。我认为最初的人才密度在很大程度上塑造了后来组织扩大后的整体面貌。
招人慢与招人快的权衡
Lenny Rachitsky: 我知道你还有一个观点——人们常说招人要慢、要等,但其实这未必是正确的建议。聊聊这个。
Brendan Foody: 这很痛苦,因为这是一把双刃剑。一方面,我很高兴我们的前十个人如此出色,我认为这对业务产生了巨大的回报。但另一方面,我觉得公司发展到某个阶段你就是需要快速招人。有些事情需要大量的人来做,你需要认识到招聘中会有一些参差,但速度快才是优先事项。
我觉得在某些方面,我们在扩展团队上确实走得太慢了。好处是每个人都非常出色,我们有这个超高的标准,并且希望长期保持。但我觉得代价是,虽然公司增长已经极其迅速,如果我们当初在从十人扩展到一百人这个阶段动作再快一些,很可能增长得更快。
Lenny Rachitsky: 好的,我正想问这个问题。所以听起来前十个人要非常谨慎,慢慢来,十到一百人的阶段可以适当加速?
Brendan Foody: 对,虽然我不一定说是刚好十个人这个数字。它取决于你什么时候确认这件事真正跑通了。我知道这条线仍然不够清晰,但 basically 一旦你发现需求远远超出你能承接的范围,那就是你该踩油门、在很多方面优先追求速度的时候了。不过我觉得在那之前,尤其重要的是保持耐心和纪律性。招到最优秀的人始终很重要,但速度的重要性在你找到市场机会、发现市场真空之后才会显著提升。
CEO 角色中的时间分配
Lenny Rachitsky: 我知道你过去创办过几家公司,规模都比较小。在现在这家高速增长的大公司担任 CEO,这个角色中什么最让你感到意外——不管是时间分配上,还是工作内容上?因为很多人梦想创业、梦想处于你现在的位置。他们可能没有意识到你的大部分时间花在了哪里?
Brendan Foody: 说起来其实不算太意外。前两大块时间始终是招聘,以及和客户在一起——深入理解客户需要什么、我们怎样更好地服务他们。然后就是如何搭建团队,以及围绕这件事建立的各种流程。当然,也有很多意料之外的琐碎事务,比如处理人事方面的问题——怎么设定职级体系、薪酬区间等等,这些都是随着企业规模扩大逐步学到的。但我觉得我花时间的核心领域和我预期的基本一致,而且恰好也是我喜欢做的事情,这非常幸运。
创业经历:甜甜圈王朝
Lenny Rachitsky: 那你之前创办的两家公司,可以分享一下是做什么的吗?它们本身就挺有趣的。另外,它们是怎么帮你走到今天的?它们教会了你什么,对你现在的角色有什么帮助?
Brendan Foody: 其实大概有十来个项目,但我挑我最喜欢的两个说吧。初中的时候我创办了 Donut Dynasty(甜甜圈王朝)。当时我发现 Safeway 的甜甜圈一打只卖 5 美元,我惊呆了,觉得对一个初中生来说这简直是不可思议的好买卖。于是我开始骑车去 Safeway,5 美元一打买进甜甜圈,然后回到中学按每个 2 美元卖出,利润率当然非常可观。很快就卖光了,于是我需要扩大规模。我会付给我妈 20 美元,让她开着她的面包车带我去 Safeway,买上十打甜甜圈,运到学校全部卖光。
然后学校想取缔我,因为我在校园里卖食物,他们不喜欢。于是校长把我叫到办公室,让我别再干了。然后我把甜甜圈摊往校外挪了 50 英尺,说这下你们管不着我了。我记得后来出现了竞争对手,他们进的是 Chuck’s Donuts 的货——湾区的人可能知道,这比 Safeway 的甜甜圈更高端,但成本也更高,每个要 1 美元。于是我把价格降到 1 美元,持续了两周,把他们挤出市场——那时候我还不懂什么是反竞争行为。我还雇佣了我所有的朋友,用甜甜圈来付工资,因为他们觉得甜甜圈值 2 美元一个,可以在学校里卖出去,而对我来说这样人工成本反而更低。
这些卖甜甜圈的有趣经历让我学到很多。我高中的时候还有一个规模更大的生意,不过从甜甜圈生意中最大的收获就是——你真的可以直接去做。很多人有想法,但阻碍更多公司被创建的,我认为就是行动力——去迈出那几步,做出客户想要的产品或体验,投入时间和野心去把它做大。正是通过一次次的实战练习,我才意识到以后可以用更大的规模去做同样的事。
Lenny Rachitsky: 太精彩的故事了。我很喜欢这个故事的健康程度——卖的是甜甜圈,不是毒品。
Brendan Foody: 当时我妈还挺担心的。她说:“这些甜甜圈里是不是掺了什么?“我说:“没有,妈,我向你保证,这些是纯正的甜甜圈。”
Lenny Rachitsky: 你付 20 美元让你妈开车送你这件事太妙了。
Brendan Foody: 对,她坚持说这不能白帮忙,毕竟她花时间开车送我,所以得从中赚点钱。我们还为她的头衔讨价还价了一番,最后她想做”全球运营负责人”,我们觉得非常好笑。
Lenny Rachitsky: 希望这个头衔在她 LinkedIn 上。
Brendan Foody: 还没有。也许她得加上。
关于 Mercor 的命名
Lenny Rachitsky: 你说你创办了十几家公司?
Brendan Foody: 对。
Lenny Rachitsky: 哇,好吧。
Brendan Foody: 准确说是十来个项目,但我觉得真正做大的就是甜甜圈那个和我后来做的 AWS 公司这两个。
Lenny Rachitsky: Mercor 这个名字背后的故事是什么?
Brendan Foody: Mercor 在拉丁语里是”市场”的意思,或者说”买卖、交易”。我们想打造世界上最大的市场——一个让所有人找到工作的市场,这就是这个名字的吸引力所在。
模型进步的下一步是什么
Lenny Rachitsky: 好,也许最后一个问题。这要回到我们之前讨论的内容,因为聊天的过程中我一直在想这件事。之前模型以数据为燃料,现在变成了以专家为燃料。你觉得还有下一步吗?还是说这条路就能带我们走到 AGI、超级智能?
Brendan Foody: 我不认为这是从数据到专家的切换。更准确地说,范式在于人们意识到实验室需要与专家进行密切合作,来帮助他们理解正在构建的 evals 是什么,以及如何推动前沿。但我认为 evals 是长青的——只要我们想改进模型,就需要专家为模型创建 evals,创建供模型学习这些能力的训练后数据。当然,人们做训练的具体方式可能会有变化——用 RL 或其他方法——但他们始终需要一个 eval 来衡量”在每个想要建设的领域里,成功是什么样子”。
Scaling Laws 与模型智能的进步
Lenny Rachitsky: 好的。那顺着这个思路,最近经常被问到的一个问题是——我知道我们之前聊了些轻松的话题,现在回到严肃的话题——scaling laws 和模型智能的进展。很多人觉得”不知道,感觉在放缓。照这个速度,我们可能到不了超级智能。“你的判断是什么?
Brendan Foody: 我完全同意这个判断。我知道一些大型实验室的高管说我们三年内就能实现超级智能,但事实是这条路更长。这并不是在贬低这些模型有多么出色。我认为在接下来的十年内,我们确实能够自动化大部分知识工作,但那条漫长的道路是由所有帮助实现这些能力的 evals 铺就的。推动这些能力进步的不会是十倍规模的预训练数据,而更多是那些经过深思熟虑、数据效率高得多的训练后数据集,它们才是帮助我们抵达目的地的关键。
Lenny Rachitsky: David Sachs 发了一条很有意思的推文,说我们现在的处境几乎是最好的情况——AI 并没有在快速走向超级智能,有很多竞争者在互相制衡。模型已经非常有价值,而且只会越来越有价值,但并不是那种一个赢家超级智能接管世界的局面。
Brendan Foody: 是的,我觉得这话说得对。我认为很多关于超级智能的恐惧煽动可能被夸大了,但与此同时,很多人的论证框架是,即使发生这种 P-Doom 的概率只有 5% 到 10%,我们也应该保持警惕,这听起来也合乎逻辑。不过我认为,随着这项技术能够创造富足——为每个人提供更好的医疗服务、最优质的法律建议,以及前所未有的构建优秀产品的能力——接下来的十年对整个硅谷乃至全世界来说都将是非凡的。
Lenny Rachitsky: 教育感觉也在发生变革。
Brendan Foody: 绝对的。我甚至在过去的十年里就已经感受到了这一点。我记得……我父母以前总因为我不去上大学课程而数落我,我会说,YouTube 上有更好的讲座,为什么不去听那个呢?而现在我只能想象,当模型变得极其擅长传递信息,比最优秀的教授还要出色的时候,那意味着什么——以及让所有人都能获取各种信息,推动人类进步,帮助每个人提升技能。
AI 角落:个人使用 AI 的方式
Lenny Rachitsky: 那我就借此过渡到最后一个问题。我们进入 AI 角落环节,这是播客的固定板块。你自己个人有什么使用 AI 来提高工作效率或辅助生活的方式吗?
Brendan Foody: 嗯,让我想想。我经常用它来写文档,这你可以想象到。我也会跟它聊,获取关于问题的建议。我发现它作为一个思维伙伴来帮你梳理思路非常有用,因为……怎么说呢,我觉得有时候我在把事情说出来的过程中会思考得更清楚,但我不能把所有事情都拿去跟同事或身边的人讨论。
Lenny Rachitsky: 所以主要是用 ChatGPT 语音模式之类的?
Brendan Foody: 对,我很喜欢 ChatGPT 语音模式。当然还有改进空间——
Lenny Rachitsky: 我也是。
Brendan Foody: ——但我对语音技术的未来非常期待。
Lenny Rachitsky: 让我给你看个我做的玩意儿。我本来没打算聊这个,但有个人叫 Eric Antonow,很多人推荐他上这个播客。他是一个低调的创意产品人,在 Facebook 待了很长时间。他做了个叫 Pirate GPT 的项目,基本上就是把 ChatGPT 塞进一个毛绒玩具里,然后跟它对话。所以我做了一个小猫头鹰。我现在没打开。
Brendan Foody: 哇。
Lenny Rachitsky: 基本上就是在这里缝进去一个小扬声器,下面放一块小磁铁,然后可以把它放在肩膀上,直接跟它对话。
Brendan Foody: 太可爱了。哇,我好喜欢。我得弄一个。因为我公寓里有一些语音助手,但我真的很想要一个 ChatGPT 语音助手,所以我很期待——
Lenny Rachitsky: 我刚才也在想这个。就是,来吧,为什么我们不能有一个 ChatGPT 语音助手,就放在那儿随时听我们说话。手机上不行,因为手机会休眠,你得”嘿,什么?”
Brendan Foody: 没错,对。
Lenny Rachitsky: 对,这就是它想实现的东西。他发起了一个 Kickstarter,我们会放出链接,大家可以支持一下。
Brendan Foody: 好的。
Lenny Rachitsky: 很方便的。
Brendan,在我们进入非常精彩的快问快答环节之前,还有什么想分享的,或者想对听众说的吗?
Brendan Foody: 呼应之前说的主动性的话题——你就是可以去做事——我鼓励每个人,尤其是在 AI 的加持下,构建东西变得如此容易,去主动出击,走出去构建产品,跟客户交流,迈出那一步。因为我认为在很多时候,创新的最大障碍就是缺少这一步,我们在任何能支持它的方向上都应该努力。
Lenny Rachitsky: 对。有太多人——不是要吐槽播客啊——就是听播客、看帖子,一直在读一直在听,但不对那些信息采取任何行动。而现在是有史以来最容易实际动手构建和尝试的时候了。
Brendan Foody: 完全同意。
Lenny Rachitsky: 所以一定要采纳这个建议。你就是可以去做事。你可以把你的甜甜圈摊挪五十英尺,跳出他们的管辖范围。
Brendan Foody: 哈哈,对。
快问快答
Lenny Rachitsky: 好的 Brendan,接下来是我们非常精彩的快问快答环节。我准备了五个问题。准备好了吗?
Brendan Foody: 准备好了。
Lenny Rachitsky: 有两三本你会经常推荐给别人的书吗?
Brendan Foody: 让我想想。按顺序的话,第一本是《High Output Management》,这是一本关于运营公司的非常好的书。第二本是《Zero to One》,当然是经典了。第三本是《Shoe Dog》,我觉得它是一个非常鼓舞人心的故事。
Lenny Rachitsky: 最近有什么你很喜欢的电影或电视剧吗?
Brendan Foody: 我很喜欢《奥本海默》。我最喜欢的电视剧是《Suits》,我知道不算最近的,但如果非要选一个近期的话,大概是《奥本海默》。
Lenny Rachitsky: 很酷。《Suits》,第一次有人提到这个。最近发现并很喜欢的产品呢?
Brendan Foody: 我很喜欢用 Codex,就是新版本。我知道它在版本意义上还算新的。是的,我觉得它非常棒,是一个巨大的提升。
Lenny Rachitsky: 你有没有一个人生态度或座右铭,是你会反复回到的,会跟别人分享的,在工作或生活中觉得有用的?
Brendan Foody: 我觉得就是”你就是可以去做事”,我们之前聊过的那个。迈出那一步。
Lenny Rachitsky: 我还以为你会说”can do”,就是你 Twitter 个人简介里写的那个。
Brendan Foody: “can do”也算,对。
Lenny Rachitsky: 两个都很好。最后一个问题。我们在这之前聊天的时候,你提到了一些可以聊的话题,其中你分享了一件很有意思的、你之前没有在任何地方公开说过的事——就是你有阅读障碍。要不你跟听众聊聊这个?你是如何在患有阅读障碍的情况下,创建了历史上增长最快的公司的?
Brendan Foody: 我完全不隐瞒这件事。我的很多同事都知道。一方面,它确实让每天处理上千封邮件、读完每一份该读的文件变得很困难。但另一方面,我觉得它帮助我用不太一样的方式思考,更有创造力,也许能看到其他人看不到的市场变化。所以到目前为止结果还不错。从管理的角度来看,它帮我认识到一件事:我们更应该专注于如何发挥每个人的优势,而不是去弥补弱点。因为有些事情我确实不擅长,也永远不可能是世界上最厉害的;但有些事情我可以不断打磨,努力成为最好的。
Lenny Rachitsky: 这其实也是这个播客反复出现的一个主题——专注于优势,而不是把所有精力都放在弥补弱点上。
Lenny Rachitsky: Brendan,这次访谈太精彩了,我学到了很多。我还有无数问题想问,但你还有很多事要忙。最后两个问题。大家应该了解你们在做什么、你们在招哪些岗位?另外,听众怎样能帮到你?
Brendan Foody: 当然。我们团队正在全方位大量招聘。运营团队在招战略项目负责人,工程团队在招软件工程师,同时也在招研究人员。请访问 mercor.com,我们非常希望能与你合作,这也是你能帮到我们的最大方式。也请分享给你的朋友们。我们平台上超过一半的人来自推荐,因为我们的用户群体非常认可我们。任何你想申请的职位,或者推荐朋友来申请的,我们都非常欢迎。
Lenny Rachitsky: Brendan,非常感谢你来参加节目。
Brendan Foody: 谢谢你的邀请。
Lenny Rachitsky: 大家再见。
感谢大家的收听。如果你觉得这期节目有价值,可以在 Apple Podcasts、Spotify 或你喜欢的播客应用上订阅本节目。也请考虑给我们评分或留下评论,这真的能帮助更多听众发现这个播客。你可以在 lennyspodcast.com 找到所有往期节目或了解更多关于本节目的信息。下期再见。
术语表
| 原文 | 中文 |
|---|---|
| Andrej Karpathy | Andrej Karpathy(保留原文,人名,AI 研究者) |
| aural environment machine | 奖励环境机器(原文 aural 疑为 oral/aural,此处根据上下文译为奖励环境机器,指强化学习中的 reward environment) |
| Benchmark | Benchmark(保留原文,风险投资机构名) |
| bootstrapped | 自力更生/白手起家 |
| Brendan Foody | Brendan Foody(保留原文,首次出现) |
| can-do attitude | ”能做到”的态度 |
| Chuck’s Donuts | Chuck’s Donuts(湾区甜甜圈品牌,保留原文) |
| Codex | Codex(保留原文,OpenAI 的代码生成产品) |
| crowdsourcing | 众包 |
| Daniel | Daniel(保留原文,人名) |
| Donut Dynasty | Donut Dynasty(甜甜圈王朝,Brendan Foody 初中时期创办的甜甜圈销售项目) |
| dyslexic | 阅读障碍(dyslexia,阅读困难症) |
| Elad | Elad(保留原文,人名) |
| elastic demand | 需求弹性大 |
| Eric Antonow | Eric Antonow(保留原文,人名) |
| evals | 评估(全称 evaluations,指 AI 能力评估/基准测试) |
| General Catalyst | General Catalyst(保留原文,风险投资机构名) |
| Gnome | Gnome(保留原文,人名/合作伙伴名) |
| GPQA | GPQA(Graduate-Level Google-Proof Q&A,研究生级别问答基准测试) |
| Handshake | Handshake(保留原文,公司名) |
| High Output Management | 《High Output Management》(保留原文,Andy Grove 所著管理类书籍) |
| human data | 人类数据 |
| Humanity’s Last Exam | Humanity’s Last Exam(人类终极考试,AI 能力评估基准) |
| incumbents | 现有企业/在位者(文中保留原文写法) |
| job displacement | 岗位替代 |
| leading indicators | 领先指标 |
| Lenny Rachitsky | Lenny Rachitsky(保留原文,人名) |
| Magnificent 7 | 科技七巨头(指美股七大科技公司) |
| Marc Andreessen | Marc Andreessen(保留原文,人名,硅谷知名投资人/创业者) |
| Mercor | Mercor(保留原文,公司名) |
| net retention | 净留存率 |
| No Priors | No Priors(保留原文,播客名) |
| P-Doom | P-Doom(Probability of Doom,AI 导致人类灭绝的概率) |
| Pirate GPT | Pirate GPT(保留原文,将 ChatGPT 集成到毛绒玩具中的项目名) |
| post-training data | 训练后数据 |
| PRD | PRD(Product Requirement Document,产品需求文档) |
| pre-training | 预训练 |
| red line | 批注(法律文件中的红线批注/修订标记) |
| reinforcement learning from AI feedback | 基于 AI 反馈的强化学习 |
| revenue run rate | 收入运转率 |
| reward hacking | reward hacking(奖励投机/作弊行为,强化学习中的术语) |
| RLHF | RLHF(Reinforcement Learning from Human Feedback,基于人类反馈的强化学习) |
| rubric | 评分标准 |
| Safeway | Safeway(美国连锁超市品牌,保留原文) |
| sales collateral | 销售物料 |
| Sarah | Sarah(保留原文,人名) |
| Scale | Scale(保留原文,公司名) |
| scaling laws | scaling laws(扩展定律,指模型性能随规模增长的规律) |
| seed round | 种子轮 |
| Series A | A 轮(融资轮次) |
| Shoe Dog | 《Shoe Dog》(保留原文,Phil Knight 所著自传) |
| Sid | Sid(保留原文,人名) |
| sourcing | 人才寻源 |
| Suits | 《Suits》(保留原文,美剧名) |
| Sundeep Jain | Sundeep Jain(保留原文,人名) |
| supervised fine-tuning | 监督微调 |
| Surge | Surge(保留原文,公司名) |
| SWE-bench | SWE-bench(软件工程能力基准测试) |
| talent density | 人才密度 |
| unicorn | 独角兽(指估值超过 10 亿美元的初创公司) |
| unit test | 单元测试 |
| upskill | 提升技能 |
| verifier | 验证器 |
| vetting | 审查 |
| Zero to One | 《Zero to One》(保留原文,Peter Thiel 所著创业类书籍) |
此文档由 AI 分片翻译(translate_long_document)