来自 Uber、Airbnb、Bumble 等平台的平台市场（Marketplace）经验 | Ramesh Johari（斯坦福大学教授）

Ramesh Johari 2023-11-09

Marketplace lessons from Uber, Airbnb, Bumble, and more | Ramesh Johari (Stanford professor)

Ramesh Johari: Marketplaces are a little bit like a game of whac-a-mole. One example that I came across with one of the companies I worked with that I love is our new supply side was having a pretty bad experience.

So what we decided to do is build some custom bespoke features that were really going to direct them to more experienced folks on the other side of the market. Good. And then, yeah, lo and behold, pretty soon those metrics start to look better. But then we’re looking at it, we’re like, “Wait a second. Now the existing folks on the other side are having a worse experience,” so you kind of whiplash around. You’re like, “Oh, wait a second. We better do something about that.” So we take them, we try to match them up with the more experienced folks, and now suddenly a month after that, you’re like, “Wait a second,” and your metrics just keep moving around. And that’s because the whac-a-mole game here is ultimately, a lot of marketplace management is moving attention and inventory around. Many of the changes that are most consequential create winners and losers. And rolling with those changes is about recognizing whether the winners you’ve created are more important to your business than the losers you’ve created in the process.

Meeting Ramesh Johari

Lenny: Today my guest is Ramesh Johari. Ramesh is a professor at Stanford University, where he does research on and teaches data science methods and practices with a specific focus on the design and operation of online marketplaces. He’s advised and worked with some of the biggest marketplaces in the world, including Airbnb, Uber, Stripe, Bumble, Stitch Fix, Upwork, and many others. And in our conversation, we get super nerdy on how to build a thriving marketplace, including where to focus your resources to fuel the marketplace flywheel growth, why data and data science is so central to building a successful marketplace, how to design a better review system. Why as a founder, you shouldn’t think of yourself as a marketplace founder, but instead simply as a founder. Also, how AI is going to impact data science and marketplaces, and experimentation, and so much more. If you’re building a marketplace business, or thinking about building a marketplace, or just curious, this episode is for you. With that, I bring you Ramesh Johari after a short word from our sponsors.

Forward-thinking companies like Figma, Amplitude, Loom, Riot Games, Linear, and more use Sanity to build content growth engines that scale drive innovation and accelerate customer acquisition. With Sanity, your team can dream bigger and move faster. As the most powerful headless CMS on the market, you can tailor editorial workflows to match your business, reuse content seamlessly across any page or channel, and bring your ideas to market without developer friction. Sanity makes life better for your whole team. It’s fast for developers to build with, intuitive for content managers, and it integrates seamlessly with the rest of your tech stack. Get started with Sanity’s generous free plan. And as a Lenny’s Podcast listener, you can get a boosted plan with double the monthly usage. Head over to sanity.io/lenny to get started for free. That’s sanity.io/lenny.

Ramesh, thank you so much for being here. Welcome to the podcast.

Ramesh Johari: Thanks so much for having me, Lenny. It’s great to be on.

What Is a Marketplace Business

Lenny: It’s great to have you on. A big thank you to Riley Newman for connecting us. Riley was the first data scientist at Airbnb and head of data science at Airbnb, and that role is actually a really good microcosm of what we’re going to be focusing on in our chat today. We’re going to get super nerdy about marketplaces, and experimentation, and data. I know that’s your jam. Are you ready to dive in?

Friction and Transaction Costs

Ramesh Johari: I really am. Yeah. And I actually want to thank Riley too. I got to know Riley when I was at oDesk first as a research scientist, and then I directed their data science team. This is way back in 2012, and I was looking around for people who are experts on how we think about data and marketplaces, and Riley Newman came up and so I invited him to come talk to us at oDesk, and we’ve stayed in touch since then.

Those were early days of where this industry was, and I’ve had a kind of lengthy career now thinking about those kinds of problems. So I’m pretty excited to talk about it with you.

Lenny: Let’s start broad and set a little foundation. You have a really interesting way to describe what a marketplace business even is. So Ramesh, what is a marketplace business, and also why is data so important and such an integral part of building a successful marketplace business?

Role of Data Science in Marketplaces

Ramesh Johari: It’s interesting when people sit down and think about, say Airbnb, what does Airbnb sell? Average person is like, “That’s pretty obvious. Airbnb sells rooms. I go there to book a room I want to stay at,” right? Other people say, “What does Uber sell? Uber sells me rides. I’d use Uber when I need to get a ride from somewhere to somewhere else.”

And in some sense, you’re not wrong. I mean, you go there. That’s a platform to get these things. But that’s not what the platform is selling. That’s a really important distinction. There are people on the platform that are selling that to you. The hosts on Airbnb are selling you listings. The drivers on Uber are selling you rides. But Uber and Airbnb are selling you the taking away of something, which is a weird thing to think about. What they’re taking away is the friction of finding a place to stay. They’re taking away the friction of finding a driver.

In economics, we call those things transaction costs. When you take econ 1, you learn about markets and how supply meets demand, and we get prices out of that. But what you don’t learn until econ 201 is that markets don’t always work. And one of the reasons markets don’t always work is because we have what are called market failures due to the presence of these kinds of friction. So what’s a market failure? It’s that Lenny wants to get from Palo Alto to Burlingame and he can’t do it. Why can’t he do it? He doesn’t have anyone to drive him. Well, why doesn’t he just call someone to drive him? Well, who’s he supposed to call? Who are those people? Are they out there? Are they willing to drive him right now, right at 10:00 AM on a Friday? Are they willing to take him somewhere?

When I want to stay somewhere when I’m traveling, a friction is, who’s willing to give me their room? I mean on principle, there’s people who are willing to let me stay in their living room, but I don’t know who they are.

So those are frictions, and what the marketplaces are selling you is taking the friction away. That’s what you’re paying them for. And it’s an important observation, because what that means is the marketplace’s customers aren’t just the people buying the rides, they’re buying the listings. Actually, the hosts are Airbnb’s customers, and the drivers are also Uber’s customers. So both sides of the marketplace are the customers of the platform. Both sides depend on the platform to help the platform take that friction away. Because just like you want a place to stay or you want to ride, the driver is at Uber because he wants to earn money by taking people places. And the host is on Airbnb because they want to earn money by selling their listing.

I think this concept that we’re making money by taking transaction costs away is such a fundamental idea that’s misunderstood around marketplaces. That when you’re an entrepreneur starting a marketplace or thinking about your business model, I think you can be wildly off if you forget that that’s the thing that’s fundamentally your value proposition. And then you asked about the role of data, and more broadly data science in marketplaces.

So it’s an interesting thing, right? The example I always love to give are the ancient Agoras in Greece or Trajan’s Market in Rome. When you look at pictures of these things, what really stands out to me is the rock. I mean, these things are made of stone. It’s not like you were going to move a booth from one place to another place without moving a lot of rock from one place to another place.

So you flash forward to 2023, and here we are with technology undergirding pretty much every kind of commerce now. And it means we can architect and re-architect the marketplace kind of on the fly, and we really are doing it all the time.

And these frictions that are getting taken away, they’re getting taken away because of data and data science. So I really want to highlight three pieces of this for people, which I want you to think of them as a cycle. But to start with, let’s just lay them out one at a time.

One of them is finding people to match with. So that’s the problem of, “I want to stay somewhere. Who is out there, who’s willing to let me stay with them on a given timeframe?” And then if I’m a host, I have a listing. Who is out there, who’s willing to stay at my place when I have it available? So that’s finding matches.

Then there’s making the match. And so here, going back to my time at oDesk, a big problem that we dealt with there was if I’ve got multiple applicants to my job, who should I hire? Who should I interview? It’s a common problem we face in the real world, but now it’s all remote. I don’t meet these people in person. All I’ve got is this application they submitted to me. I need help triaging that. Okay? So that’s helping make a match out of possible partners you can match with.

And then finally, we make matches. Well, what do the matches tell us, right? I mean, if you stay somewhere in Airbnb, you learn something about the host, you learn something about the listing. The host learns about you too. And that’s all information that the marketplace should feed back in. So this is where we get to rating systems and feedback systems, even passive data collection, right? Did you leave your booking before you were supposed to leave? Well, maybe that’s a sign that something didn’t quite work out the way you wanted to work out. So that’s passive data collection. Did you leave five stars? That’s active data collection.

Get all this back in, and what does that do? Well, that lets us do a better job finding potential matches and make potential matches in the future. Every single thing I just said, finding potential matches, making matches, and then learning about those matches, and then cycling back again, that is the data science in marketplaces.

And I feel like every marketplace that you could think of in any vertical has those three problems to deal with and relies on algorithms in data science to help them solve it. And in turn, that is I think really the underpinning of taking those frictions away.

Common Marketplace Failure Modes

Lenny: Many founders try to start a marketplace business, think about marketplace opportunities where they don’t exist. And there’s often these recurring failures of types of marketplaces that just don’t work in an area. I was just writing a couple ideas down while you were chatting like cleaners, getting cleaners as a marketplace doesn’t seem to work ever. Car wash, there’s a classic failure too. Getting tasks done for you on demand as a marketplace seems to not often work.

So this might be too big of a question, but I’m just curious if anything comes up of when someone is starting a marketplace or thinking about starting a marketplace business, what do you find are the most common flaws in, this is probably not going to work as a marketplace?

Every Founder Is a Marketplace Founder

Ramesh Johari: That is such a fantastic question, and I want to preface what I say with a couple of comments. So one of them is that I’ve worked with a lot of different marketplace companies, but anything I say is pertaining to something more sensitive. I may not name the company over the course of the podcast.

But the other more important thing I want to say is that I’m a professor at Stanford, and there’s a reason I’m not a successful scaled entrepreneur of marketplaces, and that’s because I probably haven’t unlocked the key to exactly the question you asked. But nevertheless, I have some thoughts on it.

The most important one is this. What I’ve found talking to people who want to start what they think is a marketplace is that they think too much about a marketplace before they’re a marketplace. That in my view is the biggest failure mode.

You mentioned specific things, cleaners. I wonder about that, right? Is it something about the cleaning industry? It possibly is. I don’t claim to be an expert on the microeconomics of the cleaning industry. But often it’s not that, it’s that I thought I was building a marketplace from the beginning, and that’s not the way the world works. So I’ll give you one vignette of this that I really like, and that’s UrbanSitter.

So first, UrbanSitter is a babysitting marketplace. We can talk about their whole life story, but I think what’s most interesting is really the early days. And in the early days, what I found interesting, the way I found out about them actually is that we were stuck looking for some help. And I found out about this new platform where the clever thing was when you used to hire a babysitter, it’s like pre Venmo days, you needed cash on hand. Because when the babysitter’s done at the end of the day, they’re usually high school suits or something. They want to get paid. They’re not going to take your IOU, that you’ll send them some check in the mail the next day.

And unfortunately, you often don’t have cash. They don’t take credit card. They’re high school students. That was an incredible friction to address, which is literally just we accept credit card payments for babysitting. That’s it, right?

Now from there, what happened is they took advantage of Facebook networks between parents and babysitters to build trusted introductions. So let’s say my sitter wasn’t available. I get to know sitters in the Facebook network of that sitter. And once they overcame that first thing to get some liquidity onto their platform, they could move towards asking, how do I solve for these frictions that I talked about earlier? How do I solve for helping people find potential matches? How do I solve for people making those matches, right? You can’t do that when you don’t have liquidity on your platform. It’s silly to tell someone, “Hey, I’m really going to help you find all those drivers out there, even though I only have three drivers on my platform.” That’s not a friction you’re solving for.

So in their example, as they evolved, they actually shifted their monetization model away from billing specifically for this friction of allowing you to pay with credit cards, instead to now billing for how you were interviewing and contacting sitters. They had a two-part plan for that. One with a pay as you go menu, one with a more of a subscription option. But the key thing was either way, what you were paying for now was finding potential babysitters, not paying them with a credit card. That wasn’t the key thing anymore.

So what’s the moral there? The moral is a marketplace business never starts as a marketplace business, because what we think of as a marketplace business is something which at scale is removing the friction of the two sides finding each other. But when you start, you don’t have that scale.

So when you start, you had better be thinking, “What’s my value proposition in a world in which I don’t have that scaled liquidity on both sides?” And that’s bespoke. It means different things. And in the case of oDesk, where I started, that initial thing was that remote work is a weird thing, because basically you’ve somehow got to know that this person who you’re not next to is doing what you’re asking them to do. And so the initial value proposition of oDesk was to provide tools for workers to verify they were working the hours and doing the things that they said they were doing, screenshots and various kinds of tracking.

And then in return for that, to be able to provide guarantees on both sides. So now the workers could say, “Hey, I worked what I said so I should get paid.” And the employers could say, “I actually see that you worked what you said. And so I feel comfortable that I got what I paid for.” That was the initial value proposition, is resolving a trust issue at a remote scale.

At that point, liquidity isn’t the game. It’s asking, what’s a problem that people in this space are facing that I can deal with when I’m not a scaled marketplace? So again, with the cleaning industry, I can comment on that from personal experience, but otherwise, I think that’s the way I would think about it. It’s almost never about building a marketplace when you’re building a marketplace.

Lenny: That’s very similar to the advice I always give marketplace founders, is 90% of your problems are going to be non marketplace specific problems. They’re going to be the same problems any startup is going to have. How do I grow? It’s going to be the same things you need to do.

The oDesk Pricing Dilemma

Ramesh Johari: So one thing you said was, “That’s what you tell marketplace founders.” I mean, something I’ve actually pressed hard on in my own way of thinking about this is that maybe we shouldn’t talk about the concept of a marketplace founder. Really there’s founders. And I think every entrepreneur… I mean one way to think about it, right? It’s very hard to think about a human business endeavor that has not been disrupted by the potential for transactions to take place online.

And if that’s the case, it means literally any founder is a marketplace founder. It’ll be a choice they make after they grow as to whether they want to build a platform. I mean, to take a very hot recent example, no one in their right mind would’ve thought of OpenAI as a marketplace, but OpenAI is a marketplace now. They may not want to call themselves a marketplace, but they have plugins. The plugins are flooding that platform. People have played with it. It’s not an easy thing to find the plugin you need for what you want to do. And that really is a two-sided thing now. There’s the plugin creators and there’s the users. And they may believe it, they may not believe it, but they are a marketplace.

So I think a different way to think about it is every founder is a marketplace founder. It’s going to be a choice they want to make for themselves of whether they want to become that platform. That’s I think one. And two is because that’s the case, I think one of the other challenges I find founders struggle with is you don’t want to overcommit your future. And what I mean by that is that you’re building up trust, and you’re building up a sense of what kind of business you are in your early days. If you believe that this kind of platform future awaits you, or market platform future awaits you, there may be choices you’re making early on that are tying your hands later.

A great example of this is when oDesk started, it was because the tools they were providing were for ongoing monitoring of work. It’s a very natural thing to say, “We will just take a constant fraction of the dollars that cross the platform.” That all works well and good until after you become mature. Some of these relationships between worker and employer last a long time, and most of the value was generated now not so much because they’re able to track each other, because the trust is now there, but because they found each other, because they’re able to build that relationship through oDesk.

That meant that the longer that goes on, the less value the platform is adding into that relationship, but you’re still pulling 10% of all the dollars. So what does that lead to? A word that most marketplace CEOs know well is disintermediation, which is where you were intermediating between the two parties, and now disintermediation means that essentially they’re like, “Hey, we don’t need you anymore.”

My favorite example is we had some stuff delivered from IKEA by a Thumbtack worker once, and my wife is like, “Oh, thanks a lot. You’re so reliable.” He’s like, “Hey, great. Here’s my business card. Ever need me again? Just call the number on the back.” And that was it. Thumbtack got their one lead gen, and then we didn’t need the platform anymore.

And I think this issue for oDesk meant that after they merged with Elance and became Upwork, they had to think a little bit about, “Okay, what’s the monetization strategy we want to use? How do we address this issue that longer term relationships may disintermediate? And does that mean you need a pricing plan that actually takes that into account?” So early commitments in this case to a particular pricing scheme, particular monetization, can really tie your hands as you then realize later you actually are a platform.

Lenny: I really like this message. It makes me think about Substack actually, which started as just a platform for newsletter writers. And then they’re like, “How do we make this more valuable?” Because they take a cut of everyone’s revenue. And they’ve actually invested heavily on helping drive demand to writers, for example, me. And at this point, over 80% of my subscribers come from Substack’s network. And so they’ve built this marketplace element exactly as you’re describing, where they just found, “Here’s a pain point, writers need more subscribers. How do we help them drive subscribers?” So they figured out all these ways to create demand.

Expanding Boundaries vs Breaking Contracts

Ramesh Johari: That’s a really positive story, where they managed to actually expand the frontier of their business by enabling that network. For every one of those, there’s unfortunately a lot of negative stories. I mean, one that I think is very painful is how eBay had a lot of challenges with its seller community as it introduced more and more fine-grained sources of fees.

And I think a lot of that, I mean there’s many, many treatises at this point written on eBay, and their history, and how they got to the point that they’re at. But I think one kind of simple thing I do want people to think about there is that the sellers on eBay who had matured with the platform, who had grown with it, had come to develop certain expectations about what their lives on that platform would look like. And it’s understandable, because a lot of these businesses, they had built their livelihood on that platform. That was their entire business.

So when you now reach in and you say, “I’m going to completely change the rules of the game in which your business model operates,” from the perspective of those sellers, that’s a breaking of a social contract that’s been developed over a very long time. So I love the Substack example, because that’s like, “Hey, let me amplify our social contract.” But I think for every one of those, there’s an eBay warning sign that you can trap yourself a little bit.

Advice for Marketplace Founders

Lenny: Just to close a loop on this really, I think important point, a lot of people listening to this are probably, “I’m a marketplace founder. I’m building a marketplace,” are going to hear this and be like, “Oh shit, maybe I need to rethink how I think about what I’m doing.” What would be your piece of advice to people like that? Is it focus on the friction point and it may be a marketplace solution, it may be a managed marketplace, it may be you own the supply? Is that the advice, or what would your advice be to someone that’s like, “I’m building a marketplace”? How should they reframe their thinking?

Driving One Side With Another

Ramesh Johari: Let’s go back to kind of thinking about this concept of a marketplace of reducing friction. So the litmus test I like to give to someone who claims to me that they’re building a marketplace business or they’re a marketplace founder is do you have what I would call scaled liquidity on both sides of your platform? What does scaled liquidity mean?

What it means in lay terms… And by the way, I am a data scientist, and I love to think about these quantitatively. But fundamentally, if it doesn’t pass the smell test, then you don’t have to keep going with the data science. The smell test is scaled liquidity asks, “Do I have a lot of buyers and a lot of sellers on my platform, or do I only have one of these two, or do I have neither?” If you don’t have both, you could call yourself whatever you want to call yourself, but at this moment in time, you’re not a marketplace. If you have one, congratulations. You’ve won the game on one side of the market. And now you if you want, you have a choice point. You can lean into growth on the side that you’re doing well with. You got a ton of users, ton of buyers? Great. Lean into it, get more buyers. That’s one option. There’s no shame in not being a marketplace. Scaling a business is scaling a business. If that’s the way to do it, do it.

If you decide you want to be a marketplace, then at that moment when you’ve got a lot of buyers, but not a lot of sellers, or a lot of sellers, but not a lot of buyers, the choice you’re facing is, how do I take advantage of having that one side scaled to attract the other side? We can talk more about that, but there’s a lot of ways to kind of hack that, to think about how… So to take Uber as an example, they would walk into a new city, and one thing that Uber was commonly known for doing this was back in the days when really Uber Black was the only service is they just hand out coupons for free rides at events, parties, things like that, to take people home. And that was a way of saying, “Hey, we’re subsidizing the drivers in the city. That’s our scaled side. Now we’re going to use that subsidized driver base to attract riders.”

So that’s like, how do you get that flywheel going? And again, many people have written about how to take liquidity, scaled liquidity on one side, and use it to attract the other side.

If you don’t have either side, don’t worry about it. Don’t worry about being a marketplace. Worry about scaling one side. And in that world, it opens your visibility up completely into the advice of many, many startup advisors. People who have advice not so much about scaling a marketplace, but about scaling a startup.

And I want to say you got to let the ego go at that point. It’s fine to articulate to people that your vision of the future is to be a platform or marketplace. As I said, virtually every business is going to have that option at some point in the modern tech enabled economy anyway. So you’re not saying something people don’t already know when you tell an advisor or an investor that. But I do think you need to be humble enough at the starting point to recognize that there’s no sense in talking about a marketplace if you don’t have scaling on either side yet.

Lenny: And then it becomes a question of a business model, unit economics of, can I build say a DoorDash, not as a marketplace? Can I just hire a bunch of people delivering? Is this even possible in a different route?

Markets vs Firms

Ramesh Johari: Yeah, that’s a great point. One of the things I think that’s useful for people to think about here that you’re raising, at some level, it’s kind of tied up I think with that question of whether I should have employees, or contract or freelance work on one side of the marketplace.

And that’s actually a pretty old question in economics. The way we talk about it often is a distinction between a market or a firm. And one of the interesting puzzles in economics, Ronald Coase is a famous economist who thought about this is, “Well, if markets are so efficient, why do we need firms? If markets are efficient at matching labor up with things that need to get done, why would you ever need a firm?” And that’s one of the earliest recognitions that transaction costs are a real thing. And that’s one of the things that firms are solving for.

And I love what you’re saying because what it’s recognizing is, “Hey, for your frictions, the best resolution to that might not be to have a marketplace. It might actually be to have very tightly controlled labor.” A good example of this actually, Stitch Fix, I think one of the things that’s cool about Stitch Fix is the experience that people had early on with stylists at Stitch Fix.

Lenny: I’m a happy customer, by the way. I think [inaudible 00:27:16].

Marketplace Labor: Employees to Contractors

Ramesh Johari: Yeah, I think one of the great things about that experience is it felt magical to have someone who kind of got to know you. But that depends on a relationship that doesn’t feel like a freelance relationship every single time you’re going back.

Another example that I would pull out is pretty much any healthcare platform. So for example, for physical therapy, it’d be weird if every time you went to a physical therapy platform, you just got randomly matched to whoever happened to be available then. So I think there’s some curation that needs to happen of that relationship. Does that mean full employee? Maybe not. But it does mean you have to think a little bit about exactly as you brought up, what’s the nature of curation of your labor pool?

The Power of Data in Marketplaces

Lenny: Awesome. Okay, so let’s come back to a point you made early on around the importance of data and the power of data in actually making your marketplace a lot more efficient and work more effectively. So say that you have a data scientist, or a data analyst, or someone that is helping you optimize your marketplace. Where do you often find the biggest leverage and opportunity for a data person to help you make your marketplace more effective?

Machine Learning: Prediction vs Decisions

Ramesh Johari: This is an incredible question, right? Because I think I could answer it a number of different ways. One question I think that’s kind of basic, it’s just what should this person be doing? And I’m going to actually evade that question a little bit. I’m going to give some examples of what they could do, but I feel like that’s one where context matters a lot.

So as an example, at ride-sharing or grocery delivery marketplaces, pricing means actually, what do you pay for that ride? Or what do you pay for that delivery? So that’s actually the price that’s set at the moment you actually place the order. Just to be clear, by the way, if you order from DoorDash, I don’t mean the price of the restaurant. I mean, what do you pay to DoorDash, right? What’s that fee? Is there a surcharge, because it’s surge or whatever? Okay, so that’s a thing, right?

But that’s not really a thing in a marketplace where the platform’s not setting the prices. So in Airbnb, really hosts are the ones who are in charge of setting prices for their listings.

One answer to your question is, if I’m in a place like Uber, Lyft, DoorDash, I want to have good data scientists thinking about pricing. Because that seems like something which should be heavily dependent on the instantaneous state of supply and demand in my marketplace. So that’s one type of answer is, well do I need data scientists working on pricing? Do I need data scientists working on search? Why search? Because maybe in my marketplace, finding the needle in the haystack is really the biggest, highest friction problem. So maybe I need a lot more data scientists saying about search.

That’s what I’m going to evade. Okay? I’m going to focus more on something completely different, which is just a more philosophical point about what a data scientist does.

So in a lot of companies today, especially, a main thing that you ask data scientists to do is build what’s called a machine learning model. Now, machine learning model even already can mean a lot of things to a lot of different people. I’m going to focus on something very concrete. You’re asking them to predict something.

When I started at oDesk, this is in 2012, one of the funny things about me is I started at oDesk because I’d had a academic career up to that point in about 10 years, just building mathematical models of things. I was not really very much of a data scientist up to that point. What I expected would happen is I’d go to industry and I’d be told, “Hey, look how important data is.” And definitely my eyes were opened.

And one of the first things I was asked to think about is, well, okay, someone comes to oDesk, post a job, workers apply to that job. Predict which of these workers is most likely to be hired on that job. That was the narrow question. And so why is that a good question? Because we have a whole awesome set of tools now to solve that kind of a problem exactly. How do we do it? Take a lot of past data of past jobs, past applicants, past hires that were made. Then we ask these crazy big black box algorithms, “All right, do the best job you can predicting who’s going to get hired on this job with these applicants.” And we use that data to test how well these algorithms are doing. That’s machine learning in 30 seconds basically. So we’re working on this problem. Great.

And then I kind of poked my head up a little bit. I go, “Why are we working? What is this going to do?” Well, it turns out the reason these kinds of things are important is they get used to make decisions. So what kind of decision do you make with that? Well, one thing you do is you say, “Well, if I could predict who’s most likely to be hired, then I should just rank people based on that, and that would be a good matching algorithm. That’d be a good way to sort and triage applicants for employers when they’re screening, trying to figure out who to interview, who to hire.” Great. Sounds pretty natural.

And then you think about it a little bit, and this to me, it’s such a passion project to get people to understand that this is why the humans in the loop that help us in businesses and making sense of data are so critical, is the following problem.

If you think about it a little bit, you realize what that algorithm is doing, it’s really just picking up on patterns in past data. So yeah, that’s great. This person is likely to be hired. But what we really want is something different. We’re trying to add value by ranking people.

So to give another example that’s similar to this, when you’re a marketing manager, and you’ve got a cracked data science team that’s built a long-term value, lifetime value model for you, you’re not going to get in trouble with anyone if you send your highest value promotions to the highest LTV customers, right? Who’s going to blame you for that? Because you’re like, “This person is worth a lot, and I sent them this promotion.” Say that in your monthly report, nobody’s going to give you a hard time.

But the problem with that way of thinking is actually predicting what their lifetime value is isn’t really the question. The question is, how much more are they going to spend on my platform because I sent them that promotion?

That’s a very different thing. It’s a differential rather than an absolute. I’m not interested in their absolute LTV. I’m absolutely interested in the difference in their LTV because I sent them this promotion.

And when you look at it that way, what you realize can happen is picking up on patterns because of good predictions, right? Finding the people that have high LTV because you predicted that is very different than making good decisions, which is about saying the difference in their LTV is going to be higher because I sent them this promotion.

I love this example, because I taught a class here at Stanford. It was like an executive education class. We had all the executives from a company in the room, and one of the people in the room was the chief marketing officer. And I just asked this question like, “Hey, okay, let’s say you got this great LTV model, who would you send the promotions to?” It’s like, “Definitely the highest LTV people,” and there’s a CMO in the room. And so it’s a little bit of a delicate situation, like pushing back a little bit, right?

I do want to be clear, there’s reputational reasons you might do that anyway. I mean, I’m not trying to get away from that. But just to make the narrow point that predicting is about picking up patterns, but making decisions, it’s about thinking about these differences.

Now, why is that important? Because we learn in high school, correlation is not causation. That’s a phrase everybody has heard all over the place. What does that have to do with this? Well, when we teach people to build machine learning models, we’re asking them to make predictions, we’re asking them to find correlations. Prediction is inherently about correlation. But when we ask people to make decisions, we’re asking them to think about causation. “If I make this decision, then will I actually increase the net value of my business? Will I have by sending the promotion, increased the likelihood that this person is going to spend more on my platform?”

And so the first and most important thing that I feel very strongly about in what would I get a data scientist to do is no matter who they are, even if it was that person in the weeds thinking about building this prediction model for hiring, get them to be thinking in the back of their mind always that their goal is to help the business make decisions. And that the distinction between causation and correlation matters a lot. We can talk a lot more about how does that play out in terms of their day-to-day work. But at least at a starting point, you have to recognize that the first step is always recognition that prediction isn’t the same thing as making decisions.

Lenny: So the takeaway here is as a data team and as a data scientist on the team, is help the business make predictions. Are there a couple more examples you could share of just what is an example of a decision that you think they often should be making and using data to help them with?

Difference Between Prediction and Decisions

Ramesh Johari: Maybe the right frame of reference for this, and the word that an academic would use is causal inference. So what we’re changing from is machine learning to causal inference. So let’s think that through in a couple of different use cases that are related to that marketplace data science flywheel I talked about earlier. Finding matches, making matches, and then learning about matches.

So finding matches, like you said, a core part of that is search and recommendation, and each of those relies on rankings. So I want to be able to rank order. Let’s say I go do a search on Airbnb. On a rank order, the different listings in the marketplace, right? At some level, it’s true that what I’m trying to do there is I’m trying to just predict, what are you going to like the most?

But I think there’s an important piece of that also, which is that I want to think a little bit about the distinction between two different ranking algorithms. That’s the real decision that’s being made.

And when I think about the distinction between two different ranking algorithms, I don’t want to be only comparing them in terms of how well they recreate the choices people made in the past. The way I’m really going to evaluate those is in my market, does one of those lead to better matches or more matches than the other one, right?

So Airbnb as a business, what are the most obvious core metrics? It’s bookings and revenue. So you’re going to want to ask a very basic question. If I use the ranking algorithm Lenny just developed last night versus the ranking algorithm Ramesh developed last week, does Lenny’s ranking algorithm lead to more bookings than Ramesh’s ranking algorithm?

And it’s so important to put it that way starkly, because that’s so different a question than, does Lenny’s ranking algorithm do a better job of predicting over the last two years what bookings people made than Ramesh’s ranking algorithm? So that’s I think at that level.

Then we talked a little bit about ranking at the point of making a match, and I think that’s where this hiring issue popped up. Because in the end, while we might have these predictive algorithms to rank who you’re going to hire, that’s not the important question.

Interestingly, the important question is actually to evaluate the quality of the match that’s made. And we would do that through the next step of that flywheel. We’d ask ourselves, what ratings did they give back to that freelancer? Do they hire that freelancer again? So you’re comparing two different algorithms not through their ability to recreate the past, but their ability to make matches in the future that can be objectively evaluated to say, “Hey, I increased the value of the business. I actually made better matches this way.” And then rating systems, I think we could talk quite a bit about a similar phenomenon there too.

Lenny:

Experimentation is increasingly essential for driving growth and for understanding the performance of new features. And Eppo helps you increase experimentation velocity while unlocking rigorous deep analysis in a way that no other commercial tool does.

When I was at Airbnb, one of the things that I loved most was our experimentation platform, where I could set up experiments easily, troubleshoot issues, and analyze performance all on my own. Eppo does all that and more, with advanced statistical methods that can help you shave weeks off experiment time, and accessible UI for diving deeper into performance, and out of the box reporting that helps you avoid annoying prolonged analytic cycles.

Yeah, I would actually love to talk about rating systems, but there’s an implication in everything you’re describing of running an experiment versus looking at what would’ve happened in the previous world made. You’ve made change, run an experiment, see if it actually makes an impact on bookings and revenue. And that leads me to a question I wanted to ask, which is with experiments, there’s kind of this classic challenge, and always elephant in the room of if you just run a bunch of experiments, you’re kind of going to micro optimize, lead to these local maxima, and you may miss big opportunities and big unlocks if you’re just extremely experiment driven.

You spend a lot of time thinking about experimentation. What have you learned or what advice do you have for people to either be less worried about optimizing and missing something big, or just finding a balance with running experiments, but also creating opportunity to find a huge new opportunity?

Causal Inference and the Data Science Flywheel

Ramesh Johari: Yeah. First of all, I’m really glad you broached the E word. I was dancing around it, and I’m really glad that we talked about experiments. Because yeah, one of the big lessons of this recent conversation we’ve been having is just, how could you possibly know that difference without doing something like experimenting?

So yeah, I am a big believer in experiments. I mean, I’ll just lay those cards on the table. I love working with businesses that think experiments are important to helping make good decisions.

Now all that said, I am also someone who feels pretty strongly about this exact issue that you’re raising. It’s just, you can’t experiment your way out of everything.

And one frame I like to give people is that although you might say you’re an experiment driven business, some businesses will proclaim, “We literally test everything.” What that kind of leaves aside a little bit is there’s a lot of degrees of freedom in what it means to test everything.

Because ultimately, what’s getting built and tested are choices that are made through the organizational structure, the data scientists, the PMs, the engineers, everybody’s on… Before we’re running experiments, we’re actually thinking about even what’s worth experimenting, what designs are we coming with? So that’s one.

And the other big one is, how long do we run these experiments? Okay, that’s a big choice. And what I generally believe, and I think there’s a paper we can link to later that I’ll point your readers to as well that… Not my paper, from some folks at Microsoft.

What I generally believe is we’re risk averse on both these two dimensions, that what people decide to test in a world that has promoted experimentation for everything tends to be more incremental by design. Okay? And we’ll come back to that actually, answer the because in a second. So that’s one. And two is people tend to run experiments for a long time, and probably longer than they should.

Now, what do I mean by these two things? So what’s interesting to me about this dynamic is experiments don’t live in a vacuum. Companies have incentives. And in companies that really go all in on experimentation, one of the things that gets wrapped up in that is the incentives around experiments. Because if you go all in on experiments, a common thing you’ll see is data scientists get judged based on how many wins they had that quarter. How do you get more wins?

Well, it’s easier to get wins when you’re being incremental. And because it’s important to have wins, you have to run them long enough to demonstrate that they’re really wins. You’re less willing to cut something off in exchange for trying something riskier.

So the big lesson of this Microsoft paper, it’s called A/B Testing with what’s called Fat Tails, which in lay terms just means you’re running a business where there’s potentially big opportunities out there if you look at the effects of the experiments that you run. There’s a couple of lessons there about both trying a lot more stuff that’s not all risk averse, and not necessarily running everything for so long. So really getting velocity up.

So you could see that there’s a big incentive problem there, because the culture that says it’s okay to fail big actually requires changing the terminology of wins. This is one of the things I hate most in A/B testing, I have to say. I get where it comes from. Experimentation was never historically in science about winners and losers. It’d be weird if it Ronald Fisher who’s kind of the father of experimentation with his agriculture experiments talked about winners. I don’t think that’s necessarily how he talked about things. Experimentation is always very hypothesis driven. It’s about, what are you learning?

And that’s really an important distinction because what it means is if I go with something big, risky, and it, “fails,” meaning that doesn’t win. Nevertheless, if I was being rigorous about what hypotheses that’s testing about my business, I’m potentially learning a lot.

So a great example of this kind of thing is that there’s an important feature of marketplaces is badging. So sometimes, it’s really important to have badges on your top-rated profiles or whatever, when people are searching.

And without going too far into the details, a common finding with badges is that badges you think are going to be great actually turn out to be terrible. And one reason they’re terrible is they focus too much attention on the badged folks, and pull too much attention away from the unbadged folks.

And if we judge that only in terms of winners and losers, you throw the baby out with the bath water, you’re like, “Well that badging idea was terrible. So ditch that, no badges.”

But that’s not what it’s telling you. It’s teaching you something about how inventory is being reallocated, how attention’s being redirected through the badges. And you really want to think not in terms of winning and losing, but learning.

So learning is a win. And I feel that that’s a cultural thing fundamentally. It’s very hard to somehow attach dollars and cents at the top to data scientists running experiments that fail, but learn. And ultimately, I think getting into that space where you experiment more, meaning you don’t run all your experiments for quite as long and you accept the willingness to try experiments that are into the tails where you might fail bigger is a cultural thing. It’s about saying that, “We’re allowing that to be part of our social contract with our data scientists,” or actually our employee contract with our data scientists, that not everything is just about how many launches you had and how many wins there were.

It’s okay to say, “That’s how I want to use experimentation,” but if you’re going to use it that way, then I would say don’t be a, “We experiment everything,” business. Because then I think you need some other way to deal with these big changes that teach the whole company a lot, but maybe can’t fall into the incentives you’ve created for your data scientists.

Lenny: This badging example is, I don’t know if you’re referring to the Airbnb example, but I actually led the launch of Superhost at Airbnb, which is the ultimate badge on Airbnb. And there was a lot of concern from the data team that it would destroy the marketplace, because they’ve built, as you’ve described, this very well-crafted ranking algorithm, with just a prediction of exactly as you described, which listings that guest is most likely to book and be successful booking. And then we’re about to throw a badge on random listings in the results. And so this one data scientist on our team’s like, “No, we can’t do this. This is insane. We’re going to destroy it all.”

And we still went ahead with it. We ran an experiment showing the badge to some people and some not, actually, it was no impact at all. Which Superhost itself had no impact at all on the business as far as we could tell initially, which is also bittersweet because it felt like, “Why did we even work on this thing?” There was a slight benefit where a host felt better, they felt more satisfied with being a host, but I went exactly through what you described, so that’s pretty funny.

Evaluating Match Quality

Ramesh Johari: Without necessarily going into the weeds on the data science of Superhost, I think there’s a lot wrapped up in what you said. I guess another thing I’ll say is that I’m a big believer that you don’t throw your understanding of the business out the window when you process experiment results. And it’s partly, I guess what I mean by this is data science is really about accumulation of evidence. It’s never about one finding in isolation. And so another kind of trap I think is to sometimes say, “Well, I hit stat sig on my A/B test, green light. It’s all go.”

And I think you had Ronny Kohavi on your show, and he made a similar point that there are different levels of evidence, and just having an outlier A/B test that goes against everything you believe about your business doesn’t mean that you somehow have controverted all your knowledge. And I think that’s one side of it.

The other thing is you can’t always measure everything that’s important that’s needed to really develop a full sense. So with Superhost, one of the things that’s hard to measure is the long-term impact of Superhost. Because in the short run, Superhost causes a rebalancing of inventory. There’s going to be winners and losers. Part of Superhost is actually about retaining hosts that get the badge over a longer period of time. Recognizing that hypothesis actually says something about maybe how long the experiment needs to be run or what kinds of data analyses need to be done.

And in the end, if you can’t do that, you can’t run it long enough, or you can’t do that data analysis due to sparsity of data or lack of data to address the question, it matters what you bring to the table. What are your beliefs about that?

So what I like to tell people to do there is I like to push people to be what’s called quantified rather than data-driven, which is, okay fine, some things we can’t measure. But maybe you’ve got a leadership team with different beliefs about what they think the retention value of Superhost is going to be, and they might be all over the place.

You can process your experiment results in the context of these competing beliefs. It’s almost like a prediction market kind of a thing. And start asking, “Well okay, if this is what we believe about our business, this is what the data is telling us out of the experiment, let’s put those two together and ask, is this enough for us to make the bet that we’re still going to go with it?” Even though maybe that short-term test you ran was flat.

Experiment Limits and Local Optima

Lenny: That’s actually exactly how I think of Superhost looking back. It was a great idea. I’m really happy. I can’t even imagine Airbnb without that, even though there’s no evidence, at least initially, that it made any impact. I’m guessing they looked at it again, and maybe there’s something that came out of it. But even if it had no impact, it just feels like it made the marketplace better. And that was a big learning for me. It doesn’t need to always drive a metric that you can measure. There’s just like, this is the way it should work.

Experiment Limits and Incentive Traps

Ramesh Johari: So one of the reasons the thing you said happens is because marketplaces are a little bit like a game of whac-a-mole, okay? And what I mean by that is, so narrowly in the context of Superhost, because you’re redirecting attention to some hosts at the expense of… It’s not even obvious if bookings can really go up. Maybe you get lucky and maybe you get a bunch more bookings. One reason you probably wouldn’t expect that in the first place is there’s only a limited number of Superhosts. How many more bookings are they going to be absorbing because of all this extra attention? And you’re taking attention away from other people. Without doing any data analysis, my prior would’ve been that booking should probably go down.

And one example that I came across with one of the companies I worked with that I love is we were working together over a period of time, and in a month, we looked at some of the data and it suggested that our new supply side was having a pretty bad experience. Say, “We got to do something about this.”

So what we decided to do is build some custom bespoke features that were really going to direct them to more experienced folks on the other side of the market. Good. And then lo and behold, pretty soon those metrics start to look better. But then we’re looking at it, we’re like, “Wait a second. Now the existing folks on the other side are having a worse experience.”

So you kind of whiplash around. You’re like, “Wait a second, we better do something about that.” So we take them, we try to match them up with the more experienced folks. And now suddenly a month after that you’re like, “Wait a second.” And your metrics just keep moving around.

And that’s because the whac-a-mole game here is ultimately, a lot of marketplace management is moving attention and inventory around. Sometimes you get lucky and you really expand the pie for everybody. But I think Servaes Tholen, who was CFO at Upwork that I got to know there and then went to Thumbtack later, he had this line when he came to visit our class that I love, which is, “You have to recognize when you run marketplaces that many of the changes that are most consequential create winners and losers. And rolling with those changes is about recognizing whether the winners you’ve created are more important to your business view than the losers you’ve created in the process.” And it’s a hard reality, because nobody likes to articulate the idea that a feature change is hurting some of the people in your marketplace. But because of this fundamental constraint baked into how marketplaces work, many of the things that we would choose to do and the reallocation they create can’t necessarily create observed high expanding wins in the short run. You’re often making bets that that’s where you’re headed, partly through the reallocation that you’re doing right now.

And so I think that’s interesting about Superhost to me is that partly points to thinking about, what’s the objective you would’ve defined, the metric you would’ve defined in the short run that captures this idea of a trade-off?

Lenny: That’s a great way to think about it. I wanted to come back to this idea you’re sharing of maybe you should run experiments more quickly, not wait for stat sig, have a culture of learning versus impact. In practice, it’s very difficult, because people are measured by impact. There’s performance reviews, there’s promotions, there’s how much impact did this team drive, are going to look at their experiment results? You’ve worked with a lot of marketplace companies, a lot of different companies. Is there anything you’ve seen about something you could do to help the company shift and actually work this way, while also recognizing success, and who’s doing great, who’s not, which team’s driving impact, who’s not?

Culture of Winning vs Learning

Ramesh Johari: Interestingly, it’s actually an active area of research for me now. What I mean by active area of research is I care a lot about the incentives that we create for data science through how we set up reward mechanisms. So there’s a couple things I think that could be helpful, that are maybe there may be a little bit less about… Maybe I’m not going to directly answer the question you ask, because I think that’s a hard one, right? I think I recognize that measurement on impact is critical. Well, let me answer that actually from the most obvious way first. I think there’s a cultural issue here that’s really critical.

One of the things I often find is that my PhD students, our PhD students here often go off and get great data scientist jobs. And in one sense, they’re doing amazing stuff. They apply really technically sophisticated methods. But when I look at the problems they’re working on, they’re often more at the margins of the business than they should be.

And it’s a cultural thing. It’s basically because if you’re measured narrowly on impact and that’s all anyone sees around you, then it’s very hard to engage with the creative aspect of business change and the strategic aspects of business change.

So the cultural aspect there is, I think it’s partly incumbent on the leaders to expect something more of their data scientists. And what I mean by expect more is that you expect them to do more than deliver narrowly defined, statistically rigorous results to you in their reports. You’re actually expecting them to talk also about what they’re learning about the business in the process. So where that’s headed is there’s this concept of being hypothesis driven, which is like the technical phrase. What does that mean? Again, in a more lay sense.

What it means is tests aren’t going to be defined only in terms of winners and losers, that each test should also say something about what will we learn about a business flow, a funnel, preferences of the guests, preferences of the hosts. What will we learn about their demand elasticity if we’re changing prices around? These kinds of things. So it’s possible to articulate in an experiment doc, a launch doc, what are the hypotheses that are being tested? So that’s one thing I would say is just culturally, setting the norms that learning is part of the discourse, and it’s expected actually I think is important.

But the other thing I would say that’s maybe a little bit more about programmatically, what could a data science platform team do? A funny thing about experiments is that we throw past learning away effectively. And this is just an artifact of how we analyze experiments, that the statistical methods used typically, P-values, confidence intervals, these fall into a branch of statistics known as frequentist statistics. And the idea behind frequentist statistics without being overly technical is just I let the data speak for itself. There’s no beliefs brought to the table about where that data came from.

But if you think about this in a company, in A/B testing a company, it’s a weird thing, right? Because I might’ve run 1,000 A/B tests in the past on this exact same button, or call to action, or color, and now I am going to completely ignore that and focus only on this.

So there’s ways to take the past into account, to build what’s called a prior belief before I run an experiment, and now take the data from the experiment, connect it with the prior, to come up with a conclusion of, “Okay, in light of the past plus this experiment, what’s it telling me about the future?” And that falls broadly under the category of what’s called Bayesian A/B testing.

So that’s one of the things I think can help culturally, weirdly. It’s a super technical thing, but I think it can help culturally, because what it’s doing is it’s now rewarding people for contributing information to that prior. And I think it then becomes possible to say, “Your experiment that failed actually moved our prior.” And that’s an important thing, because by doing so, you’re now altering how we’re going to think about this flow or this pricing plan in all future experiments.

So there’s an information positive externality, positive network effect that’s generated for the rest of your business if I can somehow encode what you learned into the analysis of future experiments. So this is one thing. There’s strong connection between the culture and incentives of A/B testing and the ability to actually incorporate past learning into these prior beliefs.

The Story of the Superhost Badge

Lenny: I love that you’re doing research in this area. We should bring you back when you’ve completed it and have the ultimate answer for everyone to change how they operate.

Data Scientist Roles and Expectations

Ramesh Johari: Yeah, one of the great things about professors is we never complete anything and never have ultimate answers.

Lenny: Oh boy.

Leveraging Past Data: Bayesian A/B Testing

Ramesh Johari: Yeah, I’ll do my best though.

Learning Is Not Free

Lenny: This touches on a really interesting concept that you shared with me around how, just learning isn’t free. People think that they could just learn a bunch of stuff and there’s not a cost to it. I’d love for you to just chat a bit about what that means.

Rating System Challenges and Score Inflation

Ramesh Johari: Let me start with an anecdote, that I just absolutely love this anecdote. I use it every year in class. So I was talking to a real estate platform, and they had a marketing data science manager who’s basically responsible, as many marketing managers are, for allocation of ad spend across different channels.

And what they discovered had happened at the end of the year is in one hand, the team had done great, but the manager had held out some subset of arriving visitors, not showing them any of the innovations they were making.

Lenny: Like a holdout group?

Fairness Issues in Average Ratings

Ramesh Johari: Yeah, exactly. What’s called a holdout group in experimentation. And one thing about this holdout is it wasn’t authorized. That’s not the way things are supposed to work. They’ve got their ad spend, allocate out your ad spend, great. So at the end of the year, they looked at the hole out and they’re like, “Wow, that cost us a couple million dollars, something in that range, and it’s not a trivial amount of money. What’s the deal? What were you thinking?” Basically. And of course the answer was, “Well, I get that I cost you that much, but number one, now you know what my team’s worth. And number two, you would never have had that answer unless I’d done that on my own.”

Now, why is that so powerful? I think what I find so interesting about experiments is that when you don’t know something, it seems not even a question that you would allocate some of your samples to all options, right? Treatment and control. I have two different ways of doing something. I don’t know which one’s better, so of course I’ll give some samples to each. After the fact you’re like, “Treatment was better. What the heck were we thinking? Why’d we give all those samples to control? That doesn’t make any sense now.” There’s this great Seinfeld clip where he mentions getting a bill at the end of a large luxurious meal, and people stare at the bill like, “We’re not hungry now. Why’d we order all this food?” So it’s the same thing. I mean, you know treatment’s better now. Why’d you waste all those samples on control?

And I think that is such a powerful observation that you have to put yourself in the frame of reference of when you didn’t have the answer. And at that moment, what you’re essentially saying to yourself is that it’s worth paying to learn the answer. I think it sounds obvious the way we’re saying it now, or this anecdote of the marketing manager and the holdout sounds obvious. What’s culturally not baked in I think is that idea. And the reason I say it’s not culturally baked in, by the way, is because of the language of winners and losers. Because if we use that language, we’re implicitly saying is that we wasted time when we ran an A/B test on loser. If I reward you for shipping winners, then what I’m really telling you is all the time that you spent testing out failures was wasted time.

And I think, of course, you don’t want to keep data scientists around who regularly are just generating failures. That’s not my point.

But my point is there’s a disconnect there. On one hand, we can all look at the story of this marketing manager and chuckle at it. And yet, every day we’re instantiating language and processes that are reinforcing that same theme, which is essentially trying to say to you, “If you’re wasting samples on things that don’t ultimately end up being a winner, then the act of doing so is a failure.”

So I really feel that that idea that you have to pay to learn, again, it’s a cultural thing, but it’s also an education issue for businesses are populated by people of all stripes. Not everybody comes from a data science or experimentation background. And this idea that learning is costly is not natural, actually. It’s not natural as a matter of human nature. It’s certainly not natural as a matter of running a business.

Lenny: I love that example of the real estate platform where it’s very viscerally, clearly cost. They lost because they didn’t roll out experiments to this group for a long time. Such a good example of this idea in action.

You mentioned star ratings. I know you spent a lot of time on designing rating systems. Sorry, I didn’t mean to imply star ratings. That’s just one implementation. Rating systems in general.

So maybe just to keep it focused, say a marketplace founder is trying to decide and design how they do ratings, and reviews, and things like that. What’s a couple pieces of advice you’d give them for how to do this correctly? And is there a model marketplace you’d point them to like, “These guys really do it really well”? And I know it’s super specific based on the marketplace, but is there one just like, “They really nailed it”?

The Future of AI and Data Science

Ramesh Johari: Oh man, that’s a tough one. I think I’ll answer the second part first. I don’t feel like anyone’s really nailed this. Yeah, I think there’s a lot of innovation that’s happened, but I think fundamentally, we’re still playing with the same kind of tools that we had when eBay and Amazon first started thinking about how to do rating systems ages ago.

And part of the reason we haven’t nailed it is because there’s a lot of dynamics in play that lead to what’s called rating inflation, where if you look at ratings over time in the marketplace… One of my colleagues, John Horton, who was a professor at MIT and has worked very closely with Upwork, we worked together when I was at oDesk, he was the staff economist there. He’s written a couple of really nice papers with this empirical phenomenon that over time, you see the median rating inflating, let’s say on marketplaces like oDesk, like Uber, like any of these.

And there’s a lot of reasons for this, but one of them is just that there’s a reciprocity issue, which is it’s effectively, from your perspective, it’s kind of costless if someone says to you, “Hey, please leave me a nice rating.” And if you’re seeing them or you’re interacting with them, most people don’t want to be mean. So that happens.

But there’s another aspect of it, which is norming. As the ratings in the marketplace go up, they get normed, so that now you’re in a condition, you’re like, “A four star rating. I’m really screwing this person over.” Whereas maybe when the marketplace started, you didn’t think that.

So definitely one thing that we worked on in our research was to think about renorming, the meaning of some of these labels. And renorming could mean something like rather than the star ratings just being poor to excellent, the top rating has actually exceeded expectations. You could go one step further and you could say, “How did this compare to this experience you had in the past that you rated really highly?” And Airbnb had something like this in place, where they would actually ask you to compare, or ask you questions about expectations.

I find that that’s really valuable because it’s easier for people to say, “That was good but didn’t exceed my expectations. That was good, but definitely not better than this amazing stay I had two months ago,” than it is to say, “Well, I’m going to ding this person and give them four stars.” So that’s one issue.

And I think another thing I want to point out for any marketplace founder is that something you want to be really careful about is the concept of averaging and whether are the implications of averaging. And that’s because a default for many marketplaces is to just average the ratings that people get. It feels very natural, right? Lenny’s got five ratings, let me average them.

And that actually has some pretty important distributional consequences for the marketplace. Distributional in the sense of who wins, who loses. And that’s because if you’re averaging and you’re really established on a platform, think of a restaurant on Yelp with 10,000 reviews, it’s irrelevant what the next review is. It doesn’t matter. Nothing’s moving it at that point.

If you’re new and you break into that market, and your first review is negative, you might be completely screwed. In fact, there’s some early work on eBay that showed that if your first rating’s negative, that could actually immediately cause an 8% hit on your immediate expected revenue, say nothing of long-term consequences. Subsequent work has found that that’s a significant indicator of potential exit from the platform, just because now it’s very hard to find work. And some platforms do things like maybe they won’t show your ratings until you’ve accumulated a few.

But in the end, this kind of distributional fairness aspect of averaging is pretty significant. And one of the recent papers that we’ve written is trying to get platforms to think a little bit about that. There’s ways to address that interestingly, through the same concept of a prior. And the prior basically says hey, if someone comes into the marketplace and instead of averaging them, I average them together with a prior belief, then maybe what that prior belief does, it says, “Yeah, you got one negative rating, but maybe you got a little bit unlucky,” and maybe my prior belief is something which actually pulls your rating up a little bit and allows me to still have you alongside others in the marketplace to give you a chance at getting work, getting rides, etc.

So I believe pretty strongly in this kind of distributional fairness element of designing rating systems. I think it’s been understudied. And I’ll say in general actually, I think rating systems are understudied, which to me is astonishing. Because the biggest change from those Agoras and Trajan’s Market elements of those kinds of markets, to me the biggest change is that we get to see what happened with our matches.

So as a data scientist working on marketplaces, I feel like it’s incredible that more of us don’t spend our time thinking about what we’re learning from the matches, and what these rating systems are telling us, and what the impact of that is on who wins and who loses in these markets, kind of thinking about the social implications of these things. So that’s something I’m pretty passionate about.

Rapid Fire Q&A

Lenny: I also led the review system flows for a while at Airbnb, and one of the things I’m most proud of is launching what we call double-blind reviews where you don’t see the other person’s review until you leave your review. The intention was to create more honesty and more accurate reviews.

It turned out the biggest impact was review rate went up, because people get this email, “Ramesh left you a review. If you want to see it, should leave a review.” And that really increased review rate, which gave us more data. And it was a really fun experiment to work on.

Ramesh Johari: There’s a great concept in the literature on rating systems called the sound of silence, which is this idea that there’s a lot of information in ratings that are not left. So Steve Tadelis, who’s a professor at Berkeley, he had a really nice paper with some folks at eBay talking about what they called effective percent positive, where rather than normalizing just by the ratings, they normalized by including ratings that weren’t left. And what you found was this was much more predictive of downstream performance of a seller. So there’s a lot of information in that lack of a response. So it’s cool that you’re able to get more of that out.

Interview Methods and AI

Lenny: So much easier just to not leave a review than leave a bad review. Right? The downside to you is just much better. Oh man, marketplaces are so fascinating. I could see why a founder would want to be a marketplace founder, because it’s just such an interesting space. And hearing your feedback of, no, you’re not a marketplace founder. Let’s think about the problem you’re solving. And it might be a marketplace, might change people’s minds. Also, I feel like there’s a podcast episode in every topic we touched on. I know we just scratched the surface a lot of things.

I know you got to run. Before we get to our lightning round, is there anything else you wanted to highlight, touch on, leave people with that are maybe working on marketplaces, thinking about a marketplace?

Ramesh Johari: I think one of the high level points I would make, and like you said, there’s an entire podcast in this topic, is that I think people want to imagine LMs and AI driven data science automating out large parts of what it means to do data science in industry. And I think that’s probably the wrong perspective. In some mundane sense, that’s true. It’s easier for me to code than it used to be before. It’s easier for me to develop visualizations than it used to be. I can make dashboards faster. So programmatically, I think it’s true in some basic sense.

But what I believe pretty strongly, and I teach data science here, and my students are asked to use LMs and generative AI on a weekly basis on all their assignments. So I’ve got an up close and personal beat on this, but I believe very strongly actually is what AI has done for us is it’s massively expanded the frontier of things we could think about our problem, hypotheses we could have, maybe things we could test. It’s just an astronomical explosion of explanations, and ideas, and principle.

And I really think actually what that does is puts more pressure on the human, not less. I think it becomes more important for humans to be in the loop in interacting with these tools to drive the funneling down process of identifying what matters, at all levels. That ranges from you’re carrying out a data scientific analysis, and now because you’ve got these tools, you can hypothesize 10 explanations, maybe 100 explanations. Which of those are you going to focus attention on? What are you going to tell other people to focus their attention on? To you’re running experiments, used to have 10 creatives you’re testing for a marketing campaign, you got 1,000 creatives, you’re testing for that marketing campaign. Maybe that completely changes the game of what it means to run an experiment. What are you actually looking for now? How do you evaluate that you found something that was good enough?

And I think these questions are not getting enough attention. I think people are looking for the automated tool that really cuts the human out. But what I’ve seen so far, and again, who knows? By 2024, I might have a totally different answer for you. I don’t think so. But at the moment, what I see is that humans have actually become far more important to the productive data science loop, not far less.

Recent Favorite Product Discoveries

Lenny: Such an important point. I feel like we need to add AI corner to this podcast where we always think about, how does AI impact what we’re talking about on this podcast?

Ramesh Johari: Yeah, I can see that. I totally see that.

My Life Motto

Lenny: Okay, we might start doing that. Ramesh, with that, we’ve reached a very exciting lightning round. I’ve got six questions for you. Let’s try to knock through them so you can go teach your class. Are you ready?

Ramesh Johari: I am ready.

The Real Stanford Professor Experience

Lenny: All right. What are two or three books you’ve recommended most to other people

Ramesh Johari: When it comes to books, I have one I love that I start with always, which is How to Lie with Statistics. It’s a tiny book, Darrell Huff from 1954, which is just for anyone that likes data at any level, it’s such a fun read. It’s a great book.

The second thing I recommend to people, and actually this is true even for people who are not expert, is David Freedman was a statistician at Berkeley who passed away in the 2000s, early 2000s. His writing was fantastic in getting us to think hard about process. He was especially fond of what he called shoe leather statistics, where you rolled your sleeves up, you got on the ground, boots on the ground, really getting in there, really trying to understand your data.

His writing is fantastic, his explanations are fantastic. He has a few different books at different levels I think people would love reading. Most importantly, what I like about it is he puts such emphasis on driving evidence and understanding of your processes that generate data. And I find often, data scientists don’t even look at examples.

So at oDesk, it meant are you looking at actual jobs, and what’s actually going on in your product before you’re trying to do data science on it? So I think that’s a Freedman insight, Freedman mantra, and so his writing is great.

The last one I was going to mention has nothing to do with data science or anything. It’s called Four Thousand Weeks by Oliver Burkeman. I’m not a huge self-help type person, but I really like this book a lot. I think it’s a little bit stoic in its approach, like stoic philosophy. But the basic point is you’re only on earth somewhere in the neighborhood of 4,000 weeks, and my wife and I have this term we call infinite Q, which is no matter what you think you get done on a given day, more stuff’s going to just keep coming in.

And he basically says that recognizing that is liberating. Because once you recognize it, it doesn’t matter what you do. You’re always going to have too much to do. There’s no point in stressing out about having too much to do. And just that small shift of mindset than puts a lot more attention on the usual thing people worry about, which is, where do I want to prioritize my time? So he has a great way of writing about it, some concrete rules of thumb to help manage that way of thinking. And yeah, I think it’s a great book.

Lenny: What is a favorite recent movie or TV show?

Ramesh Johari: I am a climber, and one movie that I really liked was The Alpinist. I know a lot of people have seen Free Solo, but for anyone that kind of likes that genre, I would recommend they watch The Alpinist.

I think climbing is an interesting sport because has very much a psychological aspect of it. And I think that movie is pretty good at this meta level where you reflect a little bit on, what does it mean to make a movie about people who are obviously putting themselves into such risky situations? So I really enjoyed that.

On TV, we’ve been watching Only Murders in the Building, but I’m enough episodes behind right now that I probably won’t say anything more, because I am trying to avoid any spoilers and I’m sure there’s people out there trying to do the same. So great show though on Hulu.

Lenny: What’s a favorite interview question that you like to ask candidates that you’re hiring?

Ramesh Johari: I interview people probably that are a little bit different than most of your podcast listeners. But that said, there’s one question I like to ask a lot, and that’s if you imagine… Often in our interviews in academia, whether it’s grad students or faculty will ask people about their plans.

And what I like to ask people is, “Okay, now imagine everything works out, all the challenges you’re facing work out, all your plans work out, everything hits the top end of your vision for what this could be. What do you imagine is the impact of having done that? Who’s being impacted by that? Why is that a big deal that happened?”

And I find that’s a really valuable question to ask, because first of all, many people haven’t thought about that. We’re so short-term focused, we don’t even think, “Boy, if everything worked out, what would be the big deal because of what I did?” Startup founders tend to be better at this than most people obviously.

But another reason I like it is because you will find in that conversation that their vision expands a little bit of additional spheres that are touched or impacted by what they’re thinking about doing. So on both sides, it’s kind of a revealing question, I think. So I find that important for my line of work, but my hunch is that might be useful for some of your listeners too.

Lenny: Yeah, such a unique perspective on interviewing, versus most of the guests that I interview in tech company.

Ramesh Johari: Yeah, normally there’s a coding question, right? I should say I would never ask a coding question post November 2022 after we got AI to help us code. I think it’s a superpower.

Lenny: AI corner. What is a favorite product you’ve recently discovered that you really like?

Ramesh Johari: I also really like cycling. And I’m not ashamed to admit that I think that e-bikes are the greatest thing for cycling. Admittedly, I’m late forties, so maybe I’m the right target demographic too. But yeah, I love my e-road bike. It’s great, because it’s not one of those with a throttle, you have to work, but it kicks in just when you’re on your sixth hill and you don’t want to go up the last hill anymore on the way home. So yeah, that’s amazing. I think that’s just transformative for people that like cycling, but have busy lives.

And I think another one that my son who’s 10 roped me into actually, is we were in Santa Cruz browsing at a kitchenware shop of all places, and he saw an outdoor pizza oven, a tiny portable one. And he just did research for two weeks and insisted we get one.

So he got one over the summer, and after we got it, he refused to eat pizza out anymore as a 10-year-old. So maybe that’s the best thing I could say about the quality of pizza you can get from a home outdoor portable pizza oven.

Lenny: Oh my God, I’m hungry. I am going to go have to get some pizza now. What is a favorite life motto that you like to repeat to yourself, share with folks, find useful in your day-to-day?

Ramesh Johari: A lot of my work involves talking to students of all stripes. And I guess these students go on to be data scientists, go on to be founders, and a lot of them go in the tech industry. So maybe in that sense, that advice is relevant.

My main thing I tell people is slow down. I think what I’ve found has been happening, is we’re so convinced that speed is the way you’re going to find the right answer, that I just don’t think we slow down to develop meaningful mental models of the things we’re doing. That’s certainly true in the research projects I work on. It’s consistently true when I talk to people in business, and I ask them about their… By mental model, I just mean if you’re running a marketplace, what is your model of what people care about? What makes people stay versus leave? What makes matches work versus not work? All those things shape a roadmap in your mind. And I think a lot of roadmapping, a lot of execution, paper writing in academia has all just become far more fast-paced, at the expense of deeper thinking about these kinds of structural features of the thing you’re building.

So with my students, but also I think with people I interact with in industry, I think slowing down is actually more of a virtue than it’s given credit for.

Lenny: Very similar to a motto that a recent guest shared, which I think was go slow to go fast, or stay smooth to go fast.

Ramesh Johari: Yeah, I like that. Maybe I’ll pilfer that, when I go talk to my grad students [inaudible 01:19:03].

Lenny: Final question. You’re a professor at Stanford University, which sounds incredibly cool. What’s something about being a professor at Stanford in particular or in general that would surprise people, either good or bad?

Ramesh Johari: Yeah. I mean, we’ve had a rough ride, as everybody probably knows. Stanford’s been in the news for a lot of not so great reasons, I think over the last five years especially.

So I don’t know if this is the right kind of surprise, but I think one thing that I find really energizing at Stanford is people have never asked me for credentialing here. And what I mean by that is that I came from a bunch of other good schools, and obviously I’ve spent time in industry with a lot of great companies. And a kind of cultural dynamic that can often develop is, “Well, before I’m going to talk to you, I want to know something about why you’re worth talking to. Give me your credentials. Where are you a grad student or where are you a professor? Tell me about yourself first.”

One of the things that I found very surprising when I came here is just how that never happened at any level. Grad students tell me this all the time. Go talk to someone across campus and just launch right into a conversation about how your X meets my Y, and we have something we could do together. As a faculty member, it happens all the time. I just had a conversation a couple days ago with someone about effectively a marketplace of experiment designs for nano fabrication here, which is totally out of left field for things I do, and yet seamless. Our conversation was about the substance rather than the credentialing.

I really think part of the reason for that is that Stanford is sort of unique in that it doesn’t have a weakness across the board. We have strong professional schools, law, business, medicine, strong engineering schools, strong humanities and social sciences. And then that and the weather is what I usually tell people honestly, which matters a lot. People are willing to walk anywhere. I think those things combine to create a culture and an environment where you don’t credential everybody.

And I think that means a lot. I think that’s something that I haven’t found elsewhere. And if people wanted to know something about what’s Stanford’s like on the inside, I think that’s one aspect of it that probably isn’t discussed very much. I think that’s part of what makes it really fun to be here.

Lenny: It’s also an incredibly dreamy campus, that is very joyful to walk around. That helps, I’m sure. Ramesh, I feel like we got people’s brains tingling. I think we’ve created new marketplace founders, and also convinced people maybe they aren’t marketplace founders. So maybe we netted out zero new marketplace founders. Two final questions. Where can folks find you online if they want to reach out? And how can listeners be useful to you?

Ramesh Johari: I think the easiest way, if someone’s interested more on the industrial side is probably LinkedIn. You send me a message or connect there. Also, because I’m an academic, I have my own Stanford webpage, and it’s pretty easy to figure out how to find me there as well.

And how can listeners help me? I kind of feel the most important thing that someone listening to this could do is take forward some of the messages that came out in terms of what it means to be data literate. And I think there’s a lot you can do to educate yourself there.

Maybe one final thought I’ll share is that in the same way that AI generates a lot of ideas, AI also generates a lot of prose. And in data science, that can actually be deadly because you’re getting more explanations that sometimes maybe are extraneous.

So taking that as a little vignette, I think that what the world needs is data literacy on the part of people interacting with these tools and with each other. So that’s the thing I care most about. The things I teach, the things I do research on, they’re all connected to that theme. And so that’s where I’m pretty excited. I do work with companies regularly, and so if there’s interesting opportunities that fall in the sphere of stuff we’ve discussed on the podcast, always happy to listen.

Lenny: Awesome. I think we’ve made a dent in helping people become a little more data literate. Ramesh, thank you so much for being here.

Ramesh Johari: All right. Thank you so much, Lenny.

Lenny: Bye everyone.

Thank you so much for listening. If you found this valuable, you can subscribe to the show on Apple Podcasts, Spotify, or your favorite podcast app. Also, please consider giving us a rating or leaving a review, as that really helps other listeners find the podcast. You can find all past episodes or learn more about the show at lennyspodcast.com. See you in the next episode.

Glossary

English	中文
Agora	集市广场
Airbnb	Airbnb（平台名，保留原文）
badging	徽章
Bayesian A/B testing	贝叶斯 A/B 测试
black box algorithms	黑箱算法
causal inference	因果推断
CMO	首席营销官（CMO）
credentialing	资历证明
Darrell Huff	Darrell Huff（人名，保留原文）
data literate / data literacy	数据素养
David Freedman	David Freedman（人名，保留原文）
disintermediation	去中介化
distributional fairness	分配公平性
DoorDash	DoorDash（平台名，保留原文）
double-blind reviews	双盲评价
effective percent positive	有效好评率
flywheel	飞轮
Four Thousand Weeks	Four Thousand Weeks（书名，保留原文）
Free Solo	Free Solo（电影名，保留原文）
frequentist statistics	频率学派统计
friction	摩擦（指交易成本/障碍）
holdout group	对照组
How to Lie with Statistics	How to Lie with Statistics（书名，保留原文）
Hulu	Hulu（平台名，保留原文）
humans in the loop	人在回路中
hypothesis driven	假设驱动
infinite Q	无限队列
John Horton	John Horton（人名，保留原文）
lead gen	线索生成
Lenny	Lenny（人名，保留原文）
lifetime value, LTV	终身价值（LTV）
LinkedIn	LinkedIn（平台名，保留原文）
liquidity	流动性
litmus test	试金石
local maxima	局部最优
Lyft	Lyft（平台名，保留原文）
machine learning model	机器学习模型
market failure	市场失灵
Marketplace	平台市场
matching algorithm	匹配算法
mental model	心智模型
nano fabrication	纳米加工
norming	规范化
oDesk	oDesk（平台名，保留原文）
Oliver Burkeman	Oliver Burkeman（人名，保留原文）
Only Murders in the Building	Only Murders in the Building（剧名，保留原文）
out of left field	意想不到的领域
positive externality	正外部性
prediction market	预测市场
prior	先验判断
quantified	量化思考
Ramesh Johari	Ramesh Johari（人名，保留原文）
rating inflation	评分通胀
reciprocity	互惠性
renorming	重新规范
roadmap	路线图
roadmapping	路线规划
Ronald Coase	罗纳德·科斯
Ronald Fisher	Ronald Fisher（实验设计之父，保留原文）
Ronny Kohavi	Ronny Kohari（人名，保留原文）
Santa Cruz	Santa Cruz（地名，保留原文）
scaled liquidity	规模化流动性
Seinfeld	《宋飞正传》
Servaes Tholen	Servaes Tholen（人名，保留原文）
shoe leather statistics	皮鞋统计
smell test	直觉检验
sound of silence	沉默之声
Stanford	斯坦福
stat sig	统计显著性
Steve Tadelis	Steve Tadelis（人名，保留原文）
Stitch Fix	Stitch Fix（品牌名，保留原文）
Superhost	Superhost（Airbnb 功能名，保留原文）
The Alpinist	The Alpinist（电影名，保留原文）
Thumbtack	Thumbtack（平台名，保留原文）
Trajan’s Market	图拉真市场
transaction costs	交易成本
Uber	Uber（平台名，保留原文）
unit economics	单位经济
Upwork	Upwork（平台名，保留原文）
UrbanSitter	UrbanSitter（平台名，保留原文）
Venmo	Venmo（支付服务名，保留原文）
vignette	小故事
whac-a-mole	打地鼠游戏

Reformatted by reformat_english.py

来自 Uber、Airbnb、Bumble 等平台的平台市场（Marketplace）经验 | Ramesh Johari（斯坦福大学教授）

访谈实录

Ramesh Johari： 平台市场有点像打地鼠（whac-a-mole）游戏。我遇到过一个非常喜欢的例子，是我合作过的一家公司，他们的新供给端体验非常糟糕。于是我们决定开发一些定制功能，把这些新用户引导到市场另一端更有经验的人那里去。效果不错，果然，相关指标很快就好转了。但随后我们再看数据，发现”等等，现在另一端的老用户体验变差了”。于是你就像被甩来甩去一样——“哦等等，我们得解决这个问题”。我们就把他们和更有经验的人做匹配，结果一个月后你又发现”等一下”，指标就是这样不停地来回波动。这是因为这场打地鼠游戏的本质在于——平台市场的管理很大程度上就是在调配注意力和库存。许多影响深远的变动都会制造出赢家和输家。而应对这些变动的关键在于判断：你创造出的赢家对业务的重要性，是否超过了在此过程中产生的输家。

Lenny： 今天的嘉宾是 Ramesh Johari。Ramesh 是斯坦福大学教授，研究和教授数据科学方法与实践，专注于在线平台市场的设计与运营。他曾为全球最大的平台市场提供咨询与合作，包括 Airbnb、Uber、Stripe、Bumble、Stitch Fix、Upwork 等。在我们的对话中，我们非常硬核地深入探讨了如何打造一个蓬勃发展的平台市场，包括应该把资源聚焦在何处来推动平台市场的飞轮增长，为什么数据和数据科学对构建成功的平台市场如此核心，如何设计更好的评价系统，为什么作为创始人你不应该把自己定位为”平台市场创始人”，而仅仅是一个”创始人”，以及 AI 将如何影响数据科学、平台市场和实验，还有更多内容。如果你正在构建平台市场业务，或者正在考虑构建平台市场，或者只是好奇，这期节目就是为你准备的。

(广告部分已跳过)

初次结识

Lenny： Ramesh，非常感谢你来参加节目，欢迎来到播客。

Ramesh Johari： 非常感谢邀请我，Lenny。很高兴来到这里。

Lenny： 很高兴你能来。非常感谢 Riley Newman 帮我们牵线。Riley 是 Airbnb 的第一位数据科学家，也是 Airbnb 的数据科学负责人。这个角色实际上很好地缩影了我们今天对话要聚焦的主题。我们要深入探讨平台市场、实验和数据。我知道这是你的专长。准备好深入了吗？

Ramesh Johari： 确实准备好了。其实我也想感谢 Riley。我最开始在 oDesk 担任研究科学家，后来负责他们的数据科学团队，那时认识了 Riley。那是 2012 年的事了，当时我在四处寻找对数据和市场平台有深入理解的人，Riley Newman 进入了我的视野，于是我邀请他来 oDesk 给我们做分享，此后我们一直保持联系。那时还是这个行业的早期阶段，而我在思考这类问题上已经度过了相当长的职业生涯。所以我很期待和你聊聊这些话题。

什么是平台市场业务

Lenny： 让我们从宏观开始，先打一些基础。你对”平台市场业务到底是什么”有一种很有趣的描述方式。那么 Ramesh，什么是平台市场业务？还有，为什么数据如此重要，是构建成功的平台市场业务不可或缺的一部分？

Ramesh Johari： 很有意思，当人们坐下来想一想，比如 Airbnb，Airbnb 卖的是什么？普通人会说：“这不是很明显吗？Airbnb 卖的是房间。我去上面是为了预订一个我想住的房间。“还有人会说：“Uber 卖的是什么？Uber 卖的是乘车服务。我需要从一个地方到另一个地方时就用 Uber。“在某种意义上，你说的没错。你去那个平台确实是为了获得这些东西。但这并不是平台在卖的东西——这是一个非常重要的区分。平台上有人在向你出售这些东西：Airbnb 上的房东在向你出售房源，Uber 上的司机在向你提供乘车服务。但 Uber 和 Airbnb 向你出售的是”消除”某种东西，这听起来有点奇怪。它们消除的是寻找住处的摩擦，它们消除的是寻找司机的摩擦。

摩擦与交易成本

Ramesh Johari： 在经济学中，我们把这些东西叫做交易成本。当你上经济学入门课时，你学到的是市场以及供给如何满足需求，从而产生价格。但直到你上中级经济学课才会学到，市场并不总是有效的。市场不总是有效的原因之一，就是这些摩擦的存在导致了所谓的市场失灵。什么是市场失灵？就是 Lenny 想从 Palo Alto 到 Burlingame，但他做不到。为什么做不到？他找不到人开车送他。那他为什么不直接叫个人来送？他该叫谁？那些人是谁？他们存在吗？他们愿意现在、就在周五上午十点送他吗？他们愿意把他送到目的地吗？

当我在旅行时想找个地方住，一个摩擦就是——谁愿意把房间给我住？原则上确实有人愿意让我住他们的客厅，但我不知道他们是谁。

所以这些就是摩擦，而平台市场向你出售的就是消除这些摩擦。你为这个付费。这是一个重要的观察，因为这意味着平台市场的客户不仅仅是那些购买乘车服务的人，他们也在购买房源信息。实际上，房东也是 Airbnb 的客户，司机也是 Uber 的客户。所以平台市场的双方都是平台的客户。双方都依赖平台来帮助消除这些摩擦。因为就像你需要住处或者需要乘车一样，司机在 Uber 上是因为他想通过载人赚钱。房东在 Airbnb 上是因为他们想通过出售房源赚钱。

我认为”通过消除交易成本来赚钱”这个概念是平台市场领域一个如此基础的理念，却常常被误解。当你作为一个创业者创办平台市场、或者思考你的商业模式时，我觉得如果你忘了这才是你根本的价值主张，你可能会偏离得很远。

数据科学在平台市场中的角色

然后你问到数据、更广义地说数据科学在平台市场中的作用。这是一个很有意思的话题。我最喜欢举的例子是古希腊的集市广场或者罗马的图拉真市场。当你看这些地方的图片时，最引人注目的是石头。这些东西是石头造的。你不可能不搬动大量石头就把一个摊位从一个地方移到另一个地方。

快进到 2023 年，技术已经支撑着几乎所有形式的商业活动。这意味着我们可以即时地架构和重新架构平台市场，而我们确实一直在这样做。

那些被消除的摩擦，正是因为数据和数据科学才得以消除。所以我想向大家强调其中的三个环节，希望大家把它们想象成一个循环。不过先让我们逐个展开。

第一个是找到可匹配的人。这就是”我想找个地方住，谁愿意在特定时间段让我住他们那里？“这个问题。然后如果我是房东，我有房源，谁愿意在我有空的时候住我的房子？这就是寻找匹配。

然后是促成匹配。回到我在 oDesk 工作的经历，当时我们处理的一个大问题是，如果我的职位有多个申请者，我该雇佣谁？我该面试谁？这是我们在现实世界中经常面对的问题，但现在一切都是远程的。我不会跟这些人见面，我有的只是他们提交的申请。我需要帮助来筛选。这就是在可能的匹配对象中帮助促成匹配。

最后，我们促成了匹配。那么，匹配告诉我们什么呢？如果你在 Airbnb 上住了某个地方，你会了解房东的一些情况，了解房源的一些情况。房东也会了解你。这些信息都应该被平台市场反馈回去。这就是我们说的评分系统和反馈系统，甚至包括被动数据采集——你是否在预定期结束前就离开了预订？这可能意味着某些事情没有按你期望的方式进行。这就是被动数据采集。你留了五星好评？那是主动数据采集。

把这些信息都收集回来，然后呢？这让我们在未来能更好地寻找潜在匹配、促成潜在匹配。我刚才说的每一件事——寻找潜在匹配、促成匹配、然后从这些匹配中学习、再循环回去——这就是平台市场中的数据科学。

我觉得你能想到的任何垂直领域的平台市场，都需要处理这三个问题，并依赖算法和数据科学来帮助解决它们。而反过来，我认为这真正是消除那些摩擦的根基。

平台市场最常见的失败模式

Lenny： 很多创始人试图创办平台市场业务，在想还没有平台市场的领域寻找平台市场机会。而某些类型的平台市场在某些领域就是做不起来，存在一些反复出现的失败模式。你刚才聊的时候我就顺手记了几个想法——比如清洁服务，以平台市场的模式做清洁服务似乎从来没成功过。洗车，也是一个经典的失败案例。按需帮你完成各种任务的平台市场模式，看起来也常常做不起来。

所以这可能是一个太大的问题，但我很好奇你是否有任何想法——当有人想创办平台市场、或者在考虑创办平台市场业务时，你觉得最常见的、让人判断”这个平台市场可能做不起来”的缺陷是什么？

Ramesh Johari： 这是一个非常棒的问题，我想先说几点作为铺垫。第一，我与很多不同的平台市场公司合作过，但任何涉及更敏感内容的话题，我在播客中可能不会点出具体公司名称。

但另一点更重要的是，我是斯坦福的教授，我没有成为一个成功地将平台市场规模化创业的企业家，这是有原因的——因为我可能还没有解锁你问的这个问题背后的关键答案。不过，我还是有一些想法。

最重要的一点是：在与那些想创办自己认为是平台市场的人交流时，我发现他们在还不是一个平台市场之前，就想得太多了。在我看来，这是最大的失败模式。

你提到了具体的例子，清洁服务。我也在想这个问题，对吧？是不是清洁行业本身有什么特殊之处？有可能。我不敢自称是清洁行业微观经济学的专家。但通常原因不在于此，而在于我从一开始就认为自己是在做一个平台市场，但现实世界不是这样运作的。我给你们讲一个我很喜欢的小故事，就是 UrbanSitter。

UrbanSitter 的启示

首先是，UrbanSitter 是一个保姆服务平台市场。我们可以聊它的整个发展历程，但我认为最有趣的是它的早期阶段。在早期，我觉得很有意思的一点是，我是怎么知道这个平台的——当时我们自己也在到处找人帮忙。然后我发现了这个新平台，它最巧妙的地方在于：以前请保姆的时候，那是 Venmo 还没出现的年代，你必须手头有现金。因为保姆活干完了，通常就是些高中生之类的，他们想当场拿到钱。他们不会接受你的欠条，说什么第二天寄张支票过去。

而问题是，你经常手头没有现金。他们又不刷信用卡——他们是高中生嘛。这就形成了一个巨大的摩擦。而 UrbanSitter 做的事情就是：我们接受信用卡支付保姆费用。就这么简单，对吧？

从那以后，他们利用家长和保姆之间的 Facebook 社交网络来建立可信的引荐关系。比如我的保姆没空，我就能认识那个保姆的 Facebook 网络里的其他保姆。一旦他们解决了第一个问题，让平台上有了一定的流动性，他们就可以着手去解决我前面提到的那些摩擦了——如何帮人们找到潜在的匹配对象？如何让人们达成这些匹配？但在你的平台还没有流动性的时候，你是做不到这些的。你对别人说”嘿，我真的能帮你找到所有的司机”，结果你平台上只有三个司机——这不是你在解决的摩擦。

从支付摩擦到匹配价值

在他们的例子中，随着业务的发展，他们实际上调整了变现模式——不再专门针对”允许你用信用卡付款”这个摩擦来收费，而是转而针对你如何面试和联系保姆来收费。他们为此设计了两档方案：一个是按次付费的菜单式选项，另一个是更偏订阅制的选项。但关键在于，无论哪种方式，你现在付费买到的是找到潜在保姆的能力，而不是用信用卡付款。那已经不是核心价值了。

那么这里的道理是什么？道理是：平台市场业务从来不是从平台市场业务开始的。因为我们所说的”平台市场业务”，是在规模化之后消除双方找到彼此的摩擦。但当你起步的时候，你没有那个规模。

所以当你起步的时候，你最好想清楚一个问题：“在一个双方都没有规模化流动性的世界里，我的价值主张是什么？“这个问题是量身定制的，对不同的情况意味着不同的事情。以我创办的 oDesk 为例，最初的核心是：远程工作是一件很奇怪的事，因为你得想办法确认那个不在你身边的人确实在按照你的要求做事。所以 oDesk 最初的价值主张是为工作者提供工具，验证他们确实在工作所声称的那些时长，确实在做所声称的那些工作——比如屏幕截图和各种追踪手段。

然后以此为基础，为双方提供保障。工作者可以说：“嘿，我确实做了我说的工作量，所以我应该拿到报酬。“雇主可以说：“我看到你确实做了你说的工作量，所以我放心我付的钱是值得的。“这就是最初的价值主张——解决远程场景下的信任问题。

到了那个阶段，流动性不是核心问题。核心问题是：在这个领域里的人面临的什么问题，是我在还不是规模化平台市场的时候就能解决的？所以回到清洁行业，我可以说说我个人的经验，但除此之外，我认为这就是我的思路。当你建设平台市场的时候，几乎从来不是在建设平台市场。

Lenny： 这跟我总是给平台市场创始人提的建议非常相似——你 90% 的问题都不是平台市场特有的问题。它们跟任何创业公司面临的问题一样：怎么增长？你需要做的事情是一样的。

Ramesh Johari： 你刚才说的是”这是你告诉平台市场创始人的话”。我自己在思考这个问题时，实际上一直在强调的一点是：也许我们根本不应该谈论”平台市场创始人”这个概念。存在的只是创始人。我认为每一位创业者……换一种方式想，几乎没有任何人类商业活动是没有被在线交易所颠覆的潜力的。

如果事实如此，那就意味着字面上任何创始人都是平台市场创始人。是否要搭建平台，是他们成长起来之后才会做的选择。举个很火的近期例子，任何头脑清醒的人最初都不会把 OpenAI 看作一个平台市场，但 OpenAI 现在就是一个平台市场。他们可能不想这么称呼自己，但他们有了插件。插件正在涌入那个平台。用户在试玩各种插件，要找到你需要的插件并不是一件容易的事。这现在真的是一个双边的生态了——有插件创建者，有用户。他们信也好不信也好，他们就是一个平台市场。

每个创始人都是平台市场创始人

所以我认为换一种思路是：每个创始人都是平台市场创始人。是否要成为那个平台，是他们自己的选择。这是第一点。第二点，正因为如此，我发现创始人面临的另一个挑战是：你不想过早地锁定自己的未来。我的意思是，你在早期建立信任，建立自己是什么类型生意的认知。如果你相信未来会走向平台，走向平台市场，那么你在早期做出的一些选择可能会在后来束缚你的手脚。

一个很好的例子是 oDesk 起步时，因为他们提供的工具是用于持续监控工作的，所以很自然地就会说：“我们从平台上流过的每一笔钱中抽一个固定比例。“这在初期一切运转良好，但当你成熟之后，问题就来了。工作者和雇主之间的一些关系持续很长时间，而此时大部分价值已经不再来自于他们能互相追踪——因为信任已经建立了——而是来自于他们找到了彼此，因为他们通过 oDesk 建立了那段关系。

这意味着时间越久，平台在这段关系中增加的价值就越少，但你仍在抽取所有流水的 10%。这会导致什么？大多数平台市场 CEO 都很熟悉的一个词：去中介化（disintermediation）。也就是说，你原本在两方之间充当中介，而去中介化的意思基本上就是他们说：“嘿，我们不再需要你了。”

我最喜欢的一个例子是：有一次我们请了一个 Thumbtack 上的工人来送 IKEA 的家具，我太太说：“太感谢了，你真靠谱。“他说：“嘿，太好了。这是我的名片。以后还需要我，直接打背面的电话就行。“就这样。Thumbtack 获得了一次线索生成，然后我们就不需要这个平台了。

oDesk 的定价困境

Ramesh Johari： 我认为 oDesk 面临的这个困境意味着，在与 Elance 合并成为 Upwork 之后，他们不得不重新思考：“好，我们想用什么变现策略？怎么应对长期关系可能导致去中介化这个问题？这是否意味着我们需要一个能将此因素考虑在内的定价方案？“所以在早期对某种定价方案、某种变现模式做出承诺，可能会在你后来意识到自己其实是一个平台时，严重束缚你的手脚。

Lenny： 我非常喜欢这个观点。这让我想到了 Substack，它最初只是一个面向 Newsletter 作者的平台。后来他们思考：“怎么让这个平台更有价值？“因为他们从每位作者的收入中抽取分成。于是他们大力投入，帮助为作者导流——比如对我来说就是这样。到目前为止，我超过 80% 的订阅者来自 Substack 的网络。所以他们恰好按照你描述的方式构建了平台市场的元素——他们发现了一个痛点：作者需要更多订阅者，怎么帮助他们获取订阅者？于是他们想出了各种创造需求的方法。

扩大边界还是打破契约

Ramesh Johari： 这是一个非常正面的案例，他们通过赋能网络真正拓展了业务的边界。但每出现这样一个正面案例，不幸的是都会伴随许多反面案例。我觉得一个非常痛切的例子是 eBay，随着平台引入越来越多细粒度的费用来源，它与卖家社区之间产生了大量矛盾。

关于 eBay，目前已经有非常多的论述，讨论它的历史、如何走到今天这一步。但我希望大家思考一件很简单的事：eBay 上那些伴随平台成长、一路发展起来的卖家，对自己在平台上的生活状态已经形成了特定的预期。这完全可以理解，因为很多卖家把生计建立在这个平台上，那就是他们的全部事业。

所以当你突然介入并说”我要彻底改变你的商业模式赖以运行的游戏规则”时，从这些卖家的角度来看，这就是对长期形成的隐性契约的撕毁。所以我很喜欢 Substack 的例子，因为那就像是”嘿，让我们强化彼此之间的契约”。但我认为，每有一个这样的正面案例，就会有一个像 eBay 这样的警醒——你可能会把自己困住。

给平台市场创始人的建议

Lenny： 让我来给这个我认为非常重要的观点做个收束。很多听众可能正在想：“我是一个平台市场的创始人，我正在搭建一个平台市场。“听完这些之后可能会想：“糟了，也许我需要重新思考自己在做什么。“对这样的人，你的建议是什么？是聚焦于摩擦点，然后平台市场可能是解决方案，托管型平台市场也可能是解决方案，又或者由你控制供给端？这就是你的建议吗，还是你会给一个说”我正在搭建平台市场”的人什么样的建议？他们应该如何重新审视自己的思路？

Ramesh Johari： 让我们回到平台市场降低摩擦这个概念。我给那些声称自己在做平台市场业务、或自称平台市场创始人的试金石是：你的平台双方是否都实现了我所说的”规模化流动性”（scaled liquidity）？什么是规模化流动性？

用通俗的话说——顺便说一句，我是数据科学家，我很喜欢用定量方式思考这些问题。但从根本上讲，如果连直觉检验都通不过，那你就没必要继续做数据科学了。直觉检验是这样的：规模化流动性问的是，“我的平台上有大量买家和大量卖家吗？还是说我只有其中一方？又或者两方都没有？“如果你两方都没有，你想怎么称呼自己都可以，但此时此刻，你不是平台市场。如果你有一方，恭喜你，你在市场的一侧赢了。这时你面临一个选择。你可以顺势在你做得好的那一侧继续增长。你获得了大量用户、大量买家？很好，顺势而为，获取更多买家。这是一个选项。不是平台市场并不丢人。把业务做大就是做大，如果这条路行得通，就这么做。

用一方带动另一方

如果你决定要成为平台市场，那么当你拥有大量买家但卖家不多，或大量卖家但买家不多时，你面临的选择就是：如何利用已经规模化的一方来吸引另一方？我们可以更深入地讨论这一点，但有很多方法可以实现这种撬动。以 Uber 为例，他们进入一个新城市时，在 Uber Black 还是唯一服务的年代，他们常见的做法是在活动、派对等场合发放免费乘车券，送人回家。这就相当于在说：“我们在补贴这座城市的司机——这是我们已经规模化的一侧。现在我们要利用这个被补贴的司机群体来吸引乘客。”

这就是如何让飞轮转起来的方式。同样，很多人写过如何利用一侧的规模化流动性来吸引另一侧。

如果双方都还没有，不要焦虑。先别操心做平台市场的事。先专注于把一方做大。在那个阶段，你的视野会完全打开——大量创业顾问的建议都适用于此。他们的建议不是关于如何做一个平台市场，而是关于如何做一个创业公司。

我想说的是，这时候你得放下自我。向他人阐述你未来想成为平台或平台市场的愿景，这完全没问题。正如我所说，在现代技术驱动的经济中，几乎每家企业在某个阶段都会有这个选项，所以当你告诉顾问或投资者这一点时，你说的并不是什么别人不知道的事情。但我确实认为，在起步阶段你需要足够谦逊，认识到如果双方都还没有规模化，谈论平台市场毫无意义。

Lenny： 那这就变成了一个商业模式和单位经济的问题——我能做一个 DoorDash，但不以平台市场的形式吗？我可以直接雇佣一批配送人员？走另一条路是否可行？

市场与企业

Ramesh Johari： 对，这是一个很好的观点。你提到的这个问题，在某种程度上其实牵涉到市场一侧应该用员工还是合同工、自由职业者的问题。

这实际上是经济学中一个相当古老的问题。我们通常的讨论方式是区分”市场”和”企业”。经济学中有一个很有趣的谜题——Ronald Coase 是一位思考过这个问题的著名经济学家——“如果市场这么高效，为什么我们需要企业？如果市场能高效地将劳动力与需要完成的工作进行匹配，为什么还需要企业？“这正是最早认识到交易成本是真实存在的观点之一。而企业解决的正是这个问题。

平台市场的劳动力管理：从员工到合同工

Ramesh Johari： 我很赞同你说的，因为它认识到一点：“对于你所面临的那些摩擦，最好的解决方案未必是搭建一个平台市场，可能反而是采用严格控制劳动力模式。“一个很好的例子是 Stitch Fix，我认为 Stitch Fix 早期让人感到惊艳的一点，就是用户与造型师之间的体验。

Lenny： 顺便说一句，我是他们很满意的客户。我觉得……

Ramesh Johari： 对，我觉得那体验之所以出色，是因为你感觉有一个人真的在了解你，而且这是一种关系，不像每次回去都像重新接洽一个自由职业者那样。

另一个我想举的例子是几乎所有医疗健康类平台。比如物理治疗，如果你每次登录一个物理治疗平台，都被随机匹配到当时恰好有空的治疗师，那就很奇怪了。所以我认为这种关系需要一定的筛选与维护。这是否意味着必须采用正式员工？也许不一定。但这确实意味着你必须认真思考——正如你刚才提到的——你的劳动力池的筛选机制本质上是什么样的。

数据在平台市场中的力量

Lenny： 太好了。好，让我们回到你一开始提到的观点——数据的重要性以及数据在提升平台市场效率和效果方面的巨大威力。假设你有一位数据科学家、数据分析师，或者某个人在帮你优化你的平台市场，你通常会发现数据人员能帮你在哪些方面找到最大的杠杆和机会？

Ramesh Johari： 这个问题非常好，对吧？因为我觉得可以从很多不同角度来回答。一个比较基础的问题是：这个人到底应该做什么？我打算稍微回避一下这个问题，我会举一些他们可以做什么的例子，但我觉得具体做什么在很大程度上取决于场景。

比如，在网约车或生鲜配送平台市场中，定价意味着你实际要为那趟车付多少钱，或者为那次配送付多少钱。这就是你下单那一刻实际设定的价格。顺便澄清一下，如果你在 DoorDash 上下单，我不是说餐厅菜品的价格，而是你付给 DoorDash 的那笔费用——有没有附加费，是否因为高峰时段加价之类的，对吧？

但在平台本身不设定价格的平台市场中，这就不是一个问题了。比如在 Airbnb，其实是房东在为自己房源定价。

回答你的问题，一种角度是：如果我处在 Uber、Lyft、DoorDash 这样的公司，我希望有优秀的数据科学家来研究定价问题，因为这看起来应该高度依赖于我平台市场里供给和需求的实时状态。这是一种类型的回答——我是否需要数据科学家来做定价？我是否需要数据科学家来做搜索？为什么是搜索？因为也许在我的平台市场中，大海捞针才是最大的、摩擦最高的问题，所以我可能需要更多数据科学家来研究搜索。

这就是我要回避的部分。我想更专注于一个完全不同的角度，一个关于数据科学家到底在做什么的更偏向哲学层面的观点。

机器学习模型：预测与决策

在如今很多公司中，尤其是大型公司，你让数据科学家做的主要事情之一就是构建所谓的机器学习模型（machine learning model）。而机器学习模型本身对不同人就已经意味着很多不同的东西了。我想聚焦在一个非常具体的点上：你让他们预测某个东西。

我加入 oDesk 时是 2012 年。关于我有一个有趣的事情——我加入 oDesk 之前有大约十年的学术生涯，做的就是建立各种事物的数学模型。在那之前我并不算一个真正的数据科学家。我原本以为到了工业界会有人告诉我：“看看数据有多重要。“而我的眼界确实被打开了。

我最初被要求思考的问题之一是：好，有人来到 oDesk，发布了一个工作，工人申请这个工作——预测哪些工人最有可能被录用。就是这样一个很窄的问题。为什么这是一个好问题？因为我们现在有一整套强大的工具可以精确地解决这类问题。怎么做？取大量历史数据——过去的工作、过去的申请人、过去的录用决定——然后把这些丢给那些庞大的黑箱算法：“来吧，用这些申请人，尽你所能预测谁会被这份工作录用。“然后我们用数据来测试这些算法表现如何。这基本上就是 30 秒讲完的机器学习。所以我们在这个问题上工作，很好。

然后我稍微抬头想了一下：“我们为什么要做这个？这东西要用来做什么？“结果发现这类东西之所以重要，是因为它们被用来做决策。那你会做什么样的决策？一种做法是说：“如果我能预测谁最可能被录用，那我就根据这个来排序，这就成了一个很好的匹配算法——一个在雇主筛选、决定面试谁、录用谁时对申请人进行排序和筛选的好方法。“听起来很自然。

然后你再仔细想想，对我来说，这真的是一个让我非常想让大家理解的事情——这就是为什么在商业中那些帮助我们理解和运用数据的人类环节如此关键。

如果你仔细想想，你会意识到那个算法真正在做的事情，其实就是在历史数据中捕捉模式。所以，是的，这个人很可能被录用。但我们真正想要的其实是别的东西——我们试图通过排名来创造价值。

再举一个类似的例子。当你是一位营销经理，你有一个很厉害的数据科学团队为你构建了一个长期价值、即终身价值（lifetime value, LTV）模型，你把最高价值的促销发给 LTV 最高的客户，没有人会为此找你麻烦，对吧？谁会怪你呢？因为你说：“这个人很有价值，我给他发了这个促销。“在你的月度报告里这么写，没人会为难你。

但这种思维方式的问题在于：预测他们的终身价值其实并不是真正的问题。真正的问题是：因为我发了这个促销，他们会在我的平台上多花多少钱？

这是完全不同的事情。它是一个差值，而不是一个绝对值。我关心的不是他们的绝对 LTV，我真正关心的是因为我发了这个促销而带来的 LTV 的差值。

当你从这个角度来看，你就会意识到可能发生的情况：基于良好的预测来捕捉模式——即通过预测找到那些高 LTV 的人——与做出好的决策是完全不同的。好的决策是说我发送促销后带来的 LTV 差值会更高。

预测与决策的区别

Ramesh Johari： 我非常喜欢这个例子，因为我曾在斯坦福教过一门课，类似高管培训课程。教室里坐满了来自同一家公司的高管，其中一位是首席营销官（CMO）。我就问了这样一个问题：“好吧，假设你有一个很好的 LTV 模型，你会把促销发给谁？“大家一致回答：“当然是 LTV 最高的人。“而 CMO 就坐在那里，所以情况有点微妙，不太好当面反驳。

我想先说明一点，出于品牌声誉的考虑，你可能确实会这么做。我不是要否定这一点。但仅就这个狭义的观点而言——预测是捕捉模式，而决策是要思考这些差值。

那么，为什么这很重要？因为我们在高中就学过：相关不等于因果。这句话人人都听过。它跟这件事有什么关系呢？当我们教人构建机器学习模型时，我们是在要求他们做预测，找相关性。预测本质上就是关于相关性的。但当我们要求人们做决策时，我们是在要求他们思考因果。“如果我做出这个决策，我是否真的能增加企业的净价值？我发送促销后，是否提高了这个人在我平台上消费更多的可能性？”

因此，对于数据科学家应该做什么，我有一个非常强烈的看法，那就是——无论他们是谁，哪怕是那个正在埋头构建招聘预测模型的人——都要让他们在脑海中始终牢记，自己的最终目标是帮助企业做决策。因果关系和相关关系之间的区别至关重要。我们可以进一步讨论这在日常工作中如何体现，但至少作为一个起点，你必须首先认识到：预测和决策不是一回事。

Lenny： 所以这里的启示是，作为数据团队和数据科学家，你的职责是帮助企业做预测——不对，是帮助企业做决策。你能否再举几个例子，说明数据团队经常应该做出、并利用数据来辅助的决策是什么样的？

因果推断与平台市场数据科学飞轮

Ramesh Johari： 也许更好的思考框架是——用学术界的术语来说——因果推断（causal inference）。也就是说，我们要从机器学习转向因果推断。让我们结合之前谈到的平台市场数据科学飞轮——寻找匹配、促成匹配、评估匹配——来分别看几个应用场景。

先说寻找匹配。正如你所说，核心环节是搜索和推荐，而这两者都依赖排序。我需要对搜索结果进行排序。假设我在 Airbnb 上做一次搜索，把不同的房源按顺序排列出来。在某种程度上，我确实是在试图预测——用户最喜欢什么。

但我觉得这里有一个重要的区别需要考虑：当我们要比较两种不同的排序算法时，这才是真正在做的决策。

在比较两种排序算法时，我不希望仅仅看它们多好地复现了人们过去的选择。我真正要评估的是：在我的市场中，哪一种能带来更好或更多的匹配？

以 Airbnb 为例，最核心的业务指标是什么？预订量和收入。所以你要问一个非常基本的问题：如果我用 Lenny 昨晚开发的排序算法，对比 Ramesh 上周开发的排序算法，Lenny 的算法是否能带来更多预订？

用这样直白的方式表述非常重要，因为这个问题和另一个问题截然不同——后者是：Lenny 的排序算法是否比 Ramesh 的算法更好地预测了过去两年人们的预订行为？这两件事完全不在一个层面上。

评估匹配质量

然后我们谈到在促成匹配环节的排序问题，我想招聘的例子就是在那里出现的。因为归根结底，虽然我们可以用预测算法来对候选人进行排序，但这并不是最重要的。

有趣的是，真正重要的问题实际上是评估所促成匹配的质量。而我们可以通过飞轮的下一步来做到这一点——我们会问：客户给那个自由职业者的评分如何？他们是否再次雇佣了那个自由职业者？所以，你比较两种算法的标准，不是它们复现过去的能力，而是它们在未来促成匹配的能力——而这种匹配可以客观地评估：“我提升了业务价值，我确实通过这种方式做出了更好的匹配。“至于评分系统，类似的现象也值得深入讨论。

[广告部分已跳过]

实验的局限与局部最优

Lenny： 我确实很想聊聊评分系统，但你刚才说的一切都隐含了一个含义——即做实验与仅仅观察过去世界中发生的事情是不同的。你做了一个改变，运行实验，看它是否真的对预订量和收入产生了影响。这引出了我想问的一个问题：关于实验，一直存在一个经典的挑战，也是一个绕不过去的问题——如果你只是一味地做实验，你很容易陷入微优化，陷入局部最优，可能会因此错过大的机会和突破。

你花了很多时间思考实验这件事。你有什么心得或建议？对于人们担心过度优化而错失大机会的焦虑，或者如何在运行实验与探索重大新机会之间找到平衡，你有什么想法？

实验的局限与激励陷阱

Ramesh Johari： 首先，我非常高兴你提到了”实验”这个词。我之前一直在绕着它转，很高兴我们终于谈到了实验。因为我们最近这场对话的一个重要启示就是——如果不做实验之类的事情，你怎么可能知道那个差别呢？

所以我是一个实验的坚定信徒。我先把牌亮在桌上：我喜欢和那些认为实验对做出好决策很重要的公司合作。

话虽如此，我对你提出的这个问题也深有感触。那就是，你不可能靠实验解决所有问题。

我喜欢给人们的一个框架是：虽然你可以说自己是一家实验驱动的公司，有些公司甚至会宣称”我们 literally 什么都测”，但这里被忽略的一点是，“什么都测”这个说法本身就有很大的自由度。

因为归根结底，被构建和测试的东西，是通过组织结构做出的选择——数据科学家、产品经理、工程师，所有人都在参与。在运行实验之前，我们其实已经在思考：什么值得做实验？我们要提出什么设计方案？这是第一点。

第二点是，这些实验跑多久？这也是一个重大选择。我总体上认为——有一篇论文我们可以稍后链接到，我也会向你的读者推荐，不是我的论文，是微软一些人写的——我总体上认为，人们在这两个维度上都过于保守：在一个把”什么都测”奉为圭臬的世界里，人们选择测试的东西，往往在设计上就是偏渐进的。我们稍后再回过头来解释为什么。这是第一点。第二，人们倾向于把实验跑很长时间，而且可能跑得比应有的更久。

那么我这两点到底是什么意思呢？让我觉得有意思的是，实验不是存在于真空中的。公司有激励机制。在那些全面拥抱实验的公司里，激励机制往往也和实验绑定在一起。因为如果你全面拥抱实验，一个常见的现象是：数据科学家会根据他们那个季度有多少次”胜利”来被评估。那怎么获得更多胜利呢？

很简单，做渐进式改动更容易获得胜利。而且因为”有胜利”很重要，你必须把实验跑足够长的时间来证明它们确实是胜利。你就不太愿意中途砍掉一个实验，去换一个风险更大的尝试。

所以这篇微软论文的核心教训是——论文题目叫《肥尾下的 A/B 测试》（A/B Testing with Fat Tails），通俗地说就是：你运营的生意中，如果去看实验效果的话，外面可能存在很大的机会。这篇论文给出了几个启示：一是尝试更多不那么保守的东西，二是不一定所有实验都要跑那么久。核心就是提高速度。

“胜利”与”学习”的文化

Ramesh Johari： 所以你可以看到这里面有一个很大的激励问题。因为一种能接受大失败的文化，实际上需要重新定义什么叫”胜利”。这是我在 A/B 测试中最讨厌的事情之一，我得说。我理解它从何而来。但在历史上，科学中的实验从来不是关于赢者和输者的。如果 Ronald Fisher——实验设计之父——在做农业实验的时候谈论”赢家”，那会很奇怪。我认为他不是这样谈论事情的。实验一直是围绕假设驱动的。它的核心是：你学到了什么？

这个区分非常重要。因为它意味着，如果我尝试了一个大的、有风险的东西，它”失败”了——也就是没有赢——但只要我对它所检验的关于我业务的假设保持严谨的态度，我可能学到了很多东西。

举一个很好的例子。平台市场有一个重要功能是”徽章”（badging）。有时候，在搜索结果中给那些评分最高的个人资料加上徽章是非常重要的。

不展开太多细节，关于徽章的一个常见发现是：你以为会很棒的徽章，实际上效果很糟糕。原因之一是，徽章把太多注意力集中在了获得徽章的人身上，从没有徽章的人身上抽走了太多注意力。

如果我们仅仅用赢和输来评判，就会把孩子和洗澡水一起倒掉——你会说：“那个徽章的想法太差了。扔掉，不要徽章。”

但它传达给你的信息不是这样的。它教会你的是库存如何被重新分配、注意力如何通过徽章被重新引导。你真正应该思考的不是赢和输，而是学习。

所以学习本身就是一种胜利。我觉得这从根本上说是一个文化问题。你很难在顶层用金钱来衡量一个数据科学家做的实验”失败了但学到了东西”。归根结底，我认为进入那样一种状态——实验做得更多，意味着你不把每个实验都跑那么久，并愿意尝试那些可能失败得更惨的尾部实验——是一种文化选择。它的意思是：“我们允许这成为我们和数据科学家之间的社会契约的一部分”——甚至可以说是雇佣契约的一部分——而不是一切都只看你发了多少次、赢了多少次。

你当然可以说”我就想这样使用实验”，但如果你要这样做，那就不要做一家”我们什么都测”的公司。因为那样的话，你需要其他方式来处理那些能让整个公司学到很多东西、但可能无法纳入你为数据科学家设立的激励体系的大变革。

Superhost 徽章的故事

Lenny： 这个徽章的例子——我不知道你是不是在说 Airbnb 的案例——但我实际上在 Airbnb 主导了 Superhost 的上线，那就是 Airbnb 上终极的徽章。当时数据团队非常担心它会毁掉整个平台市场，因为他们已经构建了——正如你所描述的——一个非常精巧的排序算法，能够精确预测某位客人最可能预订哪些房源并获得成功。然后我们要在搜索结果中给一些房源扔上一个徽章。我们团队里的一位数据科学家说：“不行，我们不能这么做。这太疯狂了，会把一切都毁掉的。”

我们还是做了。我们跑了一个实验，给一部分人展示徽章，另一部分人不展示。结果——完全没有影响。Superhost 本身对业务没有任何可观测的影响，至少最初是这样的。这感觉也挺五味杂陈的，因为你会想”我们做这个到底是为什么？“不过确实有一个小小的好处——房东感觉更好了，他们对作为房东的满意度提高了。但我确实完整经历了你所描述的那种情况，所以觉得挺有意思的。

Ramesh Johari： 不必深入探讨 Superhost 的数据科学细节，我觉得你刚才说的其实包含了很多层意思。我想补充的另一点是，我非常坚信，在处理实验结果时，你不应该把你对业务的理解抛到脑后。部分原因，我想我的意思是，数据科学本质上是一个证据累积的过程，绝不是孤立地看待某一个发现。所以另一个我认为常见的陷阱是，有时候人们会说：“好，我的 A/B 测试达到统计显著性了，绿灯放行，全面推广。”

我记得你请过 Ronny Kohavi 上过你的节目，他也提出了类似的观点——证据是有不同层级的。仅仅有一个与你对业务的所有认知相悖的异常 A/B 测试结果，并不意味着你就以某种方式推翻了你的全部知识积累。这是问题的一个方面。

另一方面是，你并不总是能测量所有重要的东西，而这些东西对于形成完整的判断又是必需的。以 Superhost 为例，很难测量的是 Superhost 的长期影响。因为在短期内，Superhost 会造成库存的重新分配——必然会有赢家和输家。Superhost 的部分价值实际上在于，获得徽章的房东能在更长时间内被留存下来。认识到这个假设，实际上就暗示了实验可能需要运行多长时间，或者需要做哪些类型的数据分析。

而最终，如果你做不到这一点——你没法把实验跑足够长的时间，或者因为数据稀疏或缺乏数据而无法进行相应的数据分析——那么你自己带入的专业判断就很关键了。你对这件事有什么样的信念？

所以我喜欢告诉人们的做法是，我鼓励大家做”量化思考”而非”数据驱动”。什么意思呢？好吧，确实有些东西我们没法测量。但也许你的领导团队对 Superhost 的留存价值有不同的看法，他们各自的判断可能五花八门。

你可以在这些相互竞争的信念的语境下来处理实验结果。这几乎就像一个预测市场。然后开始问：“好吧，如果我们对业务的认知是这样，而实验数据告诉我们的又是那样，让我们把这两者放在一起来看——这是否足以让我们做出决定，仍然继续推进？“即使你可能做的那个短期测试结果是平的。

Lenny： 回过头来看，这恰好就是我对 Superhost 的看法。那是个好主意，我真的挺高兴的。我甚至无法想象没有 Superhost 的 Airbnb，尽管至少在最初阶段，没有任何证据表明它产生了任何影响。我猜他们后来又重新审视了这件事，也许确实发现了一些效果。但即使它真的没有影响，它就是让人觉得平台市场变得更好了。这对我来说是一个很大的启发——它不一定要总能驱动某个可测量的指标。有时候就是一种直觉：事情就应该是这样的。

Ramesh Johari： 你说的这种情况之所以会出现，原因之一是平台市场有点像打地鼠游戏。我的意思是，就 Superhost 这个具体场景来说，因为你把注意力重新导向了一些房东，代价是……你甚至不太能确定预订量是否真的会上升。也许你运气好，能多出一些预订。但你一开始就不太可能期待这种结果的其中一个原因是，Superhost 的数量是有限的。因为所有这些额外的关注，他们能多吸收多少预订呢？而你同时又把注意力从其他人身上移走了。在完全不做数据分析的情况下，我的先验判断就会是预订量应该是下降的。

有一个我非常喜欢的例子，来自我曾经合作过的一家公司。我们合作了一段时间，某个月我们看了一些数据，显示新入驻的供给侧用户体验相当差。大家说：“我们得对此做点什么。”

于是我们决定开发一些定制功能，把这些新用户导向市场上更有经验的对手方。很好。然后果然，很快这些指标就开始好转了。但接着我们又看了一下，“等一下，现在对方那些老用户的体验变差了。”

于是你就像被甩来甩去一样：“等等，我们得解决这个问题。“所以我们把他们匹配给更有经验的用户。然后一个月后你又发现：“等一下。“你的指标就这么不停地来回晃动。

这就是因为这里的打地鼠游戏本质上在于——平台市场管理的很大一部分工作就是在重新分配注意力和库存。有时候你运气好，真的能把蛋糕做大，让所有人都受益。但 Servaes Tholen——他在 Upwork 做过 CFO，后来去了 Thumbtack——我之前在那边认识了他，他来我们课上做客座讲座时说过一句话，我特别喜欢：“你必须认识到，运营平台市场时，许多最有影响力的变革都会创造赢家和输家。接受这些变革，就是要判断你所创造的赢家对你的业务而言，是否比你在过程中制造的输家更重要。“这是一个残酷的现实，因为没有人喜欢承认一个功能变更正在伤害你平台市场中的一部分人。但由于平台市场运作方式中根深蒂固的这个基本约束，我们选择做的很多事情以及它们所引发的资源再分配，并不一定能在短期内创造出可观测的、大幅扩张的收益。你往往是在下注——赌的就是你正在朝那个方向前进，部分通过你当下所做的再分配来实现。

所以我觉得 Superhost 的案例有趣之处在于，它部分指向了这样一个思考：你在短期内定义的目标是什么？你定义的度量指标是什么？它是否真正捕捉了这种权衡取舍的概念？

Lenny： 这是一个很好的思考方式。我想回到你之前分享的那个观点——也许你应该更快地运行实验，不要等到统计显著性，建立一种重视学习而非影响的文化。但在实践中这非常困难，因为人们是被影响力来衡量的。有绩效考核，有晋升评定，有”这个团队驱动了多少影响”——大家会去看他们的实验结果。你跟很多平台市场公司、很多不同类型的公司合作过。你有没有见过什么做法，能帮助公司转变思维并真正以这种方式运作，同时又能够认可成功——谁做得好、谁做得不好、哪个团队在驱动影响、哪个团队没有？

Ramesh Johari： 有意思的是，这恰好是我目前的一个活跃研究领域。我说的活跃研究领域是指，我非常关注我们通过设置奖励机制为数据科学所创造的激励。所以我认为有几件事可能会有帮助，不过它们可能不太直接——也许我不会直接回答你问的问题，因为我认为那是个很难的问题，对吧？我承认以影响来衡量是关键的。嗯，让我先从最直接的角度来回答。我认为这里有一个至关重要的文化问题。

数据科学家的定位与期望

Ramesh Johari： 我经常发现，我们的博士生毕业后去做了很好的数据科学家工作。从某种意义上说，他们在做的事情很出色，用到了非常精密的技术方法。但当我看他们所处理的问题时，往往是在业务的边缘地带，而不是更核心的地方。

这其实是一个文化问题。因为如果你仅仅以狭隘的影响力来衡量一个人，而周围所有人也都只看这个，那他们就很难去触及业务变革中那些创造性的、战略性的层面。

所以在文化层面，我认为领导者有责任对数据科学家有更高的期望。所谓更高的期望，是指不要只要求他们在报告中交付狭义定义的、统计上严谨的结果，而是要期望他们在过程中也谈谈对业务的认知和理解。这指向一个概念——“假设驱动”（hypothesis driven），这是比较技术化的说法。用更通俗的话来说，这意味着什么？

意味着测试不应该仅仅以赢家和输家来定义。每一个测试还应回答：我们能从业务流程、漏斗、房客偏好、房东偏好中学到什么？如果我们调整定价，能了解到他们的需求弹性吗？这些都是可以在实验文档（experiment doc）、上线文档（launch doc）中清晰表述的——你到底在测试什么假设？所以我认为，在文化上建立这样的规范很重要：学习是对话的一部分，而且是被明确期望的。

利用过去的学习：贝叶斯 A/B 测试

但另一方面，我想谈谈更偏向操作层面的做法——数据科学平台团队可以做什么？实验中一个有趣的现象是，我们实际上在丢弃过去的经验。这是因为我们分析实验的方式造成的：常用的统计方法——P 值、置信区间——都属于频率学派统计（frequentist statistics）的范畴。频率学派统计的核心思想，不过于技术化地说，就是让数据自己说话，不带入任何关于数据来源的先验信念。

但如果你在公司内部思考这件事，在一家做 A/B 测试的公司里，这其实很奇怪，对吧？因为我可能已经对这个完全相同的按钮、行动号召或颜色跑过一千次 A/B 测试了，现在却要完全忽略这些，只关注当前这一次实验。

所以有一些方法可以把过去的经验纳入考虑——在跑实验之前建立一个所谓的先验判断（prior belief），然后拿实验的数据与先验判断结合，得出一个结论：“综合过去和这次实验，它对未来的启示是什么？“这大致属于所谓贝叶斯 A/B 测试（Bayesian A/B testing）的范畴。

有趣的是，我认为这在文化层面也能起到帮助。虽然这是一个非常技术性的东西，但它能在文化上产生积极效果，因为它现在是在奖励人们为先验判断贡献信息。这时你可以说：“你那个失败的实验实际上推动了我们的先验判断。“这一点很重要，因为这样做，你就改变了我们在所有未来实验中如何看待这个流程或定价方案的方式。

如果我能把你学到的东西编码到未来实验的分析中，就对你的业务其余部分产生了一个信息正外部性（positive externality）、正向的网络效应。A/B 测试的文化和激励，与将过去的学习纳入先验判断的能力之间，有着紧密的联系。

Lenny： 我很高兴你在这个领域做研究。等你完成研究、有了终极答案来改变大家的操作方式时，我们应该再邀请你回来。

Ramesh Johari： 教授的伟大之处就在于，我们永远不会完成任何事情，也永远没有终极答案。

Lenny： 天哪。

Ramesh Johari： 不过我会尽力的。

学习不是免费的

Lenny： 这触及了你之前跟我分享的一个非常有趣的概念——学习并不是免费的。人们以为自己可以学到一堆东西而不需要付出代价。我很想听听你谈谈这个观点意味着什么。

Ramesh Johari： 让我从一个故事开始，我非常喜欢这个故事，每年上课都会用。我曾经跟一个房地产平台交流，他们有一位营销数据科学经理，跟很多营销经理一样，负责在不同渠道之间分配广告支出。

到了年底他们发现，一方面团队做得很好，但另一方面这位经理私扣了一部分来访用户，没有向他们展示团队正在做的任何创新。

Lenny： 类似一个对照组（holdout group）？

Ramesh Johari： 没错，就是实验中所谓的对照组（holdout group）。但这个对照组是未经授权的——这不是正常的操作方式。你的广告预算给了你，你去分配就好了。所以到了年底，他们看了这个对照组的数据，说：“哇，这花了我们几百万美元，大致是那个量级。这可不是小数目。怎么回事？你当时在想什么？” 当然，那位经理的回答是：“我知道我花了你们那么多钱，但第一，现在你知道我的团队值多少了。第二，如果我不自己这么做，你永远也不会知道这个答案。”

为什么这个故事这么有力量？我觉得实验最有趣的地方在于：当你不知道答案时，把样本分配给所有选项——处理组和对照组——这似乎根本不是一个需要犹豫的问题。我有两种做事的方式，不知道哪个更好，当然要各给一些样本。但事后你回头看：“处理组更好。我们当时在想什么？为什么给对照组那么多样本？这说不通啊。” 这让我想起《宋飞正传》（Seinfeld）里的一个片段：吃完一顿丰盛的大餐后收到账单，大家盯着账单说：“我们现在又不饿了，怎么点了这么多菜？“道理是一样的。你现在知道处理组更好了——当初为什么在对照组上浪费那么多样本？

我认为这是一个非常有力的观察：你必须把自己放回到那个还没有答案的视角中去。在那一刻，你本质上是在对自己说——为了学到这个答案，值得付出代价。我们现在这样说，或者这个营销经理和对照组的故事，听起来似乎显而易见。但我认为在文化上并没有真正内化这个观念。我之所以说它没有被文化内化，是因为我们使用的”赢家和输家”这种语言。因为如果我们用这种语言，言下之意就是：当我们在 A/B 测试中跑了一个输家时，我们浪费了时间。如果我奖励你上线赢家，那我真正在告诉你的是——你在测试失败上花的时间全是浪费的。

当然，我不是说你想留下一批只会不断制造失败的数据科学家。这不是我的观点。

但我的观点是，这里存在一种断裂。一方面，我们都能看着那位营销经理的故事付之一笑；但另一方面，我们每天都在使用着强化同一主题的语言和流程，本质上是在告诉你：“如果你把样本浪费在那些最终没有成为赢家的东西上，那么这样做本身就是一种失败。”

所以，我真的认为”必须为学习付出代价”这个理念——这不仅是文化层面的问题，也是企业内部的教育问题。企业里汇聚了各种背景的人，并非每个人都来自数据科学或实验领域。而”学习是有成本的”这个观念，实际上并不自然。从人性角度看它不自然，从经营企业的角度看它也不自然。

Lenny： 我非常喜欢那个房地产平台的例子，那种损失非常直观、非常清晰——因为他们长时间没有对那个群体推出实验，所以遭受了损失。这是这个理念在实践中极好的一个案例。

你提到了星级评分。我知道你在评级系统设计上花了大量时间。抱歉，我不是特指星级评分，那只是其中一种实现方式，我想说的是评级系统整体。

那么，为了聚焦话题——假设一位平台市场创始人正在决定和设计他们如何做评分、评论等机制，你会给他们哪几条建议？有没有一个你可以推荐给他们作为标杆的平台市场，你会说”这家做得真的很好”？我知道这非常取决于具体平台市场的类型，但有没有哪个让你觉得”他们真的搞定了”？

评级系统的挑战与评分通胀

Ramesh Johari： 天哪，这问题太难了。我想先回答第二部分。我不觉得有谁真的搞定了这个问题。确实发生了很多创新，但从根本上说，我们仍然在使用与 eBay 和 Amazon 最初思考评级系统时相同的工具。

我们之所以还没搞定，部分原因在于系统中有很多动态机制会导致所谓的”评分通胀”——如果你观察平台市场中评分随时间的变化……我的一位同事 John Horton，他是 MIT 的教授，与 Upwork 有密切合作。我在 oDesk 的时候，他是那里的驻站经济学家，我们一起共事过。他写了几篇很好的论文，描述了这个经验现象：随着时间推移，你会看到中位数评分在不断膨胀，比如在 oDesk、Uber 等平台上。

原因有很多，但其中之一是互惠性问题。从你的角度来看，如果有人对你说”请给我留个好评价”，你的成本几乎为零。而且如果你还要继续和这个人打交道或互动，大多数人不想显得刻薄。所以这种情况就会发生。

但还有另一个层面，就是”规范化”效应。随着平台市场中的评分整体上升，标准也随之被重新校准，于是你就处于这样一种状态——“四星评价？我这是在坑这个人。“而在平台市场刚起步的时候，你可能并不会这么想。

所以，我们在研究中确实花了心思思考”重新规范”这些标签的含义。重新规范可以意味着，星级评分不再只是从差到优秀，而是最高评分代表”超出预期”。你还可以更进一步，问：“这次体验与你之前给过高分的体验相比如何？“Airbnb 曾经有过类似的做法，他们会让你进行比较，或者问你关于预期的问题。

我觉得这非常有价值，因为对人们来说，说”还不错，但没有超出我的预期”或者”还不错，但肯定比不上我两个月前那次超棒的住宿”要容易得多，而说”我要给这个人扣分，只给四星”就要难得多。这是第一个问题。

评分平均的分布公平性问题

我想对任何平台市场创始人指出的另一点是，你需要非常小心”平均”这个概念，以及平均化意味着什么。因为许多平台市场的默认做法就是把人们得到的评分取平均。感觉很自然，对吧？Lenny 有五个评分，我来取个平均。

但这实际上对平台市场有一些相当重要的分配后果。“分配”是指谁赢谁输。因为如果你使用平均分，而且你在平台上已经非常成熟了——想象一下 Yelp 上有一万条评价的餐厅——下一条评价是什么完全无关紧要。无所谓。到了那个阶段，什么都动摇不了它。

但如果你是刚进入这个市场的新人，你的第一条评价是负面的，你可能就彻底完了。事实上，关于 eBay 的一些早期研究表明，如果你的第一条评分是负面的，可能立即导致你短期预期收入下降 8%，更不用说长期后果了。后续研究发现，这是退出平台的一个显著指标——仅仅因为现在很难找到活干。有些平台会采取一些措施，比如在你积累到一定数量的评价之前不显示你的评分。

但归根结底，平均化带来的这种分配公平性问题是非常显著的。我们最近写的一篇论文就是试图让平台开始思考这个问题。有趣的是，可以通过”先验判断”（prior）这个概念来解决这个问题。先验判断的基本思路是：如果有人进入平台市场，我不是简单地对他取平均，而是把他与一个先验信念放在一起取平均。那么这个先验信念的作用就是说：“是的，你得到了一个负面评价，但也许你只是运气不好。“而我的先验信念可能会把你的评分往上拉一点，让你仍然能和平台市场中的其他人并列，给你一个获得工作、获得订单等的机会。

所以我相当坚定地相信，评级系统设计中这种分配公平性的维度非常重要。我认为它被研究得远远不够。说句更概括的话——我认为评级系统整体都被研究得不够，这让我感到震惊。因为从那些集市广场和图拉真市场那样的传统市场到现在，在我看来最大的变化就是：我们现在能够看到匹配的结果如何。

所以，作为一个在平台市场领域工作的数据科学家，我觉得很不可思议——我们当中居然没有更多人花时间思考我们从匹配中学到了什么，这些评级系统在告诉我们什么，以及这对市场中谁赢谁输产生了什么影响，思考这些东西的社会影响。这是我很热衷的事情。

双盲评价的设计

Lenny： 我在 Airbnb 的时候也负责过一段时间的评价系统流程。我最引以为豪的成果之一是推出了我们所谓的”双盲评价”——在你留下自己的评价之前，你看不到对方的评价。初衷是创造更多诚实、更准确的评价。

结果发现，最大的影响是评价率大幅上升了。因为人们会收到这样的邮件：“Ramesh 给你留了一条评价。如果你想看到它，你也应该留下评价。“这真的提高了评价率，从而给了我们更多数据。这是一个非常有意思的实验。

Ramesh Johari： 评价系统的文献中有一个很棒的概念，叫做”沉默之声”（sound of silence），意思是那些没有被留下的评价中蕴含着大量信息。Berkeley 的教授 Steve Tadelis 曾和 eBay 的一些人合作发表过一篇非常好的论文，讨论他们所谓的”有效好评率”（effective percent positive）——不是仅对已留下的评价做归一化，而是把未留下的评价也纳入归一化分母。结果发现，这个指标对卖家后续表现的预测力要强得多。所以，“没有回应”这件事本身就包含了大量信息。你能从中获取更多这类数据，确实很棒。

Lenny： 不留评价可比留差评容易多了，对吧？对你来说不作为的代价就是更低。天哪，平台市场真是太迷人了。我能理解为什么创始人会想做一个平台市场创始人，因为这个领域实在太有趣了。而听到你的反馈说”不，你不是做平台市场的料”，让我们想想你真正要解决的问题——它可能是平台市场，这种思路可能会改变人们的想法。另外，我觉得我们触及的每个话题都能单独做一期播客。我知道很多东西我们都只是浅尝辄止。

我知道你得走了。在我们进入快问快答环节之前，还有什么你想特别强调的、想分享给那些正在做平台市场或考虑做平台市场的人的吗？

AI 与数据科学的未来

Ramesh Johari： 我想强调的一个高层次观点是——正如你所说，这个话题可以聊整整一期播客——我觉得人们总是想象大语言模型和 AI 驱动的数据科学会自动化掉工业界数据科学工作中的很大一部分。我认为这个视角可能是错的。在某种平凡的意义上，这是对的——我写代码比以前更容易了，做可视化比以前更容易了，搭建仪表盘更快了。所以在编程层面，我觉得在某种基本意义上确实如此。

但我相当坚信的一点是——我在这里教授数据科学，我的学生每周都要在所有作业中使用大语言模型和生成式 AI，所以我对这一点有非常切身的观察——AI 实际上为我们做的，是极大地扩展了我们可以思考问题的边界，我们可以提出的假设，我们可以测试的东西。它带来的是解释、想法和原则的天文数字般的爆发。

而我真正认为的是，这实际上给人类施加了更大的压力，而不是更小。我认为人类”人在回路中”与这些工具进行交互变得更加重要，以便驱动那个从海量可能性中筛选出关键因素的漏斗过程，这在各个层面都是如此。比如你在做一份数据科学分析，现在因为有了这些工具，你可以提出 10 种解释，甚至 100 种解释。你要把注意力集中在哪些上面？你要告诉其他人把注意力集中在哪些上面？再比如你做实验，过去测试一个营销活动可能只有 10 个创意素材，现在你有 1000 个创意素材。这可能彻底改变了做实验的含义。你现在到底在寻找什么？你怎么评估你找到的东西足够好了？

我认为这些问题没有得到足够的关注。人们在寻找那个能把人类彻底剔除出去的自动化工具。但据我目前所见——当然，谁知道呢？也许到 2024 年我会给你一个完全不同的答案。我不这么认为。但就目前而言，我看到的是人类实际上在高效的数据科学闭环中变得远比以前重要，而不是更不重要。

Lenny： 这是一个非常重要的观点。我觉得我们需要给这个播客加一个 AI 角落，每次都思考一下 AI 如何影响我们正在讨论的话题。

Ramesh Johari： 是的，我能想象，完全能想象。

Lenny： 好的，我们可能真的会开始这么做。Ramesh，说到这里，我们已经到了令人兴奋的快问快答环节。我有六个问题。让我们尽快过一遍，这样你就能去上课了。准备好了吗？

Ramesh Johari： 准备好了。

快问快答

Lenny： 好的。你有哪两三本最常推荐给别人的书？

Ramesh Johari： 说到书，我有一本最喜欢的，总是第一个推荐——《How to Lie with Statistics》。这是一本很小的书，作者是 Darrell Huff，1954 年出版的。对于任何层面上喜欢数据的人来说，这都是一本非常有趣的读物，很棒的书。

第二本我推荐给人们的，其实即使对非专业人士也同样适用——David Freedman 是 Berkeley 的一位统计学家，在 2000 年代初去世了。他的写作非常出色，能引导我们认真思考过程。他特别推崇他所谓的”皮鞋统计”（shoe leather statistics）——卷起袖子，脚踏实地，真正深入进去，真正努力理解你的数据。

他的文笔和讲解都非常出色。他有几本不同层次的书，我觉得人们都会喜欢读。最重要的是，我喜欢他如此强调对生成数据的过程进行深入取证和理解。而我经常发现，数据科学家甚至连数据样本都不看。

比如在 oDesk，这意味着你有没有去看实际的工作内容，有没有去看你的产品里到底在发生什么，然后再对它做数据科学？我认为这就是 Freedman 的洞见、Freedman 的信条，他的著作真的很棒。

最后一本我想提到的，跟数据科学什么的毫无关系。叫《Four Thousand Weeks》，作者是 Oliver Burkeman。我不是那种热衷于自助类书籍的人，但我真的很喜欢这本书。它有点斯多葛哲学的味道。但核心观点是，你在地球上大约只有 4000 周的时间。我妻子和我有个说法叫”无限队列”（infinite Q）——不管你觉得某一天完成了多少事，总有更多的事情会不断涌入。

他基本上说，认识到这一点反而是种解脱。因为一旦你认识到了，不管你做什么，你总是会有做不完的事。没必要因为事情太多而焦虑。仅仅是这种心态上的小转变，就把更多的注意力放到了人们通常担心的问题上：我应该把时间优先花在哪里？他有一种很好的方式来讲述这个道理，还有一些具体的经验法则来帮助管理这种思维方式。是的，我觉得这是一本很棒的书。

Lenny： 你最喜欢的近期电影或电视剧是什么？

Ramesh Johari： 我是一名攀岩爱好者，有一部电影我非常喜欢——《The Alpinist》。我知道很多人都看过《Free Solo》，但对于喜欢那个类型的人，我会推荐他们看《The Alpinist》。

我觉得攀岩是一项很有意思的运动，因为它有很强的心理层面。而那部电影在元层面也做得很好，让你反思一下：拍摄那些显然把自己置于如此危险境地的人，拍一部关于他们的电影意味着什么？所以我真的很喜欢。

电视剧方面，我们一直在看《Only Murders in the Building》，但我现在落下了好几集，所以我就不多说了，因为我在努力避免任何剧透，我相信听众中也有人也在做同样的事。不过确实是 Hulu 上一部很棒的剧。

Lenny： 你面试候选人时最喜欢问的问题是什么？

Ramesh Johari： 我面试的人可能和你播客的大多数听众不太一样。不过话虽如此，有一个问题我很喜欢经常问，那就是……在学术界的面试中，不管是招研究生还是招教员，我们通常会问对方的计划。

我喜欢问的是：“好，现在想象一下，一切都顺利了——你面临的所有挑战都解决了，你的所有计划都实现了，一切都达到了你设想的上限。你想象一下，做到这一点之后会产生什么影响？谁会因此受到影响？这件事为什么意义重大？”

我发现这是一个非常有价值的问题，因为首先，很多人根本没有想过这个问题。我们太专注于短期，根本没想过”天哪，如果一切顺利，我做的事情会有多大的意义？“当然，创业者在这一点上通常比大多数人做得更好。

另一个我喜欢这个问题的原因是，在对话中你会发现，他们的视野会扩展一些——会触及到他们原本没想到的、会受到影响的其他领域。所以从两个角度来看，这都是一个很有揭示力的问题。这对我的行业很重要，但我直觉认为，这对你的一些听众可能也有用。

面试方式与 AI

Lenny： 相比我在科技公司采访的大多数嘉宾，这是一个非常独特的面试视角。

Ramesh Johari： 对，通常面试都是问编程题，对吧？不过我要说，2022 年 11 月之后我再也不问编程题了——自从我们有了 AI 来帮我们写代码。我认为这是一种超能力。

人生座右铭

Lenny： 你有没有什么喜欢对自己重复、分享给他人、在日常生活中觉得有用的人生座右铭？

Ramesh Johari： 我的工作很大程度上涉及和各种各样学生交流。这些学生后来有的成了数据科学家，有的成了创始人，很多进入了科技行业。所以从这个意义上说，我的建议可能是相关的。

我最常告诉人们的就是：慢下来。我发现，我们太相信速度是找到正确答案的方式，以至于我们根本不慢下来去建立对所做之事的有意义的心智模型。在我参与的研究项目中确实如此。当我和商业界的人交谈、问他们的……所谓心智模型，我的意思是：如果你在运营一个平台市场，你对用户关心什么有什么模型？什么让人留下、什么让人离开？什么让匹配成功、什么让匹配失败？所有这些东西会在你脑中形成一张路线图。而我觉得现在很多路线规划、很多执行、学术界很多论文写作，都变得越来越快节奏，代价是对你所构建之物的结构性特征缺乏更深入的思考。

所以不管是和我的学生，还是和工业界的人交流，我认为慢下来实际上是一种被低估的美德。

Lenny： 这和最近一位嘉宾分享的座右铭非常相似，我想是”慢即是快”，或者”保持平稳才能快”。

Ramesh Johari： 说得好，我很喜欢。也许我去跟研究生谈话时会借用一下。

斯坦福教授的真实体验

Lenny： 最后一个问题。你是斯坦福大学的教授，听起来非常酷。关于在斯坦福当教授——不管是特指斯坦福还是泛指——有什么让人意想不到的事情，好的坏的都行？

Ramesh Johari： 嗯，我们经历了一段艰难时期，大家可能都知道。斯坦福上了很多新闻，过去五年尤其如此，原因都不太光彩。

所以我不知道这算不算”惊喜”，但我觉得在斯坦福让我感到很有活力的一点是，这里从来没有人要求我出示资历证明。我的意思是，我之前在其他几所很好的学校待过，显然也在工业界和一些很棒的公司共事过。在很多地方容易形成的一种文化动态是：“在我跟你交谈之前，我得先知道你值不值得聊。把你的资历亮出来。你在哪里读的研究生？在哪里当教授？先介绍一下你自己。”

我来这里之后感到非常惊讶的一件事是，这种情况在任何层级都没有发生过。研究生们经常跟我说——你可以直接去校园另一头找某人，一上来就聊你的 X 怎么和我的 Y 结合，我们能一起做什么。作为教员，这种事更是家常便饭。就在前几天，我还在和一个人聊关于纳米加工实验设计平台市场的事情——这完全不在我的专业领域内，但我们的对话毫无障碍。我们讨论的是实质内容，而不是互相展示资历。

我真的认为部分原因在于斯坦福的独特之处——它在各个领域都没有短板。我们有很强的职业学院——法学院、商学院、医学院——有很强的工程学院，有很强的人文和社会科学。然后我通常还会加上天气——这话我是认真的，天气很重要。人们愿意走到任何地方去。我觉得这些因素共同营造了一种不需要对每个人查验资历的文化和环境。

我认为这意义非凡。这是我在其他地方没有发现的。如果有人想知道斯坦福内部是什么样，我觉得这是一个不太常被讨论的方面。这也是让在这里工作变得非常有趣的原因之一。

Lenny： 而且斯坦福的校园也梦幻极了，走在里面非常愉悦。这肯定也有帮助。Ramesh，我觉得我们让听众的大脑都在嗡嗡作响了。我想我们既催生了新的平台市场创始人，也可能说服了一些人他们其实不适合做平台市场创始人。所以也许我们净增了零个新创始人。最后两个问题：人们想联系你的话在网上哪里能找到你？听众怎样才能帮到你？

Ramesh Johari： 如果是对工业界方向更感兴趣的人，最简单的方式可能是 LinkedIn。你可以在那里给我发消息或加我好友。另外，因为我是学术界的，我也有自己的斯坦福主页，找到我的方式也很简单。

Ramesh Johari： 听众怎样才能帮到我？我觉得正在听这个节目的人能做的最重要的事情，就是把我们在节目中讨论的关于数据素养意味着什么的理念带回去、传播出去。我认为在提升数据素养方面，你能做的事情有很多。

最后再分享一个想法：就像 AI 能生成大量创意一样，AI 也会生成大量文字。而在数据科学领域，这实际上可能是致命的，因为你会得到更多的解释，而其中有些可能是多余的。

所以把这一点当作一个小故事来看，我认为这个世界需要的是人们在与其他工具和彼此时互动时具备数据素养。这是我最关心的事情。我所教授的课程、我所做的研究，都与这个主题相连。这也是我感到兴奋的地方。我确实会定期与企业合作，所以如果有有趣的机遇落在我们在播客中讨论过的这些领域内，我总是很乐意倾听。

Lenny： 太棒了。我想我们在帮助人们提升数据素养方面已经有所推进。Ramesh，非常感谢你来到这里。

Ramesh Johari： 好的，非常感谢你，Lenny。

Lenny： 大家再见。

非常感谢你的收听。如果你觉得这期节目有价值，可以在 Apple Podcasts、Spotify 或你最喜欢的播客应用上订阅本节目。另外，请考虑给我们评分或留下评论，因为这真的能帮助其他听众发现这个播客。你可以在 lennyspodcast.com 找到所有往期节目或了解更多关于本节目的信息。下期再见。

术语表

原文	中文
Agora	集市广场
Airbnb	Airbnb（平台名，保留原文）
badging	徽章
Bayesian A/B testing	贝叶斯 A/B 测试
black box algorithms	黑箱算法
causal inference	因果推断
CMO	首席营销官（CMO）
credentialing	资历证明
Darrell Huff	Darrell Huff（人名，保留原文）
data literate / data literacy	数据素养
David Freedman	David Freedman（人名，保留原文）
disintermediation	去中介化
distributional fairness	分配公平性
DoorDash	DoorDash（平台名，保留原文）
double-blind reviews	双盲评价
effective percent positive	有效好评率
flywheel	飞轮
Four Thousand Weeks	Four Thousand Weeks（书名，保留原文）
Free Solo	Free Solo（电影名，保留原文）
frequentist statistics	频率学派统计
friction	摩擦（指交易成本/障碍）
holdout group	对照组
How to Lie with Statistics	How to Lie with Statistics（书名，保留原文）
Hulu	Hulu（平台名，保留原文）
humans in the loop	人在回路中
hypothesis driven	假设驱动
infinite Q	无限队列
John Horton	John Horton（人名，保留原文）
lead gen	线索生成
Lenny	Lenny（人名，保留原文）
lifetime value, LTV	终身价值（LTV）
LinkedIn	LinkedIn（平台名，保留原文）
liquidity	流动性
litmus test	试金石
local maxima	局部最优
Lyft	Lyft（平台名，保留原文）
machine learning model	机器学习模型
market failure	市场失灵
Marketplace	平台市场
matching algorithm	匹配算法
mental model	心智模型
nano fabrication	纳米加工
norming	规范化
oDesk	oDesk（平台名，保留原文）
Oliver Burkeman	Oliver Burkeman（人名，保留原文）
Only Murders in the Building	Only Murders in the Building（剧名，保留原文）
out of left field	意想不到的领域
positive externality	正外部性
prediction market	预测市场
prior	先验判断
quantified	量化思考
Ramesh Johari	Ramesh Johari（人名，保留原文）
rating inflation	评分通胀
reciprocity	互惠性
renorming	重新规范
roadmap	路线图
roadmapping	路线规划
Ronald Coase	罗纳德·科斯
Ronald Fisher	Ronald Fisher（实验设计之父，保留原文）
Ronny Kohavi	Ronny Kohari（人名，保留原文）
Santa Cruz	Santa Cruz（地名，保留原文）
scaled liquidity	规模化流动性
Seinfeld	《宋飞正传》
Servaes Tholen	Servaes Tholen（人名，保留原文）
shoe leather statistics	皮鞋统计
smell test	直觉检验
sound of silence	沉默之声
Stanford	斯坦福
stat sig	统计显著性
Steve Tadelis	Steve Tadelis（人名，保留原文）
Stitch Fix	Stitch Fix（品牌名，保留原文）
Superhost	Superhost（Airbnb 功能名，保留原文）
The Alpinist	The Alpinist（电影名，保留原文）
Thumbtack	Thumbtack（平台名，保留原文）
Trajan’s Market	图拉真市场
transaction costs	交易成本
Uber	Uber（平台名，保留原文）
unit economics	单位经济
Upwork	Upwork（平台名，保留原文）
UrbanSitter	UrbanSitter（平台名，保留原文）
Venmo	Venmo（支付服务名，保留原文）
vignette	小故事
whac-a-mole	打地鼠游戏

此文档由 AI 分片翻译（translate_long_document）

Marketplace lessons from Uber, Airbnb, Bumble, and more | Ramesh Johari (Stanford professor)

Meeting Ramesh Johari

What Is a Marketplace Business

Friction and Transaction Costs

Role of Data Science in Marketplaces

Common Marketplace Failure Modes

Every Founder Is a Marketplace Founder

The oDesk Pricing Dilemma

Expanding Boundaries vs Breaking Contracts

Advice for Marketplace Founders

Driving One Side With Another

Markets vs Firms

Marketplace Labor: Employees to Contractors

The Power of Data in Marketplaces

Machine Learning: Prediction vs Decisions

Difference Between Prediction and Decisions

Causal Inference and the Data Science Flywheel

Evaluating Match Quality

Experiment Limits and Local Optima

Experiment Limits and Incentive Traps

Culture of Winning vs Learning

The Story of the Superhost Badge

Data Scientist Roles and Expectations

Leveraging Past Data: Bayesian A/B Testing

Learning Is Not Free

Rating System Challenges and Score Inflation

Fairness Issues in Average Ratings

Designing Double Blind Reviews

The Future of AI and Data Science

Rapid Fire Q&A

Interview Methods and AI

Recent Favorite Product Discoveries

My Life Motto

The Real Stanford Professor Experience

Glossary

来自 Uber、Airbnb、Bumble 等平台的平台市场（Marketplace）经验 | Ramesh Johari（斯坦福大学教授）

来自 Uber、Airbnb、Bumble 等平台的平台市场（Marketplace）经验 | Ramesh Johari（斯坦福大学教授）

访谈实录

初次结识

什么是平台市场业务

摩擦与交易成本

数据科学在平台市场中的角色

平台市场最常见的失败模式

UrbanSitter 的启示

从支付摩擦到匹配价值

每个创始人都是平台市场创始人

oDesk 的定价困境

扩大边界还是打破契约

给平台市场创始人的建议

用一方带动另一方

市场与企业

平台市场的劳动力管理：从员工到合同工

数据在平台市场中的力量

机器学习模型：预测与决策

预测与决策的区别

因果推断与平台市场数据科学飞轮

评估匹配质量

实验的局限与局部最优

实验的局限与激励陷阱

“胜利”与”学习”的文化

Superhost 徽章的故事

数据科学家的定位与期望

利用过去的学习：贝叶斯 A/B 测试

学习不是免费的

评级系统的挑战与评分通胀

评分平均的分布公平性问题

双盲评价的设计

AI 与数据科学的未来

快问快答

面试方式与 AI

最近发现的好产品

人生座右铭

斯坦福教授的真实体验

术语表