AI 教母谈就业、机器人以及为什么世界模型是下一个方向 | 李飞飞
AI 教母谈就业、机器人以及为什么世界模型是下一个方向 | 李飞飞
文字稿
Lenny Rachitsky: 很多人称你为AI教母,你所作的工作实际上正是带领我们走出AI寒冬的那颗火星。
李飞飞: 2015年中到2016年中,一些科技公司刻意避免使用AI这个词,因为他们拿不准AI是不是一个不太光彩的词。大约2017年前后,企业才开始自称AI公司。
Lenny Rachitsky: 有这样一句话,我想是你向国会做陈述时说的——“AI没有任何’人工’之处。它受人启发,由人创造,最重要的是,它影响人。”
李飞飞: 我并不是认为AI不会对就业或人产生影响。事实上,我相信AI无论现在还是将来做什么,都取决于我们——取决于人。我确实相信技术对人类整体是正面的,但我认为每一项技术都是双刃剑。如果我们作为社会、作为个人不做正确的事,我们同样可能把这一切搞砸。
Lenny Rachitsky: 你有一个突破性的洞察——我们可以训练机器像人一样思考,只是缺少了人类在孩童时期所拥有的那些数据。
李飞飞: 我选择从视觉智能(visual intelligence)的角度来研究人工智能,因为人类是深度的视觉动物。我们需要用尽可能多的物体图像信息来训练机器,但物体非常非常难以学习。一个单独的物体在图像上可以呈现出无限多种样子。为了用成千上万的物体概念来训练计算机,你确实需要给它展示数百万个示例。
嘉宾介绍
Lenny Rachitsky: 今天的嘉宾是李飞飞博士,她被誉为AI教母。飞飞主导并身处当前正在经历的AI革命的许多最大突破的核心。她牵头创建了 ImageNet,这基本上源自她的一个洞察——AI需要大量干净标注的数据才能变得更聪明,而这个数据集成为了引领当前构建和扩展 AI 模型方法的突破性成果。她曾担任 Google Cloud 首席 AI 科学家,一些最大的早期技术突破正是在那里诞生的。她曾担任 SAIL——斯坦福人工智能实验室(Stanford Artificial Intelligence Lab)的主任,许多顶尖的 AI 人才都出自那里。她还是斯坦福以人为本 AI 研究院(Human-Centered AI Institute)的联合创建者,该研究院在 AI 的发展方向上扮演着至关重要的角色。她曾担任 Twitter 董事会成员,被评为《时代》杂志 AI 领域百大最具影响力人物之一。她还是联合国顾问委员会成员。我还可以继续说下去。
在我们的对话中,飞飞简要回顾了我们如何走到今天这个 AI 世界的历程,其中包括一个令人惊叹的提醒——九到十年前,自称AI公司对品牌来说基本等于宣判死刑,因为没人相信 AI 真的能行。今天则完全不同了,每家公司都是AI公司。我们还聊了她对 AI 未来如何影响人类的看法、当前技术能带我们走多远、为什么她对构建世界模型如此充满热情以及世界模型究竟是什么。最令人兴奋的是,世界上第一个大型世界模型 Marble 恰好与这期播客同期推出。任何人都可以去 marble.worldlabs.ai 体验。真的太疯狂了,一定要去看看。飞飞非常了不起,考虑到她对世界的影响力,她的知名度远远不够,所以我非常高兴能邀请她来做客,把她的智慧传播给更多人。
非常感谢 Ben Horowitz 和 Condoleezza Rice 为这次对话建议了话题。如果你喜欢这档播客,别忘了在你常用的播客应用或 YouTube 上订阅关注。
AI 对人类的影响
Lenny Rachitsky: 飞飞,非常感谢你的到来,欢迎来到播客。
李飞飞: 我很高兴来到这里,Lenny。
Lenny Rachitsky: 我更高兴你能来。能和你聊天真是太荣幸了。我想聊的话题太多了。长期以来,你一直处于我们正在见证的这场 AI 爆发的核心。我们会聊很多历史,我认为很多人甚至不知道这一切是怎么开始的。但让我先读一段《连线》杂志对你的评价,让大家有个概念——在介绍中我会分享你所做的其他了不起的事,但我觉得这是一个很好的切入点——“飞飞是一小群科学家中的一员,这群人也许少到能围坐在一张餐桌旁,正是他们促成了 AI 近期的非凡进步。”
很多人称你为AI教母。与许多AI领导者不同,你是一位AI乐观主义者。你不认为AI会取代我们,不认为它会抢走我们所有的工作,也不认为它会毁灭我们。所以我觉得从这里开始会很有趣——你怎么看AI将如何随时间推移影响人类?
技术的双刃剑
李飞飞: 好,Lenny,让我先说清楚。我不是乌托邦主义者,我并不认为AI对就业或人类完全没有影响。事实上,我是一个人本主义者。我相信无论AI现在还是将来做什么,都取决于我们——取决于人。所以我的确相信技术对人类整体是正向的。如果你回望文明的漫长历程,我们从根本上是一个创新的物种。从数千年前的文字记录到现在,人类一直在不断创新自身、创新工具,并由此改善生活、改善工作、建设文明。我相信AI也是其中的一部分。乐观主义正是由此而来。但我认为每项技术都是双刃剑。如果我们作为一个物种、一个社会、一个个社区、一个个个体没有做正确的事,我们也可能把这一切搞砸。
Lenny Rachitsky: 有句话我觉得特别好,应该是你在国会做陈述时说的——“AI中没有什么是’人工’的。它受人启发,由人创造,最重要的是,它影响人。” 我没什么具体问题想问,但这话说得真好。
李飞飞: 是的,我对此感触很深。我开始从事AI研究已经二十五年了,过去二十年里一直在带学生。几乎每一个毕业的学生,在他们离开我的实验室时,我都会提醒他们:你们这个领域叫人工智能,但其中没有任何东西是”人工”的。
人类的责任
Lenny Rachitsky: 回到你刚才说的——这一切的走向取决于我们,你认为我们需要做对什么?我们怎样才能让事情走上正轨?我知道这是一个很难回答的问题,但你的建议是什么?你觉得我们应该牢记什么?
李飞飞: 嗯,我们有几个小时?
Lenny Rachitsky: 怎么让AI对齐?来吧,把它解决掉。
李飞飞: 我认为无论做什么,人都应该是负责任的个体。这是我们教给孩子的道理,也是我们成年人需要做到的。无论你参与的是AI开发、AI部署还是AI应用的哪个环节——而且很可能我们中的许多人,尤其是技术从业者,会同时参与多个环节——我们都应该像负责任的个体那样行事,关心这件事。实际上,应该非常关心。我认为今天每个人都应该关心AI,因为它会影响你个人的生活,会影响你的社区,会影响整个社会和子孙后代。以一个负责任的人的姿态去关心它,这是第一步,也是最重要的一步。
AI的漫长前史
Lenny Rachitsky: 好,让我退一步,回到AI的开端。大多数人是近几年ChatGPT发布之后才开始听说并关心所谓的”AI”的。大概是三年前?
李飞飞: 三年前,再过一个月差不多就整整三年了。
Lenny Rachitsky: 哇,好。那就是ChatGPT的发布。这是你心中的里程碑吗?
李飞飞: 是的。
Lenny Rachitsky: 好,我也是这么看的。但很少有人知道,在此之前有一段漫长的历史,有很多人在从事相关工作——那时候叫机器学习(machine learning),还有其他术语,而现在一切都叫AI了。那时有一段很长的时期,很多人在默默耕耘。然后出现了人们所说的AI寒冬(AI winter)——大家几乎放弃了,大多数人确实放弃了,觉得这个想法走不通。而你做的工作,实际上是带领我们走出AI寒冬的火种,直接催生了今天这个AI无处不在的世界。正如你刚才所说,它将影响我们所做的一切。所以我觉得听你亲口讲述会非常有意思——ImageNet之前的世界是什么样的,你创建ImageNet时做了什么工作,为什么它如此重要,以及之后发生了什么。
李飞飞: 对我来说,很难意识到AI对所有人来说还是如此新鲜,因为我整个职业生涯都活在AI中。我内心有一部分感到非常满足——看到我刚刚走出青少年时期就开始的个人好奇,如今已成为我们文明的变革性力量。它确实是一种文明级别的技术。这段旅程大约有三十年,或者说二十多年,非常令人满足。那么,这一切从何开始呢?其实我甚至不是第一代AI研究者。第一代可以追溯到五六十年代,而图灵在四十年代就已领先于他的时代,向人类大胆提出了一个问题:“是否存在会思考的机器?“当然他有一种具体的方式来检验”思考机器”的概念,那就是对话式聊天机器人——按照他的标准,我们现在已经拥有了会思考的机器。
但这只是一个更具轶事色彩的启发。这个领域真正始于五十年代,当时计算机科学家们聚在一起,研究如何利用计算机程序和算法来构建那些能够完成过去只有人类认知才能做到的事情的程序。那就是起点。奠基者们聚集在1956年的达特茅斯会议(Dartmouth workshop),John McCarthy 教授——他后来来到斯坦福——创造了”人工智能”这个术语。从五十年代、六十年代、七十年代到八十年代,那是AI探索的早期阶段,我们有了逻辑系统(logic systems)、专家系统(expert systems),也有了神经网络(neural network)的早期探索。然后到了八十年代末、九十年代和二十一世纪初,大约二十年的时间里,机器学习开始兴起——这是计算机编程与统计学习(statistical learning)之间的结合。
这种结合为AI带来了一个非常关键的概念:纯粹的基于规则的程序无法涵盖我们期望计算机能够胜任的大量认知能力。我们必须让机器去学习模式。一旦机器能够学习模式,它就有希望做更多的事情。比如,你给它三只猫的图片,希望不只是让机器认出这三只猫,而是让它能认出第四只、第五只、第六只以及所有其他的猫。这是一种学习能力,对人类和其他动物来说都是根本性的。我们作为一个领域意识到:“我们需要机器学习。“这就是到二十一世纪初为止的状况。我进入AI领域恰好是在2000年,那是我博士生涯的开始,在Caltech。
所以我算是第一代机器学习研究者,我们当时已经在研究机器学习的概念,尤其是神经网络。我记得在Caltech的最早课程之一就叫”神经网络”,但那段经历相当痛苦。当时仍处于所谓的AI寒冬的正中央——公众不太关注这个领域,资金也不多,但仍然有很多想法在流动。我觉得有两件事让我的职业生涯与现代AI的诞生如此紧密相连:第一件事是我选择透过视觉智能的视角来研究人工智能,因为人类是深度依赖视觉的动物。我们可以稍后再多聊这个,但我们的智能中有太多是建立在视觉的、感知的、空间的理解之上的,而不仅仅是语言本身。我认为二者是互补的。
视觉智能与 ImageNet 的诞生
李飞飞: 所以我选择从视觉智能的角度切入。在我的博士阶段和早期任教期间,我和学生们非常坚定地致力于一个北极星问题——解决物体识别问题,因为它是感知世界的基石,对吧?我们在这个世界中穿行、理解、推理并与之交互,基本上都是在物体层面进行的。我们不会在分子层面与世界交互。我们也不会——有时会,但很少,比如你想提起一个茶壶,你不会说:“好,这个茶壶由一百片瓷片组成,让我分别处理这一百片。“你会把它看作一个物体来与之交互。所以物体非常重要。我是最早将此识别为北极星问题的研究者之一。而作为AI的学习者和研究者,我当时在研究各种各样的数学模型,包括神经网络、贝叶斯网络(Bayesian network),以及许许多多其他模型。
而当时有一个非常突出的痛点:这些模型没有数据可以训练。整个领域都高度聚焦在模型上,但我逐渐意识到,人类学习以及进化,实际上本身就是一个大数据学习过程。人类不断地通过大量经验来学习。在进化的时间尺度上,动物就是通过不断体验世界来进化的。所以我和学生们推测,让AI真正活过来的一个被严重忽视的关键要素就是大数据。然后我们在2006、2007年左右启动了ImageNet项目。我们非常有野心,想把整个互联网上关于物体的图像数据都拿下来。当然,那时的互联网比今天小得多,所以我觉得这个野心至少不算太疯狂。当然,现在想想几个研究生和一个教授要做这件事,完全是痴人说梦。
深度学习的诞生
但我们确实做到了。我们精心整理了互联网上的一千五百万张图像,创建了涵盖两万两千个概念的分类体系,借鉴了其他研究者的工作,比如语言学家在WordNet上的成果——那是一种特定的词语编纂方式。我们把这些整合进ImageNet,并将其开源给研究社区。我们每年举办ImageNet竞赛来鼓励大家参与。我们继续自己的研究,但2012年被很多人认为是深度学习的开端,也是现代AI诞生的时刻——因为由Geoff Hinton教授领导的多伦多研究团队参加了ImageNet竞赛,利用ImageNet的大数据和两块NVIDIA的GPU,成功创建了第一个神经网络算法,能够……
它没有完全解决问题,但在物体识别上取得了巨大的进步。大数据、神经网络和GPU这三要素的组合,基本上成了现代AI的黄金配方。然后快进到AI的公众时刻,也就是ChatGPT时刻——如果你看是什么技术要素把ChatGPT带到这个世界的,它依然在使用这三个要素。现在是互联网规模的数据,主要是文本;是比2012年复杂得多的神经网络架构,但仍然是神经网络;是更多的GPU,但仍然是GPU。这三个要素仍然是现代AI的核心。
Lenny Rachitsky: 太不可思议了。我之前从没听过这段完整的故事。我最喜欢的是一开始只有两块GPU。这一点我太喜欢了。而现在,大概几十万块了吧,而且性能强了几个数量级。
李飞飞: 是的。
Lenny Rachitsky: 那两块GPU就是直接买的,就是游戏GPU,他们直接去了——
李飞飞: 对。
Lenny Rachitsky: 去玩家们打游戏的商店买的。如你所说,这在很大程度上仍然是模型变得更聪明的方式。目前世界上增长最快的一些公司,我基本都请来做过播客——Mercor、Surge、Scale——他们持续为实验室做这件事,就是给它们越来越多关于它们最感兴趣方向的标注数据。
李飞飞: 是的,我很早就认识Scale的Alex Wang。我可能还保留着他创立Scale早期发给我的邮件。他非常友善,一直给我发邮件说ImageNet如何启发了Scale。我对此非常欣慰。
“AI”曾经是一个脏词
Lenny Rachitsky: 你刚才分享的内容中,另一个让我印象深刻的点就是那种高能动性的范例——就是去做事情,这在Twitter上几乎成了一个梗:你就是可以去做。你当时就是说,好吧,这大概是推动AI所必需的。那时候大家用的是”机器学习”这个说法吗?那是大多数人用的术语吗?
李飞飞: 我觉得这些术语当时是混着用的。确实如此。我确实记得那些科技公司——我不点名了——在早期的一次对话中,大概是2015年中或2016年中,一些科技公司刻意避免使用”AI”这个词,因为他们不确定AI是不是一个脏词。我记得我当时实际上在鼓励大家使用”AI”这个词,因为对我来说,这是人类在科技探索的进程中提出过的最大胆的问题之一,我为这个术语感到非常自豪。但确实,一开始有些人不太确定。
Lenny Rachitsky: 大概是哪一年,AI还是脏词的时候?
李飞飞: 2016年,因为那正是——
Lenny Rachitsky: 2016年,不到十年前。
李飞飞: 那正是转变的时期。一些人开始叫它AI了,但如果你看硅谷科技公司的营销用语,我觉得大概是2017年左右,公司们才开始称自己为AI公司。
Lenny Rachitsky: 太不可思议了。世界变化之大。
李飞飞: 是的。
Lenny Rachitsky: 现在,你不可能不称自己为AI公司。
李飞飞: 我知道。
Lenny Rachitsky: 才过了九年左右而已。
李飞飞: 是的。
AI的历史是代代相传的集体成就
Lenny Rachitsky: 天哪。好的。关于那段早期历史,还有什么你觉得人们不太了解但又很重要的事情吗?在我们聊你对未来的看法和你正在做的工作之前。
李飞飞: 我觉得就像所有的历史一样,我很清楚自己因参与这段历史而受到认可,但背后有太多的英雄和太多的研究者。我们说的是几代研究者。在我自己的世界里,有太多启发过我的人,我在书里也谈到过。但我确实觉得我们的文化,尤其是硅谷文化,倾向于把成就归于单个人。虽然我认为这有一定道理,但大家应该记住,AI是一个到现在已有七十年历史的领域,我们经历了许多代人。没有任何人能独自走到今天。
AGI 还有多远?
Lenny Rachitsky: 好,那我问你这个问题。感觉我们总是处于AGI的悬崖边上,这个人们到处抛来抛去的模糊概念——AGI要来了,它将接管一切。你觉得我们距离AGI还有多远?你觉得按照当前的轨迹能到达那里吗?你觉得还需要更多突破吗?你觉得当前的方法能带我们到达吗?
AGI 的定义与科学追求
李飞飞: 这是个很有意思的术语,Lenny。我不知道有没有人真正定义过 AGI。有很多不同的定义,从某种意义上的机器超级能力,一直到机器能成为社会中有经济价值的代理人——换句话说,能挣工资养活自己。这就是 AGI 的定义吗?作为一名科学家,我对待科学非常认真,我进入这个领域是因为受到了一个大胆的问题的启发——机器能否像人类一样思考和行动?对我来说,这始终是 AI 的北极星。从这个角度来看,我不知道 AI 和 AGI 之间有什么区别。
我认为我们在实现部分目标方面做得很好,包括对话式 AI,但我不认为我们已经完全攻克了 AI 的所有目标。我们的先驱者 Alan Turing——我在想,如果 Alan Turing 今天还在,你让他比较一下 AI 和 AGI,他可能耸耸肩说:“嗯,我在二十世纪四十年代就问过同样的问题。“所以我不想掉进定义 AI 与 AGI 的兔子洞里。作为科学家和技术从业者,我觉得 AGI 更多是一个营销术语,而非科学术语。AI 是我的北极星,是我这个领域的北极星,人们想叫它什么名字都可以,我很高兴。
我们需要更多突破
Lenny Rachitsky: 那我换个问法。就像你描述的那样,从 ImageNet 和 AlexNet 把我们带到了今天,基本上就是 GPU、数据、标注数据,以及模型算法。Transformer 看起来也是这条轨迹上的重要一步。你觉得同样的这些组件能带我们走向——我不知道——聪明十倍的模型,对全世界产生颠覆性改变的东西吗?还是说你觉得我们需要更多突破?我知道我们接下来要谈世界模型,我认为这也是其中的一个组成部分。除此之外,你有没有觉得某些方向会碰到瓶颈,或者说这些组件就能带我们前进,只需要更多数据、更多算力、更多 GPU?
李飞飞: 不,我绝对认为我们需要更多创新。通过扩大规模——更多数据、更多 GPU、更大的当前模型架构——还有很多事情可以做,但我绝对认为我们需要更多创新。人类历史上没有任何一门深刻的科学学科到达了一个可以说”我们完成了,不需要再创新了”的地方。而 AI 是人类文明科学技术中最年轻的学科之一,甚至可能是最年轻的,我们还只是在挠表面。比如,就像我说的,我们接下来要转到世界模型的话题。今天,你拿一个模型,让它看一段几个办公室房间的视频,然后让模型数椅子的数量。这是一个幼儿甚至小学生都能做到的事情,但 AI 做不到,对吧?
所以当今 AI 做不到的事情还有很多,更不用说像 Isaac Newton 那样——观察天体的运动,推导出一组支配所有物体运动的方程——那种层次的创造力、外推能力和抽象能力。我们目前完全没有办法让 AI 做到这一点。然后再看看情感智能。你想象一个学生来到老师办公室,进行一场关于动力、热情、学什么、什么问题真正困扰自己的对话。尽管今天的对话机器人已经非常强大,你从今天的 AI 中得不到那种层次的情感认知智能。所以我们还有很多可以做得更好的地方,我不认为我们的创新已经结束了。
AI 能否复制科学突破
Lenny Rachitsky: Demis 最近有一个很有意思的采访,来自 DeepMind/Google 那边。有人问他,“你觉得我们离 AGI 还有多远?到达那里的路径是什么样的?“他的回答方式非常有趣——如果我们把二十世纪末之前的所有信息都给最先进的模型,看看它能不能得出 Einstein 的所有突破。到目前为止,我们还差得远,但他们可以——
李飞飞: 不,我们确实差得远。事实上,情况甚至更糟。让我们给 AI 所有天体的数据,包括现代仪器的数据——Newton 当时根本没有这些东西——然后把数据交给它,仅仅要求 AI 创造出十七世纪那套关于物体运动定律的方程。今天的 AI 做不到。
Lenny Rachitsky: 好吧。所以我的理解是,我们还差得很远。
李飞飞: 是的。
世界模型
Lenny Rachitsky: 好,那我们来聊聊世界模型。对我来说,这又是一个你走在时代前面的精彩例证。你早就走在前面——我们只是需要大量干净的数据让 AI 和神经网络去学习。你很久以来一直在谈世界模型这个想法。你创办了一家公司来构建——简单来说,有语言模型,这是不同的东西,这是世界模型。我们接下来会谈到它到底是什么。而现在,我在准备这次采访时看到 Elon 在谈世界模型,Jensen 在谈世界模型,我知道 Google 也在做这方面的研究。你已经在这条路上走了很久了,而且你刚刚发布了一个产品,我们会在本期播客上线之前讨论。请谈谈,什么是世界模型?为什么它如此重要?
李飞飞: 我非常高兴看到越来越多的人谈论世界模型,比如 Elon,比如 Jensen。我一生都在思考如何真正推动 AI 向前发展,而过去几年里,从研究界走出来的大语言模型,以及 OpenAI 等等,即使对我这样的研究者来说也是极具启发性的。我记得 GPT2 发布的时候,那大概是 2020 年底。我当时是——我现在依然是——但那时我是斯坦福以人为本 AI 研究院的全职联席院长。我记得那时……公众还没有意识到大语言模型的力量,但作为研究者,我们看到了它,看到了未来。我和我的自然语言处理同事们,比如 Percy Liang 和 Chris Manning,进行了相当深入的讨论。我们谈到这项技术将会变得多么关键。斯坦福 AI 研究院——以人为本 AI 研究院,HAI——是第一个建立基础模型全职研究中心的机构。
我们——Percy Liang 和许多研究者——主导了第一篇关于基础模型的学术论文。所以这一切对我来说非常令人振奋。当然,我来自视觉智能的世界,我一直在想,在语言之外还有太多我们可以推进的东西。因为人类——人类运用空间感知、对世界的理解来做很多事情,而这些远远超越了语言。想象一个非常混乱的第一响应者现场,不管是火灾、交通事故还是自然灾害。如果你置身于那样的场景,想想人们如何自组织去救人、阻止灾情扩大、扑灭火灾——这其中很大程度上是运动,是对物体、世界的自发性理解,是对情境的感知。语言是其中的一部分,但在很多这样的场景中,光靠语言是无法帮你把火扑灭的。
空间智能与 World Labs 的创立
李飞飞: 所以,这是什么?我一直在思考。与此同时,我做了大量机器人研究,然后我意识到,要连接语言之外的智能——具身 AI,也就是机器人——连接视觉智能的关键纽带,就是对世界的空间智能感知。我记得那是 2024 年,我在 TED 上做了一个关于空间智能和世界模型的演讲。而这个想法最初是在 2022 年基于我的机器人和计算机视觉研究开始成形的。那时我非常清楚的一点是,我想和最顶尖的技术人才合作,尽可能快地把这项技术变为现实。于是我们创立了这家名为 World Labs 的公司。你可以看到”世界”这个词就在我们公司名字里,因为我们深信世界建模和空间智能。
Lenny Rachitsky: 人们已经习惯了聊天机器人,那就是大语言模型。理解世界模型的一个简单方式是:你描述一个场景,它就能生成一个可以无限探索的世界。我们会链接到你发布的产品,这个稍后再谈,但这样的理解对吗?
李飞飞: 这只说对了一部分,Lenny。我觉得理解世界模型的一个简单方式是,这个模型可以让任何人通过提示——无论是一张图片还是一句话——在脑海中的眼睛里创造出任何世界。而且还能在这个世界中进行交互,无论你是浏览、行走、拾取物体还是改变事物,以及在这个世界中进行推理。例如,如果使用这个世界模型输出的是——一个机器人作为智能体,它应该能够规划路径,比如帮忙整理厨房。所以世界模型是一个基础,你可以用它来推理、交互和创造世界。
具身 AI 与机器人
Lenny Rachitsky: 很好。机器人感觉可能是 AI 研究者的下一个重大方向,而且对世界的影响巨大。你在这里说的是,这是让机器人在现实世界中真正运作的关键缺失环节——理解世界是如何运转的。
李飞飞: 是的。不过首先,我认为比机器人更令人兴奋的东西还有很多。但我同意你刚才说的所有内容。我认为世界建模和空间智能是具身 AI 的关键缺失一环。我也想说,不要低估人类本身就是具身智能体这一点,人类可以被 AI 的智能所增强。就像今天,人类是语言动物,但我们已经很大程度上被 AI 增强,帮助我们完成语言任务,包括软件工程。我认为我们不应该低估——或者也许我们往往不谈论——人类作为具身智能体,实际上可以从世界模型和空间智能模型中获益良多,就像机器人一样。
世界模型的应用前景
Lenny Rachitsky: 所以这里的大突破,机器人——如果实现的话将是巨大的事情,想象一下我们每个人都有机器人为我们做各种事情,帮助我们应对灾害等等。游戏显然也是一个非常酷的例子,就像你可以凭空创造出无限可玩的游戏。然后是创造力,就是好玩、发挥创意、想象神奇的、天马行空的新世界和环境。
李飞飞: 还有设计——人类设计从机器到建筑到住宅的一切,以及科学发现。可能性太多了。我喜欢用 DNA 结构发现这个例子。如果你回顾 DNA 发现史上最重要的一环,那就是 Rosalind Franklin 拍摄的 X 射线衍射照片。那是一张二维平面照片,拍摄的是一个看起来像十字形的衍射结构。你可以去搜那些照片看看。但凭借那张二维平面照片,人类——尤其是两位重要人物 James Watson 和 Francis Crick——结合其他信息,能够在三维空间中进行推理,推断出 DNA 高度三维的双螺旋结构。这个结构不可能是二维的。你不可能用二维思维推断出那个结构。你必须用三维空间思维,运用人类的空间智能。所以我认为即使在科学发现中,空间智能——或者说 AI 辅助的空间智能——也是至关重要的。
Lenny Rachitsky: 这让我想起 Chris Dixon 说过的那句话:下一个大事物一开始看起来都像玩具。当 ChatGPT 刚出来的时候,我记得 Sam Altman 只是在推特上发了一句类似”这是我们在玩的一个酷东西,大家看看”。而现在,它是有史以来增长最快的产品,改变了世界。往往是那些看起来——好吧这挺酷的、挺好玩的东西——最终最能改变世界。
Ben Horowitz 与”苦涩的教训”
Lenny Rachitsky: 我联系了 Ben Horowitz,他非常欣赏你做的事,是你的忠实粉丝。他们是投资方,我想是的……
李飞飞: 是的,我们认识很多年了,不过是的,目前他们是 World Labs 的投资方。
Lenny Rachitsky: 太好了。我问他应该问你什么问题,他建议我问你:为什么”苦涩的教训”单独用在机器人上不太可能奏效?首先,请解释一下 AI 历史上这个”苦涩的教训”是什么,然后再说为什么它不能帮我们到达机器人领域想要到达的目标。
李飞飞: 首先,苦涩的教训其实有很多,但大家提到的”苦涩的教训”是指 Richard Sutton——他最近获得了图灵奖——写的一篇论文,他主要从事强化学习研究。Richard 说,如果你回顾历史,尤其是 AI 的算法发展史,事实证明,用海量数据训练的简单模型最终总是会赢,而不是用较少数据的更复杂模型。其实那篇论文是在 ImageNet 之后几年才发表的。对我来说那不算苦涩,那是一个甜美的教训。正是因为我相信大数据能发挥那样的作用,我才构建了 ImageNet。那么为什么”苦涩的教训”不能单独在机器人领域奏效呢?首先,我认为我们需要认可目前的进展。机器人领域还处于非常早期的实验阶段。
机器人的数据困境
李飞飞: 这个领域的成熟度远不及大语言模型。许多人仍在用不同的算法做实验,其中一些算法是由大数据驱动的。我确实认为大数据会继续在机器人领域发挥作用,但机器人面临的困难有几个方面。一是数据更难获取,要难得多。你可以说,网络上有的是数据。现在最新的机器人研究正是在利用网络视频,我认为网络视频确实能发挥作用。但如果你想想是什么让语言模型如此——作为一个做计算机视觉、空间智能和机器人研究的人,我非常嫉妒做语言的同事们,因为他们有一个完美的配置:训练数据是词语,最终是 token,然后模型输出的也是词语。所以你的目标函数和你训练数据的形态之间有着完美的对齐。
但机器人不同,空间智能也不同。你希望从机器人那里得到的是动作,但你的训练数据中缺少三维世界中的动作,而这正是机器人必须做的事——在三维世界中执行动作。所以我们不得不另想办法,就像俗话说的把方钉子塞进圆孔里——我们拥有的是海量的网络视频。于是我们必须引入补充数据,比如遥操作数据或合成数据,让机器人按照大数据假设——也就是”苦涩的教训”——来训练。我认为这仍然是有希望的,因为我们在世界模型方面的工作将真正为机器人解锁大量信息。
但我认为我们需要谨慎,因为我们还处在早期阶段,“苦涩的教训”还有待检验,因为我们还没有完全解决数据的问题。关于机器人的”苦涩的教训”,还有一点我认为我们应该非常现实地看待——同样地,与大语言模型甚至空间模型相比,机器人是物理系统。机器人更接近自动驾驶汽车,而不是大语言模型。这一点非常重要,需要认识到。这意味着要让机器人运转,我们不仅需要大脑,还需要物理身体,还需要应用场景。回顾自动驾驶汽车的历史,我的同事 Sebastian Thrun 带领斯坦福的车队在 2005 或 2006 年赢得了第一届 DARPA 挑战赛。从那时起——一辆能在内华达沙漠中行驶 130 英里的自动驾驶原型车——到今天旧金山街头的 Waymo,已经过去了二十年。
而且我们甚至还没有走完。还有很长的路要走。这已经是二十年的历程了。而且自动驾驶汽车是简单得多的机器人——它们只是二维平面上移动的铁盒子,目标是不碰到任何东西。机器人则是三维世界中的三维物体,目标恰恰是要去触碰东西。所以这段旅程将会涉及方方面面。当然,你可以说自动驾驶汽车的早期算法是深度学习时代之前的,所以深度学习正在加速大脑的发展。我认为这是对的,这也是为什么我投身于机器人、投身于空间智能,也为此感到兴奋。但与此同时,汽车工业非常成熟,产品化也涉及成熟的用例、供应链、硬件。我认为现在是一个非常有趣的时机来做这些问题。但 Ben 说得对,我们可能还会经历不少苦涩的教训。
对人脑的敬畏
Lenny Rachitsky: 做这些工作的时候,你会不会对大脑的运作方式感到敬畏——它能为我们做这么多事情?仅仅是让一台机器能走路、不撞东西、不摔倒,这种复杂性就让你更加敬畏我们已有的能力了吗?
李飞飞: 完全会。我们大脑的运行功率大约只有 20 瓦,比我此刻所在房间里的任何一盏灯都暗。但我们能做这么多事情。所以实际上,我在 AI 领域工作越久,就越敬畏人类。
Marble:从提示词到三维世界
Lenny Rachitsky: 来聊聊你们刚发布的产品吧。叫 Marble,很可爱的名字。讲讲这是什么,为什么重要。我已经在玩了,非常不可思议。我们会附上链接供大家体验。Marble 是什么?
李飞飞: 是的,我非常兴奋。首先,Marble 是 World Labs 推出的首批产品之一。World Labs 是一家基础前沿模型公司。我们有四位联合创始人,都有深厚的技术背景。我的联合创始人是 Justin Johnson、Christoph Lassner 和 Ben Mildenhall。我们都来自 AI、计算机图形学、计算机视觉的研究领域,我们相信空间智能和世界模型与大语言模型同等重要,甚至可能更重要,而且与语言模型互补。所以我们希望抓住这个机会,创建一个连接前沿模型与产品的深度技术研究实验室。
Marble 是建立在我们前沿模型之上的应用。我们花了一年多的时间构建了世界上第一个能输出真正三维世界的生成模型。这是一个极其困难的问题。过程非常艰难,我们有一支令人难以置信的技术团队,创始团队来自顶尖的技术团队。大约一两个月前,我们第一次看到——只需用一句话提示、一张图片或多张图片,就能创建出我们可以自由导航的世界。如果你戴上 VR——我们有选项支持你这样做——你甚至可以在里面走动。虽然我们一直在做这件事,做了相当长的时间,但那一刻仍然令人叹为观止。我们想把它交到需要它的人手中。我们知道许多创作者、设计师、考虑机器人仿真的人、考虑可导航、可交互、沉浸式世界各种用例的人、游戏开发者,都会觉得这很有用。所以我们将 Marble 作为第一步开发了出来。这仍然是非常早期的阶段,但这是世界上第一个做这件事的模型,也是世界上第一个让人们只需提示——我们称之为”提示即世界”(prompt to worlds)——就能创建三维世界的产品。
Lenny Rachitsky: 我一直在玩,简直疯了。你可以拥有一个小小的 Shire 世界,在里面无限地走动,基本上就是在中土世界里漫步,虽然里面还没有人,但真的太疯狂了。你可以去任何地方。还有那种反乌托邦世界。我一直在看各种示例,而我最喜欢的部分——我不知道这算功能还是 bug——你可以在世界真正渲染出所有纹理之前看到那些点。我就是喜欢这样——你能一窥这个模型内部在发生什么。
李飞飞: 听到你这么说太好了。因为作为研究者,我正是在这里学习的。那些引导你进入世界的点是一个有意设计的可视化功能,并不是模型的一部分。模型实际上是直接生成世界的。但我们想找到一种方式引导人们进入世界,几位工程师尝试了不同版本,最终我们确定了这个点的方案。很多人都跟你一样告诉我们那个体验多么令人愉悦。我们真的非常满足地听到,这个不是核心大模型本身的、有意为之的可视化功能,居然真正打动了用户。
Lenny Rachitsky: 哇,所以你们加这个是为了让人更好地理解发生了什么——
李飞飞: 为了好玩,是的。
Lenny Rachitsky: 让体验更令人愉悦。哇,这太有趣了。这让我想到大语言模型——虽然不是同一回事,但它们会谈论自己在想什么、在做什么。
李飞飞: 是的,确实是。
《黑客帝国》的联想
Lenny Rachitsky: 这也让我想到了《黑客帝国》。这简直就是《黑客帝国》的体验。不知道这是不是你们的灵感来源。
李飞飞: 嗯,就像我说的,有很多工程师参与了那部分的工作。那可能是他们的灵感来源。
Lenny Rachitsky: 已经深入他们的潜意识了。
Marble 的应用场景
好的,对于那些想要尝试一下的听众来说,目前有哪些应用是大家现在就能开始使用的?你们这次发布的目标是什么?
李飞飞: 是的,我们认为世界模型是非常横向的能力,但我们已经看到了一些非常令人兴奋的使用场景。比如电影的虚拟制作(virtual production),因为他们需要的是可以与摄像机配合的三维世界。这样当演员在场景中表演时,他们可以调整机位,很好地拍摄各个片段。我们已经看到了令人难以置信的应用。事实上,我不知道你有没有看过我们发布视频中展示的 Marble。它是由一家虚拟制作公司制作的。我们与 Sony 合作,他们使用 Marble 场景来拍摄那些视频。我们与那些技术艺术家和导演合作,他们说这把制作时间缩短了 40 倍。实际上它必须——
Lenny Rachitsky: 40 倍?
李飞飞: 是的,实际上它必须如此,因为我们只有一个月的时间来完成这个项目,而有太多东西需要拍摄。所以使用 Marble 确确实实地大幅加速了 VFX 和电影的虚拟制作。这是一个使用场景。我们已经看到用户把 Marble 场景导出 mesh 并用于游戏,不论是 VR 上的游戏还是他们自己开发的趣味游戏。我们还展示了一个机器人模拟的案例,因为当我——我的意思是,我仍然是一个做机器人训练研究的研究者——最大的痛点之一就是为机器人训练创建合成数据(synthetic data)。这些合成数据需要非常多样化,需要来自不同的环境,包含不同的可操作物体。一种实现路径就是让计算机来模拟。
否则,人类必须为机器人逐个构建每一个资产,那会花费长得多的时间。所以已经有研究人员联系我们,希望使用 Marble 来创建这些合成环境。我们还收到了一些出乎意料的用户反馈,比如一个心理学家团队联系我们要用 Marble 做心理学研究。原来他们研究的一些精神科患者,需要了解患者的大脑如何对具有不同特征的沉浸式内容做出反应——比如杂乱的场景、整洁的场景等等。研究人员很难获取这类沉浸式场景,自己制作耗时太长、预算太高。而 Marble 几乎可以即时地为他们提供大量这样的实验环境。所以目前我们正在看到多个使用场景。但 VFX、游戏开发者、模拟开发者以及设计师们都对此非常兴奋。
用户反馈与发现使用场景
Lenny Rachitsky: 这正是 AI 领域做事的方式。我的播客上曾邀请过其他 AI 领袖,他们都说要尽早把产品放出去,以发现最大的使用场景在哪里。ChatGPT 的负责人告诉我,当他们最初发布 ChatGPT 时,他就在刷 TikTok,看人们怎么使用它、都在讨论什么,正是这些让他们确信应该在哪里发力,帮助他们看到人们真正想怎么使用它。我很喜欢刚才那个心理治疗的使用场景。我立刻就想到恐高,帮助人们应对恐高,或者蛇、蜘蛛之类的——
李飞飞: 太巧了。昨晚一个朋友打电话给我,谈到他对高处的恐惧,问我能不能用 Marble。你一下子就想到了这个方向,太神奇了。
Lenny Rachitsky: 因为想象一下各种暴露疗法的应用,这个真的太适合了。太酷了。
与视频生成模型的区别
好的,有个问题我之前应该问的,但我觉得很多人会好奇——这和 VO3 以及其他视频生成模型有什么区别?对我来说区别很明显,但我觉得有必要解释一下这和人们见过的各种视频 AI 工具有什么不同。
李飞飞: World Labs 的核心论点是,空间智能具有根本性的重要意义,而空间智能不仅仅是关于视频的。事实上,世界不仅仅是被动地观看视频从眼前流过。我很喜欢柏拉图(Plato)的洞穴寓言来描述视觉。他说,想象一个囚犯被绑在椅子上——虽然不太人道——在洞穴里观看面前的全景剧场,但实际演员表演的剧场在他身后。只是通过灯光照射,使得表演的投影落在洞穴的墙壁上。然后这个囚犯的任务就是弄清楚到底发生了什么。这是一个相当极端的例子,但它确实展示了视觉的本质——从二维中去理解三维或四维的世界。所以对我而言,空间智能比仅仅创建扁平的二维世界更深一层。
空间智能对我来说是创造、推理、交互、理解深层空间世界的能力——不管是二维、三维还是四维,包括动态变化等等。所以 World Labs 专注于这一点,当然,创建视频本身的能力可以是其中的一部分。事实上,就在几周前,我们发布了世界上首个可在单块 H100 GPU 上实时运行的视频生成演示。所以我们的技术也包含那部分,但我认为 Marble 是非常不同的,因为我们真正希望创作者、设计师、开发者手中拥有一个能够为他们提供具有三维结构世界的模型,这样他们就可以将其用于实际工作。这就是 Marble 与众不同的原因。
Lenny Rachitsky: 我的理解是,它是一个可以承载大量机会的平台。就像你描述的那样,视频只是——这里有一个一次性的视频,很有趣、很酷,你可以……然后就这样了。就这些。然后就翻篇了。
李飞飞: 顺便说一下,在 Marble 中我们也可以让人们以视频形式导出。所以你实际上可以,就像你说的,你进入一个世界,比如一个霍比特人的洞穴。你实际上可以,特别作为创作者,你有非常具体的摄像机运动轨迹和导演心目中的运镜方式,然后你可以从 Marble 中导出成视频。
团队与资源
Lenny Rachitsky: 创造这样一个产品需要什么?团队有多大?使用多少 GPU?有什么可以分享的吗?我不知道这其中有多少是保密信息,但要做出你们发布的这样一个产品,到底需要什么?
李飞飞: 需要大量的脑力。我们刚才说到每颗大脑 20 瓦。从这个角度看,数量不多,但实际上这非常了不起——经过了五亿年的进化才赋予我们这些能力。我们的团队现在大约三十人,主要是研究人员和研究工程师,但也有设计师和产品人员。我们确实相信,我们想创建一家以空间智能深度技术为根基的公司,但同时我们也在认真构建产品。所以我们实现了研发与产品化的融合。当然,我们也使用大量的 GPU。
Lenny Rachitsky: 这才是技术层面的要点。
李飞飞: 很高兴听到你这么说。
Lenny Rachitsky: 好的,祝贺你们发布。我知道这是一个巨大的里程碑,我知道这背后付出了大量的努力。
李飞飞: 谢谢。
创业之旅
Lenny Rachitsky: 我只是想再次祝贺你和你的团队。我们来聊聊你的创业历程吧。你是这家公司的创始人。公司是什么时候成立的?几年前?两三年前?
李飞飞: 一年前。
Lenny Rachitsky: 一年前?
李飞飞: 一年多。
Lenny Rachitsky: 一年?好的。哇。
李飞飞: 大概,18 个月吧。
Lenny Rachitsky: 好的。有什么是你创业之前希望自己知道的?有什么话你想对 18 个月前的李飞飞耳语的吗?
李飞飞: 嗯,我一直希望自己能预知技术的未来。实际上,我认为这也是我们的创始优势之一——我们通常比大多数人更早看到未来。但即便如此,天哪,这一切太令人兴奋、太不可思议了,未知的东西太多了,接下来会发生什么也充满了惊喜。不过我知道你问这个问题并不是关于技术的未来。而且,我并不是在 20 岁的时候创办了这么大一家公司。我 19 岁的时候开过一家干洗店,但规模要小得多。
Lenny Rachitsky: 这个我们得聊聊。
李飞飞: 后来我创建了 Google Cloud AI,又创建了斯坦福的一个研究院,但这些是不同性质的事情。相比那些 20 岁的创业者,我确实觉得自己作为创始人在心理上对这段艰辛旅程更有准备一些。但我仍然感到惊讶,有时候甚至因此而焦虑——AI 领域的竞争之激烈,从模型到技术本身,再到人才争夺,都是如此。我创办公司的时候,还没有听说过某些人才会花费多么高昂的代价这种不可思议的故事。这些都是持续让我感到意外的事情,我必须保持高度警觉。
Lenny Rachitsky: 所以你说的竞争,是人才的竞争,以及事物发展速度之快。
李飞飞: 对。
站在突破的中心
Lenny Rachitsky: 你提到了一点我想回过来谈——如果回顾你的整个职业生涯,你几乎身处每一个推动当今重大突破的核心团队。显然我们谈到了 ImageNet,还有斯坦福的 SAIL,大量工作是在那里完成的;Google Cloud,很多突破也发生在那里。是什么把你带到了那些地方?对于那些想知道如何在职业发展中走到未来中心的人来说,是否有一条贯穿始终的线索——是什么把你从一个地方拉到另一个地方,把你带入那些核心群体的?这也许对人们有所启发。
李飞飞: 这是一个非常好的问题,Lenny,因为我确实思考过这个问题。我们之前谈到了是好奇心和热情把我带到了 AI 领域,那更多是一个科学上的北极星,对吧?我当初并不在乎 AI 是不是热门。那是一方面。但我最终是如何选择那些特定的工作地点的,包括创办 World Labs,我认为我非常感谢自己——或者也许要感谢我父母的基因——我是一个在智识上非常无所畏惧的人。我必须说,当我招聘年轻人的时候,我会寻找这种品质,因为我认为如果一个人想要做出改变,这是非常重要的品质。当你想要做出改变时,你必须接受自己正在创造新的东西,或者正在投入新的事物——人们以前没有做过的东西。如果你有这种自觉,你几乎不得不允许自己无所畏惧,允许自己勇敢。
比如,当我来到斯坦福的时候,在学术界,我在普林斯顿已经非常接近所谓的终身教职(tenure)了——也就是说可以永远拥有那份工作。但我选择来到斯坦福,因为……我很爱普林斯顿,那是我的母校。只是在那个时刻,斯坦福有一些非常了不起的人,硅谷的生态系统也非常了不起,所以我愿意承担重新开始终身教职时钟的风险。成为 SAIL 的第一任女性主任时,我在当时相对来说还是非常年轻的教员,我之所以想做这件事,是因为我关心那个社区。我没有花太多时间去想各种失败的可能性。显然,我很幸运,资深的教员们支持了我,但我只是想做出改变。
后来去 Google 也是类似的。我想和 Jeff Dean、Geoff Hinton 以及所有那些了不起的人一起工作,那些令人难以置信的人。创办 World Labs 也是一样。我有这份热情,我也相信拥有相同使命感的人能做出不可思议的事情。这就是指引我一以贯之的线索。我不会过度思考所有可能出错的事情,因为那太多了。
Lenny Rachitsky: 我觉得其中一个重要的要素是不去关注下行风险,而是更多地关注人、关注使命、关注让你兴奋的事情、关注好奇心。
给年轻人才的建议
李飞飞: 是的。我确实想对所有 AI 领域的年轻人才说一句话——那些工程师、研究员们——因为你们中有些人申请了 World Labs,我感到非常荣幸你们考虑了 World Labs。我确实发现,当今很多年轻人在做职业决定时会考虑方程式中的每一个变量。也许在某个时刻,这是他们想要的方式,但有时候我确实想鼓励年轻人去关注真正重要的东西。因为我发现自己在和求职者交谈时经常进入导师模式——不一定是招还是不招的问题,而是当我看到一个非常有才华的年轻人,却过度关注考虑一份工作时每一个微小的维度和方面,而也许最重要的东西是:你的热情在哪里?你与使命是否一致?你是否相信并信任这个团队?只需关注你能产生的影响力,以及你能从事的工作和能共事的团队。
Lenny Rachitsky: 是的,这很难。对 AI 领域的人来说尤其难。现在有太多东西涌向他们,太多新的东西,太多正在发生的事情,太多的 FOMO(错失恐惧)。
李飞飞: 确实如此。
Lenny Rachitsky: 我能感受到那种焦虑。所以我认为这个建议真的很重要——就是什么才能让你在做的事情中感到真正的满足,而不是只看哪里是增长最快的公司、谁会赢之类的。我想确保问你一下你目前在斯坦福做的工作,在 HAI……我想是——
李飞飞: HAI。
Lenny Rachitsky: HAI,以人为本 AI 研究院。你在那里做什么?我知道这是你一直在兼顾的事情。
以人为本 AI 研究院
李飞飞: 是的,HAI,以人为本 AI 研究院,是由我和一群教员共同创建的,包括 John Etchemendy 教授、James Landay 教授、Chris Manning 教授,那是 2018 年的事。当时我正好在 Google 的最后一次学术休假快结束了,这对我来说是一个非常非常重要的决定,因为我本可以留在工业界。但我在 Google 的经历让我明白了一件事:AI 将成为一种文明级别的技术。我忽然意识到这对人类有多么重要,以至于那年——2018 年——我专门在《纽约时报》上写了一篇文章,论述制定 AI 发展和应用指导框架的必要性。而这个框架必须锚定在人类的善意之上,锚定在以人为本之上。我认为斯坦福,作为世界顶尖大学之一,坐落于孕育了从 NVIDIA 到 Google 等重要公司的硅谷中心,理应成为思想领袖,去创建这个以人为本的 AI 框架,并将其真正体现在我们的研究、教育、政策和生态体系工作中。
李飞飞: 于是我创建了 HAI。快进到今天,经过六七年的发展,它已经成为全球最大的以人为本 AI 研究院,从事以人为本的研究、教育、生态建设、公众倡导和政策影响等方面的工作。它汇聚了斯坦福全部八所学院的数百名教员,从医学到教育,到可持续发展、商学、工程学、人文学科、法学。我们支持研究人员,尤其是在跨学科领域——从数字经济、法学研究、政治科学,到新药发现,到超越 transformer 的新算法。我们实际上也非常重视政策工作,因为当我们创办 HAI 时,我意识到硅谷没有与华盛顿特区、布鲁塞尔或世界其他地区进行对话。
鉴于这项技术的重要性,我们需要让所有人都参与进来。所以我们创建了多个项目——国会训练营、AI 指数报告、政策简报——我们还特别参与了政策制定,包括倡导国家 AI 研究云法案,该法案在第一届特朗普政府期间获得通过,以及参与州一级的 AI 监管讨论。我们做了很多事情,我仍然是领导者之一,尽管在运营层面我已经没那么深入参与了,因为我关心的不仅是我们创造这项技术,更是我们以正确的方式使用它。
Lenny Rachitsky: 哇,我之前并不了解你做的所有这些工作。听你说的过程中,我想起了 Charlie Munger 的一句话:“取一个简单的想法,并认真对待它。“我觉得你在那么多不同的领域都做到了这一点,而且始终坚持,这么多年来在那么多方面产生了难以置信的影响力,真是令人难以置信。我想跳过快问快答环节,直接问你最后一个问题。你还有什么想分享的吗?还有什么想留给听众的?
李飞飞: Lenny,我对 AI 非常兴奋。我想回答一个问题——当我在世界各地旅行时,每个人都会问我:如果我是一个音乐家,如果我是中学教师,如果我是护士,如果我是会计,如果我是农民,我在 AI 中有角色吗?还是 AI 只会接管我的生活和工作?我认为这是 AI 领域最重要的问题。我发现,在硅谷,我们往往不会与人——不管是像我们一样的人还是不像我们的人——坦诚地交心。我们所有人,都倾向于把”无限生产力""无限休闲时间""无限权力”之类的话挂在嘴边。但归根结底,AI 是关于人的。当人们问我这个问题时,我的回答是响亮的有——每个人在 AI 中都有自己的角色。
每个人都有角色
这取决于你做什么以及你想要什么。但任何技术都不应剥夺人的尊严,人的尊严和能动性应该是每一项技术的开发、部署以及治理的核心。如果你是一位年轻的艺术家,你的热情在于讲故事,那就把 AI 当作工具来拥抱它。事实上,去拥抱 Marble 吧——我希望它成为你的工具,因为你讲故事的方式是独一无二的,世界仍然需要它。而你如何讲故事,如何使用最不可思议的工具以最独特的方式讲述你的故事,这才是重要的。那个声音需要被听到。如果你是一位即将退休的农民,AI 仍然与你有关,因为你是一个公民。你可以参与你的社区,你应该对 AI 如何被使用、如何被应用拥有发言权。你身边的人,你可以鼓励大家一起使用 AI,让生活变得更轻松。如果你是一名护士,我希望你知道——至少在我的职业生涯中,我在医疗健康研究上投入了大量精力,因为我觉得我们的医护人员应该得到 AI 技术的大力增强和帮助,无论是提供更多信息的智能摄像头,还是机器人辅助——因为我们的护士过度劳累、过度疲惫,而且随着我们社会老龄化,我们需要更多力量来照顾需要被照顾的人。AI 可以扮演这个角色。所以我只想说,这一点非常重要——即使是一个像我这样的技术专家,也真诚地相信每个人在 AI 中都有自己的角色。
Lenny Rachitsky: 多美的结尾方式。这与我们开始时的话题形成了完美的呼应——一切取决于我们自己,我们每个人都要为 AI 将在我们生活中扮演的角色承担起个人责任。最后一个问题,人们在哪里可以找到 Marble?他们可以去哪里,也许想加入 World Labs 的话应该怎么做?网站是什么?人们去哪里?
李飞飞: World Labs 的网站是 www.worldlabs.ai,你可以在那里找到我们的研究进展。我们有技术博客。你可以找到 Marble 这个产品。你可以在那里注册登录。你也可以在那里找到我们的招聘信息链接。我们在旧金山。我们希望能与世界上最优秀的人才一起工作。
Lenny Rachitsky: 太棒了。飞飞,非常感谢你的到来。
李飞飞: 谢谢你,Lenny。
Lenny Rachitsky: 大家再见。
感谢大家的收听。如果你觉得这期节目有价值,可以在 Apple Podcasts、Spotify 或你喜欢的播客应用上订阅。也请考虑给我们评分或留下评论,这真的能帮助其他听众找到这个播客。你可以在 lennyspodcast.com 找到所有往期节目或了解更多关于节目的信息。下期再见。
术语表
| 原文 | 中文 |
|---|---|
| agency | 能动性 |
| AGI | AGI(通用人工智能,Artificial General Intelligence) |
| AI winter | AI寒冬 |
| Alan Turing | Alan Turing(计算机科学与人工智能之父) |
| Alex Wang | Alex Wang |
| AlexNet | AlexNet(经典深度卷积神经网络) |
| allegory of the cave | 洞穴寓言 |
| Bayesian network | 贝叶斯网络 |
| Ben Horowitz | Ben Horowitz |
| Ben Mildenhall | Ben Mildenhall(World Labs 联合创始人) |
| Charlie Munger | Charlie Munger(美国投资家、伯克希尔·哈撒韦公司副董事长) |
| Chris Dixon | Chris Dixon(风险投资人) |
| Chris Manning | Chris Manning(斯坦福大学 NLP 研究者) |
| Christoph Lassner | Christoph Lassner(World Labs 联合创始人) |
| Condoleezza Rice | Condoleezza Rice |
| DARPA | DARPA(美国国防高级研究计划局) |
| Demis | Demis(DeepMind 联合创始人) |
| double helix | 双螺旋 |
| Einstein | Einstein(爱因斯坦) |
| Elon | Elon(Elon Musk) |
| embodied AI | 具身 AI |
| expert systems | 专家系统 |
| exposure therapy | 暴露疗法 |
| FOMO | FOMO(Fear of Missing Out,错失恐惧) |
| foundation model | 基础模型 |
| Francis Crick | Francis Crick(DNA 双螺旋结构发现者之一) |
| Geoff Hinton | Geoff Hinton |
| Human-Centered AI Institute | 以人为本 AI 研究院 |
| ImageNet | ImageNet(保持原文,专有数据集名称) |
| Isaac Newton | Isaac Newton(牛顿) |
| James Landay | James Landay(斯坦福大学教授) |
| James Watson | James Watson(DNA 双螺旋结构发现者之一) |
| Jeff Dean | Jeff Dean(Google AI 负责人) |
| Jensen | Jensen(NVIDIA CEO 黄仁勋,Jensen Huang) |
| John Etchemendy | John Etchemendy(斯坦福大学教授) |
| Justin Johnson | Justin Johnson(World Labs 联合创始人) |
| large language model | 大语言模型 |
| Lenny Rachitsky | Lenny Rachitsky(播客主持人) |
| logic systems | 逻辑系统 |
| machine learning | 机器学习 |
| Marble | Marble(World Labs 发布的三维世界生成产品) |
| mesh | mesh(网格,三维模型的基本数据结构) |
| neural network | 神经网络 |
| Percy Liang | Percy Liang(斯坦福大学 NLP 研究者) |
| Plato | 柏拉图 |
| prompt to worlds | 提示即世界 |
| reinforcement learning | 强化学习 |
| Richard Sutton | Richard Sutton(强化学习先驱,图灵奖得主) |
| Rosalind Franklin | Rosalind Franklin(DNA 结构发现的关键贡献者) |
| SAIL | SAIL(Stanford Artificial Intelligence Lab,斯坦福人工智能实验室) |
| Sam Altman | Sam Altman(OpenAI CEO) |
| Sebastian Thrun | Sebastian Thrun(斯坦福大学教授,自动驾驶先驱) |
| situational awareness | 情境感知 |
| spatial intelligence | 空间智能 |
| statistical learning | 统计学习 |
| synthetic data | 合成数据 |
| tenure | 终身教职 |
| the bitter lesson | 苦涩的教训(the bitter lesson) |
| The Matrix | 《黑客帝国》 |
| token | token(令牌,语言模型的基本处理单元) |
| transformer | transformer |
| VFX | VFX(视觉特效,Visual Effects) |
| vibe coding | vibe coding(保持原文) |
| virtual production | 虚拟制作 |
| visual intelligence | 视觉智能 |
| Waymo | Waymo(Alphabet 旗下的自动驾驶公司) |
| WordNet | WordNet(普林斯顿大学开发的词汇语义网络) |
| World Labs | World Labs(李飞飞创立的公司) |
| World models | 世界模型 |
| x-ray diffraction | X 射线衍射 |
此文档由 AI 分片翻译(translate_long_document)
The Godmother of AI on jobs, robots & why world models are next | Dr. Fei-Fei Li
Introducing the Guest
Lenny Rachitsky: A lot of people call you the godmother of AI. The work you did actually was the spark that brought us out of AI winter.
Dr. Fei Fei Li: In the middle of 2015, middle of 2016, some tech companies avoid using the word AI because they were not sure if AI was a dirty word. 2017-ish was the beginning of companies calling themselves AI companies.
AI’s Impact on Humanity
Lenny Rachitsky: There’s this line, I think, this was when you were presenting to Congress. There’s nothing artificial about AI. It’s inspired by people. It’s created by people, and most importantly, it impacts people.
The Double-Edged Sword of Tech
Dr. Fei Fei Li: It’s not like I think AI will have no impact on jobs or people. In fact, I believe that whatever AI does, currently or in the future, is up to us. It’s up to the people. I do believe technology is a net positive for humanity, but I think every technology is a double-edged sword. If we’re not doing the right thing as a society, as individuals, we can screw this up as well.
Our Human Responsibility
Lenny Rachitsky: You had this breakthrough insight of just, okay, we can train machines to think like humans, but it’s just missing the data that humans have to learn as a child.
Dr. Fei Fei Li: I chose to look at artificial intelligence through the lens of visual intelligence because humans are deeply visual animals. We need to train machines with as much information as possible on images of objects, but objects are very, very difficult to learn. A single object can have infinite possibilities that is shown on an image. In order to train computers with tens and thousands of object concepts, you really need to show it millions of examples.
The Long Prehistory of AI
Lenny Rachitsky: Today, my guest is Dr. Fei-Fei Li, who’s known as the godmother of AI. Fei-Fei has been responsible for and at the center of many of the biggest breakthroughs that sparked the AI revolution that we’re currently living through. She spearheaded the creation of ImageNet, which was basically her realizing that AI needed a ton of clean-labeled data to get smarter, and that data set became the breakthrough that led to the current approach to building and scaling AI models. She was chief AI scientist at Google Cloud, which is where some of the biggest early technology breakthroughs emerged from. She was director at SAIL, Stanford’s Artificial Intelligence Lab, where many of the biggest AI minds came out of. She’s also co-creator of Stanford’s Human-Centered AI Institute, which is playing a vital role in a direction that AI is taking. She’s also been on the board of Twitter. She was named one of Time’s 100 Most Influential People in AI. She’s also United Nations advisory board. I could go on.
In our conversation, Fei-Fei shares a brief history of how we got to today in the world of AI, including this mind-blowing reminder that 9 to 10 years ago, calling yourself an AI company was basically a death knell for your brand because no one believed that AI was actually going to work. Today, it’s completely different. Every company is an AI company. We also chat about her take on how she sees AI impacting humanity in the future, how far current technologies will take us, why she’s so passionate about building a world model and what exactly world models are, and most exciting of all, the launch of the world’s first large world model, Marble, which just came out as this podcast comes out. Anyone can go play with this at marble.worldlabs.ai. It’s insane. Definitely check it out. Fei-Fei is incredible and way too under the radar for the impact that she’s had on the world, so I am really excited to have her on and to spread her wisdom with more people.
A huge thank you to Ben Horowitz and Condoleezza Rice for suggesting topics for this conversation. If you enjoy this podcast, don’t forget to subscribe and follow it in your favorite podcasting app or YouTube. With that, I bring you Dr. Fei-Fei Li after a short word from our sponsors.
Figma Make is a different kind of vibe coding tool. Because it’s all in Figma, you can use your team’s existing design building blocks, making it easy to create outputs that look good and feel real and are connected to how your team builds. Stop spending so much time telling people about your product vision and instead show it to them. Make code-back prototypes and apps fast with Figma Make. Check it out at figma.com/lenny.
Did you know that I have a whole team that helps me with my podcast and with my newsletter? I want everyone on that team to be super happy and thrive in the roles. Justworks knows that your employees are more than just your employees; they’re your people. My team is spread out across Colorado, Australia, Nepal, West Africa, and San Francisco. My life would be so incredibly complicated to hire people internationally, to pay people on time and in their local currencies, and to answer their HR questions 24/7. But with Justworks, it’s super easy. Whether you’re setting up your own automated payroll, offering premium benefits, or hiring internationally, Justworks offer simple software and 24/7 human support from small business experts for you and your people. They do your human resources right so that you can do right by your people. Justworks, for your people.
Fei-Fei, thank you so much for being here and welcome to the podcast.
Visual Intelligence and ImageNet’s Birth
Dr. Fei Fei Li: I’m excited to be here, Lenny.
The Birth of Deep Learning
Lenny Rachitsky: I’m even more excited to have you here. It is such a treat to get to chat with you. There’s so much that I want to talk about. You’ve been at the center of this AI explosion that we’re seeing right now for so long. We’re going to talk about a bunch of the history that I think a lot of people don’t even know about how this whole thing started, but let me first read a quote from Wired about you just so people get a sense, and in the intro I’ll share all of the other epic things you’ve done. But I think this is a good way to just set context. “Fei-Fei is one of a tiny group of scientists, a group perhaps small enough to fit around a kitchen table, who are responsible for AI’s recent remarkable advances.”
A lot of people call you the godmother of AI, and unlike a lot of AI leaders, you’re an AI optimist. You don’t think AI is going to replace us. You don’t think it’s going to take all our jobs. You don’t think it’s going to kill us. So I thought it’d be fun to start there, just what’s your perspective on how AI is going to impact humanity over time?
Dr. Fei Fei Li: Yeah, okay, so Lenny, let me be very clear. I’m not a utopian, so it’s not like I think AI will have no impact on jobs or people. In fact, I’m a humanist. I believe that whatever AI does, currently or in the future, is up to us. It’s up to the people. So I do believe technology is a net positive for humanity. If you look at the long course of civilization, I think we are, and fundamentally, we’re an innovative species that we… If you look at from written record thousands of years ago to now, humans just kept innovating ourselves and innovating our tools, and with that, we make lives better, we make work better, we build civilization, and I do believe AI is part of that. So that’s where the optimism comes from. But I think every technology is a double-edged sword, and if we’re not doing the right thing as a species, as a society, as communities, as individuals, we can screw this up as well.
”AI” Used to Be a Dirty Word
Lenny Rachitsky: There’s this line, I think, this was when you were presenting to Congress, “There’s nothing artificial about AI. It’s inspired by people. It’s created by people, and most importantly, it impacts people.” I don’t have a question there, but what a great line.
Dr. Fei Fei Li: Yeah, I feel pretty deeply. I started working AI two and a half decades ago, and I’ve been having students for the past two decades and almost every student who graduates, I remind them when they graduate from my lab that your field is called artificial intelligence, but there’s nothing artificial about it.
AI History as a Collective Achievement
Lenny Rachitsky: Coming back to the point you just made about how it’s kind of up to us about where this all goes, what is it you think we need to get right? How do we set things on a path? I know this is a very difficult question to answer, but just what’s your advice? What do you think we should be keeping in mind?
Dr. Fei Fei Li: Yeah, how many hours do we have?
How Far Away Is AGI?
Lenny Rachitsky: How do we align AI? There we go. Let’s solve it.
Defining AGI and Scientific Pursuit
Dr. Fei Fei Li: So I think people should be responsible individuals no matter what we do. This is what we teach our children, and this is what we need to do as grownups as well. No matter which part of the AI development or AI deployment or AI application you are participating in, and most likely many of us, especially as technologists, we’re in multiple points. We should act like responsible individuals and care about this. Actually, care a lot about this. I think everybody today should care about AI because it is going to impact your individual life. It is going to impact your community, it’s going to impact the society and the future generation. And caring about it as a responsible person is the first, but also the most important step.
We Need More Breakthroughs
Lenny Rachitsky: Okay, so let me actually take a step back and kind of go to the beginning of AI. Most people started hearing and caring about AI, as what it’s called today, just like, I don’t know, a few years ago when ChatGPT came out. Maybe it was like three years ago.
Dr. Fei Fei Li: Three years ago, almost one more month, three years ago.
Can AI Replicate Scientific Breakthroughs?
Lenny Rachitsky: Wow, okay. And that was ChatGPT coming out. Is that the milestone you have in mind?
Dr. Fei Fei Li: Yes.
The Concept of World Models
Lenny Rachitsky: Okay, cool. That’s exactly how I saw it. But very few people know there was a long, long history of people working on, it was called machine learning back then and there’s other terms, and now it’s just everything’s AI and there was kind of a long period of just a lot of people working on it. And then there’s this what people refer to as the AI winter where people just gave up almost, most people did, and just, okay, this idea isn’t going anywhere. And then the work you did actually was essentially the spark that brought us out of AI winter and is directly responsible for the world where now of just AI is all we talk about. As you just said, it’s going to impact everything we do. So I thought it’d be really interesting to hear from you just the brief history of what the world was like before ImageNet and just the work you did to create ImageNet, why that was so important, and then just what happened after.
Spatial Intelligence and World Labs
Dr. Fei Fei Li: It is, for me, hard to keep in mind that AI is so new for everybody when I lived my entire professional life in AI. There’s a part of me that is just, it’s so satisfying to see a personal curiosity that I started barely out of teenagehood and now has become a transformative force of our civilization. It generally is a civilizational level technology. So that journey is about 30 years or 20 something, 20 plus years, and it’s just very satisfying. So where did it all start? Well, I’m not even the first generation AI researcher. The first generation really date back to the ’50s and ’60s, and Alan Turing was ahead of his time in the ’40s by asking, daring humanity with the question, “Is there thinking machines?” And of course he has a specific way of testing this concept of thinking machine, which is a conversational chatbot, which to his standard we now have a thinking machine.
But that was just a more anecdotal inspiration. The field really began in the ’50s when computer scientists came together and look at how we can use computer programs and algorithms to build these programs that can do things that have been only capable by human cognition. And that was the beginning. And the founding fathers the Dartmouth workshop in the 1956, we have Professor John McCarthy who later came to Stanford who coined the term artificial intelligence. And between the ’50s, ’60s, ’70s, and ’80s, it was the early days of AI exploration and we had logic systems, we had expert systems, we also had early exploration of neural network. And then it came to around the late ’80s, the ’90s, and the very beginning of the 21st century. That stretch about 20 years is actually the beginning of machine learning, is the marriage between computer programming and statistical learning.
And that marriage brought a very, very critical concept into AI, which is that purely rule-based program is not going to account for the vast amount of cognitive capabilities that we imagine computers can do. So we have to use machines to learn the patterns. Once the machines can learn the patterns, it has a hope to do more things. For example, if you give it three cats, the hope is not just for the machines to recognize these three cats. The hope is the machines can recognize the fourth cat, the fifth cat, the sixth cat, and all the other cats. And that’s a learning ability that is fundamental to humans and remaining animals. And we, as a field, realized, “We need machine learning.” So that was up till the beginning of the 21st century. I entered the field of AI literally in the year of 2000. That’s when my PhD began at Caltech.
And so I was one of the first generation machine learning researchers and we were already studying this concept of machine learning, especially neural network. I remember that was one of my first courses at Caltech is called neural network, but it was very painful. It was still smack in the middle of the so-called AI winter, meaning the public didn’t look at this too much. There wasn’t that much funding, but there was also a lot of ideas flowing around. And I think two things happened to myself that brought my own career so close to the birth of modern AI is that I chose to look at artificial intelligence through the lens of visual intelligence because humans are deeply visual animals. We can talk a little more later, but so much of our intelligence is built upon visual, perceptual, spatial understanding, not just language per se. I think they’re complementary.
So I choose to look at visual intelligence and my PhD and my early professor years, my students and I are very committed to a north star problem, which is solving the problem of object recognition because it’s a building block for the perceptual world, right? We go around the world interpreting reasoning and interacting with it more or less at the object level. We don’t interact with the world at the molecular level. We don’t interact with the world as… We sometimes do, but we rarely, for example, if you want to lift a teapot, you don’t say, “Okay, the teapot is made of a hundred pieces of porcelain and let me work on this a hundred pieces.” You look at this as one object and interact with it. So object is really important. So I was among the first researchers to identify this as a north star problem, but I think what happened is that as a student of AI and a researcher of AI, I was working on all kinds of mathematical models including neural network, including Bayesian network, including many, many models.
And there was one singular pain point is that these models don’t have data to be trained on. And as a field, we were so focusing on these models, but it dawned on me that human learning as well as evolution is actually a big data learning process. Humans learn with so much experience constantly. In the evolution, if you look at time, animals evolve with just experiencing the world. So I think my students and I conjectured that a very critically-overlooked ingredient of bringing AI to life is big data. And then we began this ImageNet project in 2006, 2007. We were very ambitious. We want to get the entire internet’s image data on objects. Now granted internet was a lot smaller than today, so I felt like that ambition was at least not too crazy. Now, it’s totally delusional to think a couple of graduate student and a professor can do this.
And that’s what we did. We curated very carefully, 15 million images on the internet, created a taxonomy of 22,000 concepts, borrowing other researchers’ work like linguists work on WordNet, and it’s a particular way of dictionarying words. And we combine that into ImageNet and we open-sourced that to the research community. We held an annual ImageNet challenge to encourage everybody to participate in this. We continue to do our own research, but 2012 was the moment that many people think was the beginning of the deep learning or birth of modern AI because a group of Toronto researchers led by Professor Geoff Hinton, participated in ImageNet Challenge, used ImageNet big data and two GPUs from NVIDIA and created successfully the first neural network algorithm that can…
It didn’t totally solve, but made a huge progress towards solving the problem of object recognition. And that combination of the trio technology, big data, neural network, and GPU was kind of the golden recipe for modern AI. And then fast-forward, the public moment of AI, which is the ChatGPT moment, if you look at the ingredients of what brought ChatGPT to the world technically still use these three ingredients. Now, it’s internet-scale data mostly texts is a much more complex neural network architecture than 2012, but it’s still neural network and a lot more GPUs, but it’s still GPUs. So these three ingredients are still at the core of modern AI.
Embodied AI and Robotics
Lenny Rachitsky: Incredible. I have never heard that full story before. I love that it was two GPUs was the first. I love that. And now it’s, I don’t know, hundreds of thousands, right, that are orders of magnitude more powerful.
Dr. Fei Fei Li: Yep.
Future Applications of World Models
Lenny Rachitsky: And those two GPUs where they just bought, they were like gaming GPUs, they just went to the-
Dr. Fei Fei Li: Yes.
Ben Horowitz and the Bitter Lesson
Lenny Rachitsky: … GameStar that people use for playing games. As you said, this continues to be in a large way, the way models get smarter. Some of the fastest growing companies in the world right now, I’ve had them all mostly on the podcast, Mercor and Surge and Scale. They continue to do this for labs, just give them more and more label data of the things they’re most excited and interested in.
The Robot Data Dilemma
Dr. Fei Fei Li: Yeah, I remember Alex Wang from Scale very early days. I probably still has his emails when he was starting Scale. He was very kind. He keeps sending me emails about how image that inspired Scale. I was very pleased to see that.
Awe for the Human Brain
Lenny Rachitsky: One of my other favorite takeaways from what you just shared is just such an example of high agency and just doing things that’s kind of a meme on Twitter. Just you can just do things. You’re just like, okay, this is probably necessary to move AI. And it’s called machine learning back then, right? Was that the term most people used?
Dr. Fei Fei Li: I think it was interchangeably. It’s true. I do remember the companies, the tech companies, I am not going to name names, but I was in a conversation in one of the early days, I think is in the middle of 2015, middle of 2016, some tech companies avoid using the word AI because they were not sure if AI was a dirty word. And I remember I was actually encouraging everybody to use the word AI because to me that is one of the most audacious question humanity has ever asked in our quest for science and technology, and I feel very proud of this term. But yes, at the beginning some people were not sure.
Marble: From Prompts to 3D Worlds
Lenny Rachitsky: What year was that roughly when AI was a dirty word?
Dr. Fei Fei Li: 2016, I think because that was-
The Matrix Connection
Lenny Rachitsky: 2016, less than 10 years ago.
Use Cases for Marble
Dr. Fei Fei Li: That was the changing. Some people start calling it AI, but I think if you look at the Silicon Valley tech companies, if you trace their marketing term, I think 2017-ish was the beginning of companies calling themselves AI companies.
User Feedback and Discovering Use Cases
Lenny Rachitsky: That’s incredible. Just how the world has changed.
Differences from Video Generation Models
Dr. Fei Fei Li: Yes.
Building the Team and Resources
Lenny Rachitsky: Now, you can’t not call yourself an AI company.
Dr. Fei Fei Li: I know.
The Startup Journey
Lenny Rachitsky: Just nine-ish years later.
Dr. Fei Fei Li: Yeah.
Standing at the Center of Breakthroughs
Lenny Rachitsky: Oh, man. Okay. Is there anything else around the history, that early history that you think people don’t know that you think is important before we chat about where you think things are going and the work that you’re doing?
Advice for Young Talent
Dr. Fei Fei Li: I think as all histories, I’m keenly aware that I am recognized for being part of the history, but there are so many heroes and so many researchers. We’re talking about generations of researchers. In my own world, there are so many people who have inspired me, which I talked about in my book, but I do feel our culture, especially Silicon Valley, tends to assign achievements to a single person. While I think it has value, but it’s just to be remembered. AI is a field of, at this point, 70 years old and we have gone through many generations. Nobody, no one could have gotten here by themselves.
The Human-Centered AI Institute
Lenny Rachitsky: Okay, so let me ask you this question. It feels like we’re always on this precipice of AGI, this kind of vague term people throw around, AGI is coming, it’s going to take over everything. What’s your take on how far you think we might be from AGI? Do you think we’re going to get there on the current trajectory we’re on? Do you think we need more breakthroughs? Do you think the current approach will get us there?
Dr. Fei Fei Li: Yeah, this is a very interesting term, Lenny. I don’t know if anyone has ever defined AGI. There are many different definitions, including some kind of superpower for machines all the way to machines can become economically viable agent in the society. In other words, making salaries to live. Is that the definition of AGI? As a scientist, I take science very seriously and I enter the field because I was inspired by this audacious question of, can machines think and do things in the way that humans can do? For me, that’s always the north star of AI. And from that point of view, I don’t know what’s the difference between AI and AGI.
I think we’ve done very well in achieving parts of the goal, including conversational AI, but I don’t think we have completely conquered all the goals of AI. And I think our founding fathers, Alan Turing, I wonder if Alan Turing is around today and you ask him to contrast AI versus AGI, he might just shrugged and said, “Well, I asked the same question back in 1940s,” so I don’t want to get onto a rabbit hole of defining AI versus AGI. I feel AGI is more a marketing term than a scientific term as a scientist than technologist. AI is my north star, is my field’s north star, and I’m happy people call it whatever name they want to call it.
Everyone Has a Role to Play
Lenny Rachitsky: So let me ask you maybe this way, like you described, there’s kind of these components that from ImageNet and AlexNet took us to where we’re today, GPUs essentially, data, label data, just like the algorithm of the model. There’s also just the transformer feels like an important step in that trajectory. Do you feel like those are the same components that’ll get us to, I don’t know, 10 times smarter model, something that’s like life-changing for the entire world? Or do you think we need more breakthroughs? I know we’re going to talk about world models, which I think is a component of this, but is there anything else that you think is like, oh, this will plateau, or okay, this will take us just need more data, more compute, more GPUs?
Dr. Fei Fei Li: Oh no, I definitely think we need more innovations. I think scaling loss of more data, more GPUs, and bigger current model architecture is there’s still a lot to be done there, but I absolutely think we need to innovate more. There’s not a single deeply scientific discipline in human history that has arrived at a place that says we’re done, we’re done innovating and AI is one of the, if not the youngest discipline in human civilization in terms of science and technology, we’re still scratching the surface. For example, like I said, we’re going to segue into world models. Today, you take a model and run it through a video of a couple of office rooms and ask the model to count the number of chairs. And this is something a toddler could do or maybe an elementary school kid could do, and AI could not do that, right?
So there’s just so much AI today could not do, then let alone thinking about how did someone like Isaac Newton look at the movements of the celestial bodies and derive an equation or a set of equations that governs the movement of all bodies, that level of creativity, extrapolation, abstraction. We have no way of enabling AI to do that today. And then let’s look at emotional intelligence. If you look at a student coming to a teacher’s office and have a conversation about motivation, passion, what to learn, what’s the problem that’s really bothering you. That conversation, as powerful as today’s conversational bots are, you don’t get that level of emotional cognitive intelligence from today’s AI. So there’s a lot we can do better, and I do not believe we’re done innovating.
Lenny Rachitsky: Demis had this really interesting interview recently from DeepMind slash Google where someone asked him just like, “What do you think, how far are we from AGI? What does it look like going through there?” He had a really interesting way of approaching it is if we were to give the most cutting-edge model all the information until the end of the 20th century, see if it could come up with all the breakthroughs Einstein had and so far we’re nowhere near that, but they could just-
Dr. Fei Fei Li: No, we’re not. In fact, it’s even worse. Let’s give AI all the data including modern instruments data of celestial bodies, which Newton did not have, and give it to that and just ask AI to create the 17th century set of equations on the laws of bodily movements. Today’s AI cannot do that.
Lenny Rachitsky: All right. We’re ways away is what I’m hearing.
Dr. Fei Fei Li: Yeah.
Lenny Rachitsky: Okay, so let’s talk about world models. To me, this is just another really amazing example of you being ahead of where people end up. So you were way ahead on, okay, we just need a lot of clean data for AI and neural networks to learn. You’ve been talking about this idea of world models for a long time. You started a company to build, essentially there’s language models. This is a different thing. This is a world model. We’ll talk about what that is. And now, as I was preparing for this Elon’s talking about world models, Jensen’s talking about world models, I know Google’s working on this stuff. You’ve been at this for a long time and you actually just launched something that’s going, we’re going to talk about right before this podcast airs. Talk about what is a world model? Why is it so important?
Dr. Fei Fei Li: I’m very excited to see that more and more people are talking about world models like Elon, like Jensen. I have been thinking about really how to push AI forward all my life and the large language models that came out of the research world and then OpenAI and all this, for the past few years, were extremely inspiring even for a researcher like me. I remembered when GPT2 came out, and that was in, I think, late 2020. I was co-director, I still am, but I was at that time full-time co-director of Stanford’s Human-Centered AI institute, and I remember it was… The public was not aware of the power of the large language model yet, but as researchers, we were seeing it, we’re seeing the future, and I had pretty long conversations with my natural language processing colleagues like Percy Liang and Chris Manning. We were talking about how critical this technology is going to be and the Stanford AI Institute, Human-Centered AI Institute, HAI, was the first one to establish a full research center foundation model.
We were, Percy Liang, and many researchers led the first academic paper foundation model. So it was just very inspiring for me. Of course, I come from the world of visual intelligence and I was just thinking there’s so much we can push forward beyond language because humans, humans use our sense of spatial intelligence, a world understanding to do so many things and they are beyond language. Think about a very chaotic first responder scene, whether it’s fire or some traffic accident or some natural disaster. And if you immerse yourself in those scene and think about how people organize themselves to rescue people, to stop further disasters, to put down fires, a lot of that is movements is spontaneous understanding of objects, worlds, human situational awareness. Language is part of that, but a lot of those situations, language cannot get you to put down the fire.
So that is, what is that? I was thinking a lot. And in the meantime, I was doing a lot of robotics research and it dawned on me that the linchpin of connecting the additional intelligence, in addition to language embodied AI, which are robotics, connecting visual intelligence, is the sense of spatial intelligence about understanding the world. And that’s when I think it was 2024, I gave a TED talk about spatial intelligence at world models. And I start formulating this idea back in 2022 based on my robotics and computer vision research. And then one thing that was really clear to me is that I really want to work with the brightest technologists and move as fast as possible to bring this technology to life. And that’s when we founded this company called World Labs. And you can see the word world is in the title of our company because we believe so much in world modeling and spatial intelligence.
Lenny Rachitsky: People are so used to just chatbots and that’s a large language model. A simple way to understand a world model is you basically describe a scene and it generates an infinitely explorable world. We’ll link to the thing you launched, which we’ll talk about, but just is that a simple way to understand it?
Dr. Fei Fei Li: That’s part of it, Lenny. I think a simple way to understand a world model is that this model can allow anyone to create any worlds in their mind’s eye by prompting whether it’s an image or a sentence. And also be able to interact in this world whether you are browsing and walking or picking objects up or changing things as well as to reason within this world, for example, if the person consuming, if the agent consuming this output of the world model is a robot, it should be able to plan its path and help to tidy the kitchen, for example. So world model is a foundation that you can use to reason, to interact, and to create worlds.
Lenny Rachitsky: Great. Yeah. So robots feels like that’s potentially the next big focus for AI researchers and just the impact on the world. And what you’re saying here is this is a key missing piece of making robots actually work in the real world, understanding how the world works.
Dr. Fei Fei Li: Yeah. Well, first of all, I do think there’s more than robots. That’s exciting. But I agree with everything you just said. I think world modeling and spatial intelligence is a key missing piece of embodied AI. I also think let’s not underestimate that humans are embodied agents and humans can be augmented by AI’s intelligence. Just like today, humans are language animals, but we’re very much augmented by AI helping us to do language tasks including software engineering. I think that we shouldn’t underestimate or maybe we tend not to talk about how humans, as an embodied agents, can actually benefit so much from world models and spatial intelligence models as well as robots can.
Lenny Rachitsky: So the big unlocks here, robots, which a huge deal if this works out, imagine each of us has robots doing a bunch of stuff for us, they help us with disasters, things like that. Games obviously is a really cool example, just like infinitely playable games that you just invent out of your head. And then creativity feels like just like being fun, having fun, being creative, thinking of magic, wild new worlds, and environments.
Dr. Fei Fei Li: And also design, humans design from machines to buildings to homes and also scientific discovery. There is so much. I like to use the example of the discovery of the structure of DNA. If you look at one of the most important piece in DNA’s discovery history is the x-ray diffraction photo that was captured by Rosalind Franklin, and it was a flat 2D photo of a structure that it looks like a cross with diffractions. You can google those photos. But with that 2D flat photo, the humans, especially two important humans, James Watson and Francis Crick, in addition to their other information, was able to reason in 3D space and deduce a highly three-dimensional double helix structure of the DNA. And that structure cannot possibly be 2D. You cannot think in 2D and deduce that structure. You have to think in 3D spatial, use the human spatial intelligence. So I think even in scientific discovery, spatial intelligence or AI-assisted spatial intelligence is critical.
Lenny Rachitsky: This is such an example of, I think it was Chris Dixon that had this line that the next big thing is going to start off feeling like a toy. When ChatGPT just came out, I remember Sam Altman just tweeted it as like, “Here’s a cool thing we’re playing with, check it out.” Now, it’s the fastest growing product to all of history, changed the world. And it’s oftentimes the things that just look like, okay, this is cool, that it’s a fun to play with that end up changing the world most.
It’s a more secure and branded experience. Plus you get features like interactive carousels and suggested replies. And here’s why this matters, US carriers are starting to adopt RCS. Sinch is already helping major brands send RCS messages around the world and they’re helping Lenny’s podcast listeners get registered first before the rush hits the US market. Learn more and get started at sinch.com/lenny. That’s S-I-N-C-H.com/lenny.
I reached out to Ben Horowitz, who loves what you’re doing, a big fan of yours. They’re investors I believe in…
Dr. Fei Fei Li: Yeah, we’ve known each other for many years, but yes, right now they’re investors of World Labs.
Lenny Rachitsky: Amazing. Okay, so I asked him what I should ask you about and he suggested ask you why is the bitter lesson alone not likely to work for robots? So first of all, just explain what the bitter lesson was in the history of AI and then just why that won’t get us to where we want to be with robots.
Dr. Fei Fei Li: Well, first of all, there are many bitter lessons, but the bitter lessons everybody refers to is a paper written by Richard Sutton who won the Turing Award recently, and he does a lot of reinforcement learning. And Richard has said, if you look at the history, especially the algorithmic development of AI, it turns out simpler model with a ton of data always win at the end of the day instead of the more complex model with less data. I mean, that was actually… This paper came years after ImageNet. That to me was not bitter; it was a sweet lesson. That’s why I built ImageNet because I believe that big data plays that role. So why can’t bitter lesson work in robotics alone? Well, first of all, I think we need to give credit to where we are today. Robotics is very much in the early days of experimentation.
The research is not nearly as mature as say language models. So many people are still experimenting with different algorithms and some of those algorithms are driven by big data. So I do think big data will continue to play a role in robotics, but what is hard for robotics, there are a couple of things. One is that it’s harder to get data. It’s a lot harder to get data. You can say, well, there’s web data. This is where the latest robotics research is using web videos. And I think web videos do play a role. But if you think about what made language model worth a very… As someone who does computer vision and spatial intelligence and robotics, I’m very jealous of my colleagues in language because they had this perfect setup where their training data are in words, eventually tokens, and then they produce a model that outputs words.
So you have this perfect alignment between what you hope to get, which we call objective function and what your training data looks like. But robotics is different. Even spatial intelligence is different. You hope to get actions out of robots, but your training data lacks actions in 3D worlds, and that’s what robots have to do, right? Actions in 3D worlds. So you have to find different ways to fit a, what do they call, a square in a round hole, that what we have is tons of web videos. So then we have to start talking about adding supplementing data such as teleoperation data or synthetic data so that the robots are trained with this hypothesis of bitter lesson, which is large amount of data. I think there’s still hope because even what we are doing in world modeling will really unlock a lot of this information for robots.
But I think we have to be careful because we’re at the early days of this and bitter lesson is still to be tested because we haven’t fully figured out the data for. Another part of the bitter lesson of robotics I think we should be so realistic about is again, compared to language models or even spatial models, robots are physical systems. So robots are closer to self-driving cars than a large language model. And that’s very important to recognize. That means that in order for robots to work, we not only need brains, we also need the physical body. We also need application scenarios. If you look at the history of self-driving car, my colleague Sebastian Thrun took Stanford’s car to win the first DARPA challenge in 2006 or 2005. It’s 20 years since that prototype of a self-driving car being able to drive 130 miles in the Nevada desert to today’s Waymo and on the street of San Francisco.
And we’re not even done yet. There’s still a lot. So that’s a 20-year journey. And self-driving cars are much simpler robots, they’re just metal boxes running on 2D surfaces, and the goal is not to touch anything. Robot is 3D things running in 3D world, and the goal is to touch things. So the journey is going to be, there’s many aspects, elements, and of course one could say, well, the self-driving car, early algorithm were pre deep learning era. So deep learning is accelerating the brains. And I think that’s true. That’s why I’m in robotics, that’s why I’m in spatial intelligence and I’m excited by it. But in the meantime, the car industry is very mature and productizing also involves the mature use cases, supply chains, the hardware. So I think it’s a very interesting time to work in these problems. But it’s true, Ben is right. We might still be subject to a number of bitter lessons.
Lenny Rachitsky: Doing this work, do you ever just feel awe for the way the brain works and is able to do all of this for us? Just the complexity just to get a machine to just walk around and not hit things and fall, does just give you more respect for what we’ve already got?
Dr. Fei Fei Li: Totally. We operate on about 20 watts. That’s dimmer than any light bulb in the room I’m in right now. And yet we can do so much. So I think actually the more I work in AI, the more I respect humans.
Lenny Rachitsky: Let’s talk about this product you just launched. It’s called Marble, a very cute name. Talk about what this is, why this is important. I’ve been playing with it, it’s incredible. We’ll link to it for folks to check it out. What is Marble?
Dr. Fei Fei Li: Yeah, I’m very excited. So first of all, Marble is one of the first product that World Labs has rolled out. World Labs is a foundation frontier model company. We are founded by four co-founders who have deep technical history. My co-founders, Justin Johnson, Christoph Lassner, and Ben Mildenhall. We all come from the research field of AI, computer graphics, computer vision, and we believe that spatial intelligence and world modeling is as important, if not more, to language models and complementary to language models. So we wanted to seize this opportunity to create deep tech research lab that can connect the dots between frontier models with products. So Marble is an app that’s built upon our frontier models. We’ve spent a year and plus building the world’s first generative model that can output genuinely 3D worlds. That’s a very, very hard problem.
And it was a very hard process and we have a team of incredible, founding team of incredible technologists from incredible teams. And then around just a month or two ago, we saw the first time that we can just prompt with a sentence and the image and multiple images and create worlds that we can just navigate in. If you put it on Google, which we have an option to let you do that, you can even walk around. Even though we’ve been building this for quite a while, it was still just awe-inspiring and we wanted to get into the hands of people who need it. And then we know that so many creators, designers, people who are thinking about robotic simulation, people who are thinking about different use cases of navigable interactable, immersive worlds game developers will find this useful. So we developed Marble as a first step. It’s again, still very early, but it’s the world’s first model doing this, and it’s the world’s first product that allows people to just prompt, we call it prompt to worlds.
Lenny Rachitsky: Well, I’ve been playing around with it. It is insane. You could just have a little Shire world where you just infinitely walk around middle earth basically, and there’s no one there yet, but it’s insane. You just go anywhere. There’s dystopian world. I’m just looking at all these examples and my favorite part, actually, I don’t know if there’s a feature or bug, you can see the dots of the world before it actually renders with all the textures. And I just love like, you get a glimpse into what is going on with this model, basically-
Dr. Fei Fei Li: That is so cool to hear because this is where, as a researcher, I am learning because the dots that lead you into the world was an intentional feature visualization, is not part of the model. The model actually just generates the world. But we were trying to find a way to guide people into the world, and a number of engineers worked on different versions, but we converged on the dot, and so many people, you’re not the only one, told us how delightful that experience is, and it was really satisfying for us to hear that this intentional visualization feature that’s not just the big hardcore model actually has delighted our users.
Lenny Rachitsky: Wow. So you add that to make it more, like to have humans understand what’s going on-
Dr. Fei Fei Li: To have fun, yes.
Lenny Rachitsky: … get more delightful. Wow, that is hilarious. It makes me think about LLMs and the way they, it’s not the same thing, but they talk about what they’re thinking and what they’re doing.
Dr. Fei Fei Li: Yes, it is. It is.
Lenny Rachitsky: It also makes me think about just the Matrix. It’s exactly the Matrix experience. I don’t know if that was your inspiration.
Dr. Fei Fei Li: Well, like I said, a number of engineers worked on that. It could be their inspiration.
Lenny Rachitsky: It’s in their subconscious. Okay, so just for folks that may want to play around with this, maybe like, what are some applications today that folks can start using today? What’s your goal with this launch?
Dr. Fei Fei Li: Yeah, so we do believe that world modeling is very horizontal, but we’re already seeing some really exciting use cases, virtual production for movies, because what they need are 3D worlds that they can align with the camera. So when the actors are acting on it, they can position the camera and shoot the segments really well. And we’re already seeing incredible use. In fact, I don’t know if you have seen our launch video showing Marble. It was produced by a virtual production company. We collaborated with Sony and they use Marble scenes to shoot those videos. So we were collaborating with those technical artists and directors, and they were saying, this has cut our production time by 40X. In fact, it has to-
Lenny Rachitsky: 40X?
Dr. Fei Fei Li: Yes, in fact it has to, because we only had one month to work on this project and there were so many things they were trying to shoot. So using Marble really, really significantly accelerated the virtual production for VFX and movies. That’s one use cases. We are already seeing our users taking our Marble scene and taking the mesh export and putting games, whether it’s games on VR or just fun games that they have developed. We are showing an example of robotic simulation because when I was, I mean I still am a researcher doing robotic training. One of the biggest pain point is to create synthetic data for training robots. And this synthetic data needs to be very diverse. They need to come from different environments with different objects to manipulate. And one path to it is to ask computers to simulate.
Otherwise, humans have to build every single asset for robots. That’s just going to take a lot longer. So we already have researchers reaching out and wanting to use Marble to create those synthetic environments. We also have unexpected user outreach in terms of how they want to use Marble. For example, a psychologist team called us to use Marble to do psychology research. It turned out some of the psychiatric patients they study, they need to understand how their brain respond to different immersive things of different features. For example, messy scenes or clean scenes or whatever you name it. And it’s very hard for researchers to get their hands on these kind of immersive scenes and it will take them too long and too much budget to create. And Marble is a really almost instantaneous way of getting so many of these experimental environments into their hands. So we’re seeing multiple use cases at this point. But the VFX, the game developers, the simulation developers as well as designers are very excited.
Lenny Rachitsky: This is very much the way things work in AI. I’ve had other AI leaders on the podcast and it’s always put things out there early as soon as you can to discover where the big use cases are. The head of ChatGPT told me how, when they first put out ChatGPT, he was just scanning TikTok to see how people were using it and all the things they were talking about, and that’s what convinced them where to lean in and help them see how people actually want to use it. I love this last use case for therapy. I’m just imagining heights, people dealing with heights or snakes or spiders, which-
Dr. Fei Fei Li: It’s amazing. A friend of mine last night literally called me and talked about his height scare and asked me if Marble should be used. It’s amazing you went straight there.
Lenny Rachitsky: Because imagining all the exposure therapy stuff, this could be so good for that. That is so cool. Okay, so I should have asked you this before, but I think there’s going to be a question of just, how does this differ from things like VO3 and other video generation models? It’s pretty clear to me, but I think it might be helpful just to explain how this is different from all the video AI tools people have seen.
Dr. Fei Fei Li: World Labs’ thesis is that spatial intelligence is fundamentally very important, and spatial intelligence is not just about videos. In fact, the world is not passively watching videos passing by. I love, Plato has the allegory of the cave analogy to describe vision. He said that imagine a prisoner tied on his chair, not very humane, but in a cave watching a full life theater in front of him, but the actual life theater that actors are acting is behind his back. It was just lit so that the projection of the action is on a wall of the cave. And then the goal, the task of this prisoner is to figure out what’s going on. It’s a pretty extreme example, but it really shows, it describes what vision is about, is that to make sense of the 3D world or 4D world out of 2D. So spatial intelligence to me is deeper than only creating that flat 2D world.
Spatial intelligence to me is the ability to create, reason, interact, make sense of deeply spatial world, whether it’s 2D or 3D or 4D, including dynamics and all that. So World Lab is focusing on that, and of course the ability to create videos per se could be part of this. And in fact, just a couple of weeks ago, we rolled out the world’s first real time demoable, real-time video generation on a single H100 GPU. So part of our technology includes that, but I think Marble is very different because we really want creators, designers, developers to have in their hands a model that can give them worlds with 3D structures so they can use it for their work. And that’s why Marble is so different.
Lenny Rachitsky: The way I see it is it’s a platform for a ton of opportunity to do stuff. As you described, videos are just like, here’s a one-off video that’s very fun and cool and you could… And that’s it. That’s it. And you move on.
Dr. Fei Fei Li: By the way, we could in Marble, we could allow people to export in video forms. So you could actually, like you said, you go into a world, so let’s say it’s a hobbit cave. You can actually, especially as a creator, you have such a specific way of moving the camera in a trajectory in the director’s mind, and then you can export that from Marble into a video.
Lenny Rachitsky: What does it take to create something like this? Just how big is the team, how many GPUs you work in? Anything you can share there. I don’t know how much of this is private information, but just what does it take to create something like this that you’ve launched here?
Dr. Fei Fei Li: It takes a lot of brain power. So we just talk about 20 watts per brain. So from that point of view, it’s a small number, but it’s actually incredible. It’s half billion years of evolution to give us those power. We have a team of 30-ish people now, and we are predominantly researchers and research engineers, but we also have designers and product. We actually really believe that we want to create a company that’s anchored in the deep tech of spatial intelligence, but we are actually building serious products. So we have this integration of R&D and productization, and of course, we use a ton of GPUs.
Lenny Rachitsky: That’s the technical thing.
Dr. Fei Fei Li: Happy to hear.
Lenny Rachitsky: Well, congrats on the launch. I know this is a huge milestone. I know this took a ton of work.
Dr. Fei Fei Li: Thank you.
Lenny Rachitsky: So I just want to say congrats to you and your team. Let me talk about your founder journey for a moment. So you’re a founder of this company. You started how many years ago? A couple of years ago, two, three years ago?
Dr. Fei Fei Li: A year ago.
Lenny Rachitsky: A year ago?
Dr. Fei Fei Li: A year plus.
Lenny Rachitsky: A year? Okay. Wow.
Dr. Fei Fei Li: Probably, 18 month, yeah.
Lenny Rachitsky: Okay. What’s something you wish you knew before you started this that you wish you could whisper into the ear of Fei-Fei of 18 months ago?
Dr. Fei Fei Li: Well, I continue to wish I know the future of technology. I think actually that’s one of our founding advantage is that we see the future earlier in general than most people. But still, man, this is so exciting and so amazing that what’s unknown and what’s coming, but I know the reason you’re asking me this question is not about the future of technology. Furthermore, look, I did not start a company of this scale at 20-year-old. So I started a dry cleaner when I was 19, but that’s a little smaller scale.
Lenny Rachitsky: We got to talk about that.
Dr. Fei Fei Li: And then I founded Google Cloud AI and then I founded an institute at Stanford but those are different beasts. I did feel I was a little more prepared as a founder of the grinding journey compared to maybe the 20-year-old founders. But I still, I’m surprised, and it puts me into paranoia sometimes that how intensely competitive AI landscape is from the model, the technology itself, as well as talents. And when I founded the company, we did not have these incredible stories of how much certain talents would cost. So these are things that continue to surprise me and I have to be very alert about.
Lenny Rachitsky: So the competition you’re talking about is the competition for talent, the speed at which just how things are moving.
Dr. Fei Fei Li: Yeah.
Lenny Rachitsky: Yeah. You mentioned this point that I want to come back to that if you just look over the course of your career, you were at all of the major collections of humans that led to so many of the breakthroughs that are happening today. Obviously, we talk about ImageNet also just SAIL at Stanford is where a lot of the work happened, Google Cloud, which a lot of the breakthroughs happened. What brought you to those places? Like for people looking for how to advance in their career, be at the center of the future, just is there a through line there of just what pulled you from place to place and pulled you into those groups that might be helpful for people to hear?
Dr. Fei Fei Li: Yeah, this is actually a great question, Lenny, because I do think about it, and obviously we talked about it’s curiosity and passion that brought me to AI, that is more a scientific north star, right? I did not care if AI was a thing or not, so that was one part. But how did I end up choosing in the particular places I work in, including starting World Labs, is I think I’m very grateful to myself or maybe to my parents’ genes. I’m an intellectually very fearless person, and I have to say when I hire young people, I look for that because I think that’s a very important quality if one wants to make a difference, is that when you want to make a difference, you have to accept that you’re creating something new or you’re diving into something new. People haven’t done that. And if you have that self-awareness, you almost have to allow yourself to be fearless and to be courageous.
So when I, for example, came to Stanford, in the world of academia, I was very close to this thing called tenure, which is have the job forever at Princeton. But I chose to come to Stanford because… I love Princeton. It’s by alma mater. It’s just at that moment there are people who are so amazing at Stanford and the Silicon Valley ecosystem was so amazing that I was okay to take a risk of restarting my tenure clock. Becoming the first female director of SAIL, I was actually relatively speaking a very young faculty at that time, and I wanted to do that because I care about that community. I didn’t spend too much time thinking about all the failure cases.
Obviously, I was very lucky that the more senior faculty supported me, but I just wanted to make a difference. And then going to Google was similar. I wanted to work with people like Jeff Dean, Jeff Hinton, and all these incredible demists, the incredible people. The same with World Labs. I have this passion. And I also believe that people with the same mission can do incredible things. So that’s how it guided my through line. I don’t overthink of all possible things that can go wrong because that’s too many.
Lenny Rachitsky: I feel like an important element of this is not focusing on the downside, focusing more on the people, the mission. What gets you excited, what do you think, the curiosity.
Dr. Fei Fei Li: Yeah. I do want to say one thing to all the young talents in AI, the engineers, the researchers out there, because some of you apply to World Labs, I feel very privileged you considered World Labs. I do find many of the young people today think about every single aspect of an equation when they decide on jobs. At some point, maybe that’s the way they want to do it, but sometimes I do want to encourage young people to focus on what’s important because I find myself constantly in mentoring mode when I talk to job candidates. Not necessarily recruiting or not recruiting, but just in mentoring mode when I see an incredible young talent who is over-focusing on every minute dimension and aspect of considering a job, when maybe the most important thing is where’s your passion? Do you align with the mission? Do you believe and have faith in this team? And just focus on the impact and you can make and the kind of work and team you can work with.
Lenny Rachitsky: Yeah, it’s tough. It’s tough for people in the AI space. Now there’s so much, so much at them, so much new, so much happening, so much FOMO.
Dr. Fei Fei Li: That’s true.
Lenny Rachitsky: I could see the stress. And so I think that advice is really important. Just like what will actually make you feel fulfilled in what you’re doing, not just where’s the fastest growing company, where’s the… Who’s going to win? I don’t know. I want to make sure I ask you about the work you’re doing today at Stanford, at the HCI. I think it’s the-
Dr. Fei Fei Li: HAI.
Lenny Rachitsky: HAI, Human-Centered AI Institute. What are you doing there? I know this is a thing you do on the side still.
Dr. Fei Fei Li: So yes, HAI, Human-Centered AI Institute was co-founded by me and a group of faculty like Professor John Etchemendy, Professor James Landay, Professor Chris Manning back in 2018. I was actually finishing my last sabbatical at Google and it was a very, very important decision for me because I could have stayed in industry, but my time at Google taught me one thing is AI is going to be a civilization of technology. And it dawned on me how important this is to humanity to the point that I actually wrote a piece in New York Times, that year 2018, to talk about the need for a guiding framework to develop and to apply AI. And that framework has to be anchored in human benevolence, in human centeredness. And I felt that Stanford, one of the world’s top university in the heart of Silicon Valley that gave birth to important companies from NVIDIA to Google, should be a thought leader to create this human-centered AI framework and to actually embody that in our research education and policy and ecosystem work.
So I founded HAI. Fast-forward, after six, seven years, it has become the world’s largest AI institute that does human-centered research, education, ecosystem, outreach, and policy impact. It involves hundreds of faculty across all eight schools at Stanford, from medicine to education, to sustainability to business, to engineering, to humanities to law. And we support researchers, especially at the interdisciplinary area from digital economy, to legal studies, to political science, to discovery of new drugs, to new algorithms to that’s beyond transformers. We also actually put a very strong focus on policy because when we started HAI, I realized that Silicon Valley did not talk to Washington DC and or Brussels or other parts of the world.
And given how important this technology is, we need to bring everybody on board. So we created multiple programs from congressional bootcamp to AI index report to policy briefing, and we especially participated in policymaking including advocating for a national AI research cloud bill that was passed in the first Trump administration and participating in state level regulatory AI discussions. So there’s a lot we did, and I continue to be one of the leaders even though I’m much less involved operationally because I care not only we create this technology, but we use it in the right way.
Lenny Rachitsky: Wow. I was not aware of all that other work you were doing. As you’re talking, I was reminded Charlie Munger had this quote, “Take a simple idea and take it very seriously.” I feel like you’ve done that in so many different ways and stayed with it and it’s unbelievable the impact that you’ve had in so many ways over the years. I’m going to skip the lightning round and I’m just looking to ask you one last question. Is there anything else that you wanted to share? Anything else you want to leave listeners with?
Dr. Fei Fei Li: I am very excited by AI, Lenny. I want to answer one question that when I travel around the world, everybody asks me is that, if I’m a musician, if I’m a teacher, middle school teacher, if I’m a nurse, if I’m an accountant, if I’m a farmer, do I have a role in AI or is AI just going to take over my life or my work? And I think this is the most important question of AI and I find that in Silicon Valley, we tend not to speak heart-to-heart with people, with people like us and not like us in Silicon Valley, but all of us, we tend to just toss around words like infinite productivity or infinite leisure time or infinite power or whatever. But at the end of the day, AI is about people. And when people ask me that question, it’s a resounding yes, everybody has a role in AI.
It depends on what you do and what you want. But no technology should take away human dignity and the human dignity and agency should be at the heart of the development, the deployment, as well as the governance of every technology. So if you are a young artist and your passion is storytelling, embrace AI as a tool. In fact, embrace Marble. I hope it becomes a tool for you because the way you tell your story is unique and the world still needs it. But how you tell your story, how do you use the most incredible tool to tell your story in the most unique way is important. And that voice needs to be heard. If you are a farmer near retirement, AI still matters because you are a citizen. You can participate in your community, you should have a voice in how AI is used, how AI is applied.
You work with people that you can encourage all of you to use AI to make life easier for you. If you are a nurse, I hope you know that at least in my career, I have worked so much in healthcare research because I feel our healthcare workers should be greatly augmented and helped by AI technology, whether it’s smart cameras to feed more information or robotic assistance because our nurses are overworked, overfatigued, and as our society ages, we need more help for people to be taken care of. So AI can play that role. So I just want to say that it’s so important that even a technologist like me are sincere about that everybody has a role in AI.
Lenny Rachitsky: What a beautiful way to end it. Such a tie back to where we started about how it’s up to us and take individual responsibility for what AI will do in our lives. Final question, where can folks find Marble? Where can they go, maybe try to join World Labs if they want to? What’s the website? Where do people go?
Dr. Fei Fei Li: Well, World Labs website is www.worldlabs.ai and you can find our research progress there. We have technical blogs. You can find Marble, the product there. You can sign in there. You can find our job posts link there. We’re in San Francisco. We love to work with the world’s best talents.
Lenny Rachitsky: Amazing. Fei-Fei, thank you so much for being here.
Dr. Fei Fei Li: Thank you, Lenny.
Lenny Rachitsky: Bye everyone.
Thank you so much for listening. If you found this valuable, you can subscribe to the show on Apple Podcasts, Spotify, or your favorite podcast app. Also, please consider giving us a rating or leaving a review as that really helps other listeners find the podcast. You can find all past episodes or learn more about the show at lennyspodcast.com. See you in the next episode.
Glossary
| English | 中文 |
|---|---|
| agency | 能动性 |
| AGI | AGI(通用人工智能,Artificial General Intelligence) |
| AI winter | AI寒冬 |
| Alan Turing | Alan Turing(计算机科学与人工智能之父) |
| Alex Wang | Alex Wang |
| AlexNet | AlexNet(经典深度卷积神经网络) |
| allegory of the cave | 洞穴寓言 |
| Bayesian network | 贝叶斯网络 |
| Ben Horowitz | Ben Horowitz |
| Ben Mildenhall | Ben Mildenhall(World Labs 联合创始人) |
| Charlie Munger | Charlie Munger(美国投资家、伯克希尔·哈撒韦公司副董事长) |
| Chris Dixon | Chris Dixon(风险投资人) |
| Chris Manning | Chris Manning(斯坦福大学 NLP 研究者) |
| Christoph Lassner | Christoph Lassner(World Labs 联合创始人) |
| Condoleezza Rice | Condoleezza Rice |
| DARPA | DARPA(美国国防高级研究计划局) |
| Demis | Demis(DeepMind 联合创始人) |
| double helix | 双螺旋 |
| Einstein | Einstein(爱因斯坦) |
| Elon | Elon(Elon Musk) |
| embodied AI | 具身 AI |
| expert systems | 专家系统 |
| exposure therapy | 暴露疗法 |
| FOMO | FOMO(Fear of Missing Out,错失恐惧) |
| foundation model | 基础模型 |
| Francis Crick | Francis Crick(DNA 双螺旋结构发现者之一) |
| Geoff Hinton | Geoff Hinton |
| Human-Centered AI Institute | 以人为本 AI 研究院 |
| ImageNet | ImageNet(保持原文,专有数据集名称) |
| Isaac Newton | Isaac Newton(牛顿) |
| James Landay | James Landay(斯坦福大学教授) |
| James Watson | James Watson(DNA 双螺旋结构发现者之一) |
| Jeff Dean | Jeff Dean(Google AI 负责人) |
| Jensen | Jensen(NVIDIA CEO 黄仁勋,Jensen Huang) |
| John Etchemendy | John Etchemendy(斯坦福大学教授) |
| Justin Johnson | Justin Johnson(World Labs 联合创始人) |
| large language model | 大语言模型 |
| Lenny Rachitsky | Lenny Rachitsky(播客主持人) |
| logic systems | 逻辑系统 |
| machine learning | 机器学习 |
| Marble | Marble(World Labs 发布的三维世界生成产品) |
| mesh | mesh(网格,三维模型的基本数据结构) |
| neural network | 神经网络 |
| Percy Liang | Percy Liang(斯坦福大学 NLP 研究者) |
| Plato | 柏拉图 |
| prompt to worlds | 提示即世界 |
| reinforcement learning | 强化学习 |
| Richard Sutton | Richard Sutton(强化学习先驱,图灵奖得主) |
| Rosalind Franklin | Rosalind Franklin(DNA 结构发现的关键贡献者) |
| SAIL | SAIL(Stanford Artificial Intelligence Lab,斯坦福人工智能实验室) |
| Sam Altman | Sam Altman(OpenAI CEO) |
| Sebastian Thrun | Sebastian Thrun(斯坦福大学教授,自动驾驶先驱) |
| situational awareness | 情境感知 |
| spatial intelligence | 空间智能 |
| statistical learning | 统计学习 |
| synthetic data | 合成数据 |
| tenure | 终身教职 |
| the bitter lesson | 苦涩的教训(the bitter lesson) |
| The Matrix | 《黑客帝国》 |
| token | token(令牌,语言模型的基本处理单元) |
| transformer | transformer |
| VFX | VFX(视觉特效,Visual Effects) |
| vibe coding | vibe coding(保持原文) |
| virtual production | 虚拟制作 |
| visual intelligence | 视觉智能 |
| Waymo | Waymo(Alphabet 旗下的自动驾驶公司) |
| WordNet | WordNet(普林斯顿大学开发的词汇语义网络) |
| World Labs | World Labs(李飞飞创立的公司) |
| World models | 世界模型 |
| x-ray diffraction | X 射线衍射 |
Reformatted by reformat_english.py