百年编程语言
百年编程语言
2003年4月
(本文源于在PyCon 2003的主题演讲。)
很难预测一百年后的生活会是什么样子。我们只能确定几件事。我们知道每个人都会驾驶飞行汽车,区划法律将会放宽以允许建造数百层高的建筑,大部分时间都会是黑暗的,女性都会接受武术训练。在这里,我想聚焦于这个图景中的一个细节。他们会用什么样的编程语言来编写控制那些飞行汽车的软件?
思考这个问题之所以有价值,并不是因为我们真的能用到这些语言,而是因为,如果我们幸运的话,我们会使用从现在到那个时间点的路径上的语言。
我认为,像物种一样,语言会形成进化树,到处都有死胡同分支。我们已经看到这种情况正在发生。Cobol尽管曾经流行,但似乎没有任何知识后代。它是一个进化死胡同——尼安德特人式的语言。
我预测Java也会有类似的命运。人们有时给我写信说,“你怎么能说Java不会成为一种成功的语言?它已经是成功的语言了。“我承认它是,如果你用关于它的书籍所占用的书架空间(特别是关于它的单本书籍),或者相信必须学习它才能找到工作的本科生数量来衡量成功的话。当我说Java不会成为成功的语言时,我的意思更具体:Java将是一个进化死胡同,就像Cobol一样。
这只是一个猜测。我可能是错的。我在这里的重点不是批评Java,而是提出进化树的问题,让人们思考,语言X在树的哪个位置?问这个问题的原因不仅仅是为了让我们的鬼魂在一百年后说我告诉过你这样。这是因为靠近主要分支是寻找现在适合编程的语言的有用启发式方法。
在任何给定的时间,你可能最幸福地处于进化树的主要分支上。即使仍然有很多尼安德特人,成为其中的一员也一定很糟糕。克罗马农人会不断过来殴打你并偷走你的食物。
我想知道一百年后语言会是什么样子,这样我就知道现在应该赌树的哪个分支。
语言的进化与物种的进化不同,因为分支可以融合。例如,Fortran分支似乎正在与Algol的后代融合。理论上这对物种也是可能的,但不太可能发生在比细胞更大的生物上。
语言更有可能融合,部分是因为可能性空间较小,部分是因为突变不是随机的。语言设计师有意地融合其他语言的思想。
对于语言设计师来说,思考编程语言的进化可能导致的方向特别有用,因为他们可以相应地引导。在这种情况下,“停留在主分支上”不仅仅是选择好语言的方法。它成为做出正确语言设计决策的启发式方法。
任何编程语言都可以分为两部分:一组扮演公理角色的基本运算符,以及语言的其余部分,原则上可以用这些基本运算符来编写。
我认为基本运算符是语言长期生存的最重要因素。其余的你可以改变。这就像买房子的规则,你首先应该考虑位置。其他的一切你都可以稍后修复,但你不能修复位置。
我认为不仅公理要选择得好,而且数量要少。数学家一直对公理有这样的感觉——越少越好——我认为他们抓住了要点。
至少,仔细审视语言的核心,看看是否有任何可以剔除的公理,这必须是一个有用的练习。我在我作为一个邋遢人的漫长职业生涯中发现,冗余会产生冗余,我看到这种情况不仅发生在软件中,也发生在床下和房间角落。
我有一种预感,进化树的主要分支通过那些具有最小、最干净核心的语言。你能在语言本身中编写的语言部分越多越好。
当然,即使问一百年后编程语言会是什么样子,我也做出了一个很大的假设。一百年后我们还会编写程序吗?我们不会只是告诉计算机我们想要它们做什么吗?
到目前为止,那个部门没有太多进展。我猜测一百年后人们仍会使用我们能识别的程序来告诉计算机做什么。可能有些我们现在通过编写程序来解决的问题,一百年后你不必编写程序来解决,但我认为仍会有大量我们今天所做的那种编程。
认为任何人都能预测任何技术在一百年后会是什么样子可能显得 presumptuous。但请记住,我们背后已经有将近五十年的历史。当我们考虑到语言在过去五十年中进化得多么缓慢时,展望一百年是一个可以把握的想法。
语言进化缓慢是因为它们并不是真正的技术。语言是符号。程序是你希望计算机为你解决问题的形式化描述。所以编程语言的进化速度更像数学符号的进化速度,而不是交通或通信的进化速度。数学符号确实在进化,但没有你在技术中看到的巨大飞跃。
无论一百年后计算机由什么制成,似乎可以安全地预测它们会比现在快得多。如果摩尔定律继续发挥作用,它们将快74 quintillion(73,786,976,294,838,206,464)倍。这很难想象。事实上,速度部门最可能的预测可能是摩尔定律将停止工作。任何应该每十八个月翻一番的东西似乎最终都会遇到某种基本限制。但我毫不怀疑计算机会快得多。即使它们最终只快一百倍,这也应该大大改变编程语言的基本规则。除此之外,还会有更多空间给现在被认为是慢的语言,即那些不能产生非常高效代码的语言。
然而,有些应用仍然需要速度。我们想用计算机解决的有些问题是计算机创造的;例如,你必须处理视频图像的速度取决于另一台计算机生成它们的速度。还有另一类问题本身就具有无限吸收计算周期的能力:图像渲染、密码学、模拟。
如果一些应用可以变得越来越低效,而其他应用继续要求硬件能够提供的所有速度,更快的计算机将意味着语言必须覆盖更广泛的效率范围。我们已经看到这种情况正在发生。按照过去几十年的标准,一些流行新语言的当前实现惊人地浪费。
这不仅仅是编程语言发生的事情。这是一个普遍的历史趋势。随着技术的改进,每一代人都能做前一代人认为是浪费的事情。三十年前的人会对我们如此随意地打长途电话感到惊讶。一百年前的人会更惊讶有一天一个包裹会通过孟菲斯从波士顿旅行到纽约。
我已经可以告诉你未来一百年更快的硬件将给我们的所有额外周期会发生什么。它们几乎都会被浪费掉。
我学习编程时计算机能力很稀缺。我记得取出我Basic程序中的所有空格,使它们能适合4K TRS-80的内存。想到所有这些惊人低效的软件一遍又一遍地烧掉周期做同样的事情,我觉得有点恶心。但我认为我的直觉在这里是错的。我就像一个长大的穷人,即使是为了重要的事情也不能忍受花钱,比如去看医生。
有些浪费确实是令人厌恶的。例如,SUVs即使运行在永远不会耗尽且不产生污染的燃料上,也可能是令人厌恶的。SUVs之所以令人厌恶,是因为它们是一个令人厌恶的问题的解决方案。(如何使小型货车看起来更阳刚。)但并非所有浪费都是坏的。现在我们有了支持它的基础设施,计算你的长途通话分钟数开始显得吝啬。如果你有资源,更优雅的思考方式是把所有电话通话视为一类事情,无论对方在哪里。
有好浪费,也有坏浪费。我对好浪费感兴趣——那种通过花费更多,我们可以获得更简单设计的浪费。我们将如何利用新、更快的硬件给我们带来的浪费周期的机会?
对速度的渴望在我们这些拥有可怜计算机的人心中根深蒂固,以至于需要 conscious 努力来克服它。在语言设计中,我们应该有意识地寻找可以用效率换取哪怕是最小的便利增加的情况。
大多数数据结构的存在是因为速度。例如,今天的许多语言既有字符串也有列表。在语义上,字符串或多或少是列表的一个子集,其中元素是字符。那么为什么你需要单独的数据类型呢?你真的不需要。字符串的存在仅仅是为了效率。但是用使程序运行得更快的hack来使语言的语义变得混乱是很蹩脚的。在语言中拥有字符串似乎是一个过早优化的例子。
如果我们将语言的核心视为一组公理,那么仅仅为了效率而增加不增加表达能力的额外公理肯定是令人厌恶的。效率很重要,但我认为这不是获得它的正确方法。
我认为解决这个问题的正确方法是将程序的含义与实现细节分开。不要同时拥有列表和字符串,只拥有列表,同时有某种方式给编译器优化建议,使其能够在必要时将字符串布置为连续的字节。
由于速度在程序的大部分中都不重要,你通常不需要费心处理这种微观管理。随着计算机变得越来越快,这一点会越来越真实。
少说实现细节也应该使程序更灵活。规范在程序编写过程中会改变,这不仅是不可避免的,而且是可取的。
“essay”这个词来自法语动词”essayer”,意思是”尝试”。essay在原始意义上是你写来试图弄清楚某些东西的东西。这在软件中也会发生。我认为一些最好的程序是essay,从这个意义上说,作者们在开始时并不知道他们确切地想要写什么。
Lisp黑客们已经知道灵活使用数据结构的价值。我们倾向于编写程序的第一版本,使其用列表做所有事情。这些初始版本可能如此惊人地低效,以至于需要conscious努力不去思考它们在做什么,就像,至少对我而言,吃牛排需要conscious努力不去思考它来自哪里。
一百年后的程序员最寻找的,最重要的是一种语言,你可以用最少的努力组合成一个令人难以置信的低效版本1程序。至少,这是我们现在会描述的方式。他们会说他们想要一种易于编程的语言。
低效的软件并不令人厌恶。令人厌恶的是让程序员做不必要工作的语言。浪费程序员时间是真正的低效,而不是浪费机器时间。随着计算机变得越来越快,这一点会变得越来越清晰。
我认为摆脱字符串已经是我们可以忍受思考的事情了。我们在Arc中这样做了,这似乎是一个胜利;一些用正则表达式描述会很尴尬的操作可以很容易地描述为递归函数。
这种数据结构扁平化会走多远?我能想到甚至让我这个思想开明的人感到震惊的可能性。例如,我们会摆脱数组吗?毕竟,它们只是哈希表的一个子集,其中键是整数向量。我们会用列表替换哈希表本身吗?
还有比这更令人震惊的前景。例如,McCarthy在1960年描述的Lisp没有数字。从逻辑上讲,你不需要有一个单独的数字概念,因为你可以用列表来表示它们:整数n可以表示为n个元素的列表。你可以用这种方式做数学。这只是令人难以忍受的低效。
实际上没有人提议在实践中用列表实现数字。事实上,McCarthy的1960年论文在当时根本不打算实现。这是一个理论练习,试图创建一个更优雅的图灵机替代方案。当有人意外地拿走这篇论文并将其翻译成工作的Lisp解释器时,数字肯定不是用列表表示的;它们像所有其他语言一样用二进制表示。
编程语言能否走得那么远,以至于摆脱数字作为基本数据类型?我问这个与其说是作为一个严肃的问题,不如说是作为一种与未来玩胆小鬼游戏的方式。这就像不可抗拒的力量遇到不可移动的物体的假设情况——这里,一个令人难以想象的低效实现遇到令人难以想象的巨大资源。我看不出为什么不。未来是相当长的。如果我们能做些什么来减少核心语言中公理的数量,那么随着t趋向无穷大,这似乎是值得押注的一方。如果这个想法在一百年后似乎仍然难以忍受,也许在一千年后不会。
为了明确这一点,我并不是建议所有数值计算实际上都会使用列表进行。我建议核心语言,在关于实现的任何额外符号之前,应该这样定义。在实践中,任何想做任何数量数学的程序可能会用二进制表示数字,但这将是一种优化,而不是核心语言语义的一部分。
燃烧周期的另一种方法是在应用程序和硬件之间拥有多层软件。这也是我们已经看到正在发生的趋势:许多最近的语言被编译成字节码。Bill Woods曾经告诉我,根据经验法则,每层解释成本在速度上要付出10倍的代价。这种额外的代价给你带来灵活性。
Arc的第一个版本就是这种多层次缓慢的极端情况,相应的好处。它是一个经典的”元循环”解释器,在Common Lisp之上编写,与McCarthy原始Lisp论文中定义的eval函数有明显的家族相似性。整个事情只有几百行代码,所以非常容易理解和改变。我们使用的Common Lisp,CLisp,本身在字节码解释器之上运行。所以我们有两层解释,其中一层(顶层)惊人地低效,而语言是可用的。我承认勉强可用,但是可用的。
即使在应用程序内部,将软件编写为多层也是一种强大的技术。自底向上编程意味着将程序编写为一系列层,每一层都作为其上一层的语言。这种方法往往产生更小、更灵活的程序。它也是通往圣杯可重用性的最佳途径。语言按定义是可重用的。你能将应用程序的更多部分推入用于编写这类应用程序的语言中,你的软件就会有更多部分是可重用的。
不知何故,可重用性的想法在20世纪80年代与面向对象编程联系在一起,似乎没有任何相反的证据能够动摇它。但是虽然一些面向对象的软件是可重用的,使其可重用的是其自底向上性,而不是其面向对象性。考虑库:它们是可重用的因为它们是语言,无论它们是否以面向对象风格编写。
顺便说一下,我不预测面向对象编程的消亡。虽然我认为它对好的程序员没有太多提供,除了在某些特定领域,它对大组织来说是不可抗拒的。面向对象编程提供了一种可持续的方式来编写意大利面条式代码。它让你能够将程序累积为一系列补丁。大组织总是倾向于以这种方式开发软件,我期望一百年后也会如此。既然我们在谈论未来,我们最好谈谈并行计算,因为这是这个想法似乎存在的地方。也就是说,无论你什么时候谈论,并行计算似乎都是未来要发生的事情。
未来会赶上它吗?人们谈论并行计算作为即将发生的事情至少有20年了,到目前为止它还没有太多影响编程实践。或者说有吗?芯片设计师现在必须考虑它,试图在多cpu计算机上编写系统软件的人也必须考虑。
真正的问题是,抽象的阶梯上并行会走多远?一百年后它甚至会影响应用程序员吗?或者它会是编译器作者思考的事情,但在应用程序源代码中通常是不可见的?
似乎很可能的一件事是,大多数并行机会被浪费。这是我对我们得到的大部分额外计算机能力将被浪费的更一般预测的一个特例。我期望,就像底层硬件的惊人速度一样,并行将是如果你明确要求它就可以使用的东西,但通常不被使用。这意味着我们一百年后拥有的那种并行不会,除非在特殊应用中,是大规模并行。我期望对于普通程序员来说,它更像是能够分叉出最终并行运行的进程。
这将像要求数据结构的特定实现一样,是你对程序进行优化时相当晚才做的事情。版本1通常会忽略从并行计算中获得的任何优势,就像它们会忽略从数据的特定表示中获得的任何优势一样。
除了特殊类型的应用程序,并行不会渗透到一百年后编写的程序中。如果确实如此,那将是过早优化。
一百年后会有多少种编程语言?最近似乎有大量的新编程语言。部分原因是更快的硬件使程序员能够在速度和便利性之间做出不同的权衡,取决于应用。如果这是一个真正的趋势,我们一百年后拥有的硬件只会增加它。
然而一百年后可能只有几种广泛使用的语言。我说这个的部分原因是乐观:似乎,如果你做得很好,你可以制作一种理想的语言来编写慢速版本1,然而通过给编译器正确的优化建议,也能在必要时产生非常快的代码。所以,既然我乐观,我预测尽管它们在可接受和最大效率之间会有巨大差距,一百年后的程序员将拥有能够跨越大部分差距的语言。
随着这个差距扩大,性能分析器将变得越来越重要。现在对性能分析的关注很少。许多人似乎仍然相信获得快速应用程序的方法是编写生成快速代码的编译器。随着可接受和最大性能之间的差距扩大,获得快速应用程序的方法是拥有一个从可接受到最大的良好指南,这一点将变得越来越清晰。
当我说可能只有几种语言时,我不包括特定领域的”小语言”。我认为这种嵌入式语言是个好主意,我期望它们会激增。但我期望它们被写成足够薄的皮肤,用户可以看到下面的通用语言。
谁将设计未来的语言?过去十年中最令人兴奋的趋势之一是Perl、Python和Ruby等开源语言的兴起。语言设计正被黑客接管。到目前为止的结果是混乱的,但令人鼓舞的。例如,Perl中有一些令人震惊的新颖想法。许多是令人震惊的糟糕,但对于雄心勃勃的努力来说总是如此。以它当前的突变率,天知道Perl在一百年后会进化成什么样子。
那些不能做的人,教(我认识的一些最好的黑客是教授),这并不是真的,但教师不能做很多事情是真的。研究施加了限制性的种姓限制。在任何学术领域,都有可以工作的主题和其他不可以的主题。不幸的是,可接受和禁止主题之间的区别通常基于在研究论文中描述时工作听起来多么智力化,而不是对于获得好结果多么重要。极端情况可能是文学;研究文学的人很少会说任何对生产它的人有丝毫用处的事情。
虽然科学界的情况更好,但被允许做的工作类型和产生好语言的工作类型之间的重叠令人不安地小。(Olin Shivers对此优雅地抱怨过。)例如,类型似乎是研究论文的无尽来源,尽管静态类型似乎排除了真正的宏——没有它,在我看来,没有任何语言值得使用。
趋势不仅仅是语言被开发为开源项目而不是”研究”,而且是语言被需要使用它们的应用程序员设计,而不是被编译器作者设计。这似乎是一个好趋势,我期望它继续。不像一百年后的物理学,这几乎必然是不可能预测的,我认为原则上现在设计一种会吸引一百年后用户的语言是可能的。
设计语言的一种方法是只写下你希望能够编写的程序,无论是否有编译器可以翻译它或硬件可以运行它。当你这样做时,你可以假设无限的资源。似乎我们应该能够像一百年后一样想象无限的资源。
人们想编写什么程序?无论工作量最小的是什么。不完全是:如果你对编程的想法没有被你当前习惯的语言所影响,那么工作量最小的是什么。这种影响可能是如此普遍,以至于需要巨大的努力来克服它。你会认为像我们这样懒惰的生物如何用最少的努力表达程序是显而易见的。事实上,我们关于可能性的想法往往被我们所思考的语言所限制,以至于程序的更容易表述方式似乎非常令人惊讶。它们是你必须发现的东西,而不是你自然陷入的东西。
这里一个有用的技巧是使用程序的长度作为编写工作量的近似值。当然不是字符长度,而是不同语法元素的长度——基本上,解析树的大小。最短的程序不一定是编写工作量最小的,这可能不是完全正确的,但它足够接近,以至于你最好瞄准简洁的坚实目标,而不是工作量最小的模糊、附近目标。那么语言设计算法变成:看一个程序,问,有没有更短的编写方式?
在实践中,用假想的一百年语言编写程序将在不同程度上起作用,这取决于你离核心有多近。排序例程你现在可以写。但是现在很难预测一百年后可能需要什么样的库。大概许多库将是针对甚至还不存在的领域。例如,如果SETI@home工作,我们将需要与外星人通信的库。当然,除非它们足够先进,已经用XML通信。
在另一个极端,我认为你现在可能能够设计核心语言。事实上,有些人可能会说它在1958年已经基本设计好了。
如果百年语言现在可用,我们想要用它编程吗?回答这个问题的一个方法是回头看。如果今天的编程语言在1960年可用,会有人想要使用它们吗?
在某些方面,答案是否定的。今天的语言假设了1960年不存在的基础设施。例如,像Python这样缩进重要的语言在打印机终端上不会很好地工作。但是把这些问题放在一边——假设,例如,程序都只是写在纸上——1960年代的程序员会喜欢用我们现在使用的语言编写程序吗?
我认为是的。一些想象力不那么丰富的人,他们头脑中有早期语言的人工制品,认为程序是什么,可能会有麻烦。(你如何不做指针算术来操作数据?你如何没有gotos来实现流程图?)但我认为最聪明的程序员如果拥有今天的语言,会毫不困难地充分利用它们。
如果我们现在拥有百年语言,它至少会成为一个伟大的伪代码。用它来编写软件怎么样?由于百年语言需要为一些应用生成快速代码,大概它可以生成足够高效的代码在我们的硬件上可接受地运行。我们可能需要比一百年后的用户提供更多优化建议,但这可能仍然是一个净胜利。
现在我们有两个想法,如果结合起来,暗示着有趣的可能性:(1) 百年语言原则上可以现在设计,(2) 这样的语言,如果存在,现在可能适合编程。当你看到这些想法这样 laid out 时,很难不思考,为什么不现在尝试编写百年语言呢?
当你在进行语言设计工作时,我认为拥有这样一个目标并有意识地记住它是好的。当你学习驾驶时,他们教你的原则之一是通过对准远处的一点来对齐汽车,而不是通过将引擎盖与道路上涂的条纹对齐。即使你只关心接下来十英尺内发生的事情,这也是正确的答案。我认为我们能也应该对编程语言做同样的事情。
注释
我相信Lisp Machine Lisp是第一种体现以下原则的语言:声明(动态变量的除外)仅仅是优化建议,不会改变正确程序的含义。Common Lisp似乎是第一个明确陈述这一点的语言。
感谢Trevor Blackwell、Robert Morris和Dan Giffin阅读本文草稿,感谢Guido van Rossum、Jeremy Hylton和Python团队的其余成员邀请我在PyCon演讲。
你将在《黑客与画家》中找到这篇论文和其他14篇。
The Hundred-Year Language
April 2003
(This essay is derived from a keynote talk at PyCon 2003.)
It’s hard to predict what life will be like in a hundred years. There are only a few things we can say with certainty. We know that everyone will drive flying cars, that zoning laws will be relaxed to allow buildings hundreds of stories tall, that it will be dark most of the time, and that women will all be trained in the martial arts. Here I want to zoom in on one detail of this picture. What kind of programming language will they use to write the software controlling those flying cars?
This is worth thinking about not so much because we’ll actually get to use these languages as because, if we’re lucky, we’ll use languages on the path from this point to that.
I think that, like species, languages will form evolutionary trees, with dead-ends branching off all over. We can see this happening already. Cobol, for all its sometime popularity, does not seem to have any intellectual descendants. It is an evolutionary dead-end— a Neanderthal language.
I predict a similar fate for Java. People sometimes send me mail saying, “How can you say that Java won’t turn out to be a successful language? It’s already a successful language.” And I admit that it is, if you measure success by shelf space taken up by books on it (particularly individual books on it), or by the number of undergrads who believe they have to learn it to get a job. When I say Java won’t turn out to be a successful language, I mean something more specific: that Java will turn out to be an evolutionary dead-end, like Cobol.
This is just a guess. I may be wrong. My point here is not to dis Java, but to raise the issue of evolutionary trees and get people asking, where on the tree is language X? The reason to ask this question isn’t just so that our ghosts can say, in a hundred years, I told you so. It’s because staying close to the main branches is a useful heuristic for finding languages that will be good to program in now.
At any given time, you’re probably happiest on the main branches of an evolutionary tree. Even when there were still plenty of Neanderthals, it must have sucked to be one. The Cro-Magnons would have been constantly coming over and beating you up and stealing your food.
The reason I want to know what languages will be like in a hundred years is so that I know what branch of the tree to bet on now.
The evolution of languages differs from the evolution of species because branches can converge. The Fortran branch, for example, seems to be merging with the descendants of Algol. In theory this is possible for species too, but it’s not likely to have happened to any bigger than a cell.
Convergence is more likely for languages partly because the space of possibilities is smaller, and partly because mutations are not random. Language designers deliberately incorporate ideas from other languages.
It’s especially useful for language designers to think about where the evolution of programming languages is likely to lead, because they can steer accordingly. In that case, “stay on a main branch” becomes more than a way to choose a good language. It becomes a heuristic for making the right decisions about language design.
Any programming language can be divided into two parts: some set of fundamental operators that play the role of axioms, and the rest of the language, which could in principle be written in terms of these fundamental operators.
I think the fundamental operators are the most important factor in a language’s long term survival. The rest you can change. It’s like the rule that in buying a house you should consider location first of all. Everything else you can fix later, but you can’t fix the location.
I think it’s important not just that the axioms be well chosen, but that there be few of them. Mathematicians have always felt this way about axioms— the fewer, the better— and I think they’re onto something.
At the very least, it has to be a useful exercise to look closely at the core of a language to see if there are any axioms that could be weeded out. I’ve found in my long career as a slob that cruft breeds cruft, and I’ve seen this happen in software as well as under beds and in the corners of rooms.
I have a hunch that the main branches of the evolutionary tree pass through the languages that have the smallest, cleanest cores. The more of a language you can write in itself, the better.
Of course, I’m making a big assumption in even asking what programming languages will be like in a hundred years. Will we even be writing programs in a hundred years? Won’t we just tell computers what we want them to do?
There hasn’t been a lot of progress in that department so far. My guess is that a hundred years from now people will still tell computers what to do using programs we would recognize as such. There may be tasks that we solve now by writing programs and which in a hundred years you won’t have to write programs to solve, but I think there will still be a good deal of programming of the type that we do today.
It may seem presumptuous to think anyone can predict what any technology will look like in a hundred years. But remember that we already have almost fifty years of history behind us. Looking forward a hundred years is a graspable idea when we consider how slowly languages have evolved in the past fifty.
Languages evolve slowly because they’re not really technologies. Languages are notation. A program is a formal description of the problem you want a computer to solve for you. So the rate of evolution in programming languages is more like the rate of evolution in mathematical notation than, say, transportation or communications. Mathematical notation does evolve, but not with the giant leaps you see in technology.
Whatever computers are made of in a hundred years, it seems safe to predict they will be much faster than they are now. If Moore’s Law continues to put out, they will be 74 quintillion (73,786,976,294,838,206,464) times faster. That’s kind of hard to imagine. And indeed, the most likely prediction in the speed department may be that Moore’s Law will stop working. Anything that is supposed to double every eighteen months seems likely to run up against some kind of fundamental limit eventually. But I have no trouble believing that computers will be very much faster. Even if they only end up being a paltry million times faster, that should change the ground rules for programming languages substantially. Among other things, there will be more room for what would now be considered slow languages, meaning languages that don’t yield very efficient code.
And yet some applications will still demand speed. Some of the problems we want to solve with computers are created by computers; for example, the rate at which you have to process video images depends on the rate at which another computer can generate them. And there is another class of problems which inherently have an unlimited capacity to soak up cycles: image rendering, cryptography, simulations.
If some applications can be increasingly inefficient while others continue to demand all the speed the hardware can deliver, faster computers will mean that languages have to cover an ever wider range of efficiencies. We’ve seen this happening already. Current implementations of some popular new languages are shockingly wasteful by the standards of previous decades.
This isn’t just something that happens with programming languages. It’s a general historical trend. As technologies improve, each generation can do things that the previous generation would have considered wasteful. People thirty years ago would be astonished at how casually we make long distance phone calls. People a hundred years ago would be even more astonished that a package would one day travel from Boston to New York via Memphis.
I can already tell you what’s going to happen to all those extra cycles that faster hardware is going to give us in the next hundred years. They’re nearly all going to be wasted.
I learned to program when computer power was scarce. I can remember taking all the spaces out of my Basic programs so they would fit into the memory of a 4K TRS-80. The thought of all this stupendously inefficient software burning up cycles doing the same thing over and over seems kind of gross to me. But I think my intuitions here are wrong. I’m like someone who grew up poor, and can’t bear to spend money even for something important, like going to the doctor.
Some kinds of waste really are disgusting. SUVs, for example, would arguably be gross even if they ran on a fuel which would never run out and generated no pollution. SUVs are gross because they’re the solution to a gross problem. (How to make minivans look more masculine.) But not all waste is bad. Now that we have the infrastructure to support it, counting the minutes of your long-distance calls starts to seem niggling. If you have the resources, it’s more elegant to think of all phone calls as one kind of thing, no matter where the other person is.
There’s good waste, and bad waste. I’m interested in good waste— the kind where, by spending more, we can get simpler designs. How will we take advantage of the opportunities to waste cycles that we’ll get from new, faster hardware?
The desire for speed is so deeply engrained in us, with our puny computers, that it will take a conscious effort to overcome it. In language design, we should be consciously seeking out situations where we can trade efficiency for even the smallest increase in convenience.
Most data structures exist because of speed. For example, many languages today have both strings and lists. Semantically, strings are more or less a subset of lists in which the elements are characters. So why do you need a separate data type? You don’t, really. Strings only exist for efficiency. But it’s lame to clutter up the semantics of the language with hacks to make programs run faster. Having strings in a language seems to be a case of premature optimization.
If we think of the core of a language as a set of axioms, surely it’s gross to have additional axioms that add no expressive power, simply for the sake of efficiency. Efficiency is important, but I don’t think that’s the right way to get it.
The right way to solve that problem, I think, is to separate the meaning of a program from the implementation details. Instead of having both lists and strings, have just lists, with some way to give the compiler optimization advice that will allow it to lay out strings as contiguous bytes if necessary.
Since speed doesn’t matter in most of a program, you won’t ordinarily need to bother with this sort of micromanagement. This will be more and more true as computers get faster.
Saying less about implementation should also make programs more flexible. Specifications change while a program is being written, and this is not only inevitable, but desirable.
The word “essay” comes from the French verb “essayer”, which means “to try”. An essay, in the original sense, is something you write to try to figure something out. This happens in software too. I think some of the best programs were essays, in the sense that the authors didn’t know when they started exactly what they were trying to write.
Lisp hackers already know about the value of being flexible with data structures. We tend to write the first version of a program so that it does everything with lists. These initial versions can be so shockingly inefficient that it takes a conscious effort not to think about what they’re doing, just as, for me at least, eating a steak requires a conscious effort not to think where it came from.
What programmers in a hundred years will be looking for, most of all, is a language where you can throw together an unbelievably inefficient version 1 of a program with the least possible effort. At least, that’s how we’d describe it in present-day terms. What they’ll say is that they want a language that’s easy to program in.
Inefficient software isn’t gross. What’s gross is a language that makes programmers do needless work. Wasting programmer time is the true inefficiency, not wasting machine time. This will become ever more clear as computers get faster.
I think getting rid of strings is already something we could bear to think about. We did it in Arc, and it seems to be a win; some operations that would be awkward to describe as regular expressions can be described easily as recursive functions.
How far will this flattening of data structures go? I can think of possibilities that shock even me, with my conscientiously broadened mind. Will we get rid of arrays, for example? After all, they’re just a subset of hash tables where the keys are vectors of integers. Will we replace hash tables themselves with lists?
There are more shocking prospects even than that. The Lisp that McCarthy described in 1960, for example, didn’t have numbers. Logically, you don’t need to have a separate notion of numbers, because you can represent them as lists: the integer n could be represented as a list of n elements. You can do math this way. It’s just unbearably inefficient.
No one actually proposed implementing numbers as lists in practice. In fact, McCarthy’s 1960 paper was not, at the time, intended to be implemented at all. It was a theoretical exercise, an attempt to create a more elegant alternative to the Turing Machine. When someone did, unexpectedly, take this paper and translate it into a working Lisp interpreter, numbers certainly weren’t represented as lists; they were represented in binary, as in every other language.
Could a programming language go so far as to get rid of numbers as a fundamental data type? I ask this not so much as a serious question as as a way to play chicken with the future. It’s like the hypothetical case of an irresistible force meeting an immovable object— here, an unimaginably inefficient implementation meeting unimaginably great resources. I don’t see why not. The future is pretty long. If there’s something we can do to decrease the number of axioms in the core language, that would seem to be the side to bet on as t approaches infinity. If the idea still seems unbearable in a hundred years, maybe it won’t in a thousand.
Just to be clear about this, I’m not proposing that all numerical calculations would actually be carried out using lists. I’m proposing that the core language, prior to any additional notations about implementation, be defined this way. In practice any program that wanted to do any amount of math would probably represent numbers in binary, but this would be an optimization, not part of the core language semantics.
Another way to burn up cycles is to have many layers of software between the application and the hardware. This too is a trend we see happening already: many recent languages are compiled into byte code. Bill Woods once told me that, as a rule of thumb, each layer of interpretation costs a factor of 10 in speed. This extra cost buys you flexibility.
The very first version of Arc was an extreme case of this sort of multi-level slowness, with corresponding benefits. It was a classic “metacircular” interpreter written on top of Common Lisp, with a definite family resemblance to the eval function defined in McCarthy’s original Lisp paper. The whole thing was only a couple hundred lines of code, so it was very easy to understand and change. The Common Lisp we used, CLisp, itself runs on top of a byte code interpreter. So here we had two levels of interpretation, one of them (the top one) shockingly inefficient, and the language was usable. Barely usable, I admit, but usable.
Writing software as multiple layers is a powerful technique even within applications. Bottom-up programming means writing a program as a series of layers, each of which serves as a language for the one above. This approach tends to yield smaller, more flexible programs. It’s also the best route to that holy grail, reusability. A language is by definition reusable. The more of your application you can push down into a language for writing that type of application, the more of your software will be reusable.
Somehow the idea of reusability got attached to object-oriented programming in the 1980s, and no amount of evidence to the contrary seems to be able to shake it free. But although some object-oriented software is reusable, what makes it reusable is its bottom-upness, not its object-orientedness. Consider libraries: they’re reusable because they’re language, whether they’re written in an object-oriented style or not.
I don’t predict the demise of object-oriented programming, by the way. Though I don’t think it has much to offer good programmers, except in certain specialized domains, it is irresistible to large organizations. Object-oriented programming offers a sustainable way to write spaghetti code. It lets you accrete programs as a series of patches. Large organizations always tend to develop software this way, and I expect this to be as true in a hundred years as it is today. As long as we’re talking about the future, we had better talk about parallel computation, because that’s where this idea seems to live. That is, no matter when you’re talking, parallel computation seems to be something that is going to happen in the future.
Will the future ever catch up with it? People have been talking about parallel computation as something imminent for at least 20 years, and it hasn’t affected programming practice much so far. Or hasn’t it? Already chip designers have to think about it, and so must people trying to write systems software on multi-cpu computers.
The real question is, how far up the ladder of abstraction will parallelism go? In a hundred years will it affect even application programmers? Or will it be something that compiler writers think about, but which is usually invisible in the source code of applications?
One thing that does seem likely is that most opportunities for parallelism will be wasted. This is a special case of my more general prediction that most of the extra computer power we’re given will go to waste. I expect that, as with the stupendous speed of the underlying hardware, parallelism will be something that is available if you ask for it explicitly, but ordinarily not used. This implies that the kind of parallelism we have in a hundred years will not, except in special applications, be massive parallelism. I expect for ordinary programmers it will be more like being able to fork off processes that all end up running in parallel.
And this will, like asking for specific implementations of data structures, be something that you do fairly late in the life of a program, when you try to optimize it. Version 1s will ordinarily ignore any advantages to be got from parallel computation, just as they will ignore advantages to be got from specific representations of data.
Except in special kinds of applications, parallelism won’t pervade the programs that are written in a hundred years. It would be premature optimization if it did.
How many programming languages will there be in a hundred years? There seem to be a huge number of new programming languages lately. Part of the reason is that faster hardware has allowed programmers to make different tradeoffs between speed and convenience, depending on the application. If this is a real trend, the hardware we’ll have in a hundred years should only increase it.
And yet there may be only a few widely-used languages in a hundred years. Part of the reason I say this is optimism: it seems that, if you did a really good job, you could make a language that was ideal for writing a slow version 1, and yet with the right optimization advice to the compiler, would also yield very fast code when necessary. So, since I’m optimistic, I’m going to predict that despite the huge gap they’ll have between acceptable and maximal efficiency, programmers in a hundred years will have languages that can span most of it.
As this gap widens, profilers will become increasingly important. Little attention is paid to profiling now. Many people still seem to believe that the way to get fast applications is to write compilers that generate fast code. As the gap between acceptable and maximal performance widens, it will become increasingly clear that the way to get fast applications is to have a good guide from one to the other.
When I say there may only be a few languages, I’m not including domain-specific “little languages”. I think such embedded languages are a great idea, and I expect them to proliferate. But I expect them to be written as thin enough skins that users can see the general-purpose language underneath.
Who will design the languages of the future? One of the most exciting trends in the last ten years has been the rise of open-source languages like Perl, Python, and Ruby. Language design is being taken over by hackers. The results so far are messy, but encouraging. There are some stunningly novel ideas in Perl, for example. Many are stunningly bad, but that’s always true of ambitious efforts. At its current rate of mutation, God knows what Perl might evolve into in a hundred years.
It’s not true that those who can’t do, teach (some of the best hackers I know are professors), but it is true that there are a lot of things that those who teach can’t do. Research imposes constraining caste restrictions. In any academic field there are topics that are ok to work on and others that aren’t. Unfortunately the distinction between acceptable and forbidden topics is usually based on how intellectual the work sounds when described in research papers, rather than how important it is for getting good results. The extreme case is probably literature; people studying literature rarely say anything that would be of the slightest use to those producing it.
Though the situation is better in the sciences, the overlap between the kind of work you’re allowed to do and the kind of work that yields good languages is distressingly small. (Olin Shivers has grumbled eloquently about this.) For example, types seem to be an inexhaustible source of research papers, despite the fact that static typing seems to preclude true macros— without which, in my opinion, no language is worth using.
The trend is not merely toward languages being developed as open-source projects rather than “research”, but toward languages being designed by the application programmers who need to use them, rather than by compiler writers. This seems a good trend and I expect it to continue. Unlike physics in a hundred years, which is almost necessarily impossible to predict, I think it may be possible in principle to design a language now that would appeal to users in a hundred years.
One way to design a language is to just write down the program you’d like to be able to write, regardless of whether there is a compiler that can translate it or hardware that can run it. When you do this you can assume unlimited resources. It seems like we ought to be able to imagine unlimited resources as well today as in a hundred years.
What program would one like to write? Whatever is least work. Except not quite: whatever would be least work if your ideas about programming weren’t already influenced by the languages you’re currently used to. Such influence can be so pervasive that it takes a great effort to overcome it. You’d think it would be obvious to creatures as lazy as us how to express a program with the least effort. In fact, our ideas about what’s possible tend to be so limited by whatever language we think in that easier formulations of programs seem very surprising. They’re something you have to discover, not something you naturally sink into.
One helpful trick here is to use the length of the program as an approximation for how much work it is to write. Not the length in characters, of course, but the length in distinct syntactic elements— basically, the size of the parse tree. It may not be quite true that the shortest program is the least work to write, but it’s close enough that you’re better off aiming for the solid target of brevity than the fuzzy, nearby one of least work. Then the algorithm for language design becomes: look at a program and ask, is there any way to write this that’s shorter?
In practice, writing programs in an imaginary hundred-year language will work to varying degrees depending on how close you are to the core. Sort routines you can write now. But it would be hard to predict now what kinds of libraries might be needed in a hundred years. Presumably many libraries will be for domains that don’t even exist yet. If SETI@home works, for example, we’ll need libraries for communicating with aliens. Unless of course they are sufficiently advanced that they already communicate in XML.
At the other extreme, I think you might be able to design the core language today. In fact, some might argue that it was already mostly designed in 1958.
If the hundred year language were available today, would we want to program in it? One way to answer this question is to look back. If present-day programming languages had been available in 1960, would anyone have wanted to use them?
In some ways, the answer is no. Languages today assume infrastructure that didn’t exist in 1960. For example, a language in which indentation is significant, like Python, would not work very well on printer terminals. But putting such problems aside— assuming, for example, that programs were all just written on paper— would programmers of the 1960s have liked writing programs in the languages we use now?
I think so. Some of the less imaginative ones, who had artifacts of early languages built into their ideas of what a program was, might have had trouble. (How can you manipulate data without doing pointer arithmetic? How can you implement flow charts without gotos?) But I think the smartest programmers would have had no trouble making the most of present-day languages, if they’d had them.
If we had the hundred-year language now, it would at least make a great pseudocode. What about using it to write software? Since the hundred-year language will need to generate fast code for some applications, presumably it could generate code efficient enough to run acceptably well on our hardware. We might have to give more optimization advice than users in a hundred years, but it still might be a net win.
Now we have two ideas that, if you combine them, suggest interesting possibilities: (1) the hundred-year language could, in principle, be designed today, and (2) such a language, if it existed, might be good to program in today. When you see these ideas laid out like that, it’s hard not to think, why not try writing the hundred-year language now?
When you’re working on language design, I think it is good to have such a target and to keep it consciously in mind. When you learn to drive, one of the principles they teach you is to align the car not by lining up the hood with the stripes painted on the road, but by aiming at some point in the distance. Even if all you care about is what happens in the next ten feet, this is the right answer. I think we can and should do the same thing with programming languages.
Notes
I believe Lisp Machine Lisp was the first language to embody the principle that declarations (except those of dynamic variables) were merely optimization advice, and would not change the meaning of a correct program. Common Lisp seems to have been the first to state this explicitly.
Thanks to Trevor Blackwell, Robert Morris, and Dan Giffin for reading drafts of this, and to Guido van Rossum, Jeremy Hylton, and the rest of the Python crew for inviting me to speak at PyCon.
You’ll find this essay and 14 others in Hackers & Painters.