书呆子的复仇
书呆子的复仇
想要创业?获得Y Combinator的投资。
2002年5月
“我们当时在追逐C++程序员。我们成功地将他们中的许多人拖到了Lisp的路上。”
- Guy Steele,Java规范合著者
在软件行业中,一直存在着一场持续的斗争,一方是尖脑袋的学者,另一方是同样强大的力量——尖头发的老板。大家都知道尖头发的老板是谁,对吧?我认为技术界的大多数人不仅认识这个卡通人物,而且知道他们公司中这个人物的原型是谁。
尖头发的老板奇迹般地结合了两个各自常见但很少同时出现的品质:(a)他对技术一无所知,(b)他对技术有非常强烈的看法。
假设,例如,你需要编写一个软件。尖头发的老板对这个软件应该如何工作一无所知,也无法区分一种编程语言和另一种,但他知道你应该用什么语言来编写它。完全正确。他认为你应该用Java来编写。
他为什么这么认为?让我们来看看尖头发的老板的大脑里面在想什么。他在想的大概是这样的。Java是一个标准。我知道它一定是标准,因为我总是在媒体上看到它。既然它是标准,我使用它就不会惹上麻烦。这也意味着总会有很多Java程序员,所以如果现在为我工作的程序员辞职了——为我工作的程序员总是神秘地辞职——我可以很容易地替换他们。
嗯,这听起来并不那么不合理。但这一切都基于一个未言明的假设,而这个假设结果是错误的。尖头发的老板认为所有编程语言都差不多是等价的。如果这是真的,那他完全正确。如果所有语言都等价,当然,使用其他人都在使用的语言。
但并非所有语言都是等价的,我想我甚至不需要深入它们之间的差异就能向你证明这一点。如果你在1992年问尖头发的老板软件应该用什么语言编写,他会像今天一样毫不犹豫地回答。软件应该用C++编写。但如果所有语言都等价,为什么尖头发的老板的意见会改变?事实上,Java的开发者为什么要费心创建一种新语言?
据推测,如果你创建一种新语言,那是因为你认为它在某些方面比人们已有的语言更好。事实上,Gosling在第一份Java白皮书中明确表示,Java被设计用来解决C++的一些问题。所以事情就是这样:语言并不都是等价的。如果你顺着尖头发的老板的大脑思路追踪到Java,然后再回溯Java的历史到它的起源,你最终会得到一个与你开始时的假设相矛盾的想法。
那么,谁是对的?James Gosling,还是尖头发的老板?毫不奇怪,Gosling是对的。某些语言对于某些问题来说比其他语言更好。你知道,这引发了一些有趣的问题。Java被设计为在特定问题上比C++更好。什么问题?什么时候Java更好,什么时候C++更好?是否存在某些情况下其他语言比两者都更好?
一旦你开始考虑这个问题,你就打开了一个真正的潘多拉盒子。如果尖头发的老板不得不以全部复杂性来思考这个问题,他的大脑会爆炸。只要他认为所有语言都等价,他所要做的就是选择一个似乎最有势头的语言,既然这更多是时尚问题而不是技术问题,即使他也可能得到正确的答案。但如果语言各不相同,他突然必须同时解两个方程,试图在他一无所知的两件事之间找到最佳平衡:解决他需要解决的问题的二十种左右主要语言的相对适用性,以及为每种语言找到程序员、库等的可能性。如果门后是这样的东西,尖头发的老板不想打开它也就不足为奇了。
认为所有编程语言都等价的缺点是这不是真的。但优点是它让你的生活简单得多。我认为这是这个想法如此广泛传播的主要原因。这是一个令人舒适的想法。
我们知道Java一定相当不错,因为它是酷的、新的编程语言。或者真是如此吗?如果你从远处看编程语言的世界,看起来Java是最新的事物。(从足够远的地方看,你只能看到Sun支付的大型闪烁广告牌。)但如果你近距离观察这个世界,你会发现酷的程度是不同的。在黑客亚文化中,有一种叫做Perl的语言被认为比Java酷得多。例如,Slashdot就是用Perl生成的。我想你不会发现那些家伙使用Java Server Pages。但还有另一种更新的语言叫做Python,它的用户倾向于看不起Perl,还有更多的语言在等待。
如果你按顺序看这些语言,Java、Perl、Python,你会注意到一个有趣的模式。至少,如果你是Lisp黑客,你会注意到这种模式。每一种都比前一种更像Lisp。Python甚至复制了许多Lisp黑客认为是错误的特性。你可以将简单的Lisp程序逐行翻译成Python。现在是2002年,编程语言几乎赶上了1958年。
追上数学
我的意思是,Lisp是John McCarthy在1958年首次发现的,而流行的编程语言现在才赶上他当时发展的想法。
现在,这怎么可能是真的?计算机技术不是变化很快的东西吗?我的意思是,在1958年,计算机是冰箱大小的庞然大物,处理能力相当于手表。怎么可能有那么古老的技术仍然相关,更不用说比最新的发展更优越?
我来告诉你原因。这是因为Lisp并不是真正被设计为一种编程语言,至少不是我们今天意义上的编程语言。我们所说的编程语言是我们用来告诉计算机做什么的东西。McCarthy最终确实打算发展这种意义上的编程语言,但我们实际得到的Lisp是基于他作为理论练习所做的另一件事——努力定义一个比图灵机更方便的替代方案。正如McCarthy后来所说,
展示Lisp比图灵机更整洁的另一种方法是编写一个通用的Lisp函数,并证明它比通用图灵机的描述更简洁、更易理解。这就是Lisp函数eval…它计算Lisp表达式的值…编写eval需要发明一种表示Lisp函数作为Lisp数据的符号,这种符号是为了论文的目的而设计的,并没有想到它会用来实际表达Lisp程序。
接下来发生的事情是,在1958年末的某个时候,Steve Russell,McCarthy的一个研究生,看着这个eval的定义,意识到如果将它翻译成机器语言,结果将是一个Lisp解释器。
这在当时是一个很大的惊喜。以下是McCarthy后来在采访中对此的评论:
Steve Russell说,看,为什么我不来编程这个eval…,我对他说,呵,呵,你把理论和实践混淆了,这个eval是为了阅读,不是为了计算的。但他还是继续做了。也就是说,他把我论文中的eval编译成[IBM] 704机器码,修复了一些错误,然后将其宣传为Lisp解释器,它确实是。在那时,Lisp基本上就有了今天的形式。
所以,我想在几周内,McCarthy发现他的理论练习转变为实际的编程语言——而且是一种比他预期更强大的语言。
所以,这个1950年代的语言没有过时的简短解释是它不是技术而是数学,而数学不会过时。将Lisp比较的正确对象不是1950年代的硬件,而是,比如说,快速排序算法,它发现于1960年,至今仍然是最快的通用排序算法。
1950年代还有另一种语言幸存下来,Fortran,它代表了语言设计的相反方法。Lisp是一段意想不到地转变为编程语言的理论。Fortran是故意被开发为一种编程语言的,但我们现在认为是一种非常低级的语言。
1956年开发的Fortran I语言与现在的Fortran是完全不同的动物。Fortran I基本上是带有数学的汇编语言。在某些方面它比最近的汇编语言功能更弱;例如,没有子程序,只有分支。现在的Fortran可以说更接近Lisp而不是Fortran I。
Lisp和Fortran是两个独立进化树的树干,一个根植于数学,一个根植于机器架构。这两棵树从那时起一直在收敛。Lisp开始时很强大,在接下来的二十年里变得快速。所谓的主流语言开始时很快,在接下来的四十年里逐渐变得更强大,直到现在它们中最先进的相当接近Lisp。接近,但仍然缺少一些东西。
使Lisp不同的东西
当Lisp首次被开发时,它体现了九个新想法。其中一些我们现在认为是理所当然的,其他只在更先进的语言中看到,有两个仍然是Lisp独有的。这九个想法按主流采用的顺序排列,
-
条件语句。条件语句是if-then-else结构。我们现在认为这是理所当然的,但Fortran I没有它们。它只有基于底层机器指令的条件goto。
-
函数类型。在Lisp中,函数是一种像整数或字符串一样的数据类型。它们有字面表示,可以存储在变量中,可以作为参数传递,等等。
-
递归。Lisp是第一种支持递归的编程语言。
-
动态类型。在Lisp中,所有变量实际上都是指针。值才有类型,变量没有,赋值或绑定变量意味着复制指针,而不是它们指向的内容。
-
垃圾回收。
-
由表达式组成的程序。Lisp程序是表达式的树,每个表达式都返回一个值。这与Fortran和大多数后续语言形成对比,它们区分表达式和语句。
在Fortran I中很自然要有这种区别,因为你不能嵌套语句。所以虽然你需要表达式来进行数学运算,但没有必要让其他任何东西返回值,因为不可能有任何东西在等待它。
随着块结构语言的出现,这种限制消失了,但那时已经太晚了。表达式和语句的区别已经根深蒂固。它从Fortran传播到Algol,然后传播到它们两者的后代。
-
符号类型。符号实际上是指向存储在哈希表中的字符串的指针。所以你可以通过比较指针来测试相等性,而不是比较每个字符。
-
使用符号和常量树的代码表示法。
-
整个语言一直存在。读取时、编译时和运行时之间没有真正的区别。你可以在读取时编译或运行代码,在编译时读取或运行代码,在运行时读取或编译代码。
在读取时运行代码允许用户重新编程Lisp的语法;在编译时运行代码是宏的基础;在运行时编译是Lisp在Emacs等程序中用作扩展语言的基础;在运行时读取使程序能够使用s表达式进行通信,这是一个最近重新发明的想法,称为XML。
当Lisp首次出现时,这些想法与普通的编程实践相距甚远,这在很大程度上是由1950年代后期可用的硬件决定的。随着时间的推移,默认语言,体现在一系列流行语言中,逐渐向Lisp发展。想法1-5现在已经广泛传播。数字6开始出现在主流中。Python有7的形式,尽管似乎没有任何语法。
至于数字8,这可能是最有趣的。想法8和9只是偶然成为Lisp的一部分,因为Steve Russell实现了McCarthy从未打算实现的东西。然而,这些想法结果要对Lisp的奇怪外观和其最独特的特征负责。Lisp看起来奇怪不是因为它有奇怪的语法,而是因为它没有语法;你直接在解析树中表达程序,当其他语言被解析时,这些解析树是在幕后构建的,而这些树是由列表组成的,列表是Lisp数据结构。
用其自身的数据结构表达语言结果证明是一个非常强大的特性。想法8和9一起意味着你可以编写编写程序的程序。这可能听起来像是一个奇怪的想法,但在Lisp中这是日常的事情。最常见的方法是使用叫做宏的东西。
“宏”这个术语在Lisp中的含义与其他语言中的含义不同。Lisp宏可以是从缩写到新语言编译器的任何东西。如果你想真正理解Lisp,或者只是扩展你的编程视野,我会学习更多关于宏的知识。
据我所知,Lisp意义上的宏仍然是Lisp独有的。这部分是因为为了拥有宏,你可能必须使你的语言看起来像Lisp一样奇怪。也可能是因为如果你确实添加了这最后的力量增量,你不能再声称你发明了一种新语言,只是一种新的Lisp方言。
我提到这主要是作为一个笑话,但这是相当真实的。如果你定义一种具有car、cdr、cons、quote、cond、atom、eq和函数列表表示法的语言,那么你可以从中构建出所有其余的Lisp。这实际上是Lisp的定义品质:McCarthy给予Lisp现在的形状就是为了使这一点成为可能。
语言重要的地方
所以假设Lisp确实代表了主流语言渐近接近的一种极限——这是否意味着你实际上应该使用它来编写软件?使用不太强大的语言你会失去多少?有时候,不处于创新的最前沿不是更明智吗?受欢迎度在某种程度上不是其自身的证明吗?例如,尖头发的老板想要使用一种他可以轻松雇到程序员的语言,难道不对吗?
当然,有些项目编程语言的选择并不重要。通常,应用程序要求越高,使用强大语言获得的优势就越大。但很多项目根本不要求高。大多数编程可能包括编写小的粘合程序,对于小的粘合程序,你可以使用任何你已经熟悉的语言,并且对于你需要做的事情有好的库。如果你只需要将数据从一个Windows应用程序馈送到另一个,当然,使用Visual Basic。
你也可以在Lisp中编写小的粘合程序(我把它用作桌面计算器),但像Lisp这样的语言最大的胜利是在光谱的另一端,在那里你需要编写复杂的程序来解决在激烈竞争中难以解决的问题。一个好的例子是ITA Software授权给Orbitz的机票票价搜索程序。这些人进入了一个已经由两个大型、根深蒂固的竞争对手Travelocity和Expedia主导的市场,并且似乎在技术上羞辱了他们。
ITA应用程序的核心是一个20万行的Common Lisp程序,它搜索比竞争对手多几个数量级的可能性,而竞争对手显然仍在使用大型机时代的编程技术。(尽管ITA在某种意义上也在使用大型机时代的编程语言。)我从未见过ITA的任何代码,但根据他们的一位顶级黑客的说法,他们使用了很多宏,我对此并不感到惊讶。
向心力
我不是说使用不常见的技术没有成本。尖头发的老板担心这一点并不是完全错误的。但因为他不理解风险,他倾向于夸大它们。
我能想到使用不常见语言可能出现的三个问题。你的程序可能无法很好地用其他语言编写的程序一起工作。你可用的库可能更少。你可能很难雇到程序员。
这些问题各自有多大?第一个的重要性取决于你是否控制整个系统。如果你编写的软件必须运行在远程用户的机器上,运行在有缺陷的、封闭的操作系统之上(我不点名),用与操作系统相同的语言编写应用程序可能有优势。但如果你控制整个系统并拥有所有部分的源代码,就像ITA可能做的那样,你可以使用任何你想要的语言。如果出现任何不兼容性,你可以自己修复。
在基于服务器的应用程序中,你可以逃脱使用最先进的技术,我认为这是Jonathan Erickson称之为”编程语言复兴”的主要原因。这就是为什么我们甚至听说像Perl和Python这样的新语言。我们听说这些语言不是因为人们用它们来编写Windows应用程序,而是因为人们在服务器上使用它们。随着软件从桌面转移到服务器(一个即使微软似乎也认命的未来),使用中间技术的压力将越来越小。
至于库,它们的重要性也取决于应用程序。对于要求不高的问题,库的可用性可能超过语言的内在力量。盈亏平衡点在哪里?很难确切地说,但无论在哪里,都短于你可能称之为应用程序的任何东西。如果一家公司认为自己在软件行业,他们正在编写将成为其产品之一的应用程序,那么它可能涉及几个黑客,至少需要六个月时间来编写。在这样规模的项目中,强大的语言可能开始超过预存在库的便利性。
尖头发的老板第三个担忧,雇用程序员的困难,我认为是一个红鲱鱼。毕竟,你需要雇用多少黑客?现在我肯定我们都知道,软件最好由不到十人的团队开发。对于任何人听说过的任何语言,你在这个规模上雇用黑客应该不会有麻烦。如果你找不到十个Lisp黑客,那么你的公司可能基于错误的城市来开发软件。
事实上,选择更强大的语言可能会减少你需要的团队规模,因为(a)如果你使用更强大的语言,你可能不需要那么多黑客,(b)在更先进语言中工作的黑客可能更聪明。
我不是说你不会受到很多压力去使用被认为是”标准”的技术。在Viaweb(现在的Yahoo Store),我们使用Lisp引起了一些风险投资人和潜在收购者的惊讶。但我们使用通用Intel盒子作为服务器而不是像Sun这样的”工业强度”服务器,使用当时鲜为人知的开源Unix变体FreeBSD而不是真正的商业操作系统如Windows NT,忽略了一个叫做SET的所谓电子商务标准,现在没有人记得它,等等,我们也引起了惊讶。
你不能让西装为你做技术决定。我们使用Lisp让一些潜在收购者感到担忧吗?有些,轻微,但如果我们没有使用Lisp,我们就无法编写出让他们想收购我们的软件。对他们来说似乎异常的东西实际上是因果关系。
如果你创办创业公司,不要为了取悦风险投资人或潜在收购者而设计你的产品。为了取悦用户而设计你的产品。如果你赢得了用户,其他一切都会随之而来。如果你没有,没有人会在乎你的技术选择是多么令人欣慰地正统。
成为普通人的代价
使用不太强大的语言你会失去多少?实际上有一些关于这方面的数据。
衡量力量的最方便指标可能是代码大小。高级语言的意义是给你更大的抽象——更大的砖块,可以说,所以你不需要那么多来建造给定大小的墙。所以语言越强大,程序越短(当然不是简单地按字符数,而是按不同的元素)。
更强大的语言如何使你能够编写更短的程序?如果语言允许,你可以使用的一种技术叫做自底向上编程。你不是简单地在基础语言中编写你的应用程序,而是在基础语言之上构建一种用于编写像你这样的程序的语言,然后用它编写你的程序。组合的代码可以比你在基础语言中编写整个程序短得多——实际上,这就是大多数压缩算法的工作原理。自底向上的程序也应该更容易修改,因为在许多情况下,语言层根本不需要改变。
代码大小很重要,因为编写程序所需的时间主要取决于其长度。如果你的程序在另一种语言中会长三倍,它将花费三倍的时间来编写——而且你不能通过雇用更多的人来解决这个问题,因为超过一定规模新雇员实际上是净损失。Fred Brooks在他著名的书《人月神话》中描述了这种现象,我所看到的一切都倾向于证实他说的话。
那么,如果你用Lisp编写程序,它们会短多少?例如,我听到的大多数关于Lisp对C的数字都在7-10倍左右。但最近New Architect杂志上一篇关于ITA的文章说”一行Lisp可以替代20行C”,由于这篇文章充满了ITA总裁的引用,我假设他们是从ITA得到这个数字的。如果是这样,那么我们可以对此有一些信心;ITA的软件也包括很多C和C++以及Lisp,所以他们是从经验说话的。
我的猜测是这些倍数甚至不是恒定的。我认为当你面对更难的问题时,以及当你有更聪明的程序员时,它们会增加。一个真正优秀的黑客可以从更好的工具中挤出更多。
无论如何,作为曲线上的一个数据点,如果你要与ITA竞争并选择用C编写你的软件,他们将能够比你快二十倍地开发软件。如果你在一个新功能上花费一年,他们将能够在不到三周内复制它。而如果他们只花三个月开发新东西,你将在五年后才有它。
你知道什么?那是最好的情况。当你谈论代码大小比率时,你隐含地假设你实际上可以在较弱的语言中编写程序。但事实上,程序员的能力是有限制的。如果你试图用太低级的语言解决一个难题,你会达到一个点,即一次要记住的东西太多。
所以当我说ITA的假想竞争对手需要五年时间复制ITA可以在Lisp中三个月内写出的东西时,我的意思是如果不出错需要五年。事实上,在大多数公司中事情运作的方式,任何需要五年的开发项目很可能根本无法完成。
我承认这是一个极端情况。ITA的黑客似乎异常聪明,而C是相当低级的语言。但在竞争市场中,即使是二比一或三比一的差异也足以保证你永远落后。
一个配方
这是尖头发的老板甚至不想考虑的那种可能性。所以他们大多数人不会。因为,你知道,归根结底,尖头发的老板不介意他的公司被踢屁股,只要没有人能证明是他的错。对他个人来说,最安全的计划是紧贴群体的中心。
在大型组织内部,用来描述这种方法的短语是”行业最佳实践”。其目的是保护尖头发的老板免于责任:如果他选择的是”行业最佳实践”的东西,公司亏损,他不能被责怪。他没有选择,行业选择了。
我相信这个术语最初是用来描述会计方法等的。它的意思大致是,不要做任何奇怪的事情。在会计中,这可能是个好主意。“尖端”和”会计”这两个词听起来不太合适。但当你将这个标准导入技术决策时,你开始得到错误的答案。
技术应该是尖端的。在编程语言中,正如Erann Gat所指出的,“行业最佳实践”实际给你的不是最好的,而只是普通的。当一个决定导致你以更积极竞争对手的一小部分速度开发软件时,“最佳实践”是用词不当。
所以我们有两条我认为非常有价值的信息。事实上,我从自己的经验中知道这一点。数字1,语言的力量各不相同。数字2,大多数经理故意忽略这一点。这两者之间,这两个事实字面上就是一个赚钱的配方。ITA是这个配方在行动中的一个例子。如果你想在软件业务中获胜,只接受你能找到的最难的问题,使用你能获得的最强大的语言,等待你的竞争对手的尖头发的老板回归普通。
附录:力量
作为我对编程语言相对力量意思的说明,考虑以下问题。我们想编写一个生成累加器的函数——一个接受数字n并返回另一个函数的函数,该函数接受另一个数字i并返回n增加i。
(那是增加,不是加。累加器必须累加。)
在Common Lisp中这将是
(defun foo (n)
(lambda (i) (incf n i)))
在Perl 5中,
sub foo {
my ($n) = @_;
sub {$n += shift}
}
这比Lisp版本有更多的元素,因为在Perl中你必须手动提取参数。
在Smalltalk中,代码比Lisp稍长
foo: n
|s|
s := n.
^[:i| s := s+i. ]
因为虽然一般来说词法变量工作,但你不能对参数进行赋值,所以你必须创建一个新变量s。
在Javascript中,示例再次稍长,因为Javascript保留了语句和表达式之间的区别,所以你需要显式的return语句来返回值:
function foo(n) {
return function (i) {
return n += i
}
}
(公平地说,Perl也保留了这种区别,但以典型的Perl方式通过让你省略returns来处理它。)
如果你尝试将Lisp/Perl/Smalltalk/Javascript代码翻译成Python,你会遇到一些限制。因为Python不完全支持词法变量,你必须创建一个数据结构来保存n的值。虽然Python确实有函数数据类型,但没有字面表示(除非体只是单个表达式),所以你需要创建一个命名函数来返回。你最终得到的是:
def foo(n):
s = [n]
def bar(i):
s[0] += i
return s[0]
return bar
Python用户可能会合理地问为什么他们不能只写
def foo(n):
return lambda i: return n += i
或者甚至
def foo(n):
lambda i: n += i
我的猜测是他们可能有一天会。(但如果他们不想等待Python进化到Lisp的其余部分,他们总是可以…)
在OO语言中,你可以在有限程度上模拟闭包(一个引用在封闭作用域中定义的变量的函数),通过定义一个类,该类有一个方法和一个字段来替换封闭作用域中的每个变量。这使得程序员做一些在有完全词法作用域支持的语言中由编译器做的代码分析工作,如果多个函数引用同一个变量,它就不起作用,但在像这样的简单情况下足够了。
Python专家似乎同意这是在Python中解决问题的首选方法,写作
def foo(n):
class acc:
def __init__(self, s):
self.s = s
def inc(self, i):
self.s += i
return self.s
return acc(n).inc
或者
class foo:
def __init__(self, n):
self.n = n
def __call__(self, i):
self.n += i
return self.n
我包括这些是因为我不希望Python倡导者说我歪曲了语言,但两者在我看来都比第一个版本更复杂。你在做同样的事情,设置一个单独的地方来保存累加器;它只是对象中的一个字段而不是列表的头。而这些特殊的保留字段名,特别是__call__,似乎有点像hack。
在Perl和Python的竞争中,Python黑客的主张似乎是Python是Perl的一个更优雅的替代品,但这个案例表明的是力量是终极的优雅:Perl程序更简单(元素更少),即使语法有点丑。
其他语言呢?本次讲座中提到的其他语言——Fortran、C、C++、Java和Visual Basic——不清楚是否真的可以解决这个问题。Ken Anderson说以下代码是Java中最接近的:
public interface Inttoint {
public int call(int i);
}
public static Inttoint foo(final int n) {
return new Inttoint() {
int s = n;
public int call(int i) {
s = s + i;
return s;
}
};
}
这不符合规范,因为它只适用于整数。在与Java黑客多次电子邮件交流后,我想说编写一个像前面例子那样行为的多态版本介于非常困难和不可能之间。如果有人想写一个,我会很好奇看到它,但我个人已经超时了。
当然,在其他语言中不可能解决这个问题并不是字面上的真实情况。所有这些语言都是图灵等价的事实意味着,严格来说,你可以用任何一种语言编写任何程序。那么你会怎么做?在极限情况下,通过在较弱的语言中编写Lisp解释器。
这听起来像是一个笑话,但它在大型编程项目中以不同程度经常发生,以至于这种现象有一个名字,Greenspun第十规则:任何足够复杂的C或Fortran程序都包含Common Lisp一半的临时非正式指定的充满bug的缓慢实现。
如果你试图解决一个难题,问题不是你是否会使用足够强大的语言,而是你是否会(a)使用强大的语言,(b)为一种语言编写事实上的解释器,或(c)你自己成为一种语言的编译器。我们在Python示例中已经看到开始发生这种情况,我们实际上在模拟编译器为实现词法变量而生成的代码。
这种做法不仅常见,而且制度化。例如,在OO世界中,你听到很多关于”模式”的东西。我想知道这些模式有时是否不是案例(c)的证据,即人类编译器在工作。当我在程序中看到模式时,我认为这是一个麻烦的迹象。程序的形状应该只反映它需要解决的问题。代码中的任何其他规律性,对我来说至少,表明你使用的抽象不够强大——通常是你手工生成一些你需要编写的宏的扩展。
注释
[1] IBM 704 CPU大约有冰箱那么大,但重得多。CPU重3150磅,4K RAM在另一个重4000磅的盒子中。Sub-Zero 690,最大的家用冰箱之一,重656磅。
[2] Steve Russell也在1962年编写了第一个(数字)计算机游戏Spacewar。
[3] 如果你想欺骗尖头发的老板让你用Lisp编写软件,你可以尝试告诉他这是XML。
[4] 其他Lisp方言中的累加器生成器:
Scheme: (define (foo n) (lambda (i) (set! n (+ n i)) n))
Goo: (df foo (n) (op incf n _)))
Arc: (def foo (n) [++ n _])
[5] Erann Gat关于JPL”行业最佳实践”的悲伤故事启发我解决这个普遍误用的短语。
[6] Peter Norvig发现《设计模式》中23个模式中的16个在Lisp中”不可见或更简单”。
感谢许多回答了我关于各种语言的问题和/或阅读了本文草稿的人,包括Ken Anderson、Trevor Blackwell、Erann Gat、Dan Giffin、Sarah Harlin、Jeremy Hylton、Robert Morris、Peter Norvig、Guy Steele和Anton van Straaten。他们对表达的任何意见不承担任何责任。
相关:许多人回应了这个讲座,所以我建立了一个额外的页面来处理他们提出的问题:Re: Revenge of the Nerds。
它还在LL1邮件列表上引发了广泛且经常有用的讨论。特别参见Anton van Straaten关于语义压缩的邮件。
LL1上的一些邮件促使我尝试在Succinctness is Power中更深入地探讨语言力量主题。
累加器基准测试的规范实现集收集在它们自己的页面上。
日语翻译,西班牙语翻译,中文翻译
你会在《黑客与画家》中找到这篇文章和其他14篇。
Revenge of the Nerds
Want to start a startup? Get funded by Y Combinator.
May 2002
“We were after the C++ programmers. We managed to drag a lot of them about halfway to Lisp.”
- Guy Steele, co-author of the Java spec
In the software business there is an ongoing struggle between the pointy-headed academics, and another equally formidable force, the pointy-haired bosses. Everyone knows who the pointy-haired boss is, right? I think most people in the technology world not only recognize this cartoon character, but know the actual person in their company that he is modelled upon.
The pointy-haired boss miraculously combines two qualities that are common by themselves, but rarely seen together: (a) he knows nothing whatsoever about technology, and (b) he has very strong opinions about it.
Suppose, for example, you need to write a piece of software. The pointy-haired boss has no idea how this software has to work, and can’t tell one programming language from another, and yet he knows what language you should write it in. Exactly. He thinks you should write it in Java.
Why does he think this? Let’s take a look inside the brain of the pointy-haired boss. What he’s thinking is something like this. Java is a standard. I know it must be, because I read about it in the press all the time. Since it is a standard, I won’t get in trouble for using it. And that also means there will always be lots of Java programmers, so if the programmers working for me now quit, as programmers working for me mysteriously always do, I can easily replace them.
Well, this doesn’t sound that unreasonable. But it’s all based on one unspoken assumption, and that assumption turns out to be false. The pointy-haired boss believes that all programming languages are pretty much equivalent. If that were true, he would be right on target. If languages are all equivalent, sure, use whatever language everyone else is using.
But all languages are not equivalent, and I think I can prove this to you without even getting into the differences between them. If you asked the pointy-haired boss in 1992 what language software should be written in, he would have answered with as little hesitation as he does today. Software should be written in C++. But if languages are all equivalent, why should the pointy-haired boss’s opinion ever change? In fact, why should the developers of Java have even bothered to create a new language?
Presumably, if you create a new language, it’s because you think it’s better in some way than what people already had. And in fact, Gosling makes it clear in the first Java white paper that Java was designed to fix some problems with C++. So there you have it: languages are not all equivalent. If you follow the trail through the pointy-haired boss’s brain to Java and then back through Java’s history to its origins, you end up holding an idea that contradicts the assumption you started with.
So, who’s right? James Gosling, or the pointy-haired boss? Not surprisingly, Gosling is right. Some languages are better, for certain problems, than others. And you know, that raises some interesting questions. Java was designed to be better, for certain problems, than C++. What problems? When is Java better and when is C++? Are there situations where other languages are better than either of them?
Once you start considering this question, you have opened a real can of worms. If the pointy-haired boss had to think about the problem in its full complexity, it would make his brain explode. As long as he considers all languages equivalent, all he has to do is choose the one that seems to have the most momentum, and since that is more a question of fashion than technology, even he can probably get the right answer. But if languages vary, he suddenly has to solve two simultaneous equations, trying to find an optimal balance between two things he knows nothing about: the relative suitability of the twenty or so leading languages for the problem he needs to solve, and the odds of finding programmers, libraries, etc. for each. If that’s what’s on the other side of the door, it is no surprise that the pointy-haired boss doesn’t want to open it.
The disadvantage of believing that all programming languages are equivalent is that it’s not true. But the advantage is that it makes your life a lot simpler. And I think that’s the main reason the idea is so widespread. It is a comfortable idea.
We know that Java must be pretty good, because it is the cool, new programming language. Or is it? If you look at the world of programming languages from a distance, it looks like Java is the latest thing. (From far enough away, all you can see is the large, flashing billboard paid for by Sun.) But if you look at this world up close, you find that there are degrees of coolness. Within the hacker subculture, there is another language called Perl that is considered a lot cooler than Java. Slashdot, for example, is generated by Perl. I don’t think you would find those guys using Java Server Pages. But there is another, newer language, called Python, whose users tend to look down on Perl, and more waiting in the wings.
If you look at these languages in order, Java, Perl, Python, you notice an interesting pattern. At least, you notice this pattern if you are a Lisp hacker. Each one is progressively more like Lisp. Python copies even features that many Lisp hackers consider to be mistakes. You could translate simple Lisp programs into Python line for line. It’s 2002, and programming languages have almost caught up with 1958.
Catching Up with Math
What I mean is that Lisp was first discovered by John McCarthy in 1958, and popular programming languages are only now catching up with the ideas he developed then.
Now, how could that be true? Isn’t computer technology something that changes very rapidly? I mean, in 1958, computers were refrigerator-sized behemoths with the processing power of a wristwatch. How could any technology that old even be relevant, let alone superior to the latest developments?
I’ll tell you how. It’s because Lisp was not really designed to be a programming language, at least not in the sense we mean today. What we mean by a programming language is something we use to tell a computer what to do. McCarthy did eventually intend to develop a programming language in this sense, but the Lisp that we actually ended up with was based on something separate that he did as a theoretical exercise— an effort to define a more convenient alternative to the Turing Machine. As McCarthy said later,
Another way to show that Lisp was neater than Turing machines was to write a universal Lisp function and show that it is briefer and more comprehensible than the description of a universal Turing machine. This was the Lisp function eval…, which computes the value of a Lisp expression… Writing eval required inventing a notation representing Lisp functions as Lisp data, and such a notation was devised for the purposes of the paper with no thought that it would be used to express Lisp programs in practice.
What happened next was that, some time in late 1958, Steve Russell, one of McCarthy’s grad students, looked at this definition of eval and realized that if he translated it into machine language, the result would be a Lisp interpreter.
This was a big surprise at the time. Here is what McCarthy said about it later in an interview:
Steve Russell said, look, why don’t I program this eval…, and I said to him, ho, ho, you’re confusing theory with practice, this eval is intended for reading, not for computing. But he went ahead and did it. That is, he compiled the eval in my paper into [IBM] 704 machine code, fixing bugs, and then advertised this as a Lisp interpreter, which it certainly was. So at that point Lisp had essentially the form that it has today.
So, in a matter of weeks I think, McCarthy found his theoretical exercise transformed into an actual programming language— and a more powerful one than he had intended.
So the short explanation of why this 1950s language is not obsolete is that it was not technology but math, and math doesn’t get stale. The right thing to compare Lisp to is not 1950s hardware, but, say, the Quicksort algorithm, which was discovered in 1960 and is still the fastest general-purpose sort.
There is one other language still surviving from the 1950s, Fortran, and it represents the opposite approach to language design. Lisp was a piece of theory that unexpectedly got turned into a programming language. Fortran was developed intentionally as a programming language, but what we would now consider a very low-level one.
Fortran I, the language that was developed in 1956, was a very different animal from present-day Fortran. Fortran I was pretty much assembly language with math. In some ways it was less powerful than more recent assembly languages; there were no subroutines, for example, only branches. Present-day Fortran is now arguably closer to Lisp than to Fortran I.
Lisp and Fortran were the trunks of two separate evolutionary trees, one rooted in math and one rooted in machine architecture. These two trees have been converging ever since. Lisp started out powerful, and over the next twenty years got fast. So-called mainstream languages started out fast, and over the next forty years gradually got more powerful, until now the most advanced of them are fairly close to Lisp. Close, but they are still missing a few things.
What Made Lisp Different
When it was first developed, Lisp embodied nine new ideas. Some of these we now take for granted, others are only seen in more advanced languages, and two are still unique to Lisp. The nine ideas are, in order of their adoption by the mainstream,
-
Conditionals. A conditional is an if-then-else construct. We take these for granted now, but Fortran I didn’t have them. It had only a conditional goto closely based on the underlying machine instruction.
-
A function type. In Lisp, functions are a data type just like integers or strings. They have a literal representation, can be stored in variables, can be passed as arguments, and so on.
-
Recursion. Lisp was the first programming language to support it.
-
Dynamic typing. In Lisp, all variables are effectively pointers. Values are what have types, not variables, and assigning or binding variables means copying pointers, not what they point to.
-
Garbage-collection.
-
Programs composed of expressions. Lisp programs are trees of expressions, each of which returns a value. This is in contrast to Fortran and most succeeding languages, which distinguish between expressions and statements.
It was natural to have this distinction in Fortran I because you could not nest statements. And so while you needed expressions for math to work, there was no point in making anything else return a value, because there could not be anything waiting for it.
This limitation went away with the arrival of block-structured languages, but by then it was too late. The distinction between expressions and statements was entrenched. It spread from Fortran into Algol and then to both their descendants.
-
A symbol type. Symbols are effectively pointers to strings stored in a hash table. So you can test equality by comparing a pointer, instead of comparing each character.
-
A notation for code using trees of symbols and constants.
-
The whole language there all the time. There is no real distinction between read-time, compile-time, and runtime. You can compile or run code while reading, read or run code while compiling, and read or compile code at runtime.
Running code at read-time lets users reprogram Lisp’s syntax; running code at compile-time is the basis of macros; compiling at runtime is the basis of Lisp’s use as an extension language in programs like Emacs; and reading at runtime enables programs to communicate using s-expressions, an idea recently reinvented as XML.
When Lisp first appeared, these ideas were far removed from ordinary programming practice, which was dictated largely by the hardware available in the late 1950s. Over time, the default language, embodied in a succession of popular languages, has gradually evolved toward Lisp. Ideas 1-5 are now widespread. Number 6 is starting to appear in the mainstream. Python has a form of 7, though there doesn’t seem to be any syntax for it.
As for number 8, this may be the most interesting of the lot. Ideas 8 and 9 only became part of Lisp by accident, because Steve Russell implemented something McCarthy had never intended to be implemented. And yet these ideas turn out to be responsible for both Lisp’s strange appearance and its most distinctive features. Lisp looks strange not so much because it has a strange syntax as because it has no syntax; you express programs directly in the parse trees that get built behind the scenes when other languages are parsed, and these trees are made of lists, which are Lisp data structures.
Expressing the language in its own data structures turns out to be a very powerful feature. Ideas 8 and 9 together mean that you can write programs that write programs. That may sound like a bizarre idea, but it’s an everyday thing in Lisp. The most common way to do it is with something called a macro.
The term “macro” does not mean in Lisp what it means in other languages. A Lisp macro can be anything from an abbreviation to a compiler for a new language. If you want to really understand Lisp, or just expand your programming horizons, I would learn more about macros.
Macros (in the Lisp sense) are still, as far as I know, unique to Lisp. This is partly because in order to have macros you probably have to make your language look as strange as Lisp. It may also be because if you do add that final increment of power, you can no longer claim to have invented a new language, but only a new dialect of Lisp.
I mention this mostly as a joke, but it is quite true. If you define a language that has car, cdr, cons, quote, cond, atom, eq, and a notation for functions expressed as lists, then you can build all the rest of Lisp out of it. That is in fact the defining quality of Lisp: it was in order to make this so that McCarthy gave Lisp the shape it has.
Where Languages Matter
So suppose Lisp does represent a kind of limit that mainstream languages are approaching asymptotically— does that mean you should actually use it to write software? How much do you lose by using a less powerful language? Isn’t it wiser, sometimes, not to be at the very edge of innovation? And isn’t popularity to some extent its own justification? Isn’t the pointy-haired boss right, for example, to want to use a language for which he can easily hire programmers?
There are, of course, projects where the choice of programming language doesn’t matter much. As a rule, the more demanding the application, the more leverage you get from using a powerful language. But plenty of projects are not demanding at all. Most programming probably consists of writing little glue programs, and for little glue programs you can use any language that you’re already familiar with and that has good libraries for whatever you need to do. If you just need to feed data from one Windows app to another, sure, use Visual Basic.
You can write little glue programs in Lisp too (I use it as a desktop calculator), but the biggest win for languages like Lisp is at the other end of the spectrum, where you need to write sophisticated programs to solve hard problems in the face of fierce competition. A good example is the airline fare search program that ITA Software licenses to Orbitz. These guys entered a market already dominated by two big, entrenched competitors, Travelocity and Expedia, and seem to have just humiliated them technologically.
The core of ITA’s application is a 200,000 line Common Lisp program that searches many orders of magnitude more possibilities than their competitors, who apparently are still using mainframe-era programming techniques. (Though ITA is also in a sense using a mainframe-era programming language.) I have never seen any of ITA’s code, but according to one of their top hackers they use a lot of macros, and I am not surprised to hear it.
Centripetal Forces
I’m not saying there is no cost to using uncommon technologies. The pointy-haired boss is not completely mistaken to worry about this. But because he doesn’t understand the risks, he tends to magnify them.
I can think of three problems that could arise from using less common languages. Your programs might not work well with programs written in other languages. You might have fewer libraries at your disposal. And you might have trouble hiring programmers.
How much of a problem is each of these? The importance of the first varies depending on whether you have control over the whole system. If you’re writing software that has to run on a remote user’s machine on top of a buggy, closed operating system (I mention no names), there may be advantages to writing your application in the same language as the OS. But if you control the whole system and have the source code of all the parts, as ITA presumably does, you can use whatever languages you want. If any incompatibility arises, you can fix it yourself.
In server-based applications you can get away with using the most advanced technologies, and I think this is the main cause of what Jonathan Erickson calls the “programming language renaissance.” This is why we even hear about new languages like Perl and Python. We’re not hearing about these languages because people are using them to write Windows apps, but because people are using them on servers. And as software shifts off the desktop and onto servers (a future even Microsoft seems resigned to), there will be less and less pressure to use middle-of-the-road technologies.
As for libraries, their importance also depends on the application. For less demanding problems, the availability of libraries can outweigh the intrinsic power of the language. Where is the breakeven point? Hard to say exactly, but wherever it is, it is short of anything you’d be likely to call an application. If a company considers itself to be in the software business, and they’re writing an application that will be one of their products, then it will probably involve several hackers and take at least six months to write. In a project of that size, powerful languages probably start to outweigh the convenience of pre-existing libraries.
The third worry of the pointy-haired boss, the difficulty of hiring programmers, I think is a red herring. How many hackers do you need to hire, after all? Surely by now we all know that software is best developed by teams of less than ten people. And you shouldn’t have trouble hiring hackers on that scale for any language anyone has ever heard of. If you can’t find ten Lisp hackers, then your company is probably based in the wrong city for developing software.
In fact, choosing a more powerful language probably decreases the size of the team you need, because (a) if you use a more powerful language you probably won’t need as many hackers, and (b) hackers who work in more advanced languages are likely to be smarter.
I’m not saying that you won’t get a lot of pressure to use what are perceived as “standard” technologies. At Viaweb (now Yahoo Store), we raised some eyebrows among VCs and potential acquirers by using Lisp. But we also raised eyebrows by using generic Intel boxes as servers instead of “industrial strength” servers like Suns, for using a then-obscure open-source Unix variant called FreeBSD instead of a real commercial OS like Windows NT, for ignoring a supposed e-commerce standard called SET that no one now even remembers, and so on.
You can’t let the suits make technical decisions for you. Did it alarm some potential acquirers that we used Lisp? Some, slightly, but if we hadn’t used Lisp, we wouldn’t have been able to write the software that made them want to buy us. What seemed like an anomaly to them was in fact cause and effect.
If you start a startup, don’t design your product to please VCs or potential acquirers. Design your product to please the users. If you win the users, everything else will follow. And if you don’t, no one will care how comfortingly orthodox your technology choices were.
The Cost of Being Average
How much do you lose by using a less powerful language? There is actually some data out there about that.
The most convenient measure of power is probably code size. The point of high-level languages is to give you bigger abstractions— bigger bricks, as it were, so you don’t need as many to build a wall of a given size. So the more powerful the language, the shorter the program (not simply in characters, of course, but in distinct elements).
How does a more powerful language enable you to write shorter programs? One technique you can use, if the language will let you, is something called bottom-up programming. Instead of simply writing your application in the base language, you build on top of the base language a language for writing programs like yours, then write your program in it. The combined code can be much shorter than if you had written your whole program in the base language— indeed, this is how most compression algorithms work. A bottom-up program should be easier to modify as well, because in many cases the language layer won’t have to change at all.
Code size is important, because the time it takes to write a program depends mostly on its length. If your program would be three times as long in another language, it will take three times as long to write— and you can’t get around this by hiring more people, because beyond a certain size new hires are actually a net lose. Fred Brooks described this phenomenon in his famous book The Mythical Man-Month, and everything I’ve seen has tended to confirm what he said.
So how much shorter are your programs if you write them in Lisp? Most of the numbers I’ve heard for Lisp versus C, for example, have been around 7-10x. But a recent article about ITA in New Architect magazine said that “one line of Lisp can replace 20 lines of C,” and since this article was full of quotes from ITA’s president, I assume they got this number from ITA. If so then we can put some faith in it; ITA’s software includes a lot of C and C++ as well as Lisp, so they are speaking from experience.
My guess is that these multiples aren’t even constant. I think they increase when you face harder problems and also when you have smarter programmers. A really good hacker can squeeze more out of better tools.
As one data point on the curve, at any rate, if you were to compete with ITA and chose to write your software in C, they would be able to develop software twenty times faster than you. If you spent a year on a new feature, they’d be able to duplicate it in less than three weeks. Whereas if they spent just three months developing something new, it would be five years before you had it too.
And you know what? That’s the best-case scenario. When you talk about code-size ratios, you’re implicitly assuming that you can actually write the program in the weaker language. But in fact there are limits on what programmers can do. If you’re trying to solve a hard problem with a language that’s too low-level, you reach a point where there is just too much to keep in your head at once.
So when I say it would take ITA’s imaginary competitor five years to duplicate something ITA could write in Lisp in three months, I mean five years if nothing goes wrong. In fact, the way things work in most companies, any development project that would take five years is likely never to get finished at all.
I admit this is an extreme case. ITA’s hackers seem to be unusually smart, and C is a pretty low-level language. But in a competitive market, even a differential of two or three to one would be enough to guarantee that you’d always be behind.
A Recipe
This is the kind of possibility that the pointy-haired boss doesn’t even want to think about. And so most of them don’t. Because, you know, when it comes down to it, the pointy-haired boss doesn’t mind if his company gets their ass kicked, so long as no one can prove it’s his fault. The safest plan for him personally is to stick close to the center of the herd.
Within large organizations, the phrase used to describe this approach is “industry best practice.” Its purpose is to shield the pointy-haired boss from responsibility: if he chooses something that is “industry best practice,” and the company loses, he can’t be blamed. He didn’t choose, the industry did.
I believe this term was originally used to describe accounting methods and so on. What it means, roughly, is don’t do anything weird. And in accounting that’s probably a good idea. The terms “cutting-edge” and “accounting” do not sound good together. But when you import this criterion into decisions about technology, you start to get the wrong answers.
Technology often should be cutting-edge. In programming languages, as Erann Gat has pointed out, what “industry best practice” actually gets you is not the best, but merely the average. When a decision causes you to develop software at a fraction of the rate of more aggressive competitors, “best practice” is a misnomer.
So here we have two pieces of information that I think are very valuable. In fact, I know it from my own experience. Number 1, languages vary in power. Number 2, most managers deliberately ignore this. Between them, these two facts are literally a recipe for making money. ITA is an example of this recipe in action. If you want to win in a software business, just take on the hardest problem you can find, use the most powerful language you can get, and wait for your competitors’ pointy-haired bosses to revert to the mean.
Appendix: Power
As an illustration of what I mean about the relative power of programming languages, consider the following problem. We want to write a function that generates accumulators— a function that takes a number n, and returns a function that takes another number i and returns n incremented by i.
(That’s incremented by, not plus. An accumulator has to accumulate.)
In Common Lisp this would be
(defun foo (n)
(lambda (i) (incf n i)))
and in Perl 5,
sub foo {
my ($n) = @_;
sub {$n += shift}
}
which has more elements than the Lisp version because you have to extract parameters manually in Perl.
In Smalltalk the code is slightly longer than in Lisp
foo: n
|s|
s := n.
^[:i| s := s+i. ]
because although in general lexical variables work, you can’t do an assignment to a parameter, so you have to create a new variable s.
In Javascript the example is, again, slightly longer, because Javascript retains the distinction between statements and expressions, so you need explicit return statements to return values:
function foo(n) {
return function (i) {
return n += i
}
}
(To be fair, Perl also retains this distinction, but deals with it in typical Perl fashion by letting you omit returns.)
If you try to translate the Lisp/Perl/Smalltalk/Javascript code into Python you run into some limitations. Because Python doesn’t fully support lexical variables, you have to create a data structure to hold the value of n. And although Python does have a function data type, there is no literal representation for one (unless the body is only a single expression) so you need to create a named function to return. This is what you end up with:
def foo(n):
s = [n]
def bar(i):
s[0] += i
return s[0]
return bar
Python users might legitimately ask why they can’t just write
def foo(n):
return lambda i: return n += i
or even
def foo(n):
lambda i: n += i
and my guess is that they probably will, one day. (But if they don’t want to wait for Python to evolve the rest of the way into Lisp, they could always just…)
In OO languages, you can, to a limited extent, simulate a closure (a function that refers to variables defined in enclosing scopes) by defining a class with one method and a field to replace each variable from an enclosing scope. This makes the programmer do the kind of code analysis that would be done by the compiler in a language with full support for lexical scope, and it won’t work if more than one function refers to the same variable, but it is enough in simple cases like this.
Python experts seem to agree that this is the preferred way to solve the problem in Python, writing either
def foo(n):
class acc:
def __init__(self, s):
self.s = s
def inc(self, i):
self.s += i
return self.s
return acc(n).inc
or
class foo:
def __init__(self, n):
self.n = n
def __call__(self, i):
self.n += i
return self.n
I include these because I wouldn’t want Python advocates to say I was misrepresenting the language, but both seem to me more complex than the first version. You’re doing the same thing, setting up a separate place to hold the accumulator; it’s just a field in an object instead of the head of a list. And the use of these special, reserved field names, especially __call__, seems a bit of a hack.
In the rivalry between Perl and Python, the claim of the Python hackers seems to be that that Python is a more elegant alternative to Perl, but what this case shows is that power is the ultimate elegance: the Perl program is simpler (has fewer elements), even if the syntax is a bit uglier.
How about other languages? In the other languages mentioned in this talk— Fortran, C, C++, Java, and Visual Basic— it is not clear whether you can actually solve this problem. Ken Anderson says that the following code is about as close as you can get in Java:
public interface Inttoint {
public int call(int i);
}
public static Inttoint foo(final int n) {
return new Inttoint() {
int s = n;
public int call(int i) {
s = s + i;
return s;
}
};
}
This falls short of the spec because it only works for integers. After many email exchanges with Java hackers, I would say that writing a properly polymorphic version that behaves like the preceding examples is somewhere between damned awkward and impossible. If anyone wants to write one I’d be very curious to see it, but I personally have timed out.
It’s not literally true that you can’t solve this problem in other languages, of course. The fact that all these languages are Turing-equivalent means that, strictly speaking, you can write any program in any of them. So how would you do it? In the limit case, by writing a Lisp interpreter in the less powerful language.
That sounds like a joke, but it happens so often to varying degrees in large programming projects that there is a name for the phenomenon, Greenspun’s Tenth Rule: Any sufficiently complicated C or Fortran program contains an ad hoc informally-specified bug-ridden slow implementation of half of Common Lisp.
If you try to solve a hard problem, the question is not whether you will use a powerful enough language, but whether you will (a) use a powerful language, (b) write a de facto interpreter for one, or (c) yourself become a human compiler for one. We see this already begining to happen in the Python example, where we are in effect simulating the code that a compiler would generate to implement a lexical variable.
This practice is not only common, but institutionalized. For example, in the OO world you hear a good deal about “patterns”. I wonder if these patterns are not sometimes evidence of case (c), the human compiler, at work. When I see patterns in my programs, I consider it a sign of trouble. The shape of a program should reflect only the problem it needs to solve. Any other regularity in the code is a sign, to me at least, that I’m using abstractions that aren’t powerful enough— often that I’m generating by hand the expansions of some macro that I need to write.
Notes
[1] The IBM 704 CPU was about the size of a refrigerator, but a lot heavier. The CPU weighed 3150 pounds, and the 4K of RAM was in a separate box weighing another 4000 pounds. The Sub-Zero 690, one of the largest household refrigerators, weighs 656 pounds.
[2] Steve Russell also wrote the first (digital) computer game, Spacewar, in 1962.
[3] If you want to trick a pointy-haired boss into letting you write software in Lisp, you could try telling him it’s XML.
[4] Here is the accumulator generator in other Lisp dialects:
Scheme: (define (foo n) (lambda (i) (set! n (+ n i)) n))
Goo: (df foo (n) (op incf n _)))
Arc: (def foo (n) [++ n _])
[5] Erann Gat’s sad tale about “industry best practice” at JPL inspired me to address this generally misapplied phrase.
[6] Peter Norvig found that 16 of the 23 patterns in Design Patterns were “invisible or simpler” in Lisp.
Thanks to the many people who answered my questions about various languages and/or read drafts of this, including Ken Anderson, Trevor Blackwell, Erann Gat, Dan Giffin, Sarah Harlin, Jeremy Hylton, Robert Morris, Peter Norvig, Guy Steele, and Anton van Straaten. They bear no blame for any opinions expressed.
Related: Many people have responded to this talk, so I have set up an additional page to deal with the issues they have raised: Re: Revenge of the Nerds.
It also set off an extensive and often useful discussion on the LL1 mailing list. See particularly the mail by Anton van Straaten on semantic compression.
Some of the mail on LL1 led me to try to go deeper into the subject of language power in Succinctness is Power.
A larger set of canonical implementations of the accumulator generator benchmark are collected together on their own page.
Japanese Translation, Spanish Translation, Chinese Translation
You’ll find this essay and 14 others in Hackers & Painters.