不要过度优化
不要过度优化
我们应该忘记那些小的效率,大约 97% 的时间是这样:过早的优化是万恶之源。
唐纳德·克努斯,《文学编程》,转述托尼·霍尔
在我所有关于如何写论文的其他建议之后,我应该补充一个平衡性的说明:存在一种过于完美主义的危险,即试图让论文的每个部分都尽可能”最优”。在论文的所有”简单”改进都完成后,人们会遇到收益递减规律,任何进一步的改进要么需要大量的时间和精力,要么需要在论文的其他品质上做出一些权衡。
例如,假设一个人有一个足以证明当前论文主要定理的可用引理。然后可以尝试通过使假设更弱、结论更强来”优化”这个引理,但这可能会以延长引理证明为代价,并模糊引理与论文其余部分的契合方式。相反方向,也可以通过用一个更弱(但更容易证明)的陈述来替换同一个引理,该陈述仍然勉强足以证明主要定理,但现在不适合在任何后续应用中使用。因此,当试图在一个方向或另一个方向上改进引理时,会遇到权衡。(在这种情况下,解决这种权衡的一种方法是陈述并证明引理的一种表述,然后添加关于另一种表述的备注,即陈述强版本并备注我们只使用特殊情况,或者陈述弱版本并备注可能存在更强版本。)
精心优化结果和符号,希望这将有助于该领域未来的研究人员,这有点冒险;后来的作者可能会引入新的见解或新工具,使这些精心优化的结果过时。只有当您已经知道后续论文(可能是您正在撰写的论文的续篇)确实会严重依赖这些结果和符号时,或者当当前论文显然将在很长一段时间内成为该主题的权威论文时,这才真正有利可图。
如果您还没有为论文写过快速原型,那么优化引理实际上可能是完全浪费时间,因为在写作过程的后期您可能会发现,为了处理原始论证中未预见到的故障,或者为了改进论文的整体组织,引理无论如何都需要修改。
我有时看到作者试图以牺牲所有其他属性为代价来优化论文的长度,错误地认为简洁等同于简单。虽然较短的论文可能比较长的论文更简单,但这通常只有在论文的简短是自然实现而不是人为实现的情况下才成立。如果通过删除所有示例、备注、空白、动机和讨论,或者通过删除”冗余”英语短语并纯粹依赖数学缩写(例如用 ∀ 代替”对于所有”等)和各种不合语法的缩略语来实现简洁,那么这通常是一个糟糕的权衡;有点讽刺的是,过度压缩的论文可能被读者认为比更长、更温和、更从容地处理相同材料的论文更难阅读。(另请参阅”提供适量的细节”。)
另一方面,优化论文的可读性总是一件好事(除非以严谨性或准确性为代价),为此付出的努力会受到读者的赞赏。
Don’t overoptimise
We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil.
Donald Knuth, “Literate programming”, paraphrasing Tony Hoare
After all my other advice on how to write papers, I should add one counterbalancing note: there is a danger in being too perfectionist, and in trying to make every part of a paper as “optimal” as possible. After all the “easy” improvements have been made to a paper, one encounters a law of diminishing returns, in which any further improvements either require large amounts of time and effort, or else require some tradeoffs in other qualities of the paper.
For instance, suppose one has a serviceable lemma that suffices for the task of proving the main theorems of the paper at hand. One can then try to “optimise” this lemma by making the hypotheses weaker and the conclusion stronger, but this can come at the cost of lengthening the proof of the lemma, and obscuring exactly how the lemma fits in with the rest of the paper. In the reverse direction, one could also “optimise” the same lemma by replacing it with a weaker (but easier to prove) statement which still barely suffices to prove the main theorem, but is now unsuitable for use in any later application. Thus one encounters a tradeoff when one tries to improve the lemma in one direction or another. (In this case, one resolution to this tradeoff is to have one formulation of the lemma stated and proved, and then add a remark about the other formulation, i.e. state the strong version and remark that we only use a special case, or state the weak version and remark that stronger versions are possible.)
Carefully optimising results and notations in the hope that this will help future researchers in the field is a little risky; later authors may introduce new insights or new tools which render these painstakingly optimised results obsolete. The only time when this is really profitable is when you already know of a subsequent paper (perhaps a sequel to the one you are already writing) which will indeed rely heavily on these results and notations, or when the current paper is clearly going to be the definitive paper in the subject for a long while.
If you haven’t already written a rapid prototype for your paper, then optimising a lemma may in fact be a complete waste of time, because you may find later on in the writing process that the lemma will need to be modified anyway to deal with an unforeseen glitch in the original argument, or to improve the overall organisation of the paper.
I have sometimes seen authors try to optimise the length of the paper at the expense of all other attributes, in the mistaken belief that brevity is equivalent to simplicity. While it can be that shorter papers are simpler than longer ones, this is generally only true if the shortness of the paper was achieved naturally rather than artificially. If brevity was attained by removing all examples, remarks, whitespace, motivation, and discussion, or by striking out “redundant” English phrases and relying purely on mathematical abbreviations (e.g. ∀ instead of “For all”, etc.) and various ungrammatical contractions, then this is generally a poor tradeoff; somewhat ironically, a paper which has been overcompressed may be viewed by readers as being more difficult to read than a longer, gentler, and more leisurely treatment of the same material. (See also “Give appropriate amounts of detail”.)
On the other hand, optimising the readability of the paper is always a good thing (except when it is at the expense of rigour or accuracy), and the effort put into doing so is appreciated by readers.