创建引理

Terence Tao 2007-05-14

Create lemmas

Nothing is particularly hard if you divide it into small jobs. (Henry Ford)

A typical argument in modern mathematics is often quite intricate, requiring many different steps, ingredients, and notation. The authors of the argument, having been intimately involved in all aspects of its construction, often do not realise just how complicated such an argument appears to a reader who is encountering it for the first time (I myself have been guilty of this oversight).

Part of the reason for this is that there is plenty of implicit structure in a paper which is crucial to understanding it properly, and which is known to the authors, but is not readily apparent to the readers. Suppose for instance that part of a paper goes like this:

…
In Section 2, facts A, B, and C are derived, and then used to deduce D.
In Section 3, D is used to derive E.
In Section 4, D, and another fact F, are used to derive G.
In Section 5, E, G, and another fact H, are used to derive I.
…

A reader who is going through this paper one section at a time will try to keep A, B, C, and D all in mind after finishing Section 2, and moving on to later sections. However, facts A, B, and C are never used again; they were instrumental to the argument because they allowed one to establish D, but once D is established, A, B, and C can be safely forgotten. Note, though that the reader does not know this. As a consequence, while reading Sections 3, 4, and 5, the reader has to set aside some of his or her mental resources to retain some facts which are of no further use, thus obscuring the structure of the argument and making it more difficult to follow.

Now one could address this by placing some remarks at the end of Section 2 along the lines of “Facts A, B, and C will not be used again in the rest of the paper”, or by devoting more thought to organising and motivating the paper. These are all worthwhile things to do, but a much more elegant solution is simply to encapsulate D as a lemma, and to place A, B, C inside the proof of that lemma. This conveys several useful structural cues to the reader: firstly, that D is likely to be an important fact to use in later parts of the argument, and secondly, that A, B, and C are not needed elsewhere in the paper and can be safely forgotten. This additional structure will be useful to all readers, but will be especially appreciated by those readers who are already expert in how to prove facts such as D, since they can then glance at the statement of the lemma, readily convince themselves that the lemma is plausible (possibly by using other tools than A, B, and C), and then quickly move on to the next part of the argument. One can also add some remarks “for the experts” after the proof of a lemma, discussing possible alternate proofs, refinements, special cases, connections to other lemmas in the literature, etc.

We have seen how “folding” the argument into lemmas can reduce the complexity of that argument by “localising” certain facts in the argument. The same method can also be used to localise notation, e.g. some special-purpose notation used to prove D but is not used elsewhere in the paper. The design philosophy here is similar to that of information hiding in software engineering. (Other relevant software engineering philosophies for mathematical writing include structured programming and modularity.)

Lemmas also provide a good opportunity to explicitly “recap” all the running hypotheses, assumptions, and notational conventions that have already been introduced in the argument, which can be invaluable to a reader who has misunderstood or forgotten about part of this implied context. (Sometimes, such a recap would be tediously long, in which case a sentence such as “Let the notation and assumptions be as above” may suffice. When these situations occur, one might wish to formalise all the running assumptions by judiciously introducing some good notation.)

In some cases, the conclusion of the lemma may only need a portion of these hypotheses; this might be worth stating explicitly within the statement of the lemma, as it can clarify the nature of that lemma, and may also make it more useful for future applications.

One should write the statement of a lemma in a way that makes it easy to use, rather than easy to prove. Thus, one should try to make the hypotheses of the lemma natural and easy to verify, and the conclusions of the lemma manifestly useful. Basically, the idea is to push as much of the details of the argument into the lemma as one can, to make the rest of the argument as simple as possible. Also, it may end up that you (or someone else) will eventually find a simpler proof of that lemma, thus reducing the net complexity of the paper markedly (cf. the object-oriented approach to software engineering).

Folding the argument into lemmas also makes it easier to write a rapid prototype, as once one finalises the statement of the lemma, one can defer the proof of the lemma until later.

In summary, it’s almost always a good idea to have plenty of lemmas (and propositions and corollaries too, of course) in an argument; it makes the overall structure of the argument more apparent, it makes the argument easier to follow, and can also provide some useful tools for future work in the area.

There is however one exception to the above rule. If you have a pair of technical lemmas, neither of which is of much interest in its own right, but only become useful when used together, then they should be unified into single lemma which will be easier for the reader to keep in mind. For instance, suppose your paper contains the following two lemmas:

Lemma 15. If Natural Hypothesis A is true, then Technical Statement B is true.

Lemma 16. If Technical Statement B is true, then Natural Conclusion C is true.

If Lemma 16 is the only place where B is used, and Lemma 15 is the only place where B is verified, then it can be better to unify these two lemmas as

Lemma 15′. If Natural Hypothesis A is true, then Natural Conclusion C is true.

Now, the technical statement B has been encapsulated into the proof of Lemma 15′ (which, of course, will be a concatenation of the proofs of Lemma 15 and Lemma 16), and no longer needs to make an appearance outside of this lemma. In some cases it may also be appropriate to insert something like

Sublemma 16′. Under the hypotheses of Lemma 15′, technical condition B is true.

within the proof of Lemma 15′.

This improves the paper for a number of related reasons:

The reader can safely forget about B after leaving the proof of Lemma 15′, thus freeing up mental space.
An expert reader might be able to find an alternate proof of Lemma 15′ that avoids B, or at least a good heuristic argument that convinces that reader that Lemma 15′ is true. Then that reader can skip the proof of Lemma 15′ completely, and not have to deal with B at all.
It may be that a followup paper may find a simpler proof of Lemma 15′ that avoids B, in which case that paper can simplify the original paper without having to make any modifications other than the reproof of Lemma 15′. (In some cases, the very act of reorganising the paper may prompt the reorganiser to find such a simplification by herself or himself.)

创建引理

如果你把工作分成小块，就没有什么特别困难的事情。 ——亨利·福特

现代数学中的典型论证通常相当复杂，需要许多不同的步骤、要素和符号。论证的作者由于密切参与了其构建的各个方面，往往没有意识到这样的论证对于初次接触的读者来说显得多么复杂（我自己也曾犯过这种疏忽）。

部分原因是论文中有很多隐式结构，这些结构对于正确理解论文至关重要，作者知道这些结构，但读者不容易察觉。例如，假设论文的某部分如下所示：

...
在第2节中，推导了事实A、B和C，然后用它们推导出D。
在第3节中，使用D推导出E。
在第4节中，使用D和另一个事实F推导出G。
在第5节中，使用E、G和另一个事实H推导出I。
...

逐节阅读这篇论文的读者在完成第2节后，会试图记住A、B、C和D，然后继续阅读后面的章节。然而，事实A、B和C再也没有被使用过；它们对论证很重要，因为它们使人们能够建立D，但一旦D被建立，A、B和C就可以安全地被遗忘。但请注意，读者并不知道这一点。因此，在阅读第3、4和5节时，读者必须分配一些脑力资源来保留一些不再有用的事实，从而模糊了论证的结构，使其更难理解。

现在可以通过在第2节末尾添加一些类似于”事实A、B和C在论文的其余部分将不再使用”的说明，或者通过更多地思考如何组织和激发论文来解决这个问题。这些都是值得做的事情，但一个更优雅的解决方案是简单地将D封装为一个引理，并将A、B、C放在该引理的证明中。这向读者传达了几个有用的结构提示：首先，D很可能是在论证的后续部分中使用的重要事实；其次，A、B和C在论文的其他地方不需要，可以安全地遗忘。这种额外的结构对所有读者都有用，但对于那些已经精通如何证明像D这样的事实的读者来说尤其受欢迎，因为他们可以瞥一眼引理的陈述，迅速说服自己该引理是合理的（可能使用A、B和C以外的其他工具），然后快速进入论证的下一部分。也可以在引理的证明之后添加一些”面向专家”的说明，讨论可能的替代证明、改进、特殊情况、与文献中其他引理的联系等。

我们已经看到如何通过将论证”折叠”成引理来”局部化”论证中的某些事实，从而降低论证的复杂性。同样的方法也可以用于局部化符号，例如，用于证明D但在论文其他地方不使用的某些特殊用途符号。这里的设计理念类似于软件工程中的 信息隐藏。（数学写作中其他相关的软件工程理念包括 结构化编程 和 模块化。）

引理也提供了一个很好的机会来明确”回顾”论证中已经引入的所有运行假设、前提和符号约定，这对于误解或忘记了部分隐含上下文的读者来说可能是无价的。（有时，这样的回顾会冗长乏味，在这种情况下，像”让符号和假设如上所述”这样的句子可能就足够了。当出现这些情况时，可以通过明智地引入一些 好的符号 来形式化所有运行假设。）

在某些情况下，引理的结论可能只需要这些假设的一部分；这可能值得在引理的陈述中明确说明，因为它可以澄清该引理的性质，并可能使其在未来的应用中更有用。

应该以使引理易于使用而不是易于证明的方式来写引理的陈述。因此，应该尽量使引理的假设自然且易于验证，使引理的结论明显有用。基本上，想法是将尽可能多的论证细节推入引理中，使论证的其余部分尽可能简单。此外，最终你（或其他人）可能会找到该引理的更简单证明，从而显著降低论文的总体复杂性（参见软件工程中的 面向对象方法 ）。

将论证折叠成引理也使得 编写快速原型 更容易，因为一旦确定了引理的陈述，就可以将引理的证明推迟到以后。

总之，在论证中拥有大量引理（当然还有命题和推论）几乎总是一个好主意；它使论证的整体结构更加明显，使论证更容易理解，并且还可以为该领域的未来工作提供一些有用的工具。

例外：何时合并引理

然而，上述规则有一个例外。如果你有一对技术引理，它们各自都没有太大意义，但只有一起使用时才变得有用，那么应该将它们合并成一个引理，这样读者更容易记住。例如，假设你的论文包含以下两个引理：

引理15. 如果自然假设A成立，则技术陈述B成立。

引理16. 如果技术陈述B成立，则自然结论C成立。

如果引理16是唯一使用B的地方，而引理15是唯一验证B的地方，那么最好将这两个引理合并为：

引理15′. 如果自然假设A成立，则自然结论C成立。

现在，技术陈述B已经被封装到引理15′的证明中（当然，这将是引理15和引理16证明的串联），并且不再需要在这个引理之外出现。在某些情况下，可能也适合在引理15′的证明中插入类似以下内容：

子引理16′. 在引理15′的假设下，技术条件B成立。

这出于几个相关的原因改进了论文：

读者在离开引理15′的证明后可以安全地忘记B，从而释放脑力空间。
专家读者可能能够找到避免B的引理15′的替代证明，或者至少是一个好的启发式论证来说服该读者引理15′是成立的。然后该读者可以完全跳过引理15′的证明，根本不需要处理B。
可能后续论文会找到避免B的引理15′的更简单证明，在这种情况下，该论文可以通过仅重新证明引理15′而不进行任何其他修改来简化原始论文。（在某些情况下，重新组织论文的行为本身可能会促使重新组织者自己找到这样的简化。）