数学阅读中的"编译错误"及其解决方法
数学阅读中的”编译错误”及其解决方法
计算机以过于字面化地解释语言而闻名;在原本完美的软件代码中,一个放错位置的括号就可能导致计算机在代码编译过程中因完全无法理解而停止。
人类在阅读自然语言时往往在这方面要稳健得多;一旦某人熟练掌握,比如英语,通常能够处理文本中合理数量的拼写或语法错误,特别是当写作风格清晰有条理,且文本主题对读者来说很熟悉时。
然而,当研究生第一次遇到阅读技术性数学论文的任务时,往往会失去大部分高级阅读技能,转而采用更正式且乏味的逐行解读方式。因此,论文中的一个排版错误或未定义术语可能导致对论文的理解完全停滞,就像计算机一样。在许多情况下,这种”编译错误”可以通过继续阅读论文来解决。
在某些情况下,只需阅读接下来的一两行就能对刚引入的神秘术语或逻辑中未解释的步骤提供很多启示。在其他情况下,需要阅读更远的内容;例如,如果引理15的结论难以理解,可以继续阅读该引理证明的结尾(其中大概会得出该结论),或者搜索到,比如命题23,其中引用了引理15,以获得关于引理15试图表达什么的更多线索。(在这方面,PDF阅读器等工具中的搜索功能特别有用。)
同样要记住,没有作者是绝对正确的,在某些情况下,无法理解的最简单解释是文本中存在排版错误。例如,假设一篇论文声称”由于A为真,B为真”,但当某人仔细推敲时,无法从A推导出B,而只能得出一个稍有不同的结论B’。稍后在文本中,论文声称”由于B为真,则C为真”,但某人再次难以从B推导出C。这里,最可能的诊断是作者实际上想在这两处都写B’而不是B。
类似地,如果论文包含一个你不太理解但选择忽略以便继续阅读的隐晦评论,然后两行后你发现了一个结论的推导,但你看不出这是先前陈述的结果,那么你应该回到那个隐晦评论并非常仔细地解析它,因为它很可能是达到所述结论所需的缺失假设或技术的描述。
有时需要寻找关键词的缺失,而不是它们的存在。例如,假设论文中先断言了陈述A,紧接着是陈述B。你理解A是如何推导出来的,但你看不出如何使用A来推导B。但是否有诸如”因此”、“所以”或”结果”之类的关键词实际上表明A被用来推导B?如果没有,那么这里可能发生的情况是B是从A以外的其他来源推导出来的,带着这种想法重新阅读A和B附近或紧接在前的文本可能会揭示B是如何建立的。
另一个有用的技巧是通过将注意力限制在更简单的特殊情况,或采用某种启发式方法使论文的某些技术部分变得平凡(或至少使论文的某些步骤对读者来说足够合理,以至于愿意跳过这些步骤的证明细节),来将论文”投影”为更简单、更短的论文。例如:
-
如果论文处理的是通用维度中的结果,可以首先将论文特化到一维(即使这意味着主要结果不再是新的,而是先前文献的推论)。
-
或者:如果论文需要分析表达式中的主项和误差项,可以采用所有误差项都可忽略的启发式方法,只关注主项(或者对偶地,可以接受主项总是会计算出正确答案,只关注控制误差项)。
-
如果某人知道主要结果的一个近似反例,将论文特化到该近似反例(或试图成为真正反例的该近似反例的假设扰动)通常非常有启发性。
理想情况下,应该投影掉论文中大约一半的困难,留下一个简单两倍的论文,因此大概更容易理解;一旦完成这一点,可以撤销投影,回到原始论文,现在原始论文已经被理解了一半,并且比理解投影论文之前要容易理解得多。(阅读论文的难度通常随着论文复杂度的增加而超线性增长,因此将论文分解为两个子论文,每个具有一半的复杂度,通常是一种高效的方法。)
最后,也许最重要的是,当某人能够以某种方式”进入作者的头脑”,并感受到作者试图通过论文中的每个陈述或引理做什么,而不是仅仅关注文本中的字面陈述时,阅读会变得容易得多。一个好的作者会在数学文本中穿插旨在做到这一点的评论,但即使没有这种明确的线索,通过将其与其他论文中的类似组成部分进行比较,或通过查看这样的组成部分在论文其余部分的使用方式,通常可以感受到论文每个组成部分的目的。在极端情况下,可能不得不去一个大黑板前,绘制论文的所有逻辑依赖关系图(例如,如果引理6和引理8被用来证明定理10,可以相应地绘制带有这些名称的方框之间的箭头),以获得对论文中关键步骤的一些理解。
关于如何证明论文中一个看起来特别可怕的步骤的进一步原则,请参阅我在MathOverflow上的这个回答。
关于在写作论文的对偶问题中的类似技巧,请参阅这个页面。
On “compilation errors” in mathematical reading, and how to resolve them
Computers are notorious for interpreting language in an overly literal fashion; a single misplaced parenthesis in an otherwise flawless piece of software code can cause a computer to halt in utter incomprehension halfway through the compilation of that code.
Humans, when reading natural language, tend to be far more robust at this; once one is fluent in, say, English, one can usually deal with a reasonable number of spelling or grammatical errors in a text, particularly when the writing style is clear and organised, and the themes of the text are familiar to the reader.
However, when, as a graduate student, one encounters the task of reading a technical mathematical paper for the first time, it is often the case that one loses much of one’s higher reading skills, reverting instead to a more formal and tedious line-by-line interpretation of the text. As a consequence, a single typo or undefined term in the paper can cause one’s comprehension of the paper to grind to a complete halt, in much the same way that it would to a computer. In many cases, such “compilation errors” can be resolved simply by reading ahead in the paper.
In some cases, just reading the next one or two lines can shed a lot of light on the mysterious term that was just introduced, or the unexplained step in the logic. In other cases, one has to read a fair bit further ahead; if, for instance, the conclusion of Lemma 15 was difficult to understand, one can read ahead to the end of the proof of that Lemma (in which, presumably, the conclusion is obtained), or search ahead to, say, Proposition 23, in which Lemma 15 is invoked, to get more clues as to what Lemma 15 is trying to say. (The use of search functions in, say, a PDF reader, is particularly useful in this regard.)
It is also good to keep in mind that no author is infallible, and that in some cases, the simplest explanation for incomprehension is that there is a typo in the text. For instance, suppose a paper states that “Since A is true, B is true”, but when one works things out, one cannot quite deduce B from A, but instead can only achieve a slightly different conclusion B’. A bit later on in the text, the paper states that “Since B is true, then C is true”, but again one has difficulty deducing C from B. Here, the most likely diagnosis is that the author actually meant to write B’ instead of B in both places.
In a similar spirit, if the paper contains a cryptic comment which you didn’t quite understand, but chose to ignore in order to move on, and then two lines later you find a deduction of a conclusion which you don’t see to be a consequence of the previous statements, then one should go back to the cryptic comment and parse it very carefully, as it is likely to be a description of the missing hypothesis or technique needed to reach the stated conclusion.
Sometimes one has to look for the absence of key words, rather than their presence. Suppose for instance the statement A is asserted in a paper, followed shortly by a statement B. You understand how A is deduced, but you see no way to use A to derive B. But were there key words such as “thus”, “therefore”, or “consequently” that actually indicated that A was to be used to derive B? If not, then what is likely happening here is that B is being derived from some other source than A, and a rereading of the text near or immediately preceding A and B with this in mind may then reveal how B is to be established.
Another useful trick is to “project” the paper down to a simpler and shorter paper by restricting attention to a simpler special case, or by adopting some heuristic that allows one to trivialise some technical portions of the paper (or at least make some steps of the paper plausible enough to the reader that one is willing to skip over the details of proof for those steps). For instance:
-
If the paper is dealing with a result in general dimension, one might first specialise the paper to one dimension (even if this means that the main results are no longer new, but consequences of previous literature).
-
Or: if the paper has to analyse both the main term in an expression as well as error terms, one can adopt the heuristic that all error terms are negligible and only focus on the main term (or dually, one can accept that the main term is always going to compute out to the correct answer, and only focus on controlling error terms).
-
If one is aware of a near-counterexample to the main result, specialising the paper to that near-counterexample (or to a hypothetical perturbation of that near-counterexample that is trying to be a genuine counterexample) is often quite instructive.
Ideally, one should project away roughly half of the difficulties of the paper, leaving behind a paper which is twice as simple, and thus presumably much easier to understand; once this is done, one can undo the projection, and return to the original paper, which is now already half understood, and again much easier to understand than before one understood the projected paper. (The difficulty of reading a paper usually increases in a super-linear fashion with the complexity of the paper, so factoring the paper into two sub-papers, each with half the complexity, is often an efficient way to proceed.)
Finally, and perhaps most importantly, reading becomes much easier when one can somehow “get into the author’s head”, and get a sense of what the author is trying to do with each statement or lemma in the paper, rather than focusing purely on the literal statements in the text. A good author will interleave the mathematical text with commentary that is designed to do exactly this, but even without such explicit clues, one can often get a sense of the purpose of each component of the paper by comparing it with similar components in other papers, or by seeing how such a component is used in the rest of the paper. In extreme cases, one may have to go to a large blackboard and diagram all the logical dependencies of a paper (e.g. if Lemma 6 and Lemma 8 are used to prove Theorem 10, one can draw arrows between boxes bearing these names accordingly) to get some sense of what the key steps in the paper are.
For some further principles on how to justify a particularly fearsome looking step in a paper, see this MathOverflow answer of mine.
For an analogous technique in the dual problem of writing a paper, see this page.