The ideal translator is a person “on whom nothing is lost,” said Henry James. Or maybe its a machine. But a machine wont stop you from swearing at nuns...

Years ago, on a flight from Amsterdam to Boston, two American nuns seated to my right listened to a voluble1 young Dutchman who was out to discover the United States. He asked the nuns where they were from. Alas, Framingham, Massachusetts was not on his itinerary, but, he noted, he had“shitloads of time and would be visiting shitloads of other places”.2

The jovial young Dutchman had apparently gathered that“shitloads” was a colourful synonym for the bland “lots”.3 He had mastered the syntax of English and a rather extensive vocabulary but lacked experience of the appropriateness of words to social contexts.4

This memory sprang to mind with the recent news that the Google Translate engine would move from a phrase-based system to a neural network. Both methods rely on training the machine with a “corpus”5 consisting of sentence pairs: an original and a translation. The computer then generates rules for inferring, based on the sequence6 of words in the original text, the most likely sequence of words from the target language.

The procedure is an exercise in pattern matching. Similar pattern-matching algorithms are used to interpret the syllables you utter when you ask your smartphone to “navigate to Brookline” or when a photo app tags your friends face.7 The machine doesnt “understand” faces or destinations; it reduces them to vectors8 of numbers, and processes them.

I am a professional translator, having translated some 125 books from the French. One might therefore expect me to bristle9 at Googles claim that its new translation engine is almost as good as a human translator, scoring 5.0 on a scale of 0 to 6, whereas humans average 5.1. But Im also a PhD in mathematics who has developed software that “reads” European newspapers in four languages and categorises the results by topic. So, rather than be defensive about the possibility of being replaced by a machine translator, I am aware of the remarkable feats of which machines are capable, and full of admiration for the technical complexity and virtuosity of Googles work.10

My admiration does not blind me to the shortcomings of machine translation, however. Think of the young Dutch traveler who knew “shitloads” of English. The young mans fluency demonstrated that his “wetware”—a living neural network, if you will—had been trained well enough to intuit the subtle rules (and exceptions) that make language natural.11 Computer languages, on the other hand, have context-free grammars. The young Dutchman, however, lacked the social experience with English to grasp the subtler rules that shape the native speakers diction, tone and structure. The native speaker might also choose to break those rules to achieve certain effects. If I were to say “shitloads of places”rather than “lots of places” to a pair of nuns, I would mean something by it. The Dutchman blundered into inadvertent comedy.12

Googles translation engine is “trained” on corpora ranging from news sources to Wikipedia. The bare description of each corpus is the only indication of the context from which it arises. From such scanty13 information it would be difficult to infer the appropriateness or inappropriateness of a word such as “shitloads”. If translating into French, the machine might predict a good match to beaucoup or plusieurs. This would render the meaning of the utterance but not the comedy,14 which depends on the socially marked“shitloads” in contrast to the neutral plusieurs. No matter how sophisticated the algorithm, it must rely on the information provided, and clues as to context, in particular social context, are devilishly15 hard to convey in code.

The problem, as with all previous attempts to create artificial intelligence (AI)16 going back to my student days at MIT, is that intelligence is incredibly complex. To be intelligent is not merely to be capable of inferring logically from rules or statistically from regularities. Before that, one has to know which rules are applicable, an art requiring awareness of sensitivity to situation. Programmers are very clever, but they are not yet clever enough to anticipate the vast variety of contexts from which meaning emerges. Hence even the best algorithms will miss things—and as Henry James put it, the ideal translator must be a person “on whom nothing is lost”.

This is not to say that mechanical translation is not useful. Much translation work is routine. At times, machines can do an adequate job. Dont expect miracles, however, or felicitous literary translations, or aptly rendered political zingers.17 Overconfident claims have dogged18 AI research from its earliest days. I dont say this out of fear for my job: Ive retired from translating and am devoting part of my time nowadays to…writing code.







1. voluble: 健谈的。

2. itinerary: 旅行计划,预定行程;shitload: 许多,大量。

3. jovial: 热情友好的,天性快活的;synonym: 同义词,近义词;bland:平和的,温和的。

4. syntax: 语法,句法;appropriateness:合适,得体。

5. corpus: 语料库。

6. sequence: 顺序,先后次序。

7. algorithm: 算法;syllable: 音节;navigate: 导航。

8. vector: 向量。

9. bristle: 显得愤怒。

10. feat: 业绩,功绩;virtuosity: 精湛技巧。

11. wetware: 湿件,计算机专用术语,指软件、硬件以外的其他“件”,即人脑、大脑神经系统;intuit: 凭直觉知道。

12. blunder: 跌跌撞撞,出漏子;inadvertent: 无意的,非故意的。

13. scanty: 不足的,勉强够的。

14. render:(用不同的语言)表达,翻译;utterance: 表达,表述。

15. devilishly: 非常,极其。

16. artificial intelligence (AI): 人工智能。

17. felicitous: 恰当的,贴切的;aptly: 适当地;zinger: 妙语,幽默的话。

18. dog: 作动词,意为紧随。

