The subtle science of attributing anonymous prose (The Economist September 22nd 2018) 匿名文章の作者特定
Every writer overuses a word or two. Johnson’s weakness is “fascinating”. Kate Fox’s pop-anthropology book “Watching the English” uses the word “liminal” 24 times in about 500 pages. (She deploys it to describe borderline spaces such as the pub, which exists between work and home.) “Liminal” accounts for just 0.00009% of all the words in English books published, as “Watching the English” was, in 2004. In a random work of 150,000 words, “liminal” should appear 0.14 times, so Ms Fox uses it at around 180 times the average rate. If you read an anonymous piece of writing that features “liminal”, you might think there is a good chance she wrote it.
Such analysis has become a popular game, since the recent publication of an anonymous op-ed by “a senior official in the Trump administration” in the New York Times. The article excoriated Mr Trump and portrayed an in-house “resistance” that thwarts his impulses. Pundits have striven to unmask the author － and one word has preoccupied them especially. The anonymous writer praised the late Senator John McCain as “a lodestar for restoring honour to public life. Armchair detectives pounced. “Lodestar” is only an eighth as common as “liminal”. Quick searches found that Mike Pence, the vice-president, has used this word in a number of speeches. Does that make him the mystery insider?
He denied it, of course. But there are better reasons to think this lodestar － originally, a guiding star － is leading in wrong direction. Experts in forensic linguistics don't rely on words like “lodestar” to determine authorship: rare events are bad at generating predictions.
Much more helpful are small words that appear more frequently, even if they are not particularly striking. You may notice that the perpetrator of a crime had red hair or was particularly tall. But lots of people fit both descriptions. The many ridges on your fingertips － ordinary, but in an arrangement unique to you － provide a far surer method of attribution.
Writings are not exactly like fingerprints; people produce many more than ten in a lifetime, and vary their style for many number of reasons － including attempting to disguise their authorship. (It is rumoured that when Trump staffers speak off the record to the press, they insert colleagues’ signature phrases to throw sleuths off the trail.) But the two are similar in that trivial features, in aggregate, provide a clue to their ownership.
Take the Federalist Papers, written pseudonymously by John Jay, Alexander Hamilton and James Madison to support the ratification of American constitution in the 1780s. Of 85 essays, 12 were later claimed by both Hamilton and Madison. Historians looked for reflections of their politics in the documents, but reached no consensus. Then in the 1960s two statisticians noticed that Hamilton, in his known writings, used “while”, never “whilst”. Madison was a “whilst” man. Madison wrote “on”, rarely “upon”; Hamilton used both. With these and a few other common words they created a statistical model, and tested it against a separate set of papers known to be written by one or the other. It worked perfectly. They then tested the disputed papers against their model － and all turned out to be Madison’s. What had eluded the historians was proved by the mathematicians.
Today, it is known that a piece of writing can supply hints of the probable sex of the author, along with their level of education, regional background and other qualities. Men go in for certain words? more than women － and not just stereotypical ones(“football”) but common ones like “a”,“this” and “these”. Ben Blatt’s recent book “Nabokov’s Favourite word is Mauve” is a delightful introduction to this? science of style.
Nailing the “lodestar” author this way is possible － but unlikely. There are not two candidates, but many, and probably not enough written evidence to develop linguistic fingerprints for some of them. Hamilton and Madison wrote a lot; today most politicians’ “writing” is actually done by aides. The guessing-game will remain just that.
Finding out that your lexical fingerprint is found in pronouns and prepositions may feel a bit like discovering that your genetic blueprint is just a series of four chemical bases. But the way in which humankind’s soulful nature arises from soulless components is itself a quiet miracle. If you hadn't used your quota already, you might even call it fascinating.