In his novel Timequake, Kurt Vonnegut tells of an architect named Frank who encounters a software program named Palladio. The program promises to enable anyone, regardless of training, to design any kind of architectural structure, in any kind of style, simply by specifying a few basic project parameters. Frank doubts that the program could really replicate the skills and knowledge he has gained and honed over many years, so he decides to put it to the test. He tells Palladio to design a three-story parking garage in the style of Thomas Jefferson’s Monticello. To his amazement, the program doesn’t refuse or crash. Instead, it takes him through menu after menu of project parameters, explaining how local codes would alter this or that aspect of the structure. At the end, the program produces detailed building plans and cost estimates, and it even offers to generate alternative plans in the style of Michael Graves or I M Pei. In typical Vonnegut style, Frank is so shocked and filled with dispair that he immediate goes home and shoots himself.
I was reminded of this scene in Vonnegut’s novel after reading an article about the company Narrative Science. They have produced a software program that can automatically write news stories, in human-like prose, about sporting events and routine financial reports. They are now branching out into other genres, like in-house managerial reports, restaurant guides, and summaries of gaming tournaments. Last year they generated 400,000 such stories, all without a single human journalist.
Well, not quite. Like all software programs, their program has to be trained, not only about the rules of a particular domain, but also how to write appropriate-sounding prose for the target audience. The former is done by statisticians and programmers, but the latter requires seasoned journalists, who provide templates and style guides. Theoretically, however, once those journalists train the program to sound like them, the program could generate millions of stories all on its own.
So far, this program has been used to generate stories about minor sporting events and routine financial reports that normally would not garner the attention of a real reporter. For example, parents can capture play-by-play data about their son’s little league baseball game, and submit that to Narrative Science. In a few minutes, the program can analyze the data and generate a story that highlights pivotal moments in the game as well as the final outcome, all written in that flamboyant style of a veteran sports reporter. By looking at the earlier games in the same or previous season, the program can also comment on how the team or individual players performed relative to other games and similar match-ups.
Similarly, most corporate earnings reports go unnoticed by journalists, but this program can quickly analyze the various numbers, compare them with other firms in the same industry, and generate a story for stock holders and other interested parties that highlights important changes in the company’s performance.
Narrative Science is proud of the fact that their program has not yet put any journalists out of work, and they believe that it will be used primarily to generate stories that would normally never have been written in the first place. But when asked how long they think it will take before one of their computer-generated stories would win a Pulitzer Prize, their CTO guessed that it would be within five years.
I’m a bit dubious about that last prediction, but I do find their system very interesting. Narrative Science has essentially picked the low-hanging fruit of professional writing: those routine, boring, and generally formulaic stories that might as well be written by a computer. In some senses, their program is similar to a simple machine tool that is able to construct some particular kind of part over and over again, but in another sense, they have gone far beyond that. By combining data mining techniques with prose generation, they have created a system that can not only find new insights in large datasets, but also communicate those with a wide audience in a style that the audience will recognize and trust.
But before we start worrying about whether their program will soon put all journalists out of work, we need to realize that this kind of program only works in data-rich domains, and the kinds of insights it can generate are limited to the quantity and quality of the data it receives. It can generate insights from complex data sets that a human might not notice, but it can’t really understand those irrational and mirky depths of human emotions, motivations, and desires. I have a hard time, for example, seeing how it could cover a complex public policy debate, or ask tough questions about how a certain dataset was collected, and whether it might be skewed or biased in some way.
Kurt Vonnegut’s first novel, Player Piano, was written in 1952 after he saw an early machine tool quickly make a turbine part that used to require a skilled machinist much longer to accomplish. In the novel, he imagined a dystopian future where blue-collar workers had nothing left to do, and the entire society was run by managerial technocrats. We now know that things didn’t quite turn out this way (see David Noble’s classic book Forces of Production). Similarly, I don’t think that newsroom management will ever be able to replace human reporters entirely. No doubt, some of the more routine and formulaic reporting will become automated, but the more idiosyncratic stories will still requite a reporter that understands the human condition.