Friday, June 19, 2009

Victoria Strauss -- The Most Published Author in the History of the Planet

You may never have heard of him, but with more than 200,000 nonfiction titles in his name (more than 100,000 of which are listed at Amazon), Philip M. Parker may be, in his own words, the most published author in the history of the planet.

How does he do it? According to a 2008 article in the New York Times, Parker, a professor of marketing at INSEAD business school in France and founder of ICON Group International, "has developed computer algorithms that collect publicly available information on a subject—broad or obscure—and, aided by his 60 to 70 computers and six or seven programmers, he turns the results into books in a range of genres, many of them in the range of 150 pages and printed only when a customer buys one." (This patented process is explained step-by-step in a YouTube video.)

Parker's books sell anywhere from a few hundred to a few dozen copies each. In addition to compiling books himself, he offers compilation applications to other businesses via EdgeMaven Media (whose website includes a fascinating FAQ), and has branched out into animation and video games. It all sounds quite lucrative--though in the Times article as well as an interesting Q&A at O'Reilly TOC, Parker dodges the question of income, claiming that his company makes no profit because it plows all revenue back into R&D.

Is Parker a Long Tail visionary or a one-man author mill? Are his thousands of computer-generated books an amusing and possibly useful curiosity, or the first, distant echo of the death knell for live individual authors? These are fascinating questions. Parker himself would seem to view his system as author replacement, at least in some areas of publishing. Per his patent application (quoted in The Guardian):

Parker quotes a 1999 complaint by the Economist that publishing "has continued essentially unchanged since Gutenberg. Letters are still written, books bound, newspapers printed and distributed much as they ever were."

"Therefore," says Parker, "there is a need for a method and apparatus for authoring, marketing, and/or distributing title materials automatically by a computer." He explains that "further, there is a need for an automated system that eliminates or substantially reduces the costs associated with human labour, such as authors, editors, graphic artists, data analysts, translators, distributors, and marketing personnel."


Parker hasn't eliminated the human element entirely. In his O'Reilly interview, he says that while 90% of his content is computer generated, writers, editors, and designers are are all "relied on heavily at many stages." And there would still appear to be some room in Parker's world for individual creative effort. From the EdgeMaven Media FAQ:

“Human creativity” in this sense is the absence of formulaic authorship techniques that can be reverse engineered. Some Ph.D. theses, and forms of poetry for that matter, are not that “creative”. Creative authors, therefore, need not fear being replaced by this process. The same is true for creative doctoral students, moviemakers, television producers or PC game makers.

Ah, but what's creative? Not romance novels, apparently. Per the New York Times article linked in above, Parker "is laying the groundwork for romance novels generated by new algorithms. 'I’ve already set it up,' he said. 'There are only so many body parts.'" (A reductive statement that, no doubt, will infuriate romance writers everywhere.) What's next? Computer-generated SF novels with stock aliens? Algorithm-created crime dramas with hard-boiled dialog swiped from the movies? Robo-poetry to populate a hundred Poetry.coms?

Apart from imponderable questions of creativity, Parker's system of content aggregation poses another dilemma: copyright. In his O'Reilly interview, Parker says he uses "the sources that are used by regular authors," i.e., information that is publicly available. However, "publicly available" does not necessarily mean "public domain." How does Parker ensure that the materials his algorithms stitch together are copyright-free? If they aren't, how does he ensure that his sources are properly cited?

Good question. As reported in the Sydney Morning Herald, several linguists recently challenged a number of Aboriginal-language thesauruses, dictionaries, and crossword puzzle books created by Parker's computers, alleging that the books violated copyright. The dispute is discussed at Language Log, a linguistics-focused blog (among other things, it's pointed out that the domain Parker uses for his dictionaries, websters-online-dictionary.org, is not connected with the Merriam-Webster dictionaries), and in a long post by one of the challengers, Peter Austin, who presents an argument for why, although "[i]t is not possible to copyright common knowledge such as words and meanings," Parker's use of material from Austin's 1993 dictionary of the Gamilaraay language constitutes copyright violation.

Parker has since removed the offending books from sale, saying "There was no malice and certainly no financial motive. That was the furthest thing from my mind."

26 comments:

freddie said...

Even if he cites sources and claims no malice or financial motive was intended, it's still copyright infringement, isn't it? It's my understanding he would have needed permission.

Jane Smith said...

Victoria, I might be missing something, but this whole post reads to me like an account of repeated and persistent copyright infringement and/or plagiarism.

Call me naive, but I'd rather write one original book than 200,000 "compiled" ones. I'm funny like that.

Victoria Strauss said...

I agree with you about the one original book, Jane, and I admit I find it the notion of automated book creation rather horrifying. Still, I don't feel I'm qualified to judge the infringement issues--especially since I've never actually seen on of Parker's books, and have no idea how he cites sources (which, at least according to the O'Reilly interview, he does seem to do in some circumstances). I think there is probably a very big gray area here. I'm sure that Parker has thoroughly studied the issue, and has sought expert legal advice. He'd be crazy not to.

What his run-in with the linguists suggests (to me, anyway) is the trouble that can arise from relying on broad assumptions about copyright--such as that there are no circumstances in which the contents of a dictionary can be intellectual property.

Lydia Nelson said...

So he believes that the problem with publishing is that it employs too many people? Who's going to buy products when everything becomes so automated hardly anyone has employment?

Court said...

Books generated by computer algorithms. Plagiarism and copyright issues aside, my initial honest reaction to this is:

How boring!

Anonymous said...

Perhaps Mr. Parker's next project should be to create computer AIs to read his books--I'm doubtful any human will care to.

Mad Scientist Matt said...

I've got to wonder about whether such an auto-generated book could be anything like a great read. I can see it working for basic reports, but romance novels? Guy seems a bit delusional.

Although I would love to see somebody manage to fool a vanity poetry anthology into printing an entire volume of robo-poetry.

Kathryn Magendie said...

Good lord! Well, yeah, like Jane wrote, I'll take my one book that's published and if I never publish another one, I'd rather that than have oodles of computer generated words - I just love words and manipulation of language too much -

Kathryn Magendie said...

I read the post below about people saying Rowlings copied their idea.

It's a collective consciousnees thing, I think...
for example:

I wrote a story about a 12 year old girl who was dead and telling her story of how she died, etc....I bought it to the writer's group I was in at the time (this was years ago)....one of the group said, "Have you read Lovely Bones...?" I said "no ..." he said, "go get it..."

I went to the shelf of the bookstore and pulled it off the shelf, read the first paragraph - OMG - our first paragraphs were almost identical! Not word for word, but so close! I'd never read the book, never seen it, didn't know about it -- and certainly the author didn't know a hill a beans about my little story....

I had to trash the story - it's still in my files...

Anonymous said...

Am I the only one who pictures his books reading like chapter 34 of Atlanta Nights?

Aimee K. Maher said...

This blog made my stomach churn. Ouch.

J. M. Strother said...

Yikes!

That is all.
~jon

A. Shelton said...

I wonder (copyright infringement aside), since he says he has programmers working for him, how he can claim that he's written the books all by himself.

Any *real* author would acknowledge the programmers' efforts, especially if they aren't billed as ghostwriters.

Then again, what else should we expect of a man who can't even sit down to type out a decent story himself, from his own imagination, under his own power, putting his own sweat and tears into a story, as any other real writer does? Until he does that, I won't consider him to be a writer at all, no matter how many compilations he has out.

So, as far as I'm concerned, he is not yet even *published*, much less the most-published author ever known.

Eirin said...

This is silly. From a consumer's point of view, books aren't for writing, they're for reading and, presumably, enjoying. Publishing is all about readers' selections.
This kind of from-stock-generated book-production has been tried before; whether it's done by ghostly human writers or from computer algorithms matters not at all.
Us BoBs (Buyers-of-Books *g*), we want good books for our money, and I seriously doubt interesting, thought-provoking material can be generated in this way. The very word 'generated' is an indicator here.

BuffySquirrel said...

It's Nineteen Eighty-Four!

Eh, as soon as someone starts talking about writing-by-algorithms, you just *know* the romance genre is going to be mentioned.

ALC said...

I think we're all overlooking the "non-fiction" titles aspect of this topic. If the guy is just compiling data into book form there's no real creativity involved in that to be sure. However, from a business perspective, having over 200,000 different compilations of various subjects readily available for anyone who needs sed info for research purposes could prove quite lucrative - especially when you consider the fact that his products are POD. This eliminates a great deal of overhead in his business model. If he only sold a handful of half of his products he'd still be making money.

As far as using his computer algorithm system to churn out romance novels - or any form of fiction - not feeling particularly threatened over here. Most writers who are actually "trying" have difficulty writing fiction that is readable, let alone entertaining. I'm doubtful that a thousand chimps clacking a keyboard will ever produce Shakespeare.

A.R.Yngve said...

Roald Dahl once wrote a story about a computer which replaced human writers with algorithms, capable of endlessly churning out computer-generated stories.

(What was the title again??)

Stephen Bloom said...

Something akin to this auto-generation process is gradually clogging the blogosphere with faux blogs. These quasi-blogs are merely computer generated aggregations of related information. And I think they're getting more sophisticated. I assume their purpose is simply to draw traffic for advertising revenue. There seem to be more and more of them, and some almost look legit at first glance.

Victoria Strauss said...

Ugh, I hate those faux blogs. I have my Google alerts set to bring me mentions of this blog, and it's often one of those aggregating sites.

ALC said,

I think we're all overlooking the "non-fiction" titles aspect of this topic. If the guy is just compiling data into book form there's no real creativity involved in that to be sure. However, from a business perspective, having over 200,000 different compilations of various subjects readily available for anyone who needs sed info for research purposes could prove quite lucrative

I'm glad you pointed this out, it's an important point that some commenters seem to have missed.

I have no doubt that computer algorithms could produce a basically readable novel based on genre conventions...but why? Possibly there's a reason to aggregate all the information in the world on bathmats, if no one else is doing it--but there are so many people already writing novels, there seems little need to computerize the process.

Mad Scientist Matt said...

A.R. Yngve, I think you're thinking of "The First Sally (A), or the Electronic Bard."

S.D. said...

Bizarre.

Victoria Strauss said...

Is it not ironic? I have been aggregated in one of Parker's aggregations.

Eirin said...

Oh, that's rich. It's also exactly the same text as in the Wikipedia article on "athor mill".

Eirin said...

Oh bother.
Make that author mill.

~brb said...

What's next? Computer-generated SF novels with stock aliens? Algorithm-created crime dramas with hard-boiled dialog swiped from the movies? Robo-poetry to populate a hundred Poetry.coms?

Nah. Computer-generated sitcom scripts. That's where the real money is at. And the audience would never know the difference.

Ray Girvan said...

I ran into some of these titles by accident, and the content appears completely unreliable, particularly the whole Webster's Quotations, Facts and Phrases series. Quotations without source citation aren't worth a damn, and much of the content is scraped from Wikipedia.

For instance, I was searching for "Otto von Riesenthal", and found multiple Icon Group International titles saying:

Otto von Riesenthal was born into poverty. Abandoned by his parents at an early age young Otto became a ward of the state of Prussia. Floating through several orphanages he was eventually adopted by Otto Von Bismark. A very famous Prussian renown for his wearing of weird pointy hats. After Riesenthal went on an absinthe binge Bismarck disowned him and forced him to live a shack composed mostly of deer hides, mud, grass, bones, and discarded mathematics textbooks. (Which Otto had taken to calling Devil's Books).