May 4, 2010

Lies, Damned Lies, and Statistics

Posted by Victoria Strauss for Writer Beware

You may have read a recent article in PW called "Self-Published Titles Topped 764,000 in 2009 as Traditional Output Dipped," about the amazing growth in "non-traditional" (a.k.a. print-on-demand-produced) books.

Or you may just have read about it, given how many tweets and blog mentions it received. If the latter, you may have wondered what such a gigantic surge in self-publishing portends for the commercial publishing industry (where output did indeed dip, though only by about half a percent)--or, if you're one of those folks who believes that "traditional" publishing is or should be dead, you may have felt a surge of righteous vindication.

However, while that staggering 764,000 figure (actually, 764,448, more than twice the output of traditional books--the info comes from the latest US publishing statistics released by Bowker) is real, PW's title is misleading. In fact, those numbers include not just self-publishers, but "self-publishers and micro-niche publishers," a much larger category that encompasses the print-on-demand sector as a whole. Self-publishers aren't even the biggest portion of this category; in fact, they're very much in the minority. Says PW,
The category consists largely of reprints...According to Bowker, the largest producer of nontraditional books last year was BiblioBazaar which produced 272,930 titles, followed by Books LLC and Kessinger Publishing LLC which produced 224,460 and 190,175 titles, respectively.
BiblioBazaar (whose motto is "Old Books, New Life") "support[s] projects for the digital preservation of classic material and make[s] these works available for sale in printed form as a new book." Ditto for Kessenger Publishing (whose reprinting program has actually been the target of allegations of copyright infringement.)

So better than 687,500 of 2009's 764,448 non-traditional titles were reprints of previously-published works, most in the public domain. That leaves around 77,000 titles for the self-publishing and micropress sector. According to PW,
The Amazon subsidiary CreateSpace produced 21,819 books in 2009, while released 10,386. Xlibris and AuthorHouse, two imprints of AuthorSolutions, produced 10,161 and 9,445, title respectively.
Bowker's press release rounds out its top ten POD book producers with few more numbers: General Books LLC, 11,887; International Business Publications USA, 8,271; PublishAmerica, 5,698 (I hate to admit that PA is tops in anything, but there it is).

There's something a bit curious about these numbers. Reported title output for Bowker's top ten actually adds up to more than 764,448. And what about the many other publishing service companies (including three more Author Solutions brands, Trafford, iUniverse, and WordClay), and all the POD-produced small press and micropress titles? Where are they in these figures?

2009's improbable growth in POD titles makes it clear that digital publishing is continuing to change the landscape for readers and retailers (although see this article for a discussion of how reprint companies like BookBazaar and Kessenger, which benefit from the public domain, may not be doing it any favors), and to a lesser extent (since publishing services have been around for more than a decade now) for writers. They also suggest that here's still some life in the mostly-discredited long tail theory, at least for retailers that don't have to worry about physical inventory.

But I have to wonder--who is buying all these books? Or, put another way--how many of these books are being bought at all? In the long-tail digital universe, where books are nothing more than bits and bytes, it really doesn't matter if you offer thousands of books that never sell a single copy, as long as you offer tens of thousands that sell just a few. Which is why I think it would be very interesting to compare sales figures for the POD sector (info that does not seem to be available) to title growth over the past couple of years. It might place that huge increase in titles in a somewhat different perspective.


Adventures in Children's Publishing said...

This is such a great point. Comparing titles for self-publishing against traditional pub models is meaningless. Any kind of rational analysis has to involve a revenue stream.

Andra M. said...

I wondered the same as you about sales. Publishing a book is one thing. Selling it is another.

Anonymous said...

A more honest comparison would be self published titles that sold a minimum of X number of books (ie 100 copies+)in the first year of release.

That IMHO is the problem with articles based on statistics. The author gets to pick and choose what statistics he uses in order to support his personal bias. In this case the number of titles released vs. the number of titles that actually sold books.

Janet Morgenstern said...

Did you know that 90% of statistics have been massaged to mean whatever one wants them to mean?

(Sarcasm intended.)

Frances Grimble said...

I think the numbers inflation is mostly due to reprints of public-domain material copied from the files posted by Google, the Internet Archive, Project Gutenberg, and other free sources. I've bought a few of those books for examination. If text is missing from the publicly available scan, it's missing from the reprint. These publishers just copy the PDFs and crop off the Google watermark, or they copy the OCR'd file and pour the text into a standard design with a standard cover. They're not buying the original rare book to use, they're not editing, they're not proofreading, they're not designing covers, they're not marketing individual titles.

Go find some really obscure title on Google Books or the Internet Archive, something you think would interest about a dozen people in the US and them all specialist researchers. Then look for the same title on the meta-book-search engine You're likely to find a dozen of these reprint publishers, Kessinger and others, offering that same title. They may even be printing the books as one-offs, since their other costs are so low.

The important thing to consider is: Are these reprints of scans competing with your new book? In most cases not. Sure, there are, say, afficonados of Victorian literature out there. I'm one, but I'd rather buy the book in a better-quality edition, or cheap on the used market, if I can get it. The cost of the ink in my printer and a three-ring binder is at least as much as a new mass-market paperback. Most people don't compute that, so they download the files and print them. Still, they more likely do that than buy the book from Kessinger et al.

But again, this market is not likely to affect your new science fiction or fantasy book.

This is not, however, to say that the legitimate self-publishing market is not growing. It is. And there are good reasons for it, one of them being greater control of your e-rights in the current industry tug-of-war over e-profits.

Frances Grimble said...

I will also say that "POD" is otherwise a bad category. It is simply a printing method. As such, it is also used, among other things, to keep alive older titles from large publishers and to print new scholarly and scientific titles from university presses, etc., that are too specialized to ever have a large market.

Frances Grimble said...

By the way, I am not suggesting that Kessinger et al should not reprint public-domain works. I prefer, however, to buy reprints where the publisher has added value by providing introductions, footnotes, glossaries, indexes, translations from foreign languages, etc., and/or by carefully choosing and organizing works for an anthology.

All of which "added value" is definitely legally copyrightable. It's a shame that the public is coming to associate all reprints with versions that are inferior to the original work.

I also am disturbed by the recent trend of mushing together the concepts of "out of print" and "public domain." They are by no means the same thing. It was legal for Google to scan public-domain works. The so-called "orphan works" are another story. In any case, Google scanned many books that are actively in print, and they are still scanning books. Anyone who wants to keep up to date with the Google Settlement might want to visit law professor James Grimmelman's blog at

ialmostlaugh said...

at the end they made us believe what they want

