HEADLINE: NINCH SYMPOSIUM: April 8, 2003, New York City

The Price of Digitization:
New Cost Models for Cultural and Educational Institutions

Short Report
by Michael Lesk


Agenda | Speaker Bios | Resources | Full Report by Lorna Hughes



More than 200 people attended a NINCH meeting about costs of digital conversion at the New York Public Library on April 8, 2003; it mixed some practical experiences with commercial sales pitches and searches for funding.

Costs reported for digitizing a book ranged from $4-5 up to more than $1000; too many of the speakers were so focussed on quality as to make me despair that they will find funding for their projects. Key points: (1) prices are too high, we need the Henry Ford of digitization; (2) nobody knows values, and we're not convincing the funders that information is important.

Don Waters' excellent introduction tried to balance cost and benefit. In the non-commercial world of most libraries, people don't understand the three numbers: cost, price and value. Cost is what you spend to make a product; price is what you sell it for; value is what it is worth to the customer. If cost is above price, you're losing money; if price is above value, you won't have any sales.

Universities often fail to monetize or recognize some indirect costs (e.g., most university libraries don't pay rent for their building) and so a library may not realize, in deciding whether or not to buy an electronic publication in place of a paper one, the extent of shelving and cataloging costs that are saved by going electronic. On the other hand, few in this area measure the benefit of information, so that it isn't usually possible to put a numerical value on delivering information to the desktop instead of the library reading room.

Measuring the value of information has proven very difficult; Don King has spent his career doing this almost as a lone voice in the wilderness. Is the distribution of obscure 19th-century novels by the University of Virginia free E-text center an example of the rich resources electronics can make available, or a variant of Gresham's Law in which free books are displacing good books?

Don Waters mentioned a 19th-century example of a project to provide cheap books for the working class that got attacked either for distributing dangerous ideas or for distributing no ideas at all. Innovators will be familiar with this: whenever you try something new, some will object that it might not work and others that it might. In digitization, our worst problem may be that we can't tell, at least to the extent that in a world run by bean counters we can not put a cash value on the information we make available.

We got actual price quotes for digitization projects in the next session. Maria Bonn gave an excellent talk on the University of Michigan's service center; they run some 20-27 cents per page ($60/book, say) of which 13 cents is scanning and the rest overhead, selection, and processing. The next two talks were sales pitches from Luna Imaging, which does photographic and other non-textual materials as a rule, running from $4 for a 35mm slide to $60 for a larger item, and from the Systems Integration Group, whose example priced text pages at about $2 each and maps at $12. All speakers emphasized quality, and planning. You can get quotes down to 4 cents a page or $10/book if you're willing to disbind, you are willing to ship to a lower-cost country, and you're not so fussy about the process and quality. All speakers said that you should think about the your whole project and not just scan things without knowing why.

Steve Chapman gave a good talk about the costs of paper vs. digital repositories, using OCLC's Digital Archive and the Harvard Depository as examples. Harvard charges about $4/sq ft for standard space, so that 2202 volumes would cost $689; similarly Iron Mountain charges $6 for a box about 1 cubic foot which would also hold about 10 books, I think, so Harvard is somewhat cheaper but in the same range. OCLC seemed extremely expensive to me, with charges of $60/GB for up to 100 GB and still $15/GB for more than a terabyte; this when you can buy a 100-GB disk drive for under $100. There are not many competitors yet, however, and I can't find a comparable price quote. At the prices he sees, digital ascii is cheaper than paper but image is more expensive; at the prices I see, everything is cheaper digitally.

Carrie Bickner talked about the visual archives at NYPL itself; it was very interesting material but relies on temporary soft money, as do many digitization projects. All too often libraries perceive digital information as important, but are reluctant to spend any core budget on it.

Steve Puglia gave the most numerical talk, full of the actual prices quoted to NARA and in proposals he's reviewed. Perhaps most interesting was that his projects broke down typically with 1/3 of the cost on digitization, 1/3 on cataloging, description, and indexing, and 1/3 on administrative costs, quality control, overhead and the like. In order to reduce the costs of the projects you need not only to work on scanning process flow, but also on the other categories of spending. To scan a book, in his examples, ranged from perhaps $75/book at the low end, up to $2500 at Questia. (The Questia example, computed by observing that they had spent $125M and digitized 50,000 books, was challenged with a claim that $90M or so of the $125M had been spent on advertising and other such activities).

I felt that all of his numbers were high and reflected (a) vendors believing that with the Federal budget behind it, NARA and LC can be charged anything they want, plus (b) excessive quality specs on the part of the buying organizations. I was glad that Steve gave comparisons of the online delivery of information vs. the traditional services; at NARA far more use is made of the website than of the reading rooms, and the same is true of many other groups, e.g. LC.

Tom Moritz gave an excellent talk explaining that digital information now includes things like museum specimens, not just traditional journals and monographs. He discussed revenue sources, but had no numbers for them.

Jane Sledge of the National Museum of the American Indian had a sad story of a failed RAID drive, which held images of their 800,000 items; but in the end, they still had two sets of DVDs, and lost only time and money. The British Library still has catalog entries marked "destroyed in 1940" (when a German bomb fell on the library), by contrast.

Christie Stephenson talked about getting more revenue by selling digitization services to others, which is mostly internal reshuffle. Neil Smith at the British Library once pointed out to me that if libraries only sell things to each other, they might as well just digitize their stuff and give it away free; there's a need to get more money into the system from outside.

Kate Wittenberg of Columbia talked about planning projects in the context of a digital publication to be sold to readers, but didn't have numbers we could look at. I note that the ACM digital library is doing fairly well, with some 30-40,000 readers paying an individual rate of about $100 per year.

Jack Abuhoff, head of Innodata, whom we thank for funding the symposium, emphasized the need to plan, to preserve, and to expect changes in the future. Again, though, I fear an emphasis on planning and quality will make projects unaffordable, especially in today's economy.

I ended by quoting Voltaire, "the best is the enemy of the good," and urging people to go for more material at lower costs and quality levels. I also think we urgently need help demonstrating why we need these projects. Institutions don't quantify the value of new information and fear that it is used by those outside their community; we may need a new definition of community.

On balance I'm amazed at how much is being done despite high costs and no estimates of value. I can only hope that if we can make progress on those issues, we could get even more done; and at least the costs should decline as technology continues to improve. Would that economics improved at all, let alone at the same rate.