Critical.
Authoritative.
Strategic.
Subscribe to CIO Magazine »

Google: 129 million different books have been published

Google estimates about 129 million books have been published, and it plans to digitize them all

For those who have ever wondered how many different books are out there in the world, Google has an answer for you: 129,864,880, according to Leonid Taycher, a Google software engineer who works on the Google Books project.

Estimating the number of books in the world is more than an exercise in curiosity for the search giant: It also provides a roadmap of some of the work still left to be done in meeting the company's ambitious goal of organizing all the world's information.

"When you are part of a company that is trying to digitize all the books in the world, the first question you often get is: 'Just how many books are out there?'," Taycher explained in a blog post announcing the estimate.

To come up with a reasonable approximation, the company started by ingesting book information from multiple cataloging systems, such as the International Standard Book Numbers (ISBN).

Such catalogues, while helpful, do not provide a definitive count, however. For instance, ISBNs have only been assigned to books since the 1960s, and tend to be only used in the Western countries.

Also multiple books have been assigned to individual ISBN numbers, and publisher have assigned ISBNs to items other than books, such as t-shirts and DVDs.

So Google engineers have written programs to comb though about 150 such catalogues and directories, and eliminate as many duplicate entries as could be found.

The company also had to make a number of tough decisions about what is and isn't a book, Taycher explained.

For instance, soft cover and hard cover editions of a text are counted as two books, as are the many different versions of a popular text, such as Shakespeare's "Hamlet," due to the forewords and commentaries they may contain. Serials may count as individual books or as a collected work.

As of June, the company has scanned 12 million books, according to a presentation given by Google Books engineering manager Jon Orwant at the USENIX Annual Technical Conference in Boston. These books have been written in about 480 languages (including 3 books in the Star Trek-originated Klingon language) .

The company plans to complete the scanning of existing books within a decade. The resulting virtual collection will consist of four billion pages and two trillion words, Orwant said.

About 20 percent of the world's books are in the public domain, Orwant explained. About 10 to 15 percent of these books are in print. The remaining books -- the vast majority of all titles -- are still under copyright but out of print. Google is in the process of borrowing copies of these books in order to digitize them, from about 40 large libraries worldwide.

It's this act of scanning in books that are out-of-print but still covered by copyright that has been met with some resistance by the publishing industry.

The company is now waiting for a judgement from the U.S. District Court for the Southern District of New York, on whether it can scan these books.

In 2005, the Authors Guild and the Association of American Publishers separately filed class-action lawsuits against the search giant, asserting that the company is infringing on author copyrights by scanning in the books.

Google has claimed it wants to sell digital copies of these otherwise out-of-print books, and set aside royalties for the authors to claim. The company also hopes to reveal snippets of these books in Web searches, and claims this use falls under the U.S. Fair Use doctrine.

Scanning in all the world's books will lead to other benefits in addition to improving searches, Orwant explained. Once all these volumes are digitized, their contents can be subjected to analysis, which can lead to new insights. Linguists can discover when certain words came into widespread use, or who first starting using these words.

The Google Book Search could also help answer some outstanding historical questions: For instance, it could inform the debate over whether Isaac Newton and Gottfried Leibniz -- or someone else entirely -- invented calculus.

"We can search not just for a phrase but for a concept," Orwant explained. "We can take all the different ways [that the idea of] infinity can be inflected, translate that into different languages, and do a search in parallel."

"My hope is that as we start to expose a lot more of this collection, it will allow people to ask questions like this that they haven't been able to ask before," he said.

IDG News Service editor Juan Carlos Perez contributed to this report.

Joab Jackson covers enterprise software and general technology breaking news for The IDG News Service. Follow Joab on Twitter at @Joab_Jackson. Joab's e-mail address is Joab_Jackson@idg.com

Join the CIO Australia group on LinkedIn. The group is open to CIOs, IT Directors, COOs, CTOs and senior IT managers.

More about: Google, IDG, Trek
References show all

Comments

Post new comment

The content of this field is kept private and will not be shown publicly.
Users posting comments agree to the CIO comments policy.
Login or register to link comments to your user profile, or you may also post a comment without being logged in.
Related Coverage
Related Whitepapers
Latest Stories
Community Comments
Tags: search engines, internet, Google, cloud computing, analytics
Latest Blog Posts
Whitepapers
  • Oracle Exadata: Extreme Performance Lowest Cost
    As organisations contend with escalating demands for greater quantities of information, more sophisticated data analysis, and a burgeoning user population, Oracle Exadata makes database workloads faster, easier to manage, and less expensive. Oracle Exadata is the world’s first database machine to provide extreme performance for both data warehousing and online transaction processing (OLTP) applications.
    Learn more »
  • IBM agility@scale™: Become as Agile as You Can Be
    In this eBook, Scott Ambler, IBM Rational software's Chief Methodologist for Agile and Lean discusses how IT organisations are finding that agile project teams, as compared to traditional project teams, enjoy higher success rates, deliver higher quality projects, have greater levels of stakeholder satisfaction, provide better return on investment (ROI) and deliver systems to market sooner.
    Learn more »
  • Customer Case Study: Yarra Valley Water Turns to Enterprise Software to Improve Information Flow
    “We don’t need to wait till month-end for management reports—they’re now available whenever we need them. We have much more efficient management, as everyone across the organization is looking at the same set of figures. Read on.
    Learn more »
All whitepapers
rhs_login_lockGet exclusive access to Invitation only events CIO, reports & analysis.

HP and IDG news, product videos and resources