Google: 129 million different books have been published
- 07 August, 2010 06:07
- Comments
For those who have ever wondered how many different books are out there in the world, Google has an answer for you: 129,864,880, according to Leonid Taycher, a Google software engineer who works on the Google Books project.
Estimating the number of books in the world is more than an exercise in curiosity for the search giant: It also provides a roadmap of some of the work still left to be done in meeting the company's ambitious goal of organizing all the world's information.
"When you are part of a company that is trying to digitize all the books in the world, the first question you often get is: 'Just how many books are out there?'," Taycher explained in a blog post announcing the estimate.
To come up with a reasonable approximation, the company started by ingesting book information from multiple cataloging systems, such as the International Standard Book Numbers (ISBN).
Such catalogues, while helpful, do not provide a definitive count, however. For instance, ISBNs have only been assigned to books since the 1960s, and tend to be only used in the Western countries.
Also multiple books have been assigned to individual ISBN numbers, and publisher have assigned ISBNs to items other than books, such as t-shirts and DVDs.
So Google engineers have written programs to comb though about 150 such catalogues and directories, and eliminate as many duplicate entries as could be found.
The company also had to make a number of tough decisions about what is and isn't a book, Taycher explained.
For instance, soft cover and hard cover editions of a text are counted as two books, as are the many different versions of a popular text, such as Shakespeare's "Hamlet," due to the forewords and commentaries they may contain. Serials may count as individual books or as a collected work.
As of June, the company has scanned 12 million books, according to a presentation given by Google Books engineering manager Jon Orwant at the USENIX Annual Technical Conference in Boston. These books have been written in about 480 languages (including 3 books in the Star Trek-originated Klingon language) .
The company plans to complete the scanning of existing books within a decade. The resulting virtual collection will consist of four billion pages and two trillion words, Orwant said.
About 20 percent of the world's books are in the public domain, Orwant explained. About 10 to 15 percent of these books are in print. The remaining books -- the vast majority of all titles -- are still under copyright but out of print. Google is in the process of borrowing copies of these books in order to digitize them, from about 40 large libraries worldwide.
It's this act of scanning in books that are out-of-print but still covered by copyright that has been met with some resistance by the publishing industry.
The company is now waiting for a judgement from the U.S. District Court for the Southern District of New York, on whether it can scan these books.
In 2005, the Authors Guild and the Association of American Publishers separately filed class-action lawsuits against the search giant, asserting that the company is infringing on author copyrights by scanning in the books.
Google has claimed it wants to sell digital copies of these otherwise out-of-print books, and set aside royalties for the authors to claim. The company also hopes to reveal snippets of these books in Web searches, and claims this use falls under the U.S. Fair Use doctrine.
Scanning in all the world's books will lead to other benefits in addition to improving searches, Orwant explained. Once all these volumes are digitized, their contents can be subjected to analysis, which can lead to new insights. Linguists can discover when certain words came into widespread use, or who first starting using these words.
The Google Book Search could also help answer some outstanding historical questions: For instance, it could inform the debate over whether Isaac Newton and Gottfried Leibniz -- or someone else entirely -- invented calculus.
"We can search not just for a phrase but for a concept," Orwant explained. "We can take all the different ways [that the idea of] infinity can be inflected, translate that into different languages, and do a search in parallel."
"My hope is that as we start to expose a lot more of this collection, it will allow people to ask questions like this that they haven't been able to ask before," he said.
IDG News Service editor Juan Carlos Perez contributed to this report.
Joab Jackson covers enterprise software and general technology breaking news for The IDG News Service. Follow Joab on Twitter at @Joab_Jackson. Joab's e-mail address is Joab_Jackson@idg.com
Join the CIO Australia group on LinkedIn. The group is open to CIOs, IT Directors, COOs, CTOs and senior IT managers.
- Bookmark this page
- Share this article
- Got more on this story? Email CIO
- Follow CIO on twitter
- IDC Insight: V-Ray Gives Symantec NetBackup a Competitive Advantage Today and into the Future
- Why Two Thirds of Enterprise Architecture Projects Fail
- Configuration, Not Coding
- 2-Layer BPM: Oracle's Unique Strategy Towards Exceptional Agility and Business Process Efficiencies
- Six tips for choosing a unified threat management (UTM) solution
-
Australia's first 4G smartphone is the HTC Velocity 4G
-
Swedish e-commerce startup's execs linked to NYC sex crime
-
Face Time - Interview with John Brennan and Robert DiStefano
-
How to implement next-generation storage infrastructure for Big Data
-
Pfizer's Future Depends on IT Transformation
-
Top 5 Threat Protection Best Practices
Small businesses are especially vulnerable to computer viruses and lost or stolen data, since they typically lack the IT resources to deal with these threats. Inadequately protected computers open the door to annoying infections, or worse, serious business disruption. Below are five simple and effective strategies to help you protect your business against an ever-increasing number of threats. -
Implementing Energy Efficient Data Centres
Electrical power usage is not a typical design criterion for data centers, nor is it effectively managed as an expense. This is true despite the fact that the electrical power costs over the life of a data center may exceed the costs of the electrical power system including the UPS, and also may exceed the cost of the IT equipment. Read on. -
Automating Your Processes to Outperform Your Competition
Welcome to Volume Three of the “Intelligent Guide to Enterprise BPM.” Get ready for an education in automation—Process Automation, that is. This white paper goes into detail about the Process Automation entry point into an Enterprise Business Process Management (BPM) program. Read on to learn how Process Automation opens up new ways to help your business do things faster—like open up a new sales channel or deliver customer orders. Discover how Process Automation enables your business to run smoother and consistently in an orchestrated way. With a true Enterprise BPM solution, you can automate newly designed processes far easier than starting from scratch.
-
Windows 7 for Dummies®
-
MYOB Software for Dummies 6E Australian Edition
-
Microsoft Office
-
Teach Yourself Visually Windows 7
-
Computers for Seniors for Dummies, 2nd Edition
-
Windows 7 for Seniors for Dummies®
-
Excel 2007 All-In-One Desk Reference for Dummies
-
Office 2007 for Dummies
-
Windows 7 for Dummies® Dvd+book Bundle








Comments
Post new comment