Ancestry.com preps for flood of census queries
- 31 March, 2012 03:49
For Ancestry.com, big data is about to get even bigger.
The subscription-based website for finding long-lost relatives already has 6.7 billion historical records and 4.8 billion people named in family trees on its website. But now it's adding the 1940 United States Federal Census, which the federal government will release on Monday.
The National Archives has turned the 1940 census paperwork into more than 3.8 million digital images. The online archivebeing released after a 72-year waiting periodwill be a gold mine for people just beginning to compile their family history, though it will become easier to use once the images are indexed.
When Ancestry.com's database and index are complete, users will be able to search more than 130 million census records using fields such name, street address, county and state.
Scott Sorensen, Ancestry.com's senior vice president of engineering and its top IT executive, says that his staff has been busily preparing systems for the expected deluge of search requests.
The company learned its lesson two years ago from a huge spike in website traffic during the TV show "Who Do You Think You Are," in which celebrities such as Sarah Jessica Parker discover clues about their ancestors. At the first commercial break, many inspired viewers apparently dashed to their computers to try their hand at family research.
Ancestry.com had prepared for a 300 percent spike in traffic from TV viewers, but the website was slammed by traffic that was (in some cases) 21 times the usual pattern, which "brought us to our knees," Sorensen says.
Since then, the company has added servers and beefed up its network and infrastructure to support bigger surges in traffic, he says.
The company has nearly 5,000 servers at its data center and uses a variety of tools to handle its big data work, including the data-mining software Hadoop; traditional relational database software; statistical software called R; algorithms that employ machine learning, a form of artificial intelligence; and Mongo DB, database software that creates linkages among the public family trees posted on the site.
The Provo, Utah-based company had about $400 million in sales last year and has about 1,000 employees, according to Hoovers.com. It currently has 1.7 million subscribers.
The key business goal at Ancestry.com is to broaden its customer base to include people who are curious about their ancestors but aren't experienced researchers. Sorensen's job is to use technology to make the discovery of ancestors as easy as possibleÂso the first-time searchers don't go away disappointed.
Consequently, his technology group works to improve customer metrics such as "time to first discovery" and (for long-time subscribers) "number of discoveries in a week." The company continues to enhance the "power-user tools" for sophisticated researchers, too, Sorensen says.
Three years ago, most ancestor discoveries were made through the company's custom search engine, but now more discoveries are made through "hinting," whereby Ancestry.com's artificial intelligence technology suggests likely connections or records.
"We take the massive amounts of data we have, and the billions of records that people have attached to the family trees, to do record linking and record matching," Sorensen says. "So you start with 40 million Smith names, and then 4 million John Smiths, but what you want are the four records about your great-great-grandfather John Smith. Our record-linking technology will try to surface those four records and give you a hint," he explained. "We try to make those discoveries more automatic."
What does the future hold? Sorensen says he envisions a time when the company adds socio-economic data to the classic genealogical data to provide more colorful information and context about ancestors. He offered this example: "I can see [from the 1930 census] that my great-great-grandfather had a radio, and was the only person on the block to have a radio. Then [with socio-economic data] here's the additional color that shows what percentage of people had a radio in that time and place."
Mitch Betts is CIO magazine's executive editor. Follow him on Twitter: @mitchbetts.
Read more about data management in CIO's Data Management Drilldown.
Join the CIO Australia group on LinkedIn. The group is open to CIOs, IT Directors, COOs, CTOs and senior IT managers.
Updated: Bill Morrow new head of NBN Co
Cloud debate now about speed and sophistication
Cloud debate now about speed and sophistication
Yahoo Mail still down for some users, after an attempted fix
Queensland government to provide 200 services online by 2015
Embracing Behaviour-Based Pricing Models
The telecommunications industry is one of the most challenged, fast-moving and evolving industries thanks to the worldwide embrace of mobile lifestyles that demand new services, solutions and experiences. In this survey, we investigate where and how new thinking around data, analytics and the actionable customer intelligence can further monetise mobile subscribers. Click to download!
Casestudy: Managing an Antivirus Service and Improve the Customer Experience
Anittel Group has provided managed technology and connectivity services to organisations for more than 15 years, expanding to become one of the world’s largest full-service, IT and telecommunications companies. Previously, Anittel deployed an in-built antivirus solution as part of its managed service offering, which addressed a number of its customers’ needs, except for individual malware infections, which occurred as often as a several times a week. In this case study, find out what they did to solve this problem.
Advancing Customer Intelligence Capabilities in Asia-Pacific
Many Asia-Pacific organisations lack or are hindered in their ability to integrate, analyse, and extract insights from multiple internal and external databases. When it comes to big data, Asia-Pacific organisations lag behind the U.S. and Europe in data warehouse, business intelligence, and analytics investments. But don’t expect that to last. Download to find out the big shifts in marketing strategies to improve behavioural targeting and personalisation.