National Library of Australia assistant director-general of information technology, Mark Corbould
The National Library of Australia (NLA) has more than doubled its web services traffic to reach more than 2.5 billion hits in 2009.
Speaking at the CIO Summit 2010 in Sydney, NLA assistant director of general information technology, Mark Corbould, attributed a lot of the organisation’s success to the use of open source solutions.
In 2006 across all of its online assets including the public catalogue and the Trove search engine, the NLA had half a billion web server requests. This jumped in 2008 to over one billion and last year hit more than 2.5 billion.
“We are now probably the highest ranked cultural institution in Australia in terms of our web presence,” Corbould told the summit attendees.
In April the NLA unveiled its Trove search engine that was built on an open source platform.
The search engine provides access to more than 90 million items about Australians and Australia, sourced from more than 1000 libraries and cultural institutions across the country.
Courbold said for an organisation that costs roughly $70 million a year to run with 450 staff and 10 per cent of its budget going towards IT, there was little chance of securing a commercial software licence to do what the NLA does.
The Trove project’s team of five developers used SOLR 1.4, which internally uses Lucene 2.9, for the main bibliographic search database and the web page archive, and MySQL 5 for managing all data relationships.
“The overpowering characteristic was that it was the only one we could afford,” Courbold said.
The project team also opted for Jetty as a web server, Nginx as the HTTP front-end/reverse proxy, Java Server Pages (JSP) for the newspapers part of the site, and Restlet and FreeMarker for the remaining portions of the service.
Additionally, one of the main steps taken was to use Solid State Disks (SSDs) – four Intel X-M25 160GB drives in each machine – for the Lucene indices to achieve the necessary performance. Trove issues more than 8000 IOPS (input/output operations per second) to the SSDs, which the team says would be expensive to achieve with even the fastest SAN setup.
“A lot of what we do is not well supported by the mainstream market,” Courbold told the CIO summit.
“When you want to harvest the web or when you want to integrate access to varieties of collections we have the market place doesn’t service us well.”
(See the CIO Summit 2010 in pictures)