eBay bids on big data challenge
- 09 May, 2013 11:39
Trying to make sense of the estimated 100 terabytes of new data received every day led eBay to start using big data platforms.
Speaking at the Teradata Big Data Analytics summit in Sydney today, eBay director of data and data infrastructure Alex Liang told delegates that its website has more than 50,000 product categories with more than US$3500 goods sold every second.
“We know that almost everyone is using a smartphone to browse listings on eBay which means we get more data. This also means we need to process more data.”
Liang said that his team was also under pressure from the finance department to provide better systems for increased analytics.
“For eBay, data is about value so if you cannot get value from big data you should not even work on it,” he said.
However, getting the value proved difficult because eBay’s integrated analytics environment has more than 100,000 data elements, 90 petabytes of stored data and tables containing 3.5 trillion rows of data.
According to Liang, this environment was not easy to navigate for the 12,000 internal business intelligence (BI) users who range from data scientists down to sales directors who want regular reports.
In 2011, the company began the rollout of three different platforms which support a particular type of analytics.
According to Liang, the company uses its enterprise data warehouse (EDW) platform for corporate BI reporting and a 40 petabyte Discovery platform called Singularity for website behaviour analytics.
“The reason we have Singularity is that you need a very powerful system to do deep data processing. We can’t do this on EDW or our Hadoop cluster.”
Its 40 petabyte Hadoop cluster is used for technical analytics such as counterfeit detection and image classification.
Turning to the second BI platform, he said that eBay built a Data Hub to provide a central information platform for access to all analytics and information, regardless of which BI platform is used to support it.
“Because the business environment is much more complex, you cannot have one analyst working independently. People must be working with each other to get deep data insight,” he said.
This information portal has been configured to drive collaboration between analysts with sharing of the analytics that have been built by anyone in the company. It provides definitional information about each report and can be searched or browsed by category.
According to Liang, the Web design was borrowed heavily from the eBay website to make it easier for analysts to find the report they were searching for.
“We are facing very aggressive competition from other sites so data is the biggest advantage for eBay. Every business initiative we make is based on data,” he said.
Finally, eBay developed an integrated dashboard hub called QuickStrike. Common definitions are provided for dashboard metrics and tools are used across all dashboards for ease of use.
In addition, the dashboards contain links to wiki sites which provide metric definitions including data lineage and SQL queries used to process the data.
Turning to the future, Liang said that the company was considering the development of machine learning techniques to drive more value from stored data.
“You don’t need to spend so much time finding different algorithms because once you have a big volume of data, machine learning will offer a higher rate of accuracy,” he said.
According to Liang, the future will be “live”- meaning real time data loading and analytics.
“Coupled with forecasting and predicting future events, this will lead to even higher value being delivered by the analytics platforms,” he said.
Follow Hamish Barwick on Twitter: @HamishBarwick