If the mobile apps supporting half your online betting business fail on the day of the Melbourne Cup, one of the biggest gambling events in Australia, then you can bet your heart would be racing faster than Phar Lap's.
That’s what Alan Alderson experienced in 2013 as head of IT infrastructure and operations at William Hill, an online sports betting business. For six hours, Alderson and his team worked to get server infrastructure back up and running smoothly so punters could bet using the company's mobile apps just before the race started.
“At 6:30am, I got a phone call to say there is high CPU usage with the main transactional database. It was a very frustrating ... we didn’t have any real indication of the problem.
“The business at that time was a 50/50 split between mobile and web browsers, so it was a big chunk that was down. CPU returned to normal at about 2:00pm, an hour before the race. IT confidence across the business was probably at an all-time low after that,” he said during a webinar yesterday.
The problem was the business was not using sophisticated tools for clear visibility of systems and detailed monitoring where issues could be spotted and acted on early, he said.
There were also too many siloed views and non-integrated tools, with no real unified visibility, he said.
“Back in the day, we were very reactive. Customers generally told us first there was a problem - [our] internal and external customers.”
For the past two years, William Hill has undergone a transformation of its IT operations, which includes implementing the CA Unified Infrastructure Management, App Synthetic Monitor and and Server Management tools, as well as Splunk.
Alderson said these tools have improved monitoring, visibility and reporting across its server infrastructure, showing CPU and disk memory usage, and website availability.
Alderson said he looked to CA because the tools required minimal management overhead. They automatically identify any part of William Hill’s infrastructure or website performance that deviates from the norm, and send out alerts to operations staff so they can act on issues quickly, he said.
Alderson said he wanted to be more proactive rather than reactive in solving mission critical issues, so problems are fixed early before they manifest into larger issues.
“It’s about knowing before our business and customers know. With all the monitoring and alerting we have in place, we are on it straight away and can get things fixed and sorted out quickly," he said.
Also there is more visual, real-time dashboarding on website performance – everything from uptime, availability, to downloading.Read more: Turnbull talks challenges with open data in government
“Customers want high availability in systems, or they will go elsewhere if they don’t get good performance, good customer experience,” Alderson said.
Since 2013, William Hill has experienced 99 per cent uptime of customer facing systems and payments; 98 per cent uptime of customer support systems, data feeds, and payments; and 95 per cent email-to-ticket conversion (2 hours) in its service desk.
The Melbourne Cup in 2014 and 2015 ran smoothly, he said, with 2014 being a year the business broke the record in online bets per minute.
However, Alderson said William Hill’s infrastructure is not at an ideal stage. The tools are still quite infrastructure-centric, and have yet to allow staff to peel away at the different layers such as mobile from browser, payments from databases.Read more: Corporate website performance stuck in the 1990s: survey
“I want to make it more a service-centric dashboard rather than an infrastructure and application dashboard.”
But painful days like the Melbourne Cup in 2013 are behind him, and even though he still gets some surprises, he said they are small and not damaging to the business.