Application Performance Management (APM) is a big topic in IT shops across the board, and one of the hot players is Dynatrace. The company says its customer roster includes nine of the top 10 banks, nine of the top 10 retailers, and 386 of the Fortune 500. Network World Editor in Chief John Dix recently sat down with Dynatrace CEO John Van Siclen to get a sense of where APM is today, where it is going, and how the company is faring since private equity firm Thoma Bravo acquired Compuware and spun out its APM division as Dynatrace.
Where does APM stand today in terms of adoption? If you go into a large shop will they have APM support for a good portion of everything?
It’s gaining but there’s no way it’s saturated. In fact, there are more apps created per day than the top players can instrument. It’s one of those unique markets that just continues to expand. Every Global 1,000 company is doing more new apps now than they were doing five years ago.
While companies are still trying to consolidate their systems of record, there are a whole set of new apps being created, many of which are for connecting customers to those systems of records. And the new applications have different characteristics. They’re faster paced, they’re using cloud infrastructure, mobile is front and center. So they have different design points, but this is where we're seeing a huge explosion of applications.
For example, I was at an automotive insurance company and they’re thinking about how they can leverage the Uber concept. What if you put their app on your phone, turn on tracking and, depending on how you drive, where you drive, how many miles you drive, etc., they’ll adjust your insurance on sort of a micro insurance basis? So you have 100-year-old companies that are transforming the way they think about engaging with customers. When it starts to permeate older style industries, it’s quite the revolution.
What makes you folks different?
The industry has been moving beyond availability monitoring to performance monitoring because performance is a superset. People used to think availability was a superset and performance was an interesting addition, but now it’s all about performance, and increasingly performance as measured from the outside in. We’ve been one of the key drivers of this outside-in program, where it’s not about the green lights in your data center, it’s about whether the user has a smile on their face at the end of the experience. If they don’t, how do you find that out, how do you react to it, and how do you fix it quickly on a proactive basis?
That focus has fueled our growth from about $25 million when Compuware acquired Dynatrace just a little over four years ago, to more than $400 million in sales today. Granted, there were some other assets combined along the way, but little Dynatrace has become big Dynatrace by showing how performance is key, that the user perspective is the only perspective that matters.
But the shift to digital performance is not just about the performance of the application, it’s also about conversion rates, revenue streams, complaint resolution, that kind of thing. Those are now key characteristics of the use cases we support. So instead of talking about APM, now we talk about Digital Performance Management. We spend as much time now talking to CMOs and line-of-business execs as we do with the operations and Dev teams, and that’s new in the last 18 months, but it’s a big part of our push going forward.
So you’re putting an emphasis on DPM vs. APM, and APM competitor New Relic is talking more about software analytics. How do you add it all up?
There’s no question the APM market is morphing. It’s interesting because what we all have is a granular set of information about what’s going on inside those applications that nobody has had before. HP never had it. CA never had it. IBM never had it. They all tried to build IT operation management off of averages and aggregations and correlations as opposed to the raw material about what’s actually happening.
We happen to have the highest fidelity of that data, but guys like New Relic have come in with an easy to use cloud app that gives you some monitoring of cloud apps. AWS apps, Azure apps, that kind of thing. That’s all they do, which hurts them when it comes to Tier 1 applications in healthcare, finance, insurance and the rest. But it’s a similar concept in that, if you have all this raw material, you can do more with it than just provide visibility into the plumbing of the apps for the operations guys. You can start surfacing information for the business people because they want real time visibility in order to react to the market. We’re all thinking about the same thing. In fact, there are probably three of us who think about this market different than the old generation, ourselves, New Relic and AppDynamics.
One of the unique characteristics of our type of digital performance management is we use the same source of truth whether you look at it through the lens of the business, the lens of operations, or from the development point of view that gets you all the way down into the lines of code. If you have separate tools, no matter how good you are, you are going to take time firing up the new tools trying to figure out what happened and when and it just takes too long.
That real time notion is paramount for the business people. They lean forward at the table when you start talking about real time and the ability to dig in and hand off the results to the appropriate people to address problems by clicking a button.
As organizations spread resources among clouds -- their own and public providers – I imagine they end up with a patchwork quilt of APM tools. Is that what you see?
They usually have a variety of pieces. Banks, for example, have hundreds. Even ecommerce shops, non-brick and mortar, will have dozens. They all want to consolidate them because when problems emerge everybody shows up with data from their own little tool and you can’t correlate and react quickly enough. That’s one of the things the APM platforms -- or the digital performance platforms in our case -- have been doing, which is gathering more and more data to provide the right level of analytics to make it easier to cover more use cases faster.
My prediction is in the next two or three years you’re going to see a change in the whole IT operations management landscape where the APM-born platforms are going to take over the next wave of IT ops in the cloud because it’s the only high-fidelity information that can take you from the business view all the way to the network and do it end-to-end. We have a platform we introduced last fall called Ruxit, for Real User Experience for IT.
Ruxit it will automatically instrument all applications, all processes, all your infrastructure processes for an entire application. We’ve been beta testing at some of our largest customers where they have tens of thousands of points of instrumentation and in just a few minutes it will spider the whole thing and provide a map of all the dependencies. It’s layered with analytics and has significant intelligence so it makes it easy to get answers -- Here are your key issues and what you should do about them.
Think about it. It’s provides real time change management on a massive infrastructure. You can’t do that unless you come at it from an application performance bent and have the fidelity of information, the automatic instrumentation and the analytics we’ve built in.
If companies have a mixed bag of tools today, do they typically bring you as an integration platform for those tools or to solve a specific problem?
A little bit of both, but most of the time the need to consolidate is the driver. But they start by applying it to one application to make sure that everything works as advertised, which we encourage. There’s way too much shelfware in the software industry.
So they’ll start with an app and once that is proven they see how quick and easy it was and they start to think about the next set of apps. Many of them already have it planned. We’re working with a big hospitality company right now where they acquired some of our software for a small set of apps and now less than six months later they’re coming back for a much larger transaction for a much wider set of applications, so probably 4-5x the size of their original transaction.
It happens that fast because the products are surprisingly easy to implement for the depth of visibility and the value you get from seeing everything from how customers are behaving on your site all the way down to how you can optimize your longest-running transactions to make them faster.
You mentioned IT operations management, where does APM fit into that?
Ten years ago if you were to talk to BMC or CA or HP they would say, “If you want to do IT operations management you’re going to need network monitoring, capacity planning, a change management database, and over here you’re going to need some deep application stuff, some of this, some of that.” They had these big maps of all the different little pieces and that would match up to the ITSM handbook that all the IT guys would carry around with them. But that was built in a static world where, if you made a change it was a physical change or it was an operating system upgrade or the addition of an application or whatever.
Now we’re in a highly elastic world of virtual machines on virtualized hardware and a virtualized network and a stack of applications made from microservices that either you built or which come from third-parties and dynamically bind at the time something is happening. How do you troubleshoot that? How do you even know what’s going on?
Splunk is trying to do it with high performance search of log information, because everybody needs to keep logs. It’s an SEC regulation. It’s a healthcare regulation. And they’re trying to add more and more value on top of logs. What we do is instrument the applications. We come at it through the application lens, looking both up towards the business transactions and conversions and all the way down to the network, whether it’s physical or virtual. And it turns out that approach is much better than the old approach which was focused on physical componentry and assumed that if each item was healthy you assume everything was good.
Do you leverage Splunk somewhere?
Splunk provides value into our environment so it’s very complementary. We don’t compete for budget. They’re in a different category that companies have to spend money on and we’re in another category that companies need to spend money on. Most people don’t have a budget for application performance management. But they know they need to be able to monitor from the outside-in to guarantee customer satisfaction and handling complaints in a very efficient matter. That’s why the people we talk to in an organization is growing on the business side. I’d say it’s maybe 20% now but it will go to about a third.
Some IT shops are looking for ways to automatically respond to some of the stuff coming at them from various systems. What have you seen in that regard?
It’s a good question. I’ll tell you what we hear and then give you my editorial on top of it. Nobody wants to install a black box and hope it’s going to work properly, but I don’t know if that’s just humans trying to safeguard their roles. But I don’t see why the world wouldn’t want to automate things they know, so if this happens here are the three scenarios of what caused it, figure out which one it is and automatically apply it. The APIs are there, the scripting technologies from Chef and Puppet and Ansible and those guys are there. It’s a matter of whether the people sitting there in the middle want it to automatically do things.
It’s the concept of going from DevOps to NoOps because NoOps means I’ve written all the orchestration and all the recovery so I’m not going to have those issues and, if I have them, I’m going to come back and write another script so if I ever see this again it’s going to automatically do something else.
You mentioned shelfware and we all know various network tools end up on the shelf because alarms start ringing all over the place and eventually they get turned off or just go ignored. How do you fight that?
That’s the trick. If you don’t have good semantics you end up with lots of alerts that trigger all sorts of things. It could be one simple thing but twelve systems are alerting because they all got hit at a different point and people eventually just ignore them. The other thing that happens is someone sees something go red but, because its on the other side of some load balancer, it automatically transfers and you presume it didn’t affect anybody and you can go fix that server later.
The outside-in capability shows if that is in fact the case and also identifies the high impact items by showing what’s happening to users. Then the trick is, how do you correlate and get rid of the false positive alerts that might have been triggered by something that wasn’t really an issue, like having more traffic than normal. Anybody that has baselines built on historical information continues to get alerts and so everybody discards those. We build in more intelligent analytics for automatic baselining. That makes it possible to take into account, for example, if everything is rising at once. So, if traffic and your CPU is rising I’m not going to alert, but if your traffic isn’t rising and the CPU is rising I’m going to tell you about that. Then I’m going to tell you if any users are impacted. That’s the intelligence that’s built in.
Do you get that out of the box or is that after a year of customization?
It’s very much out of the box. Compare that to where we were even five years ago. You’d go talk to CA or IBM and six months or maybe a year later they were done configuring every little potential path through an app. Ours is automatic. And as we go forward, it’s getting to the point where all dependencies are mapped between everything in minutes. And it’s not just once, it’s always mapped all the time, which is unheard of.
On the cloud front, how are your SaaS offerings doing?
Our “as a service business” is about $150 million, between Synthetic and Ruxit, so we have a significant part of the installed base that just want to use APM as a service.It partly stems from the fact that it’s hard to find real experts in the full stack of monitoring today. We have lots of experts.
What has changed since you became an independent company?
One of the biggest changes is the culture. As an independent business we can take on a culture focused on innovation and customer success. This is now a meritocracy, where your results matter to the business as well as to yourself and your career, as opposed to the “who you know” kind of model. That’s really refreshing for the organization because it strips away clutter and confusion about who reports to who, who does what, etc., especially in the international spaces where we had literally six different businesses sitting in the same office.
That’s all cleaned up now. If you go to our French office it’s Dynatrace, if you go to the Sydney office it’s Dynatrace. It may sound like a little thing, but its fundamentally powerful because everybody now has identification with one thing and the success of one thing.
It also enables us as leaders to accelerate change, whether that’s reacting faster to market shifts, or the pace of product innovation, or branding or reengaging the market in terms of what we’re about, how we’re different, those kinds of things. The speed at which we can react has stepped up dramatically.
All of these things are really important to success in this very competitive, dynamic market. My hat is off to Thoma Bravo for recognizing that there was a diamond in the rough here and somebody needed to come in and carve it out and give it its own life and charter and ability to attack the market. They’ve been a great partner.
A great partner in what way?
Here’s one example. They looked at our support costs and said they were too high relative to their portfolio companies. So we looked at that and ended up coming up with a way of actually doing better than their benchmarks and increasing customer satisfaction at the same time. We reorganized and moved from a traditional support model focused on how many cases closed per month, and shifted it over to engineering, where it’s about reducing the total number of cases in the first place by building a better product. Now engineering spends less time on escalations because they are right in the middle of what’s coming in. They get to see what’s coming in on a daily basis and react to them in each and every product.
It just took re-spinning and thinking about it once they showed us the benchmark was off. We’ve probably done at least a half dozen of those transformational kinds of things and it’s made a world of difference in our efficiency, in the satisfaction of customers, in our speed to market. My hat is off to those guys.