It's 11 p.m. -- do you know what your Web stack is doing? Or not doing? If you're working for any serious firm, you have to know what's going wrong because the website is the front door, your mall presence, your receptionist, and your permanent booth in the big trade show called the Internet, all rolled into one. Even if the business is officially closed, insomniacs and people on the other side of the globe are going to come knocking.
The job of monitoring this public face is undergoing a transformation that would have been unthinkable just a few years ago. At the beginning, programmers started writing their own scripts that would ping a few pages, then send a text message if something wasn't right. After that, companies started selling monitoring tools you would install in your data center. Now a number of companies are popping up to offer monitoring as a service. These solutions sit out on the Internet, watching your site and setting off alarms if something doesn't work.
[ Also on InfoWorld: Read Paul Venezia's post on remote monitoring and control systems, "Stay connected when disaster strikes." | Stay up on the cloud with InfoWorld's Cloud Computing Report newsletter. | Get the latest practical info and news with InfoWorld's Data Center newsletter. ]
In the past, companies simply didn't outsource monitoring. Oh sure, it might be worthwhile to have some agents ping your websites from around the world if you had a global presence. Otherwise, most enterprise managers wanted everything running in their own servers in their own racks safely behind their own locked doors.
The cloud has changed this. If your website is going to be knit together from a bunch of machines that may be anywhere, there's no real reason to insist on doing all of the monitoring from the same rack. Or if your Web application is going to live in the cloud, reacting to the traffic and load by growing and shrinking, it makes sense to consume monitoring services that can also grow and shrink with demand.
It's worth noting that watching your website to make sure it's up can also be accomplished with more traditional tools like New Relic and AppDynamics, which offer sophisticated agents that watch the entire Java stack. If you want to know which method is misfiring, the agents are tracking everything that's happening in the Java VM. Programmers often find it much easier to fix the problem when they can see what's going on in excruciating detail. But while that data is useful for the programmers, the rest of us often don't need it.
To get a feel for how the market is changing, I dug into four new monitoring packages -- Boundary, Circonus, and Librato's Metrics and Silverline -- that are delivered as a service. I set up a cloud of servers, installed some agents, and watched the pretty colored lines go up and down. While they all share a cloud-based delivery model, each takes a different approach to detecting problems.
Boundary doesn't drill down into your application, but watches the network traffic as it flows in and out of your servers. Just keeping track of these comings and goings can be surprisingly useful. A big, fat zero for some flows is a red flag that something is wrong.
Circonus offers a surprisingly large collection of "checks" that can probe your system and track its performance. You can deploy all sorts of metrics and standards to watch all of the most important parts of a Web app's performance.
Librato provides a highly automated pair of tools. Metrics sucks up data from your server stack and turns it into graphics. Silverline acts as a powerful sandbox that limits the resources your application stack can consume. Use Metrics to track your resource consumption and Silverline to put limits on the servers that try to gobble up too much.
Some of these options require embedding local agents, while others watch from afar. Once their collectors gather the information, they send it to a central server in the cloud where it's dressed up and presented on a Web-based dashboard. You get the information you need to track what your website is doing, but none of the hassles of running your own monitoring server.
Boundary: Follow the network flows Boundary is a monitoring tool based on the idea that sometimes you don't need to capture so much information. In the United States, for instance, the police need to convince a judge to issue a warrant before they can eavesdrop on a phone conversation, but they don't need one to get a list of the phone numbers a person called. While the list of numbers sounds boring, it can be surprisingly revealing despite not holding any information about the content of the calls. The list is simple, efficient, and free of all the small talk and gossip that fill up real phone coversations.
Boundary takes a similar approach to monitoring your server stack. While Circonus and Librato Silverline will poke and prod, looking for information about the OS and its applications, Boundary installs itself in the networking corner of the OS and quietly keeps track of all of the traffic between all of the machines. These flows are distilled into a concise illustration that explains where the data is going and who is doing the talking.
It's not exactly true that Boundary works like the "pen recorders," the name the police gave to the machines they used to capture the numbers dialed on phones. Boundary will also dig into the header of each packet and classify it according to the type of traffic. It's not doing deep packet inspection, so it won't tell you what lies beyond the header, but it can tell you something about the packet type and size.
This is surprisingly useful. One of Boundary's smarter features lets you group your machines into tiers. Boundary automatically starts watching the traffic between the groups, which lets you isolate the performance. Is there an abnormally large amount of traffic going to the database? Perhaps some of the queries are misfiring.
You end up with a general feel for the Web stack -- a pulse, you might call it. The graphs of traffic start falling into a pattern that indicates everything is healthy. Boundary also lets you write rules that will trigger alerts, but I'm not so sure this is an easy thing to do. It's much easier to write a simple rule for trouble when you are looking at the content of the packets and noticing that a database query is returning no lines when it should be finding a pile of information. It's bound to be harder when you're working with just some basic information about the flow.
There are other advantages to just skimming. Packet headers don't hold personal information about you or your customers, so it's safer to export to a distant Web service owned by a third party.
I found the Boundary tool to be fascinating. Watching the flow of data move around the network is a bit counterintuitive because our instincts lead us to ask for as much data as possible. But the sizes of data moving around any significant network is so large that we almost need to start thinking at such a high level.
The Boundary service is priced by the hour. However, the company isn't sharing the rates publicly yet.
Boundary's input and output graphs show which protocols are used to carry the data in the cloud.
Circonus: Monitoring from the ground upWhereas Boundary paints the big picture with one brush, Circonus has a different brush for every color in the rainbow. The core of the system is your collection of "checks"-- that is, a query executed to test the system. The simplest may be loading a URL and tracking how long it takes for the data to arrive. The most complex may be elaborate queries on the database to track back-end performance.
There are a surprisingly large number of checks -- more than several dozen. Some dig into remote processes like databases, and others gather information from third-party sources like Google Analytics. There are even some generic ones that fetch JSON blobs from URLs. The tools typically provide a wizardlike progression of forms that culminate in testing the check before saving it.
Circonus takes the data from these checks and turns them into metrics. My request for a URL, for instance, generated six values ranging from the time it takes for the first byte to arrive to the total number of bytes delivered. The price of your Circonus package will depend upon the number of metrics you store. Some checks deliver more metrics, and you can configure how they store the data.
Once the data is in Circonus, you can start graphing the information and putting it on dashboards. Circonus has a great collection of graphing widgets, but they range in complexity and capability. The basic gauge, for instance, wanted me to specify a minimum and maximum value, a task it should do on its own. Ideally, it should be easy to configure the meter to start flashing when a value moves out of a historical range.
The map widget, on the other hand, will plot out locations on a global map. But it's not just any view of the world. You can choose a map based on the old Mercator projection that has distorted our view of the world since grade school, or you can pick among seven other options, including the Gall-Peters, an "equal area" view that will save you from overestimating the amount of land in Greenland. If only my website had readers in Greenland.
Circonus pricing is tiered. The lowest level, called Copper, covers two servers for $50 per month. The price per server falls as you add servers, running up to $750 per month to watch 50 servers. Each package includes a set number of metrics.
Circonus will install its monitoring solution inside your data center for a negotiated price, an option that might be more attractive than a hosted service to large enterprises and governments.
Circonus maintains a collection of agents around the world that can be summoned to ping your site.
Librato: Simple metrics or resource control The Librato world comes in two parts. The first part, called Metrics, is an online database for whatever you measure. It's connected to graphing and alerting engines. The second, Silverline, offers the instrumentation tools that dig up the numbers you might want to watch on your server. You can use one without the other or use them both, but they're relatively independent tools that don't use each other. If you log into one, you won't automatically be logged into the second.
The Metrics system is a Web-based graphing package for any kind of data you want to collect. You gather together your information in a JSON package and POST it to the Metrics URL. The next thing you know, your graphs are changing as the new data arrives.
Metrics is a flexible system because most software has the ability to send off an HTTP message today. JSON-based logging is becoming common, even for C-level programming. It took me only a few seconds to start sending messages via the Curl command line. The biggest glitch seemed to be caused by the fact that I wasn't including decimal points in my integers.
The Metrics look is quite pretty and very simple by design. There are few buttons and few options, echoing the design style of companies like 37signals. Much of the complexity is hidden beneath the surface, and the Metrics folks clearly want to build something that "just works." If you POST a new measurement in your JSON package, Metrics lists it with the others. There's no need for updating schema or adding another line to a configuration file -- whew! If the data shows up, Metrics assumes you want to track it.
This approach is ideal if all you want to do is keep track of basic statistics and watch them scroll across your screen. You get two types of data: gauges and counters. The graphs have a few basic date ranges. This confused me at first because I was starting with sample scripts with old dates that weren't appearing. While Metrics is smart enough to recognize new data fields, it loses data with timestamps outside of the current window.
In addition to JSON/Curl and a number of open source collection agents, Librato points to several more libraries and packages for assisting in reporting. The collection is heavily skewed toward modern, trendy stacks such as Node.js, Clojure, and Erlang. But in a tip of the hat to those who graduated more than 10 minutes ago, Python and PHP are also supported. Java? You can be the first to contribute this code, but there's some for handling JMX.
The Silverline half of Librato is aimed at managing servers at the OS level. It burrows into the OS and sets itself up to watch over the applications. While other tools focus on the system as a whole, Silverline wants to drill down into the application.
The simplest thing that Silverline does is report information about resources to their central server. It tracks basic numbers such as CPU load, network bandwidth, and disk bandwidth to help you get a handle on what the machines in your cluster are doing.
This is just the beginning -- Silverline also sneaks its way into the scheduling algorithms controlling the server. Once you put an application in a Silverline "container," Silverline can limit the resources it can consume, a process Librato calls "workload virtualization." Background tasks such as disk defragmentation can be given minimal headroom. High-computation tasks like converting photos or videos can also be limited so that a burst of uploads won't swamp your machines. You can control these spikes more easily at the OS level.
Librato Silverline monitors the essential metrics of the system. This graph shows the rise in network write.
This is, I think, a great idea. For a number of years, people have punted on all of the trouble with the OS configuration, instead setting up another virtual server running another copy of the OS. Every application was pushed off into its own virtual server even if this isolation wasn't required. This is largely a waste of OS copies, and it creates as many problems as it solves. Perhaps we can go back to one layer of the OS running directly on the iron without any virtualization sucking up CPU cycles.
The pricing of the Metrics service is set up like a taxi meter: You pay for every data item you store. Sending 50 measurements every minute costs $4.46 per month, for example. Librato provides an HTML5 calculator so that you can come up with your own measurements, but the rates seem to be entirely linear. Silverline is priced at $0.006 per CPU core hour. Librato's online calculator indicates that a month of virtual control over a single-core server would cost $4.38.
Deploying monitoring as a service is also quite useful for navigating corporate politics. There are no data center czars citing every possible reason against setting up the new server. You just start paying by the hour; everything else is not your responsibility.
Of course there are worries too. Data about your site and its performance will be flowing into someone else's computers, and you'll probably never know where that data may end up. The monitoring service itself may not even know. This may be problematic if some of your customer data ends up in the log files being monitored. In most cases, the tools only capture information about speed and latency, but in other cases they even snarf SQL queries and the responses. The latest craze of embedding more and more parameters in URLs means that even lowly log files can hold sensitive info. These dangers can usually be avoided with care, but the potential for trouble remains.
Aside from these issues, the cloud-based monitoring tools are quite similar to the monitoring tools that we used to buy as software packages. They watch the network, ping the servers, download Web pages, and track the performance of URLs. If something seems wrong, they start sending emails or text messages. That's about it, but it may be all you need at 11 p.m.
This article, "Review: 3 Web stack monitors in the cloud," was originally published at InfoWorld.com. Follow the latest developments in applications and Web application development at InfoWorld.com. Get a digest of the key stories each day in the InfoWorld Daily newsletter. For the latest business technology news, follow InfoWorld on Twitter.
Read more about applications in InfoWorld's Applications Channel.
Join the CIO Australia group on LinkedIn. The group is open to CIOs, IT Directors, COOs, CTOs and senior IT managers.