The trouble with network monitoring is that the more you know, the more you find you need to do. Some shops will make like an ostrich and do the bare minimum so that they can plead ignorance, while other shops use only what the vendor tells them they need to use. I wanted a solution that neither tied me to a single manufacturer nor hid its head in the sand. At the same time, I demanded a tool that wouldn't blithely send alert blasts I then had to sort through, but would put those alerts into context.
It turns out the solution I was looking for is ScienceLogic's EM7. It's a carrier-class monitoring system capable of scaling to truly staggering proportions, but still appropriate (and affordable enough) for my smaller network at the University of Hawaii School of Ocean and Earth Science and Technology's (SOEST) Research Computing Facility.
[ Read about the year's best products in InfoWorld's 2013 Technology of the Year Award winners slideshow. | For the latest practical data center advice and info, check out Paul Venezia's Deep End blog and InfoWorld's Data Center newsletter. ]
Although it has taken a while for me and the IT support group to get comfortable with the ScienceLogic EM7 system, its ability to create a multitenant environment will soon allow us to hand off monitoring duties to the individual labs that run the HPC clusters instead of forcing us to add more personnel. Plus, EM7 lets us easily build monitoring support for equipment that lacks MIBs (management information bases). And like many distributed organizations, we preferred to avoid purchasing a complete setup for every location, yet still wanted the ability to monitor remote sites through firewalls and over WAN links. To this end we placed the EM7 collectors into key locations using either physical appliances or virtual machines. The collected data is forwarded to the main database en mass instead of clogging up our WAN pipes with constant SNMP traffic.
As in many shops, the workload is distributed among several staff members. The EM7 dashboard system allows us to customize status screens with the information that makes sense for each member of the support team. Thus, the systems dashboard concentrates on Windows, Linux, and Sun machines, while the networking screen focuses on the backbone and distribution switches, and everyone gets a window into overall system health. As we grow more familiar with the system, we'll look at carving off new dashboards for certain labs to provide at-a-glance views of key information on their systems, while shielding them from other labs or our main systems.
Another huge challenge for us was gaining insight into our newer virtualization and cloud environments. Although most small-shop-monitoring systems can harvest SNMP and WMI information from the servers, we needed to know about the VMware and Hyper-V plumbing. Our legacy monitoring system couldn't provide this information. The EM7 system lets us examine the performance of the virtualization hosts and the physical nodes in the context of function and role within the organization.
Built for big networksOne point I want to make very clear is that EM7 should not be compared to WhatsUp Gold or other network monitoring systems designed for the single enterprise. This is a carrier-class system that was born from engineers working with national and international carriers that needed enough flexibility to handle hundreds of entities and a similar number of connected networks. The fact that EM7 can use national weather service map overlays to put potential trouble spots into perspective gives you a good idea of the system scope the Science Logic folks are used to. Even so, keep in mind that I've been using EM7 for the past year or so in a single college on a single campus at a single university. Pricing is a function of the number of systems you're monitoring.
In other words, the EM7 pricing structure doesn't differentiate between a flat enterprise network with 1,000 devices to monitor as opposed to our network that includes dozens of labs with projects behind their own firewalls. All of this sits on a collection of NAT Class C IPv4 subnets -- some with public addressing and a smaller number making the transition to IPv6. If you have projects behind NAT firewalls, you can put in a virtual machine collector that feeds systems information to the main EM7 database for the same price as a flat network with a single collector. If you're not virtualized yet, the database and collectors are also available as a physical appliance. We make use of both physical and virtual appliances.
Initial setup is quite easy, though EM7 has its own way of doing things. At first, I was confused over where to find features and the use of a whole new terminology. Ultimately, the logic of EM7's naming conventions seems to come back to the multitenant nature of the solution.
For instance, the "registry" handles devices, device groups, networks, users, and just about anything you might typically call "assets," but with a multitenant twist. The same goes for "run book" (I might have used "action items"), which is a collection of items the system will run for notifications, scripts, and cascading actions. The run book is where the automation lives, based on scripts contributed by Science Logic and the EM7 user community. My favorite is an automation script we employ in the InteropNet NOC that uses SNMP put commands to turn on a power socket that flashes either red or blue flashing lights to indicate major or critical alarms.
The customizability of EM7 is a huge differentiator. After a year of learning the system, the team at Interop 2011 went to town with a wide variety of dashboards created both by the Science Logic team and by the InteropNet crew. One was designed to fit the massive 55-inch monitor in the NOC, and it allowed the network operations team to keep an eye on the status of various equipment groups from across the room. Because EM7 dashboards are customizable on an individual basis, we even created HTML5 dashboards so that the team could use an iPad to watch key components. (EM7 dashboards are currently a mix of HTML5 and Flash, but Science Logic is migrating more and more of the widgets to HTML5 and away from Flash.)
Another great feature is the ability to have both individual and shared dashboards based on templates that display only the information appropriate to the user account. At the SOEST Research Computing Facility, each research group can have its own set of dashboards to monitor its equipment, while all groups can have access to the school's dashboards that monitor key equipment further up in the network architecture. At Interop, we had different sets of dashboards so that the wireless, VoIP, and router folks could monitor what's important to them, while avoiding status information that might distract them from their mandate.
With a network that spans the entire continental United States, the InteropNet crew needs to know if a storm might affect our network performance. Above, a map of the Century Link Cyber Centers and our cross-country links is overlaid by a live National Oceanic and Atmospheric Administration weather feed. Below, an example of EM7 dashboard widgets devoted to different segments of the InteropNet by function.
More signal, less noiseEM7 provides similarly fine-grained control over alerting. Setting up the InteropNet involves putting together a large number of pieces in a short amount of time. To speed the process, the EM7 crew would discover assets and put them into maintenance mode in order to ignore errors as we tested parts of the network during construction. At one point, we knew we had a bunch of optical errors as the fiber crew cleaned and connected more of the show infrastructure, but we didn't want to be overwhelmed by alerts and flashing lights. EM7 allowed us to change the alerting so that small numbers of minor errors would be ignored, but increasing numbers of minor errors over a short time would automatically escalate from minor to major to critical.
The happy result: We weren't notified of optical errors until the routers started load balancing over both our major WAN links. Minor errors were ignored until they started skyrocketing due to the increase in the amount of traffic on our secondary link. This escalated alert got our attention and forced us to look for missed optical path issues. It turned out someone had plugged in a dirty fiber cable that drove some grit into the optics of a 10G interface module. The escalated alert gave us enough time to replace the damaged components before the open of the exhibit floor.
EM7 has so many features and capabilities, it's hard to do it justice. Setup is easy, and the initial learning curve isn't too painful. It scales like crazy and distributes the load across lots of collectors and database appliances, physical or virtual. You can add any number of collectors -- one for every subnet, if you want -- as part of the license. Best of all, EM7 can be extended through templates and scripting to do just about anything you could possibly want from a management system, even if you don't have a full MIB for the device. In nearly two years of running this system, I haven't found anything it can't manage.
We chose EM7 at the University of Hawaii SOEST because we wanted each research lab to be able to handle its own monitoring and trouble ticketing, instead of managing it at the college or campus level. But along with the multitenant management and billing capabilities, as well as the staggering granularity of delegation, Science Logic EM7 has everything needed to manage your entire enterprise from soup to nuts. Its ability to correlate events across a huge variety of platforms, provide context-sensitive views, automatically generate trouble tickets, and flexibly scale without breaking the bank is simply a game changer.
- Multitenant management and billing
- Fine-grained delegated monitoring
- Context-sensitive, user-specific dashboards
- SNMP monitoring even without a fancy MIB
- Visibility into underlying virtualization platforms (VMware vSphere or Microsoft Hyper-V)
- VMotion-sensitive VMware monitoring
- Service-based performance monitoring (e.g. Exchange email delivery times)
- Vblock and FlexPod support
- Monitors multivendor videoconferencing networks (VTC- and VoIP-specific metrics)
- Installation under 15 minutes, from raw metal to running
- Updates have been flawless and hitless, no restarting required
- Dashboards currently based on Adobe Flash, but HTML5 coming
- Flow support is optional and part of an OEM agreement
- More expensive than WhatsUp Gold
This story, "Review: The best network monitoring system on earth," was originally published at InfoWorld.com. Follow the latest developments in data center and networking at InfoWorld.com. For the latest business technology news, follow InfoWorld.com on Twitter.
Read more about data center in InfoWorld's Data Center Channel.
Join the CIO Australia group on LinkedIn. The group is open to CIOs, IT Directors, COOs, CTOs and senior IT managers.