It's the last phone call you want to get: A disgruntled ex-employee was caught sneaking out of your company's network operations centre. No one knows exactly what damage was done, but now the network is running haywire. Traffic is twice what it should be. The servers are so overloaded with bogus messages that some users are completely shut out of the network. If this system isn't fixed fast, business could grind to a costly halt. One of your staffers tried to find the bug, but-talk about bad timing-his pregnant wife just went into labour. Now what?Not a hoax, not a dream, not an imaginary tale. The above scenario is a simulated disaster -- a practical exam, really -- thrust upon IS professionals at the end of a recent "Trouble-shooting TCP/IP Networks" class conducted by the American Research Group Inc. (ARG), a Cary, North Carolina-based IT training vendor. The problem is real-a hacker has unleashed a nasty bug-but the drama is played out not on a real company's information systems, but on a faux network rigged together in a hotel conference room in Schaumburg, Illinois. The trouble-shooters are 18 network professionals completing the two-day course, and the hacker-no ex-IS employee gone postal-is instructor Rick Gallaher, an engineer and educator with 26 years of networking and telecommunications experience. "By the way," Gallaher warns as he sets the scene, "no class has ever found this bug."The clock starts; the trouble-shooters-in-training have one hour to solve the puzzle. For the network-savvy students who pride themselves on their innate abilities to sniff out problems, the exercise is a fun challenge. But for those rookies tapping into new skills, this is a real test. Gallaher, an imposing man with a soft demeanour, sits behind his Looney Toons tie and smiles; he's been here before. He knows that no matter what size company these students come from, no matter how technically proficient they are, they need more than just tools and methodologies to shoot this trouble. They need each other. Whether they reach this realisation before time expires -- that's the true test.
This isn't a class for Cos, but it's one that Cos should be aware of for their IT staffs. When the "Now what?" comes, the answer should be to send an IT professional who understands the ins and out -- technical and emotional -- of trouble-shooting. More than just another expensive training session, the class is remarkable for the way the skills are taught. Students don't just sit through two days of lectures and readings, take a certification exam and then go back to work and promptly forget half of what they learned. Here the students still earn their valued certificates, but the lectures are minimal and the reading is supplementary. The real teaching is done via a live intranet network set up in the classroom to give students the chance to practice what's been preached. It's on this simulated intranet-complete with two servers (one running Windows NT 4.0, the other Unix), a Cisco router, up to 7 hubs and 12 PCs-that students get hands-on experience with the tools that help prevent, detect and eradicate network errors. By participating in a series of network disaster drills, students develop practical skills that they retain and can apply back at their jobs. And if they fail to solve the problems -- or if they make them worse-well, thank God it's a fake network. You wouldn't want network professionals trying these trouble-shooting tricks for the first time at home.
Know this, too: This is training that network professionals want. Networking skills are among the toughest to find in the competitive IS marketplace, and knowledge of the TCP/IP communications protocol, the lingua franca of the Internet, may be the hottest of the hot skills. Since April 1997, ARG has offered its trouble-shooting course anywhere from two to six times a month to groups of 20 or more students throughout the United States. (Earlier this year the course was introduced to the Pacific Rim countries and Europe.) Of the 18 individuals who signed up for this session, easily a third registered of their own volition just to improve their personal skills cachet. The unspoken message here is that if Cos won't provide this training, then employees will go out and get it themselves -- or go to work for a company that will give them access to these skills.
These students represent a range of enterprises, and their reasons for taking the class are just as diverse. Christine Teno, for example, is a network technical analyst at Grubb & Ellis Co., a global real estate firm based in Northbrook, Ill. Experienced with other protocols, she simply needs more hands-on experience with TCP/IP. David Spivey, the director of network services at On-Line Financial Services Inc. in Oak Brook, Illinois, worries that his company's network trouble-shooting is done mostly on the fly; he needs to learn more about tools and methodologies. Evan Bounous, a network specialist from U.S. Central Credit Union in Kansas City, Kansas, doesn't work with TCP/IP at all currently, but he wants the expertise.
And although each of these students has to take off at least two days from work to attend the class, clearly this is no vacation. There is a textbook to read and lectures to hear, but the bulk of the session will be devoted to dealing with simulated disasters. "Breaks are contingent upon how fast you trouble-shoot the network," Gallaher says. " [The time] you leave at night is contingent upon how fast you trouble-shoot the network. Sound like you're back at work? That's the idea."The Tools and TricksUnderlying the trouble-shooting curricula are some basic concepts that apply to networking or crashing applications or anything else: Pretend it's a fire drill and stay calm. In this instance, Gallaher describes two types of trouble-shooters. First is the superhero, who rushes into danger and saves innocent users from the ravages of a downed network. And then there's...hmm, how does one say this nicely? You know how in parades, when the horses and elephants march by, they're followed by a grim-faced man with a shovel? This guy represents the other type of network trouble-shooter. "If you control your network tightly, you're the superhero," Gallaher says. "But if you let people do what they want on the network and then just clean up after them, you're the guy behind the horses and elephants." The difference between the two archetypes is methodology: The superhero has one, while the shovel-wielder wings it. The superhero methodology boils down to seven basic steps that should be taped on the wall of every IS shop:-- Define the problem.
Don't leap to conclusions -- or rely solely on a user's observations-about a perceived network problem. Test for yourself.
-- Gather facts.
Did the system ever work? What changed? Have any new devices or services been added to the network? These simple questions can help unravel complex answers.
-- Consider the possibilities.
Think what could be wrong before you start pulling plugs and switching off servers. Use the process of elimination to sort through potential problems.
-- Create an action plan.
This is especially key in a large organisation where multiple people may be trouble-shooting a single network. Draft a detailed chart of procedures and assign individuals to specific tasks.
-- Implement the plan.
After each step of the plan is complete, test to see if the original problem has been solved. This process helps isolate areas affected by the problem and shows you when you're honing in on a solution.
-- Observe and document results.
Even if you haven't solved the original problem, by documenting the results of your plan you can identify and compile effective and ineffective trouble-shooting techniques.
If yes, rejoice; if not, start over.
Before the tyro trouble-shooters are turned loose, Gallaher introduces them to the fundamentals of the trade. The tools are basic network management systems-LAN meters, protocol analysers and software packages (for example, Cinco Network Inc.'s NetXray) that allow analysts to measure network activity.
Corollary number one to the rules of network trouble-shooting is also fundamental: Don't get hung up on the tools. No network management device is worth its salt if it's not used within a strict trouble-shooting methodology.
Before he unleashes "The Big Bug," Gallaher tests the students with some simpler exercises. The first scenario: Users complain that they cannot communicate with anyone, anywhere. What's wrong?The students pair off around the PCs, boot up NetXray, test their individual cables with LAN meters...and after half an hour have nothing to show for their efforts. The glitch was simple enough-Gallaher disconnected a wire from the router and switched one at the hub-but none of the pairs followed item number four from the trouble-shooting methodology; each tried to solve the problem solo. Had teams shared data with others working off different hubs, they would have isolated the real trouble in minutes.
Sobered, the students resolve to do better with the second exercise. The story line: Movers were in over the weekend, and they've completely reconfigured the network. Some users can communicate with others; some can't. The goal: Restore connectivity before Monday's productivity plummets.
This time, directed by Gallaher, three groups of six people form, assign roles and start tackling the problem by the book. Before long, after checking cable connections and network access at each workstation, the groups compare notes and easily isolate the problem: The movers (a.k.a. Gallaher) switched connections between workstations and their host hubs and servers. The solution is as simple as reconnecting a few wires.
Flush with this success, the students eagerly call for The Big Bug.
And they fail. Miserably.
Like every trouble-shooting class before them, these people ply the right tools and tricks-they even work well (mostly) in their six-person teams. But this is a larger network problem that requires teamwork from the entire class (in the real world, the teamwork of all IS might be required). Rather than pool common data and observations among the other teams, each group tries to be the superteam. Spivey and Bounous work together on one team that tries to find the source of the network traffic surge, but simultaneously another group is busy unplugging workstations all along the network. Teno's squad splits into two camps, each seemingly determined to manage the trouble-shooting process.
Neither succeeds. Ultimately, rather than flying high as superheroes, each team ends up knee-deep in the stuff the parade guy shovels.
After an hour, Gallaher calls the exercise to a halt. Then he unmasks The Big Bug, which turns out to be a veritable ant: The hacker had switched cables between the two servers and used NetXray to generate a continuous spray of old network traffic. Period. Had the three teams coordinated their individual tasks, they would have quickly identified the duplicate traffic and isolated the culprit machine.
Yet, as Gallaher points out, there was no central control because that's the way life is in the workplace. Network disasters don't come packaged with coaches to prod trouble-shooters when they stray from their methodology. If there is any one lesson to take away from this class, it's this: Theories and tools are fine, but real trouble-shooting requires fidelity to process and to leadership.
And if the students were grateful for any one thing, it's that they learned this lesson in a mock disaster, not in a real crisis that could have cost them far more than hurt pride.
Within a month of taking the class, the freshly certified trouble-shooters say they have already applied their new skills. Teno not only has the TCP/IP knowledge she sought, but she also finds herself sharing the knowledge with her organisations other analysts, who form the first line of response to network emergencies. "I used to jump right in and try a variety of trouble-shooting [methods]," Teno says. "I now question [the analysts], asking them if they tried this or that and what were the results when they did. This teaches them to [stop and] think as well as be more proactive."Spivey also imparts the trouble-shooting methodology to his staff. "Often, problems [in our network] exist that we do not have the time to investigate because we are a 24/7 shop; we just put out the fire and continue," Spivey says. "The methods illustrated and demonstrated in class help in formulating a plan of attack in the real networking world. Also, since I am the director here, if the problem has not been solved within a certain time period, I can follow the method from class and narrow the problem down to where the engineers have not followed through. Lately, [problems] have been with the documentation portion of the method."For Bounous, the payoff is twofold: the certification, which he feels makes him more valuable in the IS field, and the network tools, which he hopes to apply to other situations. "I don't know that I'll utilise most of what I learned in the real world, at least in my current job," Bounous says. "[But] I still enjoyed the class and would go to more like it in a heartbeat."Students who took earlier courses report even greater impact back at the workplace. Joan Vehock, a network tools administrator at Lucent Technologies Inc. in St. Petersburg, Florida, took the trouble-shooting class this past spring and, at the time, thought that she'd have no quick opportunity to ply the skills. Yet almost immediately after returning to work she received some flawed network architecture specifications from a new corporate customer.
Vehock was able to spot the problem (the network addressing was botched) for the technicians who ultimately fixed it. Her efforts saved time and money for both Lucent and the customer. "I would not have been able to pick out that problem if I'd not gone through the class," she says. Since then, Lucent has designated this trouble-shooting class as its training benchmark within Vehock's organisation, and the TCP/IP analyst certification is being used as a selling point with prospective customers.
Vehock's and the other students' experiences form a good testimonial for simulated disaster training. Some executives undoubtedly question the merit of paying real money for network professionals to learn trouble-shooting on a virtual network. But here's a question for the executives: Can you afford to have these lessons learned by trial and error on your network? And remember, your people don't get just tools and techniques for the money; they get lessons in teamwork and leadership -- qualities that befit not just today's network professionals, but tomorrow's cos. Seems like a small investment for the yield.
How Many Users Does It Take to Screw Up a Network?Get a bunch of network professionals together and what do they talk about? Users. Dumb users. At one point during the "Trouble-shooting TCP/IP Networks" class, students and instructor alike sit back to trade stories about dumb users.
Linda (not her real name), a network analyst at a major national insurance firm, tells about the company president who expects-and gets-a new PC every time his current machine crashes.
Then there's the senior executive-and more than one network professional can lay claim to one of these-who rousts network personnel out of bed at midnight because he has trouble dialling up his Internet investment sites. He thinks it's a network problem.
In his role as administrator of a hospital network, class instructor Rick Gallaher says he once got a call from the operating room during surgery. "The network is down!" the medical staffer said. Gallaher rushed to the OR and quickly diagnosed the problem on one of the PCs: It wasn't turned on. Gallaher laughs, and the students nod knowingly. They've been there.
Funny stories all of them, but beneath is a truism: Users can be a trouble-shooter's best friend. They rarely know what's wrong, and they hardly ever can explain why, but users do know when their computers aren't working as they should. If trouble-shooters take time to listen to users and try to understand their problems, Gallaher says, then network problems sometimes can be much easier to diagnose and fix.
And, of course, taking time to tell users of the importance of power switches and electrical plugs can't hurt either.
(Senior Writer Tom Field can be reached at firstname.lastname@example.org.)
Join the CIO Australia group on LinkedIn. The group is open to CIOs, IT Directors, COOs, CTOs and senior IT managers.