How to salvage a (nearly) hopeless software project

How to salvage a (nearly) hopeless software project

Faulty foundations, AWOL contractors, bugs piling up -- here's what to do before taking a sledgehammer to a faltering pile of code

Like a carpenter called in to salvage a home repair gone wrong, developers who've been around the block are used to seeing a handful of the same problems. The code gets creaky; bug reports file at an ever-increasing clip; the time spent maintaining the project surpasses any ability to add features to it. At a certain point, the question arises: Can you rehab the code, or should you scrap it and rebuild from the ground up?

We talked with seasoned pros for insights on how they have addressed the most common types of software projects on the brink: Projects with runaway costs, poorly architected projects, ones that simply no longer work.

This is our wrecking crew:

  • Daniel Jacobson, vice president of edge engineering at Netflix, where he was brought on to lead the API team
  • Stefan Estrada, an engineering manager at Verizon who oversees a team of five developers working on the OnCue streaming TV service
  • Dave Sweeton, chief technologist of Stout Systems, a consultancy often called on to take over or repair projects gone awry at Fortune 500 firms, among others

Now let's take a look at our fixer-uppers.

The patch-up job

Not every software project kicks off with a bulletproof plan and a crew of fabled "10x developers" to execute it. Instead, you often get a kind of "homeowner's special" -- a pile of code cobbled together to solve an internal need using the skills of those on hand. In these cases, success can be a curse. Over time, this DIY effort, essential now for business purposes, becomes a tangled mess of bolt-on code and superfluous frameworks, and the only way forward is to bring in an experienced crew and hope for the best.

"There could be many culprits in a scenario like this," says Sweeton, of Stout Systems. "A common one is some fundamental flaw at the architectural level, so the framework of the house doesn't support what you're trying to build. That can be too little framework or too much framework -- too much is actually more common than you'd think."

If you're lucky and the bones are good, you may not have to tear down the whole endeavor and start anew.

"This is where the buzzword refactoring comes into play," Sweeton says, adding that the first step in refactoring code is to return to its requirements.

"I would understand the requirements, then review the code to see if it's meeting those requirements -- and unstated ones like quality and maintainability," Sweeton says. "If there are architectural problems, then sort out what it really should be and figure out the way to get the right architecture in place."

This doesn't always mean rip and replace. As with a house, when you run across problems with code, there are ways to improve the overall situation instead of merely fixing the previous owners' mistakes. Make incremental fixes and every time the code needs work, add an enhancement -- right the wrongs, Sweeton says. And refactor, refactor, refactor.

If it's a project that's only used internally instead of a consumer product, you have a better chance of sticking with it rather than scrapping it.

"But if it's being used by millions of people," Verizon's Estrada says, "then little changes can make a big difference in terms of cost. If it's really creating a bad user experience or costing a lot of money, then there's probably a better way. It's time to start over."

The accidental duplex

This common software project nightmare highlights the importance of strong vision and leadership. Stakeholders are assembled, and the squabbling begins. Users, managers, and engineers can't agree on how to go forward, so the approach devolves into trying to please everyone, with the project manager gathering requirements from everyone in the building. By the time the project gets handed off to engineering the big picture is lost.

Dan Jacobson, VP of edge engineering, Netflix"When prioritizing, you're basically saying: 'Which things are we going to do today or this week?' The right question is: 'Do we care about this at all?' And if we do, then make it happen -- and if we don't, then make it gone."

--Dan Jacobson, VP of edge engineering, Netflix

"This is a common problem," Estrada says. "What ends up happening is that you get requirements from almost everyone, then nobody knows what you're trying to accomplish or what should be done first. Of course it cascades into more problems and more problems."

In other instances, executives and project managers can get caught up in note-taking -- priorities and process -- rather than finding the best overall solution.

In any case, soon enough you have the equivalent of three kitchens purpose-built for prepping specific meals, two bathrooms right next to another, and a garage no one can get into. You set out in pursuit of a unified solution and ended up with myriad pet projects housed under the same roof. The result satisfies nobody.

Before you can even think of what to do with the code on this kind of rehab job, you have to go back to square one -- this time, say our pros, with vision. There has to be one blueprint that everyone agrees is the best plan.

"You have to have a foundation in terms of the architecture to make sure it accomplishes what you are trying to do," Estrada says. "You have to get input from all the stakeholders, but someone from management on down has to drive that vision, or you get really bad architecture in terms of the software design."

With the new unified plan in place, you can return to the code with purpose and better prioritize how to fix the mess.

"When prioritizing, you're basically saying: 'Which things are we going to do today or this week?" says Netflix's Jacobson. "The right question is: 'Do we care about this at all?' And if we do, then make it happen -- and if we don't, then make it gone."

The castaway

It's not uncommon for a software project to simply be left unfinished, with the developer AWOL. Perhaps the core has been sketched out, a few features have been implemented, but the contractor, dev shop, or employee who launched the project simply left one day, never to return. The code works, barely or not at all. Or perhaps the project was completed but lacked a maintenance plan and is now falling into disrepair. What next?

First, the bad news: If you did the hiring, there's not much point in finger-pointing.

"If one person created the whole thing, the problem may be the manager," Jacobson says. "And if you're the manager, you've done a poor job."

All of our pros offer the same tip: Avoid it in the first place. One way to plan for this contingency is to cross-train the team to ensure more than one person understands the source code.

"Let's say the team working the project has six developers," Estrada says. "One is working on some of the functionality for online payments, another person is doing personal account management or presenting data to the user. You switch them over. Rather than assign the person who built the original functionality, get a different person to fix problems or create new features. They get exposure to that area and the source code. If the person working on account management gets lost in some foreign trip and never comes back, somebody else can still pick it up without much struggle."

If consultants are creating the code, Sweeton says it's still important to get multiple developers in the loop: "Engaging a lone consultant has inherent risk if you don't have direct-hire developers on your staff. Make a direct hire or engage a consulting company that approaches the development project with resource redundancy."

The money pit

The bills for this makeover are so extreme it would make a spouse -- or accounts payable -- flip out. The architecture had problems from the start, and predictably, boatloads of cash didn't solve them. How do you stop throwing good money after bad?

Sometimes the instinct is to prioritize and dig your way out. But then you end up with a long list where everything -- and nothing -- is urgent. Time goes by, more issues are reported, and the result is a backlog that would take years to resolve.

"It becomes a game of tactical cat and mouse," says Jacobson. "You're not attacking anything with real focus."

Stefan Estrada, engineering manager, Verizon "Here's the litmus test: If you're spending more time on bugs than you are on functionality, then you need to rethink the implementation and start over."

--Stefan Estrada, engineering manager, Verizon

Decide what's important, say our pros, and don't get lost in gradients of importance.

Jacobson explains: "I was having a conversation with one of my peers who was saying, 'We need to add this to the priority list because it keeps getting pushed down and not executed.' And I said, 'It's clearly not important; let's not add it to the priority list. If it were important, we'd be doing it. Or if it is important, then we have to stop doing everything else.'"

In some cases, managers see an outside firm as the solution to fix the problem, but consultants can compound the errors if the software's purpose isn't clearly articulated by those doing the hiring. The company keeps throwing money at the problem -- when the solution isn't about cost.

"The consultants will do what they're hired for," says Estrada, "but because there's not a clear vision as to what they're supposed to be doing -- say, they're adding more features, but that might not be the end goal of what the company is trying to achieve. You need a good set of program managers who can communicate between all the groups to make sure the right functionality is being created."

The total teardown

Sometimes a rehab job isn't a rehab job at all. The foundation has obvious cracks, and everybody can see them. The architecture is so poorly designed it needs to be scrapped and started anew. But how do you know when it's time to toss the entire project? Why not stick with code you have and exterminate the bugs?

"Here's the litmus test: If you're spending more time on bugs than you are on functionality, then you need to rethink the implementation and start over," says Verizon's Estrada. "If it's a complete patchwork that needs constant maintenance, then you're wasting a lot of time."

It's a daunting task to throw out the old and start anew. Not everybody is going to be happy about it. If a demolition is necessary, our pros say communicating that effectively to the various workgroups is the first step.

"Within the first couple of months of being at Netflix," Jacobson recalls, "I basically said, 'This application that all of you are using, we're going to throw that away. We're going to go with a new approach.'"

But ditching the system immediately isn't usually an option. In Jacobson's project, too many legacy devices counted on the existing platform and it would take time to transition to the new tools.

"We said we're going to invest in a fundamentally different model," Jacobson says, "a more optimized model that is not one-size-fits-all -- and it's going to be at some cost to my teams to execute and to the other teams who have to consume from it rather than from the previous one. We need to take a chance here and have confidence in ourselves and go for it. That's what we did. And it worked."

The creaking new construction

New construction always leads to some settling. The pipes rattle. The beams creak. But when it keeps happening, you may start to wonder: Is this normal, or does it need repair? A wise product manager once said about newly launched code: "All new babies cry." She was letting users know the system wasn't broken -- it would eventually settle down.

"Parents learn the difference between cries," says Sweeton. "Some are best left alone. Others are the genuine article distress call. Run -- don't walk -- to handle. The same is true in software development."

Dave Sweeton, chief technologist, Stout Systems"Parents learn the difference between cries. Some are best left alone. Others are the genuine article distress call. Run -- don't walk -- to handle. The same is true in software development."

--Dave Sweeton, chief technologist, Stout Systems

Determine the minimum viable product, Sweeton says, and focus on that. While you're nailing it down, figure out what can wait. "Maintain a list of all other items that will need attention, but defer them until after the first release."

You have to be clear-eyed, says Jacobson. Look at the reality of the situation and assemble the evidence. Then make your recommendation. Here, metrics for assessing the health of the project -- and the cost of healing it -- is key.

"It's not just that l have this great idea, or I didn't build it, so I want to build my own thing. Say: 'We've talked to this number of developers, we're churning this number of development hours fighting against this system, we're exposing greater risk to our availability because we're handling this number of requests. We're actually producing complexity and bugginess,'" Jacobson says.

If the current model is faltering under its own weight, he adds, it's time to create a stronger and healthier option for the future. "Whatever those metrics are -- those data points that are exposing the weakness of the current system -- they're fuel to say, 'This needs to go away.'"

The retrofit

Increasingly common is the case in which the foundation is solid, but the techniques that produced it are dated. The technology used may have been the right choice at one time but no longer. Maybe too many dependencies are no longer supported. Or it relies on old tech that few people know how to support anymore. Can you prop up this project and keep it going, or is it time to call in the wrecking ball?

Estrada recalls working on an app where there was a push to use HTML5, which seemed like the latest and best choice. But as the project went on inefficiencies became apparent -- it wasn't ready for prime time yet.

"We redid the project in C++, and it takes a lot more effort because C++ is a much lower level," Estrada said. "But we achieved better performance. If you create your platform correctly, you develop a process that makes it easy to add new features."

Consultants might be a good fit here, says Estrada, where you have an isolated technology that needs updating. But there's a caveat: He argues that bringing in consultants is typically a bad choice to fix an internal tool in need of new features. If the tool is tied into other systems, consultants won't have the big picture or a long-term investment in the outcome.

"Business process should drive the software," says Sweeton. "Sort out the right business process, then adapt the code to it."

If Sweeton's sentiments suggest a theme, there's good reason for it. Runaway projects tend to be missing one of three elements: a clear vision from management, project managers who communicate effectively, and strong tech leadership. Projects that don't have all three? They tend to need renovation on more than one level.

Join the CIO Australia group on LinkedIn. The group is open to CIOs, IT Directors, COOs, CTOs and senior IT managers.

Join the CIO newsletter!

Error: Please check your email address.

Tags OnCueStout Systemssoftwarenetflix

More about NetflixVerizon

Show Comments

Market Place