Inside Cisco's private Cloud
- 05 March, 2014 19:14
Do private Clouds work? You bet they do, says Cisco, which has more than two years of experience under its belt with its Cisco IT Elastic Infrastructure Services (CITEIS) Cloud.
Today, 90 per cent of the company's 30,000 operating system instances are virtualised, including all 15,000 virtual machines in CITEIS, which is supported by Cisco Unified Computing System blade servers tied together with Nexus switches.
Cisco employees can dial up server resources in minutes, and even line up virtual datacentres at the click of a button, says John Manville, senior vice-president, Information Technology, Global Data Centers. Manville is responsible for all of the company's infrastructure, which includes compute, storage and network platforms as well as databases and middleware.
With the core now in place, the company is looking forward to rolling out its Application Centric Infrastructure, the Software Defined Networking (SDN) vision the company announced in November last year. Manville says some ACI components are already in beta and ACI should be fully deployed by calendar year 2017 (more on the expected benefits below).
One of the first datacentres to get ACI gear is the company's newest, a 160,000-square foot building in Allen, Texas, with 35,000 square feet of "raised floor" that was completed in 2011 (see our in-depth tour, in pictures, and our recent update). This datacentre was built from the ground up to use UCS and is the crown jewel in Cisco's Global Data Center Strategy, a multifaceted, multi-year plan to consolidate and modernize the company's datacentres to ensure Cisco had the capacity and resiliency needed to support the business.
"The first step on our journey into the Cloud was to get to an x86 architecture with UCS and Nexus where we could actually begin virtualizing in a large way," says James Cribari, Manager, Information Technology Global Infrastructure Services. "And then the next thing was to build intelligent automation to take it to the next level. Not just a Web portal, but a portal tied to an electronic service catalog with electronic orchestration built into the provisioning tools. Then the third step is to logically segment the work to make sure customers don't get their peanut butter in their jelly. You have to make sure you isolate and can manage that environment."
When the Global Data Center strategy was rolled out in 2008 Cisco had 51 datacentres, some for engineering and others for production, and the goal was to optimize the entire footprint to become more compact and dense, while balancing where critical workloads are assigned. "We're down to 36 today, and by the end of 2014 we hope to be down to 26," Cribari says.
But that number will always fluctuate as the company gains resources through acquisitions, Manville says. "Although we actively consolidate even the engineering data centers, we can't always do that because of latency considerations, the tools they're using, etc."
On the production side of the house Cisco today has two datacentres in the Dallas area, one in Amsterdam and one in Singapore. And it has still others that support both development and production work in: Raleigh, N.C.; San Jose, Mountain View; St Leonards, Australia; and Bangalore.
All of the company's production and backup environments are on UCS today, Manville says, and all of the virtual machines that support the business are being managed in the company's private cloud. Besides the flexibility achieved, the shift has reduced costs dramatically, he says.
With CITEIS, Cribari says Cisco set out to keep virtual machine subscription models simple: Internal IT customers coming in through a Web portal can either use CITEIS Express to acquire virtual machines provisioned for 30 day lease, or sign up for Virtual Data Centers (VDCs) that require a quarterly commitment. VDCs come in Medium (75 VMs), Large (120 VMS), and Jumbo (360 VMS).
"We built an easy to use Web portal based on an electronic service catalog of IT offerings" that lets users easily build sophisticated environments in short order, Cribari says. "Let's say you want a virtual data center with 100 VMs. You go in and select how much storage you want and in what increments you want to grow that storage, and then you select other things like, what kind of network do you need? Do you need a DMZ or an Internet facing network, or do you just want an internal production network." Users can then layer on Platform as a Service (PaaS) options, such as an Oracle database schema or an Apache server.
Cribari says Cisco can provision a standard CITEIS Express VM which authorized users can do without approval - in less than 15 minutes (in a demonstration a VM showed up in four minutes).
"If I click to provision one VM it pulls up my details based on my login and it shows my department ID and where it's going to be built, and then I can select VMware ESX or Open Stack KVM," Cribari says. "So I pick one and the service catalog goes back to orchestrate our inclusions, and soon I'll get an email saying Your VM is provisioned. It's ready to use. Here is your VM name and your IP.' And once you click that, you're up and running."
"Pre cloud, we would have to architect the server, then design it and find data center space, and then go through procurement, installation, configuration, and then secure it and deploy it," he says. "That process lasted anywhere from six to eight weeks. Now with CITEIS we're provisioning virtual machines in minutes, because you build them before they come. You're building the infrastructure, you're setting the policy standard, and then you're provisioning applications into the environment you constructed."
Wherein lies a potential problem: Given the fluidity of the environment, how do you protect against runaway usage?
"The right sizing of components in a private Cloud is going to be a challenge for a lot of enterprises," Cribari says, "making sure you keep your finger on the pulse, knowing what's provisioned, and managing those resources."
It starts with basic controls.
"The good thing about a Web-based portal is you can manage who gets billed for that resource and even put in approvals," Cribari says. "So if Kirti orders a virtual machine and you don't want Kirti to order that machine because he doesn't have enough budget, you can, as his manager, say, No, that's not happening.' You can manage the environment so it doesn't get out of control."
With CITEIS Express, if you order a VM you are going to pay for it for 30 days, even if you only used it for a day. "When you get into Virtual Data Centers we provision for a quarter because it costs us to build and provision it," Cribari says. "So there are some guidelines and standard best practices that we're going to have to develop so our application owners understand."
To safeguard against resources getting stranded, Cisco does quarterly audits. "If someone says they need 32-gig of storage and they only use a fraction of that, we see that and take it back," Cribari says.
Anything under 30 per cent utilisation (40 per cent on production systems) is considered underutilized and is reclaimed after checking with the owner -- and put back in the resource pool. "Typically we see anywhere from 70 per cent to 100 per cent utilisation," he says. "Anything between 40 per cent to 70 per cent is an ideal state. And if anything gets to more than 70 per cent to 80 per cent, that's where we say we need more resources."
Infrastructure capacity planning in this agile, virtual environment is done the same way as in traditional data centers, Cribari says. "If the engineers see that a virtual environment is hitting a threshold -- 70% in production, 60% in non-production -- we get a team together to figure out what's the next logical upgrade? How much do we need? How do we design it and provision it before we need it?"
It just tends to be easier, says Kirti Thakkar, a Cisco Information Technology Engineer: "There's a UCS cluster here in Allen that can support 1,000-2,000 VMs, but probably right now it's running 200. So when it reaches 80 per cent we can simply add another blade. We have a roadmap that keeps track of what's coming in the pipeline, so with the architecture and roadmap showing supply and demand, we can see what's needed in the next one to two quarters."
Before Cisco started to aggressively pursue virtualization and build out its private cloud, Manville estimated the total cost of ownership of a physical server inside a data center taking into account hardware, operations costs, space, power, people, etc. was about $3,600 per quarter.
In a Q&A we did with Manville in the fall of 2010, he estimated that virtualization, combined with the move to UCS and cloud technology, would push that cost down to around "$1600 - on average - per operating system instance per quarter." He said that, if the company got a "a little bit more aggressive about virtualisation and squeezing applications down a little bit more, we think we can get the TCO down to about $US1200 per operating system instance per quarter."
The reality today? "We've blown that figure away," Manville says. The average virtual machine now costs Cisco $US300 to $US400 per quarter. "And that includes some aspects of the middleware, some aspects of databases, it includes what we would assume a middle-sized application would use in terms of storage, and obviously the network and compute layer."
These figures are based on normalized costs, Manville says, meaning he uses average prices that customers would actually see.
Layering on SDN
The next step big step forward for Cisco IT will be adding the company's recently announced Application Centric Infrastructure (ACI) technology, Cisco's take on Software Defined Networking.
Manville expects a broad set of benefits.
In terms of application service migration, for example, he says ACI is "going to allow us to move not just the virtual machine, which is relatively easy these days, but all the services around it. And that, I think, is a major step forward."
And he expects to see gains in operational excellence because of information they will be able to collect from the switches themselves. "That's going to allow us to do a lot more than we can today," Manville says. "If a port on a switch is dropping too many packets we can take it out of service automatically rather than having some CCIE hunt around looking for the degradation. So we think there's an awful lot of things we can do to be much more proactive."
But one of the main advantages will be adoption of the Application Policy Infrastructure Controller. APIC will provide a high-level abstract language for programming configurations, making it much easier, Manville says. "For example, we won't have to get into identifying specific ACLs anymore. We'll be able to speak at a much higher level, and also do that in an automated way. The ACL thing is a major issue for us and customers. We have thousands of ACL lines, and that's very difficult to manage, to make sure that ACL is doing what you expect it to do. So one of the key advantages is around configuration."
How is that achieved? Manville says, "The APIC has the construct of endpoint groups, which are like closed user groups. You set them up and then put in virtual machines, VLANs and other capabilities, and specify what traffic can go to that endpoint group and what traffic can come out. And that's done at a much higher level than writing an ACL and saying, This IP address can only speak to that IP address on this port.' And because we can now automate a lot of these things we don't have to have as much back-and-forth with application teams. In many cases the application teams can actually decide what they want from the infrastructure themselves using a GUI."
That should drive down OPEX costs, Manville says, but he is also looking for increased efficiencies: "If we can raise the abstraction layer so an application person doesn't have to know about subnets or why this IP address can't talk to that one, we should be able to significantly lower the friction, the time it takes to interact between the application and the infrastructure teams to provision and configure the infrastructure for whatever the application needs."
ACI deployment plans
Although Cisco has been testing ACI in the Allen data center, the first major implementation is going to be in an engineering data center in San Jose using the ACI switches in "standalone mode," Manville says.
"Probably around June'ish we'll have the APIC running in an engineering data center handling all the networking aspects in that data center," Manville says. "In the other data centers we're planning to implement the fabric almost standalone and then migrate applications and workloads one by one. And that's going to take a series of quarters for us to integrate with the release schedules of the applications, doing our testing of those applications, understanding the dependencies of those applications, so we can make best use of the endpoint group capability."
He anticipates ACI being deployed in most environments by calendar year 2017.
VMware pushed out?
When it comes to software controlled networks, Cisco is facing bold new competition from VMware with its NSX technology (see "SDN showdown: Examining the differences between VMware's NSX and Cisco's ACI").
VMware isn't shy about its intention of worming its way into the market by putting a network shim below virtual servers and rendering the physical network to the lowly job of creating tunnels between virtual resources (see "SDN will never happen, says VMware exec").
So, given that market threat, will Cisco migrate away from the VMware technology that underpins so much of the company's private Cloud?
"That's a great question," Manville says. "A part of CITEIS now has OpenStack as the environment and KVM as the hypervisor. At the moment we're giving CITEIS users a choice between an OpenStack or VMware environment. And there are definitely advantages to both.
"On a business level, there are parts of VMware that are aggressively competing with Cisco and we're aggressively competing with them. However, there are other parts of Cisco that have a reasonable relationship with VMware. VMware has certain advantages over OpenStack, but OpenStack is catching up pretty quickly. In a year or so it's going to be a much different landscape, and time will tell whether our users and maybe Cisco itself will choose to move more aggressively towards OpenStack."