Critical.
Authoritative.
Strategic.
Subscribe to CIO Magazine »

Cloud architecture: Questions to ask for reliability

How do know your cloud provider has the right web-services architecture? Gregory Machler offers these questions to ask

I've been an architect on some complex applications and I have a significant concern about assessing architectural risk for public/private cloud applications. Traditional risk assessments focus on external/internal access to confidential information like social security numbers, credit card number, and for banks PINs for the ATMs. Access controls and network protection are high priorities because they suppress the risk.

I'm interested in something a little different -- I'll call it architectural reliability. The desire is to avoid single points of failure for critical applications so that catastrophic errors don't occur; those lead to huge financial losses and a diminished corporate brand. So, where would I start to shore up the architecture? Here are some storage and networking diagnostic questions I would ask for the top-10 applications within a corporation. Note that some questions that need to be asked are pertinent to all applications and some just within a given domain. I'm going to focus on just the storage and networking product domains that support the top-10 applications.

[See also: Five cloud security trends experts see for 2011]

Storage Architecture -- All Applications

Is only one SAN vendor used for storage of all of the applications?

How is data de-duplication addressed?

Is only one SAN switch vendor used for all of the applications?

Is only one data replication vendor used?

Is only one encryption vendor used to encrypt data for all of the applications?

Which encryption algorithm is used for a given encryption tool?

Is only one PKI vendor used to manage certificates?

Where are the certificates related to data at rest encryption stored?

Storage Architecture -- Each Application

What storage subsystem does the application run on?

Which other applications run on the same subsystem?

Is the data on the storage subsystem replicated elsewhere or is this the only copy?

How is the need for more data storage addressed for a given application?

What SAN switch is used for traffic to/from the storage subsystem?

What network components are used to replicate SAN data from one data center to another remote data center?

What is the application that performs data replication?

What is the software version and release for the data replication application?

Which encryption vendor is used to encrypt Confidential data on a given storage subsystem?

Does the storage for the encryption tool also run on a SAN shared with other applications?

Can corruption of the encryption data affect multiple applications or just this application?

What PKI vendor is used?

What version and release of PKI software is deployed?

Network Architecture -- All Applications

Is there only one switch/router vendor?

Is there only one firewall vendor?

Is there only one Intrusion Protection System/Intrusion Detection System (IPS/IDS) vendor?

Is there only one load balancer vendor?

Is there just one telecommunications vendor to the internet and/or WAN (Wide Area Network)?

Network Architecture -- Each Application

Which switch/routers are used within the data center?

Which switch/router models are used?

Are the switch/routers in an architecturally redundant design?

What version of embedded software and model of hardware is used in switch/router deployment?

Which firewall vendor is used?

What models of firewalls are deployed in the data center?

Are there a limited number of firewall permutations that are deployed? (embedded OS version, hardware model, features)?

What intrusion protection/detection products are deployed?

Which intrusion protection/detection vendors are used?

What permutations of IPS/IDS are deployed in the data center?

What version of IPS/IDS software is deployed?

Which vendor's load balancers are used?

Which load balancer model is used?

What is the version of the load balancer's embedded software and model of hardware?

Are they used to steer traffic between different global data centers?

Are the load balancers redundant, could one instantly take the place of another?

What telecommunications vendors are used for internet access?

What WAN telecommunications vendor is used for traffic between data centers?

What WAN telecommunications vendor is used for traffic between offices and the data center?

Is the telecommunications equipment redundant?

Is the telecommunications fiber underground physically separate?

These questions cover a large chuck of storage and networking diagnostic questions. I'm sure that I've missed some; but this should provide a flavor of what the critical web applications are using within the infrastructure cloud layer. These questions give insight into whether or not failure in a given product would affect multiple applications. It helps companies design and tune the architecture properly so that redundancy can be created in all products where possible. Then the failure of a given product does not cascade to multiple critical applications. It is very likely that it is much cheaper to over-engineer, thereby anticipating and reacting well to failure, than it is to have very expensive cloud services downtime.

The questions associated with whether or not only one vendor is used for a given product type reveals a potential enterprise weakness. Full reliance on one vendor can lead to significant failure if a specific product hardware/software release is flawed and occurs under stressful conditions only. Then, all cloud applications that use that product would be impacted negatively. The other questions address what I'll call use congestion. Multiple applications are sharing the same component (storage subsystem, server, or firewall). The product failure affects all those applications simultaneously.

In summary, this article focuses on architectural reliability. It creates a set of questions just focused on products within the storage domain, encryption of data-at-rest, and the networking domain. Since the cost of products is much cheaper than application downtime over-engineering is encouraged where possible. The need to deploy more product vendors must be balanced with a need to limit product and feature permutations so that realistic disaster recovery scenarios can be tested. Please see a previous article that I wrote on this. I'll visit other cloud layer diagnostic questions in the next article.

Join the CIO Australia group on LinkedIn. The group is open to CIOs, IT Directors, COOs, CTOs and senior IT managers.

More about: etwork, Intrusion, IPS
References show all

Comments

Post new comment

The content of this field is kept private and will not be shown publicly.
Users posting comments agree to the CIO comments policy.
Login or register to link comments to your user profile, or you may also post a comment without being logged in.
Related Coverage
Related Whitepapers
Latest Stories
Community Comments
Tags: cloud computing, internet
Latest Blog Posts
Whitepapers
  • HP and Closed Circuit Print Security Podcast featuring Quorcirca
    Managing Security risks within Enterprise printing environments
    Learn more »
  • Fixing Your Dropbox Problem - How the Right Data Protection Strategy Can Help
    It’s estimated that more than 50 million people have used public cloud storage services such as Dropbox to share and exchange files. Public cloud services are so easy to use that their openness can undermine existing IT policies regarding the transmission of confidential data. With data volumes threatening to overwhelm onsite storage, IT managers are looking to find a solution that’s affordable and secure. This paper details a simple three-step approach to helping users manage access to the public cloud without placing your data or your business at risk. Read on.
    Learn more »
  • Developing an Information Strategy - Strategize, Align, Govern, Execute, and Optimize
    An information strategy defines how a company will use the data it collects to achieve a competitive advantage. It is a comprehensive, constantly evolving plan that encompasses five distinct actions. In this white paper we explore how these five vital actions, as well as the technologies that enable and support them, can help organizations develop an effective and broad-reaching information strategy that drives positive change.
    Learn more »
All whitepapers
rhs_login_lockGet exclusive access to Invitation only events CIO, reports & analysis.
Recent comments