How to Pick a CPU When Buying Servers
- 01 February, 2013 14:58
As 2013 rolls in and the economy stabilizes, many IT organizations are looking to upgrade their computational and storage systems. Like any IT purchasing decision, there are tradeoffs to consider and choices to make regarding hardware features and the technology available. When it comes to storage servers, the first step is understanding your CPU options.
Intel vs. AMD
For at least this year, the two server CPU choices remain Intel and AMD. ARM might solve some of the computational parts of some of the problems, but in 2013, ARM won't have enough I/O bandwidth with 10 Gigabit Ethernet ports and storage to make it a viable alternative. This might change for 2014, but it's too soon to predict as development of PCIe buses with enough performance capability is complex.
The latest AMD CPUs have 16 cores, but only if you are running integer operations. When it comes to floating-point operations, you have only eight cores. This combined with the fact that the latest Intel server processors can read and write data from memory significantly faster than AMD processors mean that AMD processors should be relegated to operations with low computational intensity that do not require high-memory bandwidth--you might think of things like VMs, but more on why this is not a good idea later.
Communications Between CPU Sockets
Another place that Intel has a major advantage is communications between CPU sockets. The current crop of Intel server CPUs support 25.6 gigabits per second (Gbps) of I/O bandwidth between CPU sockets over the Quick Path Interconnect (QPI).
This performance combined with the per-socket memory bandwidth performance exceeds the current performance of AMD CPUs. On multi-socket machines, this has a dramatic impact on the performance for all of the sockets because a process might be making a request for which memory has been allocated on another socket.
PCIe Bus Drives Intel Ahead
PCIe is where the rubber meets the road on why the latest Intel processors are far ahead of their AMD competitors. The Intel technology on the latest server CPUs runs PCIe 3 with 40 lanes on each CPU.
That means that the PCIe bus and the CPU are capable of 40Gbps of I/O bandwidth. This is far greater than the bandwidth of available on AMD processors. So if you need to do a lot of network I/O or disk I/O, PCIe 3 is the better choice because it has far higher bandwidth than PCI 2.0 and the performance of the bus will double, but also the Intel CPU supports more PCIe lanes.
It's Intel's Year But There Are Still Issues
There is one problem with the new Intel CPUs that becomes more noticeable with quad-socket configurations. As mentioned earlier, the PCIe bus is on the CPU socket so with four sockets you have four PCIe buses with 40 lanes each for a total of 160 lanes of 1Gbps PCIe bandwidth. That is a lot of I/O bandwidth, but looking a bit deeper there is a problem:
The QPI connections between sockets is a dual-channel 12.8Gbps channel for a total performance of 25.6Gbps
The PCIe express bandwidth of a socket is 40x 1Gbps per lane or 40 Gbps of PCIe bandwidth to the socket.
Problems quickly arise when PCIe bandwidth exceeds 25.6Gbps and the process requesting access to the PCIe bus is not on the socket with the bus where the access is being requested. Some of the workarounds attempted would lock processes on sockets with the PCIe bus that needs to be read or written. But it did not work for all applications. For example, those with data coming in and going out of multiple locations such as a striped file system are affected because you cannot break the request and move each request to each PCIe bus.
The real-world performance for general purpose applications running on a four-socket system is likely an estimated 90 percent of the QPI bandwidth between sockets (or 23Gbps) unless the data goes out on the socket with the PCIe bus. Every fourth I/O, if they are equality distributed, will run at 40Gbps, so the average performance would be (3x23Gbps +40Gbps)/4 or an average performance of about 27.25Gbps per socket for a quad-socket system.
This is, of course, the average based on equal distribution of the processes and I/O to the PCIe bus. A process that has PCIe processor affinity will significantly improve that average, but it is often difficult to architect and meet the requirements of putting every task on a PCIe bus and ensuring that the process runs on the CPU with that bus. The probability of this limitation is higher with a quad-socket system than with a dual-socket system.
The diagram below shows an example of a dual-socket system that, though having the same issues, reduces the potential of hitting that architectural limitation.
My estimate for performance for a dual-socket system is (23Gbps +40Gbps) or average socket performance of 31.5Gbps. On a dual-socket system it is much easier to architect the system so that you can put the right I/O on the right CPU and achieve near-peak performance.
CPU Conclusions Are Counter-Intuitive
New Intel systems have far more I/O bandwidth than previous systems and they have more than anything available from AMD. ARM is not currently competitive if you need to move lots of data in and out of the system.
The current Intel line quad-socket systems will average about 27.25Gbps unless significant work is done to architect the system to connect with processors and PCIe buses. The IOPS performance of the system will, of course, be higher as IOPS is not impacted by QPI bandwidth limitation.
The dual-socket systems are easier to get higher performance, and the average system performance is over 4.25Gbps. So my conclusion is you are better off using dual-socket systems for high I/O bandwidth requirements versus a quad socket. This, of course, is clearly counterintuitive, but is the best strategy given the current Intel architecture.
You will mostly likely see Ivy Bridge server processors in 2013 and the QPI bandwidth will go way up so with Ivy Bridge quad socket systems likely make sense. More on this after the Ivy Bridge serve processor are released.
Henry Newman, is CEO and CTO of Instrumental Inc., a consulting firm that specializes in high-performance computing and storage.
Join the CIO Australia group on LinkedIn. The group is open to CIOs, IT Directors, COOs, CTOs and senior IT managers.
Ruggedized scientific calculator perfect for extreme math
How to Switch From iPhone 5S to BlackBerry Z30 (and Why)
How to Switch From iPhone 5S to BlackBerry Z30 (and Why)
CIOs to Become In-House Brokers -- and That's a Good Thing
The future of computing
Modernize Your Business with Oracle ERP Cloud
If your business has plans that include aggressive growth and aspires to be a best-in-class organization, your IT systems and applications need to be up to the task. Homegrown solutions or outdated software can hamper the execution of your strategic vision. If your IT infrastructure and maintenance costs are affecting your ability to stay competitive, then a cloud-based enterprise resource planning (ERP) suite is well worth exploring. This eBook explores the core components of a cloud-based ERP solution that delivers enterprise-class software without sacrificing functionality or changes to business processes and with no additional cost for infrastructure and complicated integrations.
The Three Essential Steps to Successful Cloud Migration
Businesses and enterprises have quickly realised the power and efficiency of cloud computing, but migrating to the cloud can be a challenging process. This guide leads you through the three key steps you should take to assess your workload, select the most appropriate cloud model and ensure your cloud provider’s migration methodology stacks up.
Avoiding Common Pitfalls of Evaluating and Implementing DCIM Solutions
While many who invest in Data Centre Infrastructure Management (DCIM) software benefit greatly, some do not. Research has revealed a number of pitfalls that end users should avoid when evaluating and implementing DCIM solutions. Choosing an inappropriate solution, relying on inadequate processes, and a lack of commitment / ownership / knowledge can each undermine a chosen toolset’s ability to deliver the value it was designed to provide. This paper describes these common pitfalls and provides practical guidance on how to avoid them.