High port density, high throughput, and very low latency are bedrock requirements in the data center, and Force10's new S4810 top-of-rack switch delivers on all three counts.
At the same time, Clear Choice testing revealed some interesting limitations in the "merchant silicon" chips increasingly seen in data-center switches. Tests turned up anomalies in the areas of cut-through latency; media access control address learning; and link aggregation failover handling. Beyond the switching silicon, the S4810 also turned in mixed results in multicast scalability.
The S4810 is a 1U top-of-rack switch with multiple interface options. It has 48 SFP+ ports for 1G/10G Ethernet (we tested it with 48 10G Ethernet transceivers) and four QSFP+ ports for 40G uplinks. With 10GBase-SR transceivers, the switch drew 202 watts when idle and 219 watts with its data plane fully loaded.
The switch runs the Force10 Operating System (FTOS), which includes a command-line interface (CLI) that's nearly a clone of Cisco's IOS. Experienced Cisco users will have no trouble configuring and managing this switch.
Although we tested the switch as a layer-2 data center device, it also supports layer-3 features, including major IPv4 routing protocols and static routing of IPv6 traffic, via a $2,000 software upgrade.
Significantly, the switch does not yet support some key data center protocols, according to a features questionnaire completed by Force10. These include the data center bridging extensions (DCBX); IEEE 802.1Qbb priority-based flow control (PFC); 802.1Qau congestion notification; and 802.1Qaz traffic shaping. Force10 says these features are slated for third-quarter 2011 release. (Click on links for features questionnaire and test results spreadsheet.)
We used the same methodology to test the S4810 as in our January 2010 comparison of 10G Ethernet top-of-rack switches. The only difference this time was that we used 48 instead of 24 ports in measuring layer-2 unicast and multicast performance.
The S4810 put up solid numbers when it comes to basic unicast traffic handling. It delivers line-rate throughput, regardless of unicast frame size. Better still for delay-sensitive applications, the S4810 offers sub-microsecond average latency when configured in store-and-forward mode. This is one of the first store-and-forward switches we've tested to break the microsecond barrier.
We expected average latency to be lower still with the S4810 configured as a cut-through device, but that wasn't always the case. For frame sizes of 256 bytes and larger, cut-through latency was significantly higher than the equivalent test in store-and-forward mode. Further, cut-through latency increased with frame length.
Usually cut-through devices usually have two properties: They tend to be very fast (since they start forwarding a frame before it's fully received, unlike store-and-forward devices which wait until the entire frame is cached before switching it) and they have roughly the same average latency regardless of frame length.
With the S4810, these properties better described the store-and-forward results than cut-through ones.
This is partially explained by a characteristic of the Broadcom 56845 application-specific integrated circuit (ASIC) used in the S4810. According to Force10, the chip still acts in store-and-forward mode for frames shorter than 624 bytes, even when set for cut-through operation. This could explain higher cut-through latency for medium-length frames (say, between 256 and 624 bytes) but it's still puzzling why cut-through latency would be higher for longer frames. The testing RFCs require different measurement methods for store-and-forward and cut-through latency, and we checked and rechecked results to verify we'd used the appropriate methods for each. Force10 and other labs also have confirmed this behavior.
Given the latency results, we'd recommend leaving the switch in its default store-and-forward mode. There's a performance advantage for doing so, and users get the extra benefit of error checking that store-and-forward operation provides.
MAC address capacity
Another anomaly appeared in tests of MAC address capacity, which determines how many devices can be attached to a switch. This metric is especially important for virtualization and cloud computing, where virtual machine counts in a single broadcast domain can rise into the tens of thousands.
The S4810's data sheet states its MAC capacity as 128,000; in practice, we found the limit to be slightly lower, averaging 117,145 addresses depending on which set of pseudorandom addresses we used. The switch ASIC's hashing algorithm accounts for the difference. To save memory and speed lookup times, ASICs store a hash of each MAC address. With a particular set of addresses perfectly matched to a given hashing algorithm, no two hashes will ever overlap or "collide." In practice vendors cannot predict what addresses customers will use, so some collisions are inevitable.
What's more, the actual number of addresses the switch can learn in production is likely to be far lower than 117,000. Typically, address capacity tests are conducted using only three ports. When we configured the Spirent TestCenter traffic generator to offer a set of nearly 100,000 pseudorandom addresses across 48 ports, the switch learned only about 94,000 of these due to hash collisions. Through trial and error, we found that the switch would learn at most around 25,000 addresses without hash collisions when we distributed addresses across 48 ports.
To be sure, 25,000 addresses is still a huge number, more than enough for the vast majority of data centers. Then again, some heavy users of virtualization already are pushing above this figure. Further, we think data-sheet numbers should give users meaningful guidance on the limits of switch performance, not theoretical best-case estimates.
Link aggregation fairness
The S4810 allows up to eight ports to be combined into a link aggregation group (LAG) and uses the link aggregation control protocol (LACP) to dynamically add and remove LAG members. We took one LAG member offline, as might occur in the event of a link or transceiver failure, to see how the switch would distribute that port's traffic across remaining members of the LAG.
Traffic distribution was not uniform in this failover test. After we disabled a port, the switch redistributed all of its traffic to the first two ports in the LAG. On a lightly loaded network this wouldn't be a problem, but it could result in oversubscription and frame loss on a heavily loaded LAG. Still, this is an improvement over LAG behavior we saw on some switches in last year's test, where all traffic from a failed LAG port was redistributed to just one other LAG member.
As a final test of unicast performance, we checked the S4810 for "forward pressure," a mechanism some switches use to avoid congestion by forwarding frames illegally fast. The S4810 doesn't have that problem. Its clock is set to run at 40 parts per million (ppm) faster than Ethernet's theoretical line rate, but that's well within the 100-ppm tolerance allowed in the Ethernet specification.
We measured the S4810's multicast performance with tests of IGMP group capacity; group join and leave times; and throughput and latency. The first two of these stress the switch's control plane via the switch's software and CPU, while throughput stresses the data plane via the ASIC.
Using IGMP snooping, the switch learned 3,000 multicast groups in our capacity test. That's higher than all but one top-of-rack switch tested last year, and a useful figure for trading and videoconferencing applications that require large number of multicast groups.
The switch's join/leave times were another story. With all receivers subscribed to 989 multicast groups, the S4810 took an average of 21.7 seconds to join each group and 18.3 seconds to leave. That's much higher than most switches in last year's test, which also handled 989 groups. The S4810's maximum join and leave times were higher still, at 49.8 and 53.7 seconds respectively. These high IGMP processing times suggest an overload of the switch's CPU.
More evidence of an overload came in a buffer-overflow message we saw when running this test (and the group capacity test) immediately after a switch reboot. The fact that the switch did not display this message on the second and subsequent test iterations suggests an issue with initial loading of a multicast software module into memory when large group counts are involved. Another issue we saw (on all iterations, not just the first one) is that the switch's CLI erroneously reported the same port twice as a member of a given multicast group.
Force10 said it replicated these results in-house, and found much lower join and leave times - of 1 second or less - when 100 groups were involved instead of nearly 1,000. The vendor also says it's doing more optimization work on this new platform.
The final set of multicast tests examined switch throughput and latency, again using 989 groups. In these tests, we configured the Spirent TestCenter traffic generator to transmit multicast traffic to one port, and act as multicast subscribers on the 47 remaining ports.
The switch offered line-rate throughput of multicast traffic, with the exception of jumbo frames. With these 9,216-byte frames, the highest zero-loss rate was roughly equivalent to around 98.5 percent of line rate. That's a bit of a surprise in that most data-center switches deliver line-rate throughput in all cases, unicast and multicast alike. On the other hand, jumbo frames are common for unicast than multicast transport (think backup and disaster-recovery applications); thus, the multicast jumbo throughput result probably isn't a concern for most users.
Average and maximum multicast latencies were roughly comparable to unicast with the switch in store-and-forward mode.
For network managers whose foremost switch requirements are high port density and very low latency, the S4810 is a good fit. The S4810 still has more work to do in the areas of data center features support and multicast processing speeds. These involve software fixes, and Force10 says they're already in the works. Hardware anomalies, such as those involving MAC address learning and link aggregation failover, are harder to fix and may take longer to address.
Network World gratefully acknowledges the support of vendors that supplied key test bed infrastructure and support. Spirent Communications supplied its Spirent TestCenter traffic generator/analyzer with 10-gigabit Ethernet test ports and Spirent's Mike Kanada, Timmons Player, Michael Lynge, and Chris Chapman provided engineering and logistical support. Fluke Corp. supplied its Fluke 335 TrueRMS clamp meter for power measurement.
Newman is a member of the Network World Lab Alliance and president of Network Test, an independent test lab and engineering services consultancy. He can be reached at firstname.lastname@example.org.
Read more about lan and wan in Network World's LAN & WAN section.
Join the CIO Australia group on LinkedIn. The group is open to CIOs, IT Directors, COOs, CTOs and senior IT managers.