Selecting an AES Solution

Contributor: High Tech Marketing

The Advanced Encryption Standard (AES), based on the Rijndael algorithm which combines an extremely high level of security with computational efficiency, is an open standard selected from an open competition. The algorithm consists of Exclusive-OR functions combined with matrix operations and is a mathematically "clean" design which avoids the risk of "back doors" to unauthorized users. The elegance and efficiency of the system makes it suitable for either hardware or software systems.

Low data rates can be accomplished by software-only solutions. Hardware solutions are, of course, much faster and are often specified because implementing the critical security components in hardware isolates them from software threats such as viruses. This avoids the need to carry out a detailed and costly security analysis of all the software components in the system.

To achieve higher data throughput designers can use an SOC (ASIC) or FPGA platform to provide hardware acceleration. This is where another feature of AES comes into play, i.e., the scalability of the algorithm. Figure 1 gives a typical trade-off between throughput and equivalent ASIC gate count and exhibits a nearly linear relationship between complexity and data rate. A low-gate-count design will use a narrow data width (down to 8bits) and process each 128-bit sample through multiple cycles. A classic cost/ performance trade-off is achieved by increasing the hardware resources to give a wider data path, which needs fewer cycles for a given throughput. Implementations using 32-bit data paths often offer an optimal trade-off because of the way the AES algorithm has been defined.

Figure 1. Example area/ performance trade-off for AES configurations.


FPGA applications can exploit a similar trade-off. For example, a wireless application requiring 100Mbps can be realized using a small core with 16-bit data width and a 70-MHz system clock. Optical networking needing 10Gbps is achieved by increasing the data width to 128bits, adding pipelining and winding the clock up to 156MHz. Trade-offs within this 100:1 range provide intermediate solutions that span the needs of military, broadcast, communications and storage applications.

Figure 2. Simplified block diagram showing area/ performance trade-off.


The factors to weigh in technical and commercial issues

The choice between a SOC/ASIC or an FPGA solution is usually clear-cut. Compared to ASIC solutions, FPGAs carry overheads that have an impact on technical and commercial performance. The programmable interconnect on an FPGA adds RC delays that reduce the performance compared to the custom metal of an ASIC. Additional transistors are needed by an FPGA to provide the programmability, but this raises the cost. Over recent years, however, FPGAs have reduced these disadvantages and closed the gap significantly to the point where they are routinely used for volume production. A programmable solution is, without question, the fastest way to market and for this reason has become ubiquitous.

Much has been written about the total SOC project costs of mask sets, design time and tool suites for a 65-nm chip, with estimates starting at $5 million and going stratospheric. Staggering costs like these eliminate all but a handful of designs, as witnessed by the dwindling number of ASIC starts. Recent reports also suggest that close on 9 out of 10 projects over-run their deadlines, making long development schedules even worse. That said, if the choice is for a masked solution because of unit cost or extreme performance requirements, then there is a wide selection from numerous IP vendors.

From Figure 1, the cost of the silicon for the AES function will be extremely low, with the largest design occupying only 1 or 2 cents worth of silicon. AES cores can be highly optimized for ASIC. What will be more significant are the engineering costs of integrating the function into the design and especially design verification. Here is where comprehensive test benches which cover all the "corner cases" start to pay off. Some applications may require a FIPS validation of the AES implementation, with the attendant risk of a respin.

When the target technology is an FPGA, the choice of IP appears to be equally as wide. Many vendors offer netlists which can be input into the FPGA design flow. A faint warning bell should be ringing at this point unless the design has been specifically targeted at the FPGA architecture. The reason is that the ASIC world is different from the FPGA world in subtle ways. An ASIC builds up functions that the designer specifies from a rich cell library and there is no need to consider the impact of implementation. In contrast, FPGAs have fixed resources that the design must map efficiently onto.

Small details such as using asynchronous resets rather than synchronous signals can have disproportionate impacts. The details of the memory design can have a large influence, as these are fixed and finite, and an unsympathetic design could double the resources consumed. This would not be significant if the costs per gate were similar to ASICs, but that is not the case. If the design is moderate performance and the production quantity is low, then any inefficiency can probably be accommodated. These architectural differences should have been taken into account if the design is specifically built for FPGA implementation.

Even within designs built for FPGAs there can be large differences. You would expect vendor-to-vendor differences, but there can be surprises within vendor portfolios. For example, the premium to implement a data encrypter/ decrypter over an encrypter-only design can range from a modest 10% with lots of resource sharing to around twice the size in the worst case. Another significant variable relates to the key expansion system that is used in the AES process to encrypt the plain text. The algorithm to calculate these keys can be implemented in either the FPGA hardware or in software on an external processor. A software approach may be suitable for low throughput schemes with spare processing cycles, but this will not be the case for high performance and is counter-intuitive if you have decided to use hardware acceleration. However, the key expander can be resource intensive and can double the design size, although it is normally included in the resource estimates.

Vendor-to-vendor differences can be even more surprising. A comparison of data sheets for a similar configuration can result in significant throughput differences from similar resources. The explanation, (to some degree, at least) can be resolved by the data width used or the features provided.

Another variable that is worth considering is the clock frequency used to achieve the throughput. The relationship is given by:

Throughput (Mbps) = 128 * fclk/ number of cycles

The number of cycles depends on the key size and the data width and ranges from 1 to over 600.

The engineering costs of a high clock speed are higher power consumption and more difficult timing closure. FIFOs and flow control of data may be required if the core runs at a different speed to the rest of the design, adding both cost and complexity. If an IP core is optimized by locking down the physical layout it reduces the flexibility of placement for the remainder of the design, which can worsen the overall timing-closure problems. Remember, the very essence of an encryption algorithm is randomness. So the combination of high clock speed, high complexity and a "random" interconnect pattern is a recipe for difficulties in meeting timing closures.

Power consumption in FPGAs has become a major criterion for users in recent years because it affects the overall system cooling regime and costs, as well as raising concerns on reliability. FPGA vendors have successfully adopted design techniques to minimize quiescent power in larger devices. Fortuitously, one consequence of Moore’s Law is that it has driven operating voltages down to 1V, but dynamic power will always be proportional to clock frequency, so a lower clock is better.

Verification is the number one headache in system design. AES was standardized by NIST with a number of different operating modes. NIST also provides a large number of "known-answer" test patterns and a specification for tests to be used in implementation validation. For validation, the test vectors are generated by a NIST-approved test lab and are not known in advance. You can save quite a lot of work by selecting a vendor who offers a comprehensive test bench implementing all AESAVS tests, ideally with additional vectors as specified in FIPS197 and Special Publication SP800-38A.

Commercial considerations

These are best illustrated by example. Factors to consider include core cost, silicon cost, core support and modification costs, license restrictions, and learning curve.

Algotronix, Ltd. for example, offers a range of AES cores, including a flagship 10Gbps AES-GCM design. The core is very competitively priced, and "ticks all the boxes" in terms of being a compact design that delivers 10Gbps from only a 156-MHz clock. This is an example of a complex core with additional logic to provide enhanced functionality. The AES-XTS core is dedicated for data storage applications, and the AES-CCM is used in wireless applications such as IEEE 802.11n and WiMAX applications. So far, so good.

The next cost to consider is the silicon. Unlike an ASIC, efficiency of implementation has an impact on the total cost. For most systems, it is valid to assume that it will need an FPGA for interfacing or custom logic, and that the AES core can co-exist in this device. The Algotronix G3 core will fit into a Spartan or Virtex series from Xilinx or a Cyclone or Stratix device from Altera. Select a core that can be targeted at multiple FPGA families or vendors so that you have more flexibility to reduce silicon costs. The 100+ list price for Xilinx Spartan XC3S2000-4FG456C is $45.90. The resources for the core logic provide a throughput of over 350Mbps and require 1.3% of the FPGA. As a result, the cost of the silicon real-estate occupied by the core is only $0.57.

IP vendors typically provide customers with either HDL or netlist versions of their cores. It is not practical to trace how many FPGAs have been shipped with the IP, so expect no royalty payments. One "future proof" consideration is that at least one vendor allows the IP to be moved to ASIC at no additional cost. This provides an easy cost-reduction path for successful products, or an easy way to prototype an ASIC solution.

Figure 3. A simplified decision flow.


To cover the range of options and modes supported in AES, customers can license and edit HDL code, but be aware that there is often at a steep price premium over a netlist. In addition to on-line support (ranging from 30 days to a year), most vendors offer a customization service where the exact functionality can be set. The advantage of licensing HDL code becomes clear in two circumstances. The first example is where marketing changes the specification late into the project. (Never happens in your company?) The other advantage relates to learning curve and reuse issues.

Engineers need to understand how things work, and it is much easier reading an HDL source than a netlist. HDL code lets them play "what-if" scenarios to arrive at the optimum design. For example, users can change VHDL generic parameters and recompile to evaluate trade-offs such as data path widths. They will also be expected to verify the core, and create test vectors for the final product. The final task (and the least exciting) is to document how the design works. To put the cost of the learning curve into perspective, assume that the annual cost of salary, benefits and tool costs for an engineer runs at $110k yielding a cost of $500 per day. The experience of Algotronix is that customers are actively working within a day on "plain vanilla" cores, but it also offers a safety net for more demanding designs with consultancy options. Better still, customers can sign up to get a free of charge evaluation copy to convince them that it is the right core.

One special consideration for encryption IP relates to confidence that the security has not been compromised. A concern in a high security design is to ensure that so called "back door" features have not been maliciously included. It is important, therefore, to select a reputable vendor. Purchasing untraceable source code without provenance or whose authors are anonymous should be avoided. It greatly reduces the risk that criminal hackers or a hostile intelligence agency has "contributed" malware to an open source project. It is also risky to purchase encryption IP from a vendor in a country with a less developed legal framework or one which has political disagreements with your own. Owning source code gives users the option of analysing the design and archiving it.

Finally, a very important consideration is reuse. In reality, most customers will continue to include the security systems they develop in future products, because reuse of blocks has doubled over the last decade . The AES standards will not change, so choosing a vendor who offers multiple use or low cost extensions to the license can be a shrewd move. Larger companies will also look for either a site-wide license or one covering their whole company division so that they have the flexibility to operate efficiently.

While a number of technical and commercial aspects have been discussed in this article, there is no substitute for a full evaluation.

By Paul Dillien

Paul Dillien founded the high-technology marketing consultancy company High Tech Marketing. He has worked in the semiconductor industry for over 30 years, including various Sales and Marketing roles working for Xilinx, Plessey and Ferranti.


Reprinted from SOCcentral.com, your first stop for ASIC, FPGA, EDA, and IP news and design information.
Copyright 2002 - 2011 Tech Pro Communications, 1209 Colts Circle, Lawrenceville, NJ 08648