April 27, 2012 -- Content — we want it all, and we want it now! At work and at play, we expect our devices to provide easy access to media, games, web sites, email, Facebook, Twitter. We demand that our devices stay connected to the Internet, and thus our content is increasingly stored in the cloud. And now we're welcoming a host of new content into the cloud as billions of appliances, sensors and garage doors begin to populate the Internet of Things. While it may not be entirely obvious today what will be done with all this content, SOC designers are nonetheless challenged with delivering innovative technologies that drive and power the next generation of cloud-connected devices.
SOCs that are connected to the cloud have to capture, store, process, display and/or communicate increasingly varied, higher-definition content. For an application processor, the required functionality is immense: high-performance, multicore CPUs and GPUs, HD video encode and decode, multi-standard audio, capable camera and display subsystems, telecom and datacom interfaces, Universal Flash Storage (UFS) and multi-channel DRAM systems. The next-generation wants voice and gesture recognition, 3D and even more sensors. At the server side of the cloud, it's all about performance per watt for SOCs that embed multiple cache-coherent clusters of CPUs and (soon) GPUs as computing engines, again with multi-channel DRAM systems. Even my garage door opener's microcontroller will need to grow a baseband subsystem, complete with a full TCP/IP stack, plus a few more sensors to join the cloud.
Speed and power concerns dominate the SOC architect's choices. For many subsystems, speed is a requirement to meet the real-time processing demands of streaming content. By operating a subsystem at higher speed, more functionality can be delivered for the same die area — if the rest of the SOC can keep up. As a result, a faster subsystem finishes a given function more quickly, enabling the SOC to switch off the subsystem's power to eliminate leakage current. For application processors, the thermal impact of peak power can limit performance so it is critical to keep the silicon "dark" for as long as possible.
On the speed front, SOC speeds between subsystems have normally been limited by access to external DRAM. Computer system designers know this "memory wall;" a processor that attempts to operate at higher bandwidth than its primary memory spends an increasing amount of time waiting for that memory to provide service. Many of the cost benefits of SOC integration stem from the sharing of most subsystems' primary memory via a unified DRAM subsystem. Since this sharing reduces the memory bandwidth available to each subsystem, the memory wall problem increases with integration level. The modest improvements in DRAM bandwidths are barely enough to keep up with the individual subsystems. To take advantage of all this processing speed, it is critical for the architect to design an on-chip network or NoC that can keep pace with the subsystems. Operating at gigahertz speeds does no good if you are waiting for access to memory.
In addition, the next-generation of multicore CPUs extends cache coherence outside the cluster to enable larger core counts, heterogeneous coherence and I/O coherence. Since the primary CPU memory is effectively the cache system (the CPU's answer to the memory wall), the speed of the coherence network can impact performance more than DRAM bandwidth. With application processor CPUs clocking at 2GHz or more, it is easy to imagine a new requirement for the on-chip network to operate above 1GHz.
Active power management is the key to controlling SOC power consumption. Each layer of the technology stack is developed with power in mind, from the underlying process technology through the operating system. We can, therefore, assume that our SoC building blocks — subsystems, IP cores and memories — are each designed with low operating power. The SOC architecture groups these building blocks together into power domains that are characterized by clock and voltage choices. A domain's power use can be varied by changing the clock frequency or voltage level, stopping the clock, switching off the voltage or combinations of these. SOC power management strives to dynamically control the power domains to achieve application performance and responsiveness goals while minimizing power consumption.
Since the on-chip network is at the center of the SOC and "sees" all the traffic, it is ideal for assisting the power-management units with safe and fast wake-up and shut-down of the subsystems. The network can easily fence off traffic when power-down is requested and acknowledged that it is safe to shut down. This is much faster than can be done in software and will ensure that no data is lost. This scheme saves significant power since domains can be left in "normally off" states and powered up quickly when required to service the device. Hardware approaches lend themselves better to local control and can, therefore, effectively support many more domains and a higher percentage of "dark" silicon. Finally, by accomplishing most power transitions without CPU assistance, the hardware-based power manager allows the highest leakage subsystem on the SOC to stay dark.
Cleary, the need for cloud connectivity is pushing the SOC architecture to the limit. As the nerve center of the SOC, it is critical that the on-chip network keep pace with all the high-performance subsystems and does not become the system bottleneck. The increased bandwidth capabilities of multi-channel DRAM systems, coupled with the emergence of system-level coherence across subsystems, is driving on-chip network frequencies above 1GHz. For most application processors, this is more than a 2X speed increase. As a result, the on-chip network will continue to take on an even more critical role in SOC design as integration, headroom and complexity requirements increase — in the cloud and beyond.

By Drew Wingard.
Drew Wingard is Chief Tecnical Officer, Sonics, Inc.
Go to the Sonics, Inc. website to learn more.