Page loading . . .

  
 You are at: The item(s) you requested.Thursday, May 23, 2013
Are You Building Your ESL Design Flow on Sand?  
Contributor: Bluespec, Inc.
 Printer friendly
 E-Mail Item URL

To date, behavioral synthesis solutions based on sequential programming languages, e.g. C/C++/SystemC, have been the only high-level options above RTL. While these approaches raise the level of abstraction of design, they have significant limitations, including poor quality of synthesis results except for the narrow application spaces that they can efficiently address. Consequently, except for niche applications, C/C++ and SystemC have primarily been used for functionality modeling, performance assessment and verification. It is the rare chip development team that does not write RTL to produce real silicon.

You have to ask yourself: when was the last time you saw a benchmark from someone doing behavioral synthesis that did not involve math algorithms, such as imaging and filters? Well, you probably haven't seen one -- the reason lies in how these solutions raise the level of abstraction -- and the challenges of synthesizing these higher level constructs.

Bluespec is reinventing hardware design by offering a fundamentally new approach to high-level synthesis that elevates all applications without affecting the quality of results relative to hand-coded RTL.

We contrast SystemC and Bluespec across the following dimensions to highlight where each approach fits:
  • Structure
  • Resources
  • Concurrency/Coordination
  • Communication

The ESL Vision: Elevating Design above RTL

ESL design is design at a level above RTL with a straightforward goal: to improve the productivity and quality of the complete chip deliverable, including both hardware and software. With chip complexities finally outpacing the ability for verification and timing closure teams to keep pace, either in terms of resource costs, time-to-market or quality, there are two primary emphases:
  • Accelerating software and firmware development
  • Elevating RTL design to reduce time-to-market, resource costs, and quality issues in chip development

There are a lot of EDA tools targeting ESL applications, covering many categories -- over a dozen were part of Gartner Dataquest's 2003 ESL Landscape. Effective ESL solutions need to ensure that they:
  • Integrate with and leverage existing EDA tools and methodology
  • Integrate specification, modeling and implementation work
  • Deliver the Quality of Results (QoR) of hand-coded RTL, in area, frequency, latency or power
  • Optimize the project, not solely sub-components
  • Maintain design predictability, control and transparency

ESL Synthesis: The Foundation

Behavioral synthesis provides ESL's foundation. Without it, modeling work is detached from hardware implementation - work must be manually re-created at the RTL-level, wasting resources and introducing divergence and errors.

Behavioral synthesis solutions have been based on software programming language environments, predominantly C, C++ and SystemC. C/C++ synthesis involves mapping a serial description into a parallel one, using Control Data Flow Graphs (CDFGs). SystemC layers parallel hardware constructs onto C++, delivering a hybrid sequential/parallel environment.

SystemC will be the focus in the remainder of the paper as it has augmented C/C++ with parallel constructs to address the limitations of pure software-based hardware design environments - incidentally, it will also address C and C++ synthesis.

SystemC, Bluespec and Two Different Tales of Abstraction: With Fundamentally Different QoR

When used at an RTL-level, SystemC has been effectively translated into Verilog. But, what happens when designs are done at a higher level of abstraction? Let's look at how SystemC and Bluespec elevate design from four different perspectives: structure, resources, concurrency/coordination, and communication. This analysis will highlight SystemC's dependence on sequential descriptions for abstraction -- which limits effective synthesis of high-level designs to vectorizable or VLIW-mappable math algorithms.

The following figure summarizes the different ways in which SystemC and Bluespec raise the level of abstraction for design:


Structure

SystemC's primary source of high-level design productivity lies in the structure in which designs are written. With its parallel hardware constructs, SystemC can be used at an RTL level, at a similar level of design abstraction to VHDL and Verilog. In fact, people have used it to design at the RTL level, with the ability to translate these detailed implementations to Verilog or VHDL. But, this neither saves time nor is it a higher-level of abstraction than RTL - why would you want to do this? SystemC delivers power in expression when designers deviate from a detailed RTL mapping and write more succinctly at a transaction level - regressing largely to sequential algorithm descriptions that provide the desired functional effects, while ignoring the granularity and parallelism of hardware structure.

In both SystemC and Bluespec SystemVerilog, you can design at an abstract, transactional level, and refine it to a synthesizable implementation. However, in SystemC, this refinement often involves:
  • A change in computation models (from, say, a message-passing model to RTL signaling), with concomitant changes in interfaces
  • A low-level synthesizable endpoint - RTL-level SystemC

With SystemC, development efficiency is derived by writing high-level designs, which deviate from hardware structure and include a significant amount of sequential software coding wrapped by parallel constructs. When looking at the underlying technology for performing this behavioral synthesis, it can be seen that while this type of abstraction is a terrific base for modeling, it is a terrible one for synthesis as it puts two burdens on the synthesis tool:
  • As the tool is in control of the implementation, it needs to determine the architecture and micro-architecture of an abstract, largely sequential algorithm.
  • In addition, the tool must convert a succinct transactional representation into an efficient, granular implementation.

SystemC based high-level synthesis, given its reliance on abstract, sequential code for its high level, depends on serial to parallel technology, which has significant limitations:

First, C specifications are sequential. Hardware is inherently parallel. Researchers in the field of parallel programming have largely abandoned the superficially seductive goal of synthesizing sequential programs (C/C++) into parallel implementations. They began over 35 years ago, in the fields of vectorization and automatic parallelization, and have come to realize that sequential programs are a fundamentally bad specification for parallel implementations - they specify sequentialization where none is algorithmically necessary, and it is generally computationally intractable for compilers/synthesis tools to undo this unnecessary sequentialization. Although hardware synthesis has a different target (hardware, instead of vector machines), the analysis techniques are essentially the same, with essentially the same pitfalls and intractabilities.

Second, while proving the lack of dependencies in serial designs is intractable in general, parallelization has been efficient for a very limited scope of problems: those described by simple, nested for-loops with simple, linear indices. These are algorithms that can be easily vectorized or have a straightforward mapping to VLIW engines. What does this mean? Basically, math algorithms with these characteristics, e.g. filters and Viterbi decoders, can be translated to efficient hardware from sequential software implementations. This means that software synthesis based on C/C++ delivers acceptable results for only a narrow market, DSP algorithms; for other applications, C/C++ will be relegated to modeling.

Finally, SystemC is based on C++, a sequential software programming language. As such it will suffer from the same limitations for high-level synthesis as generic C/C++ based approaches.

Let's look at the following high-level description of an IP lookup algorithm for a network router. It takes the IP address of a packet and looks up the correct egress port on which to forward it. In this simple C-based implementation, the lookup follows a sparse tree structure to perform one to three memory table lookups per address:

int longest_prefix_match (IPA ipa) /* Up to 3 memory lookups */
{
   int p;
   p = RAM [ipa[31:16]]; /* Level 1: 16 bits */
   if (isLeaf(p)) return p;

   p = RAM [p + ipa [15:8]]; /* Level 2: 8 bits */
   if (isLeaf(p)) return p;

   p = RAM [p + ipa [7:0]]; /* Level 3: 8 bits */
   return p; /* must be a leaf */
}

While simple to express in C, how would you map this to hardware? How many cycles should it take to implement? What architecture is the most extensible for supporting different memory latencies? How many lookups should occur in parallel? All of these are choices that a C-based synthesis tool must make, because they aren't specified in the design.

Consider three example architectures: A static pipeline, characterized by inefficient memory usage but simple design; a linear pipeline, characterized by efficient memory usage through memory port replicator; and a circular pipeline, characterized by efficient memory the with most complex control. How is a tool going to take the above serial description and derive these alternative architectures? Who do you want in control of hardware architecture: a software tool or you?

With Bluespec, the designer controls 100% of the hardware structure, both the architecture and micro-architecture. Bluespec's high-level abstraction is focused on speeding the development and eliminating the errors of these implementations versus what is possible with RTL.

Both SystemC and Bluespec's SystemVerilog also leverage the expressibility and succinctness of high-level types, such as structures, unions, and enumerations. High-level types allow designers to work with familiar data objects for their problem, with fewer lines of code, and fewer errors.

Unfortunately, the main source of SystemC's power is, in the case of synthesis, also its Achilles' heel. Do you want significant abstraction or efficient synthesis? Except with math algorithms, SystemC cannot deliver both simultaneously.

Resources

In creating structure, designers typically use two kinds of building blocks: state elements and operators, such as shift, add, and multiply. These are resources - and, as a first cousin to structure, are an area where SystemC abstracts from RTL. Instead of describing explicit hardware state elements or specifying specific processing building blocks like adders, barrel shifters or SRT dividers, C/C++ expressibility allows specific decisions to be left to synthesis. There are several implications to this:

First, in order to make these tradeoffs in the translation from software to hardware, SystemC behavioral synthesis tools need technology libraries, where different operation implementations have been pre-characterized for area and speed. This allows the synthesis tool to assess the implications of utilizing different resource configurations and opportunities for resource sharing. Unfortunately, the availability of these libraries can pre-determine your options for ASIC or FPGA silicon vendors.

Second, continuing the theme from structure, is the synthesis tool really the best one to make choices as to pipeline register stages, state elements, and operator implementations and configurations? In a November 8, 2004 EE Times contributed article entitled "Getting an Algorithm Ready for Reuse", Ketul Patel describes the challenges porting C++ algorithms across different processor platforms. He writes, "Everyone assumes that algorithms written in C++ are easily portable across platforms. This is not true in practice." If software compilers cannot efficiently optimize the same C++ algorithm for different processors, how is a hardware synthesis tool going to make the right architecture and micro-architecture implementation choices? He describes an image-processing algorithm that, when re-compiled for the target hardware, a TI TMS320C6205 DSP, was 50x slower than required.

To make the algorithm usable, they made big changes:
  • Converted it from C++ to C
  • Ensured efficient use of DMA control
  • Performed floating point to fixed point conversion
  • Eliminated division in the fixed point implementation

Without pre-seeding the compiler with the final implementation, the compiler was unable to come close to the target performance.

Third, when a synthesis tool makes resource and structure trade-offs, the RTL will look foreign to the algorithm developer. Designers will have considerable challenges working with the tool's RTL output - the only option is stay away from the RTL for simulation and debug, What are the chances that you can avoid the RTL?

Finally, what about other considerations such as power trade-offs, complex clock structures, multiple clock domains, and IP integration? Where and how are these going to be controlled?

With Bluespec, resource choices are explicitly made - the designer is the person making these choices as he or she is the best person to be making them. This gives the designer control over the architecture and micro-architecture and ensures that the results will be reasonable, as well as familiar. In order to elevate resources, Bluespec provides mechanisms, such as high-level types as well as others, that allow more creative resource implementations with fewer lines of code - again, higher abstraction, but 100% under designer control.

Concurrency/Coordination

While SystemC raises the level of abstraction above RTL in the areas of structure and resources, it does nothing for the management of concurrency and coordination across FSMs. Concurrency semantics were specifically added to SystemC to provide parallel constructs in a sequential programming environment.

But, like RTL, SystemC designers must explicitly delineate and manage parallelism. The mechanisms do not add anything additional for designers - in fact, if anything, SystemC's semantics are less succinct than what you would find with Verilog or VHDL.

In contrast, Bluespec revolutionizes concurrency and coordination, using new mechanisms for expressing behavior: rules and methods. A rule updates state behavior within a module - it is described as a set of conditions that, when true, execute a set of actions (see the following example). Methods are protocol behavior for interfaces, and operate to compose rule behavior across modules.

Rule syntax:

rule rule_name (list of conditions);
   action_a;
   action_b;
   …;
   action_n;
endrule;

Rule example:

rule stage1_leaf (unpack (sramResp.first) matches tagged leaf (.v));
   sramResp.deq;
   let {ip.lutag, itag} = fifo.first;
   fifo.deq;
   completionBuffer.complete.put(tuple2(itag,tuple2(unpack({0,v}),lutag)));
endrule;

You can think about a rule as a smart always block with safety interlocks. Safety interlocks because unlike RTL and unlike SystemC, Bluespec manages the following things:
  • Automatically identifying and avoiding potential race conditions
  • Automatically identifying resource contentions
  • Managing multiplexing, connectivity, and, if desired, arbitration of common resources
  • Ensuring that multiple actions which must happen together, stay together

Bluespec's underlying Term Rewriting System (TRS) technology makes the synthesis of these capabilities straightforward and transparent to designers. But, these compiler decisions are clearly communicated and easily controlled by the designer. In contrast, these areas must be manually tracked, managed and explicitly written for both RTL and SystemC.

With SystemC, there are two mechanisms for coordinating access to shared data members. You can explicitly design multiplex and arbitration logic, which requires manual, detailed work. Alternatively, you can use mutex and locks to coordinate access to shared data for modeling purposes, with unclear mapping to hardware.

With increased design complexity, concurrency and coordination have increasingly become a significant source of errors, which are often subtle, hard-to-stimulate and hard-to-find. With SystemC, designers get no additional facilities over RTL to stem these errors.

With rules and methods, Bluespec elevates both the specification and management of concurrency and inter-FSM coordination. You have to ask yourself: what is the source of the most numerous and subtle bugs and complexity: structure and resources or concurrency and coordination?

Communication

As designs have grown in size, solutions are being composed from large numbers of components and hierarchies of components. However, RTL techniques for scaling designs are inadequate. Today, interfaces are very simplistic:
  • They consist of wire definitions
  • The protocols for communicating between modules are typically customized with each interface, and are part of the core logic of the module
  • The protocols are documented in informal specifications, e.g. timing diagrams and text, that must be interpreted and explicitly coded to each time a module is used
  • Modules are connected by hooking adjacent module interfaces together, with no guarantee that protocols across the interfaces properly interwork

SystemC uses the same old model for inter-module communication. Although there are some benefits in succinctness, the capabilities and issues are the same as RTL.

Bluespec is built around new techniques for composing components that drive scalability and re-use:
  • Interfaces include protocol, in addition to a more succinct "signal" definition
  • Interface methods allow modules to change state within other modules, without requiring the manual coding, configuring, and connecting of external ports
  • Interface protocol ties elastically to internal module state, allowing interfaces and external modules to be immune to changes in core processing latency - for example, no re-design if a function now produces a result one cycle faster
  • Interfaces encapsulate use caveats - i.e. conditions under which they can or cannot be used. Designers are not stuck reading a spec - and verification teams need not worry about whether an interface was properly designed to.

The following is a SystemC code excerpt showing what it takes to manually implement some of these capabilities. This code excerpt is missing the top-level module connecting the two example interfaces together - in the Bluespec that follows example_interface_A directly instantiates example_interface_B so does not require a top-level module connecting them.

SystemC code example



Bluespec SystemVerilog code example

These two code examples illustrate a simple scenario. Now imagine changing the design in only a small way so that two modules, mod_A and mod_A', want to talk to the same interface on mod_B. With Bluespec, the change is trivial as the toolset will identify the resource contention and generate the proper connectivity and control logic to manage it. With SystemC, this small addition impacts the design by requiring a myriad of detailed changes in connectivity, multiplexing, and arbitration control.

With Bluespec's communication capabilities, there is both rapid composition and re-configuration. Design re-use means more than using off-the-shelf IP -- it means designers need less effort and complex control logic to hook up to an interface. Interfaces become self-documenting to use, speed connectivity and significantly reduce errors in use. Again, where do you see more issues: structure and resources or communication?

Conclusion

There are different approaches to raising the level of design above RTL. While different approaches can enable significantly faster designs, the ways in which they elevate design have very different implications for synthesis and in reducing the number of errors in the design.

In order to provide material improvements in performance and time-to-market, ESL solutions need to:
  • Elevate and accelerate all aspects of RTL design. Accelerating and providing good Quality of Results for only a sub-component (math algorithms) may offer targeted resource improvements and some increased flexibility, but SoC projects will be limited by the weakest link: the bulk of RTL development and verification that cannot be leveraged. ESL should raise all boats - the goal isn't to accelerate a piece of the system development, but the development of the overall system.
  • Integrate well with RTL-level tools and verification. ESL tools cannot ignore:
    • Existing IP
    • The existence of well understood, established EDA methodologies
    • The lack of equivalency tools to compare high-level designs to RTL

Design teams need to work at the RTL-level for verification and debug, and will need to do so for many years. ESL tools must be pragmatic and deliver productivity not only at higher levels, but at the RTL-level as well.

If you are looking at SystemC-based behavioral synthesis, challenge the vendor to show you area, speed, latency and lines of code benchmarks comparing RTL to SystemC for the following types of designs:
  • Processors
  • Bus arbitration
  • Network processing
  • Complex control logic applications
  • Any applications that aren't easily vectorized or mapped to a VLIW engine

Providing a strong modeling environment is fine if that is all you are planning. If the goal is hardware implementation, then a common, elevated environment for modeling and HW implementation is far preferable. One without the other ensures that specifications and implementations will be disconnected and work will be duplicated.

Unlike SystemC, Bluespec is reinventing hardware design with elevated design that delivers Quality of Results comparable to hand-coded RTL across all applications.



By George Harper, VP Marketing, Bluespec, Inc.




Go to the Bluespec, Inc. website to learn more.

Keywords: SOCcentral, Bluespec, ESL,
488/11326 2/1/2005 13691 13691
Add a comment or evaluation (anonymous postings will be deleted)



Designer's Mall
0.921875



Copyright 2002 - 2004 Tech Pro Communications, P.O. Box 1801, Merrimack, NH 03054
 Search site for:
    Search Options

Subscribe to SOCcentral's
SOC Explorer
Newsletter
and receive news, article, whitepaper, and product updates bi-weekly.

Exec Viewpoint

The Many Faces
of Low-Power Verification


Ghislain Kaiser
CEO, Docea Power

Exec Viewpoint

Maximizing the Value of Your Internal IP


Warren Savage
CEO, IPextreme

Odd Parity

Lets' Go On
with the Show!


Mike Donlin
The Write Solution

Odd Parity Archive

Barbara's Bytes

So, Just What
Is ESL


Barbara Tuck
Senior Editor,
SOCcentral

SOCcentral Job Search

SOC Design
ASIC Design
ASIC Verification
FPGA Design
CPLD Design
PCB Design
DSP Design
RTOS Development
Digital Design

Analog Design
Mixed-Signal Design
DFT
DFM
IC Packaging
VHDL
Verilog
SystemC
SystemVerilog

Special Topics/Feature Articles
3D Integrated Circuits
Analog & Mixed-Signal Design
Design for Manufacturing
Design for Test
DSP in ASICs & FPGAs
ESL Design
Floorplanning & Layout
Formal Verification/OVM/UVM/VMM
Logic & Physical Synthesis
Low-Power Design
MEMS
On-Chip Interconnect
Selecting & Integrating IP
Signal Integrity
SystemC
SystemVerilog
Timing Analysis & Closure
Transaction Level Modeling (TLM)
Verilog
VHDL
 
Design Center
Whitepapers & App Notes
Live and Archived Webcasts
Newsletters


About SOCcentral.com

Sponsorship/Advertising Information

The Home Port  EDA/EDA Tools  FPGAs/PLDs/CPLDs  Intellectual Property  Electronic System Level Design  Special Topics/Feature Articles  Vendor & Organization Directory
News  Major RSS Feeds  Articles Online  Tutorials, White Papers, etc.  Webcasts  Online Resources  Software   Tech Books   Conferences & Seminars  About SOCcentral.com
Copyright 2003-2013  Tech Pro Communications   1209 Colts Circle    Lawrenceville, NJ 08648    Phone: 609-477-6308
1  0.984375