

## USING HARD MACROS TO REDUCE **FPGA COMPILATION TIME**

Christopher Lavin, Marc Padilla, Subhrashankha Ghosh, Brent Nelson, Brad Hutchings, and Michael Wirthlin







Example QPSK Design

How Do Hard MACROS MAKE FPGA **COMPILATION FAST?** 

**Benefits of Hard Macros** 

- Experiment #1: Xilinx + Hard Macro Support
- Three different results
- Design placed/routed fine but took longer than regular design
- PAR failed, could not find a valid placement
- Placement succeeded, but no valid routing could be 3.

#### **Experiment #2: Obtainable Speedup**

- Place Hard Macros by hand to determine obtainable speedup
- Conclusions

speedup

- 3X obtainable speedup
- Custom router will also increase

- No need for synthesis (XST), technology mapping (NGDBuild), or packing (MAP)
- Skipping these steps saves significant part of build time
- Potentially faster placement times
- Only placing 10's hard macros instead of *1000's* of primitives
- Faster routing times
- Hard macros contain all necessary internal routing
- Hard macro designs only need routing to connect hard macros to each other and to IOs
- Offer significant design reuse
- Once a hard macro is built, it can be reused in many other designs without the need to rebuild it

### Hard Macro Experiments

- Experiment #1: Determine Xilinx hard macro support
- Can Xilinx tools implement designs created purely of hard macros?
- Experiment #2: Determine obtainable

- found
- Conclusions

PAR .

- Xilinx PAR is not suitable for hard macro-based designs
- Placement of hard macros is source of problems
- Implies creation of our own hard macro-optimized placer
- Experiment #2 tests if custom placer would speed up compilation time



| Convent                  | ional D          | esigns (E | Raseli      | ne Ri | Intimes          | :)                |                 | Hard<br>Iacro 4 | Hard<br>Macro 2<br>Hard<br>Hard<br>Macro 5                                                                                                                                                                                                                                         |  |
|--------------------------|------------------|-----------|-------------|-------|------------------|-------------------|-----------------|-----------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--|
| Design                   |                  |           | MAP         | PAR   |                  | untime            |                 |                 |                                                                                                                                                                                                                                                                                    |  |
| Mult-tree                | 46.7s            | 9.0s      | 17.7s       | 64.2s | 5 137.5s         |                   | .xdl            |                 | $\begin{array}{c} \text{Istom} \\ \text{lacer} \end{array} \xrightarrow{\bullet} \begin{array}{c} \text{xdl2} \\ \text{ncd} \end{array} \xrightarrow{\bullet} \begin{array}{c} \text{PAR-} \\ \text{p} \end{array} \xrightarrow{\bullet} \begin{array}{c} \text{.ncd} \end{array}$ |  |
| Heterogeneous            | 80.2s            | 4.9s      | 10.4s       | 35.5s | 5 131.0s         |                   |                 | _               |                                                                                                                                                                                                                                                                                    |  |
| Hard Macro-based Designs |                  |           |             |       |                  |                   |                 | •               | XST/NGDBuild/Map = 0 secs,<br>PAR reduced                                                                                                                                                                                                                                          |  |
| Design                   | Custom<br>Placer | XDL2NCD   | PAR<br>(rou | -     | Total<br>Runtime | Speedu<br>baselin | up (over<br>le) | •               | Assume placer can run very<br>quickly (only a few blocks vs.<br>1000's)                                                                                                                                                                                                            |  |
| Mult-tree                | 4.3s             | 14.3s     | 25.7        | S ·   | 44.3s            | 3.1X              |                 | •               | XDL design assembler will be                                                                                                                                                                                                                                                       |  |
| Heterogeneous            | 4.3s             | 12.1s     | 25.5        | S ·   | 41.9s            | 3.1X              |                 |                 | fast (trivial operation)                                                                                                                                                                                                                                                           |  |



 $\times \times \times \times \times \times \times$ 

# WHAT KIND OF DESIGN FLOW WILL USE HARD MACROS?



#### speedup of a hard macro-based flow

• Hand place hard macros to get accurate

timing of the routing