A novel predictable segmented FPGA routing architecture
Proceedings of the 1998 ACMSIGDA sixth international symposium on Field programmable gate arrays FPGA 98 (1998)
- ISBN: 0897919785
- DOI: 10.1145/275107.275111
Available from portal.acm.org
or
Available from portal.acm.org
Page 1
A novel predictable segmented FPGA routing architecture
A Novel Predictable Segmented FPGA Routing Architecture
Emil S. Ochotta, Patrick J. Crotty, Charles R. Erickson, Chih-Tsung Huang,
Rajeev Jayaraman, Richard C. Li, Joseph D. Linoff, Luan Ngo, Hy V. Nguyen,
Kerry M. Pierce, Douglas P Wieland, Jennifer Zhuang, and Scott S. Nance
Xilinx, Inc., 2100 Logic Drive, San Jose, CA 95124 USA
408-559-7778
emil @xilinx.com
1. ABSTRACT
In the development of new FPGA architec-
tures, a designer must balance speed, density
and routing flexibility. In this paper, we dis-
cuss a new FPGA architecture based on a
patented [l], novel, segmented routing fabric
that is targeted to high performance and pre-
dictability but does not sacrifice routability or
area efficiency. Current segmented arcbitec-
tures allow much flexibility in routing, but
incur large delay penalties when a signal has
high fanout or must traverse medium to long
distances to reach its target. Reducing the
number of programmable interconnect points
(PIPS) that a signal must traverse to reach its
target, while eliminating the RC delay
buildup due to signal fanout, improves design
performance and offers highly predictable
signal delays.
1.1 Keywords
FPGA, Programmable Logic, Routing.
2. INTRODUCTION
Since the introduction of the FPGA in 1985 [2], the number
and scope of applications in which FPGAs are used have
increased ramatically. One reason for this dramatic growth
has been the shrinking gap between FPGAs and application
specific integrated circuits (ASICs) in terms of capacity,
cost, usability, and performance. It follows that the goal of
architects creating a new FPGA is to reduce and eventually
eliminate this dwindling gap. This paper examines one
aspect of FPGA architectural design, design of the routing
fabric, and presents apatented [ 11, novel, segmented routing
architecture, that has significant advantages in performance
and usability while balancing the demands of cost and
capacity.
permission to make digital/hard copies ofall or part ofthis material for
personal or chsroom use is granted without fee provided that the copies
arc not made or distributed for profit or commercial advantage. the copy-
+$t notice, the title of the publication and its date appear, and notice is
given that copyright is by permission ofthe ACM, Inc. To copy othewise,
to republish, to post on servers or to rcdiibute to lists, requires specific
permission and/or fee.
FPGA 98 Monterey CA USA
Copyright 1998 ACM 0-89791-9785/98101..%5.00
We organize the balance of this paper as follows: In the next
section, we describe the goals and key ideas behind our new
routing architecture. We then set the context for the routing
architecture by providing an overview of the FPGA of which
it is a part and a brief description of how the routing
architecture volved. This is followed by the core of the
paper, which describes the routing fabric and presents ome
results from our analysis of it. Finally we close with some
conclusions.
3. GOALS AND BACKGROUND
Our new FPGA routing architecture is designed to maximize
performance and predictability, while maintaining high
routability and efficient area utilization. These new goals -
performance, predictability, routability, and area efficiency
- result from applying the more general goals of capacity,
cost, usability, and performance tothe FPGA routing design
problem. For some of these goals, the effect is readily
apparent: routing performance, the delay of the interconnect
from one logic element o the next, directly influences the
overall performance of a user’s design in an FPGA. In
FPGAs, routing delays have always been important in
computing overall performance, and as process minimum
feature sizes continue to shrink, routing delays increasingly
dominate logic delays. Similarly, the effect of area efficiency
on cost is direct, since the cost of an FPGA is proportional to
its die area. For predictability, the relationship to the more
general goals is not so obvious. Predictability is the ease and
accuracy with which interconnect delay can be estimated for
a design when the gates have been placed in the logic
elements on the FPGA but the routing has not been
completed. Good predictability makes it easier to write
software that can quickly implement a user’s design on the
FPGA, significantly impacting its usability. Routability also
influences usability. Routability is a measure of how easy it
is to interconnect the necessary logic elements to complete
the implementation f the user’s design on the FPGA. Again,
greater outability makes it easier to write efficient routing
software, thereby increasing usability. Routability also
impacts cost and capacity. If there are insufficient routing
resources to interconnect all the placed logic elements,
routability can be the limiting factor in the capacity of a
device. Similarly, since cost is directly proportional to die
area, for a given amount of logic that can be interconnected,
a routing architecture that consumes more area is more
expensive.
Now that we have established the relationship between our
routing specific goals - performance, . predictability,
3
Emil S. Ochotta, Patrick J. Crotty, Charles R. Erickson, Chih-Tsung Huang,
Rajeev Jayaraman, Richard C. Li, Joseph D. Linoff, Luan Ngo, Hy V. Nguyen,
Kerry M. Pierce, Douglas P Wieland, Jennifer Zhuang, and Scott S. Nance
Xilinx, Inc., 2100 Logic Drive, San Jose, CA 95124 USA
408-559-7778
emil @xilinx.com
1. ABSTRACT
In the development of new FPGA architec-
tures, a designer must balance speed, density
and routing flexibility. In this paper, we dis-
cuss a new FPGA architecture based on a
patented [l], novel, segmented routing fabric
that is targeted to high performance and pre-
dictability but does not sacrifice routability or
area efficiency. Current segmented arcbitec-
tures allow much flexibility in routing, but
incur large delay penalties when a signal has
high fanout or must traverse medium to long
distances to reach its target. Reducing the
number of programmable interconnect points
(PIPS) that a signal must traverse to reach its
target, while eliminating the RC delay
buildup due to signal fanout, improves design
performance and offers highly predictable
signal delays.
1.1 Keywords
FPGA, Programmable Logic, Routing.
2. INTRODUCTION
Since the introduction of the FPGA in 1985 [2], the number
and scope of applications in which FPGAs are used have
increased ramatically. One reason for this dramatic growth
has been the shrinking gap between FPGAs and application
specific integrated circuits (ASICs) in terms of capacity,
cost, usability, and performance. It follows that the goal of
architects creating a new FPGA is to reduce and eventually
eliminate this dwindling gap. This paper examines one
aspect of FPGA architectural design, design of the routing
fabric, and presents apatented [ 11, novel, segmented routing
architecture, that has significant advantages in performance
and usability while balancing the demands of cost and
capacity.
permission to make digital/hard copies ofall or part ofthis material for
personal or chsroom use is granted without fee provided that the copies
arc not made or distributed for profit or commercial advantage. the copy-
+$t notice, the title of the publication and its date appear, and notice is
given that copyright is by permission ofthe ACM, Inc. To copy othewise,
to republish, to post on servers or to rcdiibute to lists, requires specific
permission and/or fee.
FPGA 98 Monterey CA USA
Copyright 1998 ACM 0-89791-9785/98101..%5.00
We organize the balance of this paper as follows: In the next
section, we describe the goals and key ideas behind our new
routing architecture. We then set the context for the routing
architecture by providing an overview of the FPGA of which
it is a part and a brief description of how the routing
architecture volved. This is followed by the core of the
paper, which describes the routing fabric and presents ome
results from our analysis of it. Finally we close with some
conclusions.
3. GOALS AND BACKGROUND
Our new FPGA routing architecture is designed to maximize
performance and predictability, while maintaining high
routability and efficient area utilization. These new goals -
performance, predictability, routability, and area efficiency
- result from applying the more general goals of capacity,
cost, usability, and performance tothe FPGA routing design
problem. For some of these goals, the effect is readily
apparent: routing performance, the delay of the interconnect
from one logic element o the next, directly influences the
overall performance of a user’s design in an FPGA. In
FPGAs, routing delays have always been important in
computing overall performance, and as process minimum
feature sizes continue to shrink, routing delays increasingly
dominate logic delays. Similarly, the effect of area efficiency
on cost is direct, since the cost of an FPGA is proportional to
its die area. For predictability, the relationship to the more
general goals is not so obvious. Predictability is the ease and
accuracy with which interconnect delay can be estimated for
a design when the gates have been placed in the logic
elements on the FPGA but the routing has not been
completed. Good predictability makes it easier to write
software that can quickly implement a user’s design on the
FPGA, significantly impacting its usability. Routability also
influences usability. Routability is a measure of how easy it
is to interconnect the necessary logic elements to complete
the implementation f the user’s design on the FPGA. Again,
greater outability makes it easier to write efficient routing
software, thereby increasing usability. Routability also
impacts cost and capacity. If there are insufficient routing
resources to interconnect all the placed logic elements,
routability can be the limiting factor in the capacity of a
device. Similarly, since cost is directly proportional to die
area, for a given amount of logic that can be interconnected,
a routing architecture that consumes more area is more
expensive.
Now that we have established the relationship between our
routing specific goals - performance, . predictability,
3
Page 2
routability, and area efficiency - and our more general
goals for FPGA architecture design, it is important o
identify specific areas for improvement in existing FPGA
routing architectures. One of the most important of these is
the build-up of RC signal delay through unbuffered
segments of the programmable interconnect. This delay can
have a dramatic negative ffect on performance, particularly
for high-fanout nets, where each load contributes additional
delay. Signal delay also can be exacerbated by poor
routability because a connection may be relegated to a
suboptimal path due to signal congestion in the local area.
The non-linear nature of RC trees also makes the delays on
routes with many unbuffered segments difficult to predict.
When the problem is further complicated by the possibility
that a signal may be forced to route around congestion,
predictability becomes nearly impossible. Because of these
weaknesses, a key idea behind our work is to eliminate
unbuffered segments from the routing architecture.
Another key idea in our routing architecture comes from
trying to capture the best features of interconnect s ructures
that are too costly to build. The ideal FPGA routing
architecture for performance, predictability, and routability
is a fully populated crossbar switch, which allows any logic
element to connect o any other logic element on the FPGA
through a minimum number of programmable
interconnection points (PIPS), the programmable
connections between wires. Unfortunately, the amount of
wiring needed for a crossbar grows quadratically with the
number of logic elements, so a full crossbar switch is not
area efficient. For an FPGA with a capacity of more than a
few thousand gates, a crossbar is too expensive; however,
we have captured some of the flavor of a crossbar in our
routing architecture without paying the high area cost.
Our new routing architecture combines the key idea of
buffering segments with the flavor of a crossbar structure. In
a somewhat whimsical allusion to the Gordian knot, the
architecture is called “Alexander”, a name we shall use for
convenience in the balance of this paper. Our Gordian knot
was the problem of achieving our performance and
predictability objectives while maintaining routability and
area efficiency. However, Alexander cut through his knot
problem and so shall we. We begin the description of our
solution with the most general view of the Alexander
routing architecture and the FPGA of which it was a part,
then motivate some of the features in the architecture by
outlining its evolution, and finally describe the architecture
in detail.
4. FPGA OVERVIEW
Before we describe the Alexander routing architecture in
detail, we first provide a context for the routing architecture
by describing the rest of the FPGA of which it was a part.
The general arrangement of the Alexander FPGA is shown
in Fig. 1 and is similar to the coarse grain static RAM
architecture of the Xilinx XC4000 Family [3]. The
architecture consists of a two-dimensional rray of logic
elements called Configurable Logic Blocks (CLBs), that are
interconnected bythe Alexander outing. A single CLB and
its associated routing resources are collectively referred to
as a tile. As sho’wn in Fig. 1, the CLB tiles form the core of
the FPGA and are surrounded by a ring of programmable I/
0 buffers. The CLBs implement the user’s logic, and the I/O
buffers provide the interface between the FPGA core and
the external world.
As shown in Fig. 2, each CLB consists of two logic cells.
These logic cells are largely independent: they have separate
data inputs and outputs but share the control signals on the
flip flops. As shown in Fig. 2, each logic cell consists of a
function generator that can be configured to produce any
function of its four inputs, and an edge-triggered D flip flop
that can act as a storage lement.
Fig. 3 presents a somewhat more detailed view of a logic
cell. Combinational logic is implemented in the function
generator, and the flip flop can store a single bit of state
information. The function generator and flip flop in a logic
cell are arranged in series such that the output of the
function generator can be used as the input to the flip flop.
To provide fast and compact arithmetic, the logic cell
Figure 1. Architecture Layout.
Cin
Figure 2. Configurable Logic Block (CLB)
4
goals for FPGA architecture design, it is important o
identify specific areas for improvement in existing FPGA
routing architectures. One of the most important of these is
the build-up of RC signal delay through unbuffered
segments of the programmable interconnect. This delay can
have a dramatic negative ffect on performance, particularly
for high-fanout nets, where each load contributes additional
delay. Signal delay also can be exacerbated by poor
routability because a connection may be relegated to a
suboptimal path due to signal congestion in the local area.
The non-linear nature of RC trees also makes the delays on
routes with many unbuffered segments difficult to predict.
When the problem is further complicated by the possibility
that a signal may be forced to route around congestion,
predictability becomes nearly impossible. Because of these
weaknesses, a key idea behind our work is to eliminate
unbuffered segments from the routing architecture.
Another key idea in our routing architecture comes from
trying to capture the best features of interconnect s ructures
that are too costly to build. The ideal FPGA routing
architecture for performance, predictability, and routability
is a fully populated crossbar switch, which allows any logic
element to connect o any other logic element on the FPGA
through a minimum number of programmable
interconnection points (PIPS), the programmable
connections between wires. Unfortunately, the amount of
wiring needed for a crossbar grows quadratically with the
number of logic elements, so a full crossbar switch is not
area efficient. For an FPGA with a capacity of more than a
few thousand gates, a crossbar is too expensive; however,
we have captured some of the flavor of a crossbar in our
routing architecture without paying the high area cost.
Our new routing architecture combines the key idea of
buffering segments with the flavor of a crossbar structure. In
a somewhat whimsical allusion to the Gordian knot, the
architecture is called “Alexander”, a name we shall use for
convenience in the balance of this paper. Our Gordian knot
was the problem of achieving our performance and
predictability objectives while maintaining routability and
area efficiency. However, Alexander cut through his knot
problem and so shall we. We begin the description of our
solution with the most general view of the Alexander
routing architecture and the FPGA of which it was a part,
then motivate some of the features in the architecture by
outlining its evolution, and finally describe the architecture
in detail.
4. FPGA OVERVIEW
Before we describe the Alexander routing architecture in
detail, we first provide a context for the routing architecture
by describing the rest of the FPGA of which it was a part.
The general arrangement of the Alexander FPGA is shown
in Fig. 1 and is similar to the coarse grain static RAM
architecture of the Xilinx XC4000 Family [3]. The
architecture consists of a two-dimensional rray of logic
elements called Configurable Logic Blocks (CLBs), that are
interconnected bythe Alexander outing. A single CLB and
its associated routing resources are collectively referred to
as a tile. As sho’wn in Fig. 1, the CLB tiles form the core of
the FPGA and are surrounded by a ring of programmable I/
0 buffers. The CLBs implement the user’s logic, and the I/O
buffers provide the interface between the FPGA core and
the external world.
As shown in Fig. 2, each CLB consists of two logic cells.
These logic cells are largely independent: they have separate
data inputs and outputs but share the control signals on the
flip flops. As shown in Fig. 2, each logic cell consists of a
function generator that can be configured to produce any
function of its four inputs, and an edge-triggered D flip flop
that can act as a storage lement.
Fig. 3 presents a somewhat more detailed view of a logic
cell. Combinational logic is implemented in the function
generator, and the flip flop can store a single bit of state
information. The function generator and flip flop in a logic
cell are arranged in series such that the output of the
function generator can be used as the input to the flip flop.
To provide fast and compact arithmetic, the logic cell
Figure 1. Architecture Layout.
Cin
Figure 2. Configurable Logic Block (CLB)
4
Sign up today - FREE
Mendeley saves you time finding and organizing research. Learn more
- All your research in one place
- Add and import papers easily
- Access it anywhere, anytime
Start using Mendeley in seconds!
Readership Statistics
1 Reader on Mendeley
by Discipline
100% Engineering
by Academic Status
100% Student (Master)
by Country
100% Iran


