SlideShare une entreprise Scribd logo
1  sur  4
Télécharger pour lire hors ligne
Designing a dataflow processor using CλaSH
             Anja Niedermeier, Rinse Wester, Kenneth Rovers, Christiaan Baaij, Jan Kuper, Gerard Smit
                              University of Twente, Department of Computer Science
                                            Enschede, The Netherlands

   Abstract—In this paper we show how a simple dataflow              dataflow machine developed at MIT [3]. The dataflow graph
processor can be fully implemented using CλaSH, a high level        resides in a special memory containing instruction cells which
HDL based on the functional programming language Haskell.           implement the firing rule. When a cell fires, the operands and
The processor was described using Haskell, the CλaSH compiler
was then used to translate the design into a fully synthesisable    instruction are sent to an execution unit which executes the
VHDL code. The VHDL code was synthesised with 90 nm                 instructions and sends the result back to the memory.
TSMC libraries and placed and routed. Simulation of the final           The Monsoon[4] is the first implementation with an explicit
netlist showed correct behaviour. We conclude that Haskell and      token store (ETS) to provide a more efficient token storage.
CλaSH are well-suited to define hardware on a very high level of     In an ETS, every node in the dataflow graph is assigned a
abstraction which is close to the mathematical description of the
desired architecture. By using CλaSH, the designer does not have    unique memory location. When a token is sent to a node it is
to care about internal implementation details like when designing   checked if there is already a token present at the corresponding
with VHDL. The complete processor was described in 300 lines        address in the token store. If not, the token is stored at that
of code, some snippets are shown as illustration.                   address. If yes, a match occured, i.e. the firing rule of the
                                                                    node is satisfied and the execution is triggered. A presence bit
                      I. I NTRODUCTION
                                                                    is used to indicate whether an address in the token store is
   As current embedded systems become increasingly complex,         occupied.
new ways of modelling these systems are being investigated.
Different levels of abstraction can be used to keep the system      III. D ESIGNING HARDWARE USING H ASKELL AND THE
manageable. In this paper, CλaSH [1] is used as a language                                CλA SH COMPILER
for specifying hardware on a high traction levels. CλaSH            This section gives a short introduction to designing hard-
is a recently developed hardware description system which ware using Haskell and CλaSH, the CAES1 Language for
compiles specifications written in the functional programming Synchronous Hardware. The CλaSH compiler was recently
language Haskell into VHDL. Haskell offers higher abstraction developed at the CAES group at the University of Twente, it
mechanisms than existing hardware description languages, thus translates a Haskell description of a design to fully synthesisable
the designer does not have the burden of specifying several VHDL. It is directly integrated into ghc [5], an open source
implementation details.                                          Haskell compiler. A detailed description of the working
   This paper presents the design process of a complete principle of CλaSH can be found in [1], several design
architecture using CλaSH. The designed architecture is a simple examples in [6]. The idea behind CλaSH is that electronic
dataflow processor which is taken as example as its functionality circuits can be seen as a mathematical function: For a certain set
can be easily modelled in a functional programming language of inputs, a determined output is produced. An electronic circuit
as the principles are very close to mathematics. The processor can thus intuitively be modelled in a functional programming
is designed based on previously presented dataflow processors. language.
The focus of this paper is thus not the designed architecture       With Haskell as well as with other functional languages, it
itself but more a presentation of a design method which has a is possible to achieve a very concise description of the desired
short design time and is very close to the initial concepts of architecture. Before CλaSH, other approaches were presented
the architecture.                                                to describe hardware with help of Haskell, like Lava [7], which
                                                                 is an HDL embedded in Haskell and ForSyDe [8], which uses
                  II. DATAFLOW PROCESSORS
                                                                 Haskell for system modelling. In contrast to CλaSH, they
   Dataflow processors can directly execute dataflow graphs[2] do not directly use a subset of Haskell but use Haskell to
which are mathematical representations of programs in which define their syntax. That has the disadvantage that many of
nodes represent operations and edges (called arcs) represent Haskell’s features like control structures (e.g. guards, if-else)
the dependencies between these operations. Arcs carry the data or polymorphism are not supported whereas they are fully
to other nodes as tokens. A node can only execute when all supported in CλaSH.
required data is available (the firing rule). During firing the       The CλaSH compiler produces fully synthesisable VHDL
node consumes all tokens on the input and produces result- code from a given Haskell description which is compliant to
tokens on the output. Several nodes may fire at the same time the CλaSH restrictions which are described in [1] (e.g. no
which may also trigger firing of other nodes.                     dynamic lists but vectors, a state of a function is marked with
   Dataflow machines do not use a central program counter the State keyword). Higher order Haskell functions like map
as in von Neumann architectures, but use the firing rules of or zip are fully supported as are control structures like guards
dataflow nodes to trigger the execution of operations. The first
machine capable of executing dataflow graphs is the static          1 Computer Architecture for Embedded Systems (University of Twente)




978-1-4244-8971-8/10$26.00 c 2010 IEEE
processor                                       core                                                                             i0       i1          i2       i3           i4
         router                                         matcher
 full                 read                      full              read                                                                  3             4               20                15
             fifo              control                       fifo               control                    TSt
token                 token                     token             token                                                                       +               7                 +
                                                                                                                                                  0                                 3




                                        token
                                read




                                                                          extoken
token
                                                                                                         PMem




                                                                                           full
 full                                                                                                                                                                 −
                               fifo                                  alu
                                                                                                                                                                            2
                                                                                     fifo                                                                  ∗
                                                                                                                                                                  1




                                                                                    data
                                                                                                                                                          out




                                                                                                  read
                                                                                      fu
                                                                                                                Figure 2.   Example: Graph for the expression (i0 + i1 ) ∗ (i2 − (i3 + i4 ))

                                                                              control                                                           Table I
                                                                                                                                L IST OF TOKENS FOR GRAPH IN F IGURE 2
                                                          token
                                                           full                                                                               Value           Destination
                                                                                                                                              3               0, L
                  Figure 1.    Schematic of the proposed architecture                                                                         4               0, R
                                                                                                                                              7               2, L
                                                                                                                                              20              3, L
or pattern matching and polymorphism. As CλaSH is integrated                                                                                  15              3, R
into ghc, simulation of the design is very fast compared to a
full VHDL simulation.
   The clock does not have to be explicitly defined. The designer                                                the program memory. The token which was stored in the token
describes the desired functionality of a module between two                                                     storage is then deleted from the storage.
clock cycles as a transition from the current state to the next.                                                   The ALU can perform either an addition, a subtraction
                                                                                                                or a multiplication. With the opcode, it is determined which
                      IV. P ROPOSED A RCHITECTURE                                                               operation has to be performed. Each computation takes one
   The proposed architecture is based on the principles of                                                      clock cycle, i.e. there is no pipelining and the result is
dataflow processors found in literature [2]. It is implemented                                                   immediately sent to the output.
as a static dataflow machine like [3], but with the explicit token                                                  By connecting the modules like shown in Figure 1, a
store (ETS) principle presented in [4].                                                                         complete (though limited) dataflow processor is constructed.
   An overview is displayed in Figure 1. The processor consists A. Execution of dataflow graphs
of three main modules, namely a router, which arbitrates data
from both the external and the internal input, a matcher, which        The dataflow processor is programmed by defining the
is responsible for the matching process, i.e. the central principle destination of each node in the graph. Suppose a graph like the
of a dataflow machine, and an arithmetical logical unit (ALU), one shown in Figure 2. The graph represents the expression
which performs calculations of the data sent by the matcher. out = (i0 + i1 ) ∗ (i2 − (i3 + i4 )) with i0 = 3, i1 = 4, i2 =
Tokens travelling through the processor contain a data value 7, i3 = 20, i4 = 15 as example input values.
and the destination address.                                           In order to calculate the result for a given set of inputs, the
   The whole processor can itself be considered a dataflow           input values are sent to the corresponding inputs in forms of
graph. This means that every connection between the modules tokens. In order to calculate (3 + 4) ∗ (7 − (20 + 15)), which
corresponds to an arc on which tokens can be stored. In the is also used in Figure 2, 3 has to be sent to the left input of
proposed architecture, buffers at the input of each module are node 0, 4 to the right input of node 0 and so on. The list of
used to store those tokens. The same holds for the modules, tokens is shown in Table I.
every module in the processor corresponds to a node which              The temporary data values, i.e. the values resulting from one
operates using the firing rules of dataflow.                          computation and travelling to the next computation, have to be
                                                                    sent to the correct destination. The destination is determined
   The router is responsible for managing incoming data from
                                                                    from the program memory. For the example graph, the program
the outside (the external input) and data from within the
                                                                    memory is shown in Table II.
processor (the internal input). Data which is present in the
buffer of the internal input has priority over data in the external                      V. I MPLEMENTATION
input. Also, the router can send data out of the processor.
                                                                       In this section, a brief introduction to the implementation
   The matcher consists of the token storage (TSt), which
                                                                    of the processor is given.
implements the ETS principle, the program memory (PMem),
which stores the operation in form of an opcode and the
destination address(es) for every node in the graph, and a                                        Table II
control unit that takes care of the matching process. For each                   P ROGRAM MEMORY FOR GRAPH IN F IGURE 2
incoming token from the router it is checked whether it can be                        Node   Operation     Destination
matched with a token already in the token storage. If not, the                        0      ADD           1, L
token is stored. If a match is found, the values of both tokens,                      1      MUL           out
i.e. the stored one and the incoming one, are sent to the ALU                         2      SU B          1, R
together with the opcode and the destination address(es) from                         3      ADD           2, R
module
                                                                                        module (State (fs,ms)) i = ((State (fs’,ms’)),o)
                                                                                            where
 input                                                                         output           (fs’,fo) = fifo_control   fs i
                       fifo control                            fifo state
                                                                                                (ms’,o) = module_control ms fo

                                                                                           where fs and ms denote the current fifo and internal state,
                                                                                        fs’ and ms’ are the new states, i and o are input and output
                   module control                             internal state            and fo is the current output of the fifo. The type of the output
                                                                                        is a Maybe type to distinguish between a token present at the
                                                                                        output and no token present.
                                     (fifo state,internal state)



             Figure 3.       General implementation of the modules                      D. Matcher
A. Buffers                                                                                 As the matcher is the central module which implements
   To implement the buffers at the inputs, fifo buffers were                             the dataflow principle, its implementation is described in
used. The input of the fifo consists of a token wrapped in the                           more detail. The structure is implemented as shown in Figure
so-called Maybe type2 , i.e. either a new token was received                            3. The internal state of the matcher consists of the token
or not, and a read-signal from the module which indicates if                            storage TSt and the program memory PMem. The state input to
a value has been read from the fifo and can be erased. The                               module_control are the states of TSt and PMem, the state
output is a Maybe token and a full-signal indicating if the fifo                         output is only the new state of TSt as the program memory
is full. As flow control mechanism back pressure is used, i.e.                           does not change during execution. The data input consists of
when the buffer is full, its full line is set to true which notifies                     the token sent by the router (Just (v,(u,s)), where v is
the sending module that no more data should be sent.                                    the value and (u,s) is the destination) and the full-signal
                                                                                        from the ALU input buffer (fi). The data output consists of
B. Tokens
                                                                                        the read-signal to the fifo which indicates if a value was taken
   Tokens consist of a value and a destination. The destination                         out of the fifo and the data packet which is sent to the ALU.
consists of an address which represents the node in the graph                           The Haskell implementation of module_control looks as
and the input of the node, i.e. left or right. The implementation                       follows:
in Haskell is as follows:
                                                                                        module_control (tst,pmem) (Just (v,(u,s)),fi)
data     Side      =   L | R                                                             | not (check_match u tst) = (tst’,(True,Nothing))
type     Word      =   Int16                                                             | check_match u tst && fi = (tst,(False,Nothing))
type     Dest      =   (Int7, Side)                                                      | otherwise     = (tst’’,(True,(Just (l,r,op,d))))
type     Token     =   (Word,Dest)                                                         where
                                                                                            (l,r) | s == L    = (v,(tst!u))
  The keyword data defines new data values. The keyword                                            | otherwise = ((tst!u),v)
type is used to define a new data type by using existing data                                tst’   = addToken tst (v,u)
types.                                                                                      tst’’ = clearPBit tst u
                                                                                            (op,d) = pmem!u
  The data which is sent from the matcher to the ALU is an
extended token which combines two data values, the opcode       The function is divided into three cases. (1) No match is
and four destinations which are wrapped in the Maybe type. found for the incoming token; (2) a match is found but the
The Haskell implementation of the extended token type looks ALU buffer is full, i.e. it cannot receive any new data; or (3) a
as follows:                                                  match is found and the ALU can receive data.
data Op        = ADD | SUB | MUL                                The different cases are modelled with Haskell guards, their
type ExToken = (Word,Word,Op,                                principle is illustrated at the example of an inverter:
                          (Vector 4 (Maybe Dest)))
                                                                                        f :: Bool -> Bool
C. General implementation of the modules                                                f i
   The general implementation of the modules is shown in                                 | i         = False
                                                                                         | otherwise = True
Figure 3. The structure is similar for all modules, as they all
have an internal state (internal state), a state of the fifo(s)                             If the input i is True, i.e. the first case, the result is False
(fifo state), and data input and output. The internal state and                          If i is False, which is modelled in the otherwise case
the fifo state are controlled by their respective control functions                      (which is the default case if all other cases evaluate to False),
module ctrl and fifo ctrl. fifo ctrl depends on the data input                            the result is True
and the state of the fifo, module ctrl is a function of the current                         Furthermore, some helper functions are used in
output of the fifo and the internal state of the module. Both                            module_control. check_match u tst is True
control function resemble a mealy machine. Both state elements                          if there is a match for the current token at address u in tst,
are then combined and fed back into the module as Haskell                               otherwise it is False. addToken tst (v,u) adds a
natively does not have a notion of state. The implementation                            value v to tst at index u, and clearPBit tst u clears
looks as follows:                                                                       the presence bit in tst at index u, i.e. sets the content
  2 a Haskell datatype which can have the values Just x, indicating a valid             to Nothing. The ! operator is the indexing operator in a
value x or Nothing, indicating no value.                                                CλaSH vector.
Table III
                   R ESULTS FOR ASIC SYNTHESIS                         The Haskell implementation of the complete processor was
                                                                    300 lines of code, including comments and type definitions.
             area          gates   cells   power                    Such a compact description also greatly simplifies debugging.
             70850 µm2     33470   7460    6.9 mW                      To evaluate the outcome of the CλaSH implementation, the
                                                                    same processor was also implemented using pure VHDL. The
E. Program memory
                                                                    details are presented in [9] and are not discussed here, only a
   The program memory is implemented as a vector of length          brief summary is given. Both implementations were synthesised
128, i.e. currently 128 nodes in the dataflow graph are possible.    for an FPGA and also for ASIC. For the FPGA synthesis, the
Each element of the vector contains the opcode for the              results in terms performance were similar with only small
corresponding node in the graph and one up to four destinations.    deviations. The ressource utilisation differed for the number
The destinations are wrapped in the Maybe datatype to               of registers, we assume it could be caused by the way CλaSH
distinguish between valid and invalid destinations as nodes in a    implements registers (e.g. no write-enable signal is used).
dataflow graph usually have a different number of destinations.         The ASIC synthesis (without clock gating) results were
The number of the node is used as the address, i.e. to index        surprising, as the VHDL implementation required considerably
the vector. The Haskell implementation is as follows:               more area (roughly a factor of 2.5) and consequently also more
type PMem = Vector 128                                              power (a factor of 3). At the moment, we are not sure what
            (Op,(Vector 4 (Maybe Dest)))                            the reason for that is. However, when clock gating is enabled,
                                                                    the VHDL design is more power efficient, roughly by a factor
   where the keyword Vector denotes the CλaSH datatype
                                                                    of 8. The reason for that is that clock gating is not inserted
for a vector with a defined size.
                                                                    during synthesis for the CλaSH implementation. We assume
F. Token storage                                                    that this is again due to the missing write-enable signal.
   The token storage is also a vector of 128 elements, each                    VII. C ONCLUSION AND F UTURE W ORK
element has space for one data value. The rest of the data            In this paper we presented the complete design of a dataflow
token, i.e. the destination, is not stored as it can be derived processor by descriping the architecture in Haskell and then
from the second token which is matched with the token value using CλaSH to translate the description to fully synthesisable
stored in the token storage. Each element is a Maybe datatype VHDL code. The VHDL was synthesised and placed and
to distinguish between empty and occupied spots, i.e. to model routed, its correct functionality was verified by simulation. Due
a presence bit required for the ETS principle. Like in the to full support of Haskell’s features like control structures and
program memory, the number of the node is used as address. polymorphism in CλaSH, the processor could be described in
The implementation in Haskell looks as follows:                   a very compact way. In Haskell, different levels of abstraction
type TSt = Vector 128 (Maybe Word)                                can be used, in general the level of abstraction level is very high
                                                                  and close to the mathematical description of a design. Internal
G. Arcs between the modules                                       implementation details like timing are completely hidden from
   The modules are connected using Haskell’s so-called arrow- the designer. Summarising, we can say that it is possible to
abstraction. Each module is wrapped into an arrow together design a complex architecture using CλaSH with relatively
with its initial state. Several modules can be grouped by defining little effort. However, there are still many open issues which
a new arrow where the arrows of the modules are connected. are being investigated at the moment, like the problem with
The arrow for the processor is as follows:                        clock gating.
                                                                                            R EFERENCES
processorA = proc (di,fi) -> do
  rec (d,d2,f,f2) <- routerA -< (di,d1,fi,f1)                       [1] C. Baaij, M. Kooijman, J. Kuper, A. Boeijink, and M. Gerards, “Cλash:
      (d1,f1)     <- coreA   -< (d2,f2)                                 Structural descriptions of synchronous hardware using haskell,” in
 returnA -< (d,f)                                                       Proceedings of the 13th EUROMICRO Conference on Digital System
                                                                        Design, Architectures, Methods and Tools, Nice, France, September 2010.
where routerA and coreA are arrows for the router and the           [2] A. H. Veen, “Dataflow machine architecture,” ACM Comput. Surv., vol. 18,
                                                                        no. 4, pp. 365–396, 1986.
core, respectively, di and fi are the inputs to the processor, d    [3] J. B. Dennis and D. P. Misunas, “A preliminary architecture for a basic data-
and f are the outputs. d1,d2,f1, and f2 are the connections             flow processor,” in ISCA ’75: Proceedings of the 2nd annual symposium on
between the router and the core.                                        Computer architecture. New York, NY, USA: ACM, 1975, pp. 126–132.
                                                                    [4] G. M. Papadopoulos, “Monsoon: an explicit token-store architecture,” in
                                                                        In Proc. of the 17th Annual Int. Symp. on Comp. Arch, 1990, pp. 82–91.
               VI. R ESULTS AND DISCUSSION                          [5] http://www.haskell.org/ghc/.
  The generated VHDL code was verified using a four-point            [6] J. Kuper, C. Baaij, M. Kooijman, and M. Gerards, “Exercises in architecture
                                                                        specification using cλash,” in Proceedings of the Forum on Specification
FFT and different mathematical expressions. The simulations             and Design Languages, 2010.
showed that the design worked as expected, i.e. the CλaSH           [7] P. Bjesse, K. Claessen, M. Sheeran, and S. Singh, “Lava: hardware design
generated VHDL code resembles the functionality of the                  in Haskell,” in Proceedings of the third ACM SIGPLAN international
                                                                        conference on Functional programming. ACM, 1998, pp. 174–184.
Haskell description.                                                [8] I. Sander and A. Jantsch, “System modeling and transformational design
  The design was fully synthesised and placed and routed                refinement in ForSyDe,” IEEE Transactions on Computer-Aided Design
using TSMC 90 nm low power library for a clock speed of                 of Integrated Circuits and Systems, vol. 23, no. 1, p. 17, 2004.
                                                                    [9] A. Niedermeier, R. Wester, K. Rovers, C. Baaij, J. Kuper, and G. Smit,
100 MHz. A four-point FFT was used to extract power numbers.            “Comparing CλaSH and vhdl by implementing a dataflow processor,” in
The results for both the area and the power consumption are             Proceedings of the Program for Research on Embedded Systems and
shown in Table III.                                                     Software (PROGRESS), 2010.

Contenu connexe

Tendances

11 chapter06 slc_int_float_mov_mvm_fa14
11 chapter06 slc_int_float_mov_mvm_fa1411 chapter06 slc_int_float_mov_mvm_fa14
11 chapter06 slc_int_float_mov_mvm_fa14John Todora
 
Msp430 assembly language instructions &amp;addressing modes
Msp430 assembly language instructions &amp;addressing modesMsp430 assembly language instructions &amp;addressing modes
Msp430 assembly language instructions &amp;addressing modesHarsha herle
 
Sf rmr - Servicing Forwarding Remote Multiplexing Relay
Sf rmr - Servicing Forwarding Remote Multiplexing RelaySf rmr - Servicing Forwarding Remote Multiplexing Relay
Sf rmr - Servicing Forwarding Remote Multiplexing RelayAlenMilincevic
 
Lecture 05 NOP and Stack Group of Instructions
Lecture 05 NOP and Stack Group of InstructionsLecture 05 NOP and Stack Group of Instructions
Lecture 05 NOP and Stack Group of InstructionsZeeshan Ahmed
 
Building Flexible APIs for Web 2.x/Cloud Applications (JavaOne 2011 Session ...
Building Flexible APIs for Web 2.x/Cloud Applications (JavaOne 2011 Session ...Building Flexible APIs for Web 2.x/Cloud Applications (JavaOne 2011 Session ...
Building Flexible APIs for Web 2.x/Cloud Applications (JavaOne 2011 Session ...Raymond Feng
 
Intrl 8086 instruction set
Intrl 8086 instruction setIntrl 8086 instruction set
Intrl 8086 instruction setedwardkiwalabye1
 
RESTful SCA with Apache Tuscany
RESTful SCA with Apache TuscanyRESTful SCA with Apache Tuscany
RESTful SCA with Apache TuscanyRaymond Feng
 
Win pcap filtering expression syntax
Win pcap  filtering expression syntaxWin pcap  filtering expression syntax
Win pcap filtering expression syntaxVota Ppt
 
Unicode (UTF-8) with PHP 5.3, MySQL 5.5 and HTML5 Cheat Sheet (2011)
Unicode (UTF-8) with PHP 5.3, MySQL 5.5 and HTML5 Cheat Sheet (2011)Unicode (UTF-8) with PHP 5.3, MySQL 5.5 and HTML5 Cheat Sheet (2011)
Unicode (UTF-8) with PHP 5.3, MySQL 5.5 and HTML5 Cheat Sheet (2011)Francois Cardinaux
 
Understanding Layering and Ethernet
Understanding Layering and EthernetUnderstanding Layering and Ethernet
Understanding Layering and EthernetNicole Gaehle, MSIST
 
Addressing mode of 8051
Addressing mode of 8051Addressing mode of 8051
Addressing mode of 8051Nitin Ahire
 
Assembler design option
Assembler design optionAssembler design option
Assembler design optionMohd Arif
 

Tendances (19)

Chapter 1 archietecture of 8086
Chapter 1 archietecture of 8086Chapter 1 archietecture of 8086
Chapter 1 archietecture of 8086
 
11 chapter06 slc_int_float_mov_mvm_fa14
11 chapter06 slc_int_float_mov_mvm_fa1411 chapter06 slc_int_float_mov_mvm_fa14
11 chapter06 slc_int_float_mov_mvm_fa14
 
Msp430 assembly language instructions &amp;addressing modes
Msp430 assembly language instructions &amp;addressing modesMsp430 assembly language instructions &amp;addressing modes
Msp430 assembly language instructions &amp;addressing modes
 
Assembler (2)
Assembler (2)Assembler (2)
Assembler (2)
 
Sf rmr - Servicing Forwarding Remote Multiplexing Relay
Sf rmr - Servicing Forwarding Remote Multiplexing RelaySf rmr - Servicing Forwarding Remote Multiplexing Relay
Sf rmr - Servicing Forwarding Remote Multiplexing Relay
 
Lecture 05 NOP and Stack Group of Instructions
Lecture 05 NOP and Stack Group of InstructionsLecture 05 NOP and Stack Group of Instructions
Lecture 05 NOP and Stack Group of Instructions
 
Building Flexible APIs for Web 2.x/Cloud Applications (JavaOne 2011 Session ...
Building Flexible APIs for Web 2.x/Cloud Applications (JavaOne 2011 Session ...Building Flexible APIs for Web 2.x/Cloud Applications (JavaOne 2011 Session ...
Building Flexible APIs for Web 2.x/Cloud Applications (JavaOne 2011 Session ...
 
Intrl 8086 instruction set
Intrl 8086 instruction setIntrl 8086 instruction set
Intrl 8086 instruction set
 
Topics:LZ77 & LZ78
Topics:LZ77 & LZ78Topics:LZ77 & LZ78
Topics:LZ77 & LZ78
 
RESTful SCA with Apache Tuscany
RESTful SCA with Apache TuscanyRESTful SCA with Apache Tuscany
RESTful SCA with Apache Tuscany
 
Win pcap filtering expression syntax
Win pcap  filtering expression syntaxWin pcap  filtering expression syntax
Win pcap filtering expression syntax
 
Unicode (UTF-8) with PHP 5.3, MySQL 5.5 and HTML5 Cheat Sheet (2011)
Unicode (UTF-8) with PHP 5.3, MySQL 5.5 and HTML5 Cheat Sheet (2011)Unicode (UTF-8) with PHP 5.3, MySQL 5.5 and HTML5 Cheat Sheet (2011)
Unicode (UTF-8) with PHP 5.3, MySQL 5.5 and HTML5 Cheat Sheet (2011)
 
Co&amp;al lecture-07
Co&amp;al lecture-07Co&amp;al lecture-07
Co&amp;al lecture-07
 
Understanding Layering and Ethernet
Understanding Layering and EthernetUnderstanding Layering and Ethernet
Understanding Layering and Ethernet
 
Addressing mode of 8051
Addressing mode of 8051Addressing mode of 8051
Addressing mode of 8051
 
Co&amp;al lecture-05
Co&amp;al lecture-05Co&amp;al lecture-05
Co&amp;al lecture-05
 
Co&amp;al lecture-08
Co&amp;al lecture-08Co&amp;al lecture-08
Co&amp;al lecture-08
 
Assembler1
Assembler1Assembler1
Assembler1
 
Assembler design option
Assembler design optionAssembler design option
Assembler design option
 

En vedette (11)

73
7373
73
 
84
8484
84
 
82
8282
82
 
70
7070
70
 
75
7575
75
 
72
7272
72
 
Jgoes
JgoesJgoes
Jgoes
 
83
8383
83
 
550663274
550663274550663274
550663274
 
94
9494
94
 
87
8787
87
 

Similaire à 69

A Sneak Peek of MLIR in TensorFlow
A Sneak Peek of MLIR in TensorFlowA Sneak Peek of MLIR in TensorFlow
A Sneak Peek of MLIR in TensorFlowKoan-Sin Tan
 
Convention-Based Syntactic Descriptions
Convention-Based Syntactic DescriptionsConvention-Based Syntactic Descriptions
Convention-Based Syntactic DescriptionsRay Toal
 
Language-agnostic data analysis workflows and reproducible research
Language-agnostic data analysis workflows and reproducible researchLanguage-agnostic data analysis workflows and reproducible research
Language-agnostic data analysis workflows and reproducible researchAndrew Lowe
 
XML / JSON Data Exchange with PLC
XML / JSON Data Exchange with PLCXML / JSON Data Exchange with PLC
XML / JSON Data Exchange with PLCFeri Handoyo
 
Introducing Parallel Pixie Dust: Advanced Library-Based Support for Paralleli...
Introducing Parallel Pixie Dust: Advanced Library-Based Support for Paralleli...Introducing Parallel Pixie Dust: Advanced Library-Based Support for Paralleli...
Introducing Parallel Pixie Dust: Advanced Library-Based Support for Paralleli...Jason Hearne-McGuiness
 
A Domain-Specific Embedded Language for Programming Parallel Architectures.
A Domain-Specific Embedded Language for Programming Parallel Architectures.A Domain-Specific Embedded Language for Programming Parallel Architectures.
A Domain-Specific Embedded Language for Programming Parallel Architectures.Jason Hearne-McGuiness
 
Combinators, DSLs, HTML and F#
Combinators, DSLs, HTML and F#Combinators, DSLs, HTML and F#
Combinators, DSLs, HTML and F#Robert Pickering
 
TensorFlow 2.0 Autographs - For TFUG - Vik Pant
TensorFlow 2.0 Autographs - For TFUG - Vik PantTensorFlow 2.0 Autographs - For TFUG - Vik Pant
TensorFlow 2.0 Autographs - For TFUG - Vik PantDevatanu Banerjee
 
220 runtime environments
220 runtime environments220 runtime environments
220 runtime environmentsJ'tong Atong
 
Lossless LZW Data Compression Algorithm on CUDA
Lossless LZW Data Compression Algorithm on CUDALossless LZW Data Compression Algorithm on CUDA
Lossless LZW Data Compression Algorithm on CUDAIOSR Journals
 
Ekon bestof rtl_delphi
Ekon bestof rtl_delphiEkon bestof rtl_delphi
Ekon bestof rtl_delphiMax Kleiner
 

Similaire à 69 (20)

A Sneak Peek of MLIR in TensorFlow
A Sneak Peek of MLIR in TensorFlowA Sneak Peek of MLIR in TensorFlow
A Sneak Peek of MLIR in TensorFlow
 
Convention-Based Syntactic Descriptions
Convention-Based Syntactic DescriptionsConvention-Based Syntactic Descriptions
Convention-Based Syntactic Descriptions
 
python and perl
python and perlpython and perl
python and perl
 
Language-agnostic data analysis workflows and reproducible research
Language-agnostic data analysis workflows and reproducible researchLanguage-agnostic data analysis workflows and reproducible research
Language-agnostic data analysis workflows and reproducible research
 
A Peek into TFRT
A Peek into TFRTA Peek into TFRT
A Peek into TFRT
 
XML / JSON Data Exchange with PLC
XML / JSON Data Exchange with PLCXML / JSON Data Exchange with PLC
XML / JSON Data Exchange with PLC
 
TCP/IP 3RD SEM.2012 AUG.ASSIGNMENT
TCP/IP 3RD SEM.2012 AUG.ASSIGNMENTTCP/IP 3RD SEM.2012 AUG.ASSIGNMENT
TCP/IP 3RD SEM.2012 AUG.ASSIGNMENT
 
Tut4 dc
Tut4 dcTut4 dc
Tut4 dc
 
Introducing Parallel Pixie Dust: Advanced Library-Based Support for Paralleli...
Introducing Parallel Pixie Dust: Advanced Library-Based Support for Paralleli...Introducing Parallel Pixie Dust: Advanced Library-Based Support for Paralleli...
Introducing Parallel Pixie Dust: Advanced Library-Based Support for Paralleli...
 
A Domain-Specific Embedded Language for Programming Parallel Architectures.
A Domain-Specific Embedded Language for Programming Parallel Architectures.A Domain-Specific Embedded Language for Programming Parallel Architectures.
A Domain-Specific Embedded Language for Programming Parallel Architectures.
 
Combinators, DSLs, HTML and F#
Combinators, DSLs, HTML and F#Combinators, DSLs, HTML and F#
Combinators, DSLs, HTML and F#
 
xml2tex at TUG 2014
xml2tex at TUG 2014xml2tex at TUG 2014
xml2tex at TUG 2014
 
7986-lect 7.pdf
7986-lect 7.pdf7986-lect 7.pdf
7986-lect 7.pdf
 
Compiler design Project
Compiler design ProjectCompiler design Project
Compiler design Project
 
TensorFlow 2.0 Autographs - For TFUG - Vik Pant
TensorFlow 2.0 Autographs - For TFUG - Vik PantTensorFlow 2.0 Autographs - For TFUG - Vik Pant
TensorFlow 2.0 Autographs - For TFUG - Vik Pant
 
220 runtime environments
220 runtime environments220 runtime environments
220 runtime environments
 
Alp 05
Alp 05Alp 05
Alp 05
 
Alp 05
Alp 05Alp 05
Alp 05
 
Lossless LZW Data Compression Algorithm on CUDA
Lossless LZW Data Compression Algorithm on CUDALossless LZW Data Compression Algorithm on CUDA
Lossless LZW Data Compression Algorithm on CUDA
 
Ekon bestof rtl_delphi
Ekon bestof rtl_delphiEkon bestof rtl_delphi
Ekon bestof rtl_delphi
 

Plus de srimoorthi (20)

68
6868
68
 
63
6363
63
 
62
6262
62
 
61
6161
61
 
60
6060
60
 
59
5959
59
 
57
5757
57
 
56
5656
56
 
50
5050
50
 
55
5555
55
 
52
5252
52
 
53
5353
53
 
51
5151
51
 
49
4949
49
 
45
4545
45
 
44
4444
44
 
43
4343
43
 
42
4242
42
 
41
4141
41
 
39
3939
39
 

69

  • 1. Designing a dataflow processor using CλaSH Anja Niedermeier, Rinse Wester, Kenneth Rovers, Christiaan Baaij, Jan Kuper, Gerard Smit University of Twente, Department of Computer Science Enschede, The Netherlands Abstract—In this paper we show how a simple dataflow dataflow machine developed at MIT [3]. The dataflow graph processor can be fully implemented using CλaSH, a high level resides in a special memory containing instruction cells which HDL based on the functional programming language Haskell. implement the firing rule. When a cell fires, the operands and The processor was described using Haskell, the CλaSH compiler was then used to translate the design into a fully synthesisable instruction are sent to an execution unit which executes the VHDL code. The VHDL code was synthesised with 90 nm instructions and sends the result back to the memory. TSMC libraries and placed and routed. Simulation of the final The Monsoon[4] is the first implementation with an explicit netlist showed correct behaviour. We conclude that Haskell and token store (ETS) to provide a more efficient token storage. CλaSH are well-suited to define hardware on a very high level of In an ETS, every node in the dataflow graph is assigned a abstraction which is close to the mathematical description of the desired architecture. By using CλaSH, the designer does not have unique memory location. When a token is sent to a node it is to care about internal implementation details like when designing checked if there is already a token present at the corresponding with VHDL. The complete processor was described in 300 lines address in the token store. If not, the token is stored at that of code, some snippets are shown as illustration. address. If yes, a match occured, i.e. the firing rule of the node is satisfied and the execution is triggered. A presence bit I. I NTRODUCTION is used to indicate whether an address in the token store is As current embedded systems become increasingly complex, occupied. new ways of modelling these systems are being investigated. Different levels of abstraction can be used to keep the system III. D ESIGNING HARDWARE USING H ASKELL AND THE manageable. In this paper, CλaSH [1] is used as a language CλA SH COMPILER for specifying hardware on a high traction levels. CλaSH This section gives a short introduction to designing hard- is a recently developed hardware description system which ware using Haskell and CλaSH, the CAES1 Language for compiles specifications written in the functional programming Synchronous Hardware. The CλaSH compiler was recently language Haskell into VHDL. Haskell offers higher abstraction developed at the CAES group at the University of Twente, it mechanisms than existing hardware description languages, thus translates a Haskell description of a design to fully synthesisable the designer does not have the burden of specifying several VHDL. It is directly integrated into ghc [5], an open source implementation details. Haskell compiler. A detailed description of the working This paper presents the design process of a complete principle of CλaSH can be found in [1], several design architecture using CλaSH. The designed architecture is a simple examples in [6]. The idea behind CλaSH is that electronic dataflow processor which is taken as example as its functionality circuits can be seen as a mathematical function: For a certain set can be easily modelled in a functional programming language of inputs, a determined output is produced. An electronic circuit as the principles are very close to mathematics. The processor can thus intuitively be modelled in a functional programming is designed based on previously presented dataflow processors. language. The focus of this paper is thus not the designed architecture With Haskell as well as with other functional languages, it itself but more a presentation of a design method which has a is possible to achieve a very concise description of the desired short design time and is very close to the initial concepts of architecture. Before CλaSH, other approaches were presented the architecture. to describe hardware with help of Haskell, like Lava [7], which is an HDL embedded in Haskell and ForSyDe [8], which uses II. DATAFLOW PROCESSORS Haskell for system modelling. In contrast to CλaSH, they Dataflow processors can directly execute dataflow graphs[2] do not directly use a subset of Haskell but use Haskell to which are mathematical representations of programs in which define their syntax. That has the disadvantage that many of nodes represent operations and edges (called arcs) represent Haskell’s features like control structures (e.g. guards, if-else) the dependencies between these operations. Arcs carry the data or polymorphism are not supported whereas they are fully to other nodes as tokens. A node can only execute when all supported in CλaSH. required data is available (the firing rule). During firing the The CλaSH compiler produces fully synthesisable VHDL node consumes all tokens on the input and produces result- code from a given Haskell description which is compliant to tokens on the output. Several nodes may fire at the same time the CλaSH restrictions which are described in [1] (e.g. no which may also trigger firing of other nodes. dynamic lists but vectors, a state of a function is marked with Dataflow machines do not use a central program counter the State keyword). Higher order Haskell functions like map as in von Neumann architectures, but use the firing rules of or zip are fully supported as are control structures like guards dataflow nodes to trigger the execution of operations. The first machine capable of executing dataflow graphs is the static 1 Computer Architecture for Embedded Systems (University of Twente) 978-1-4244-8971-8/10$26.00 c 2010 IEEE
  • 2. processor core i0 i1 i2 i3 i4 router matcher full read full read 3 4 20 15 fifo control fifo control TSt token token token token + 7 + 0 3 token read extoken token PMem full full − fifo alu 2 fifo ∗ 1 data out read fu Figure 2. Example: Graph for the expression (i0 + i1 ) ∗ (i2 − (i3 + i4 )) control Table I L IST OF TOKENS FOR GRAPH IN F IGURE 2 token full Value Destination 3 0, L Figure 1. Schematic of the proposed architecture 4 0, R 7 2, L 20 3, L or pattern matching and polymorphism. As CλaSH is integrated 15 3, R into ghc, simulation of the design is very fast compared to a full VHDL simulation. The clock does not have to be explicitly defined. The designer the program memory. The token which was stored in the token describes the desired functionality of a module between two storage is then deleted from the storage. clock cycles as a transition from the current state to the next. The ALU can perform either an addition, a subtraction or a multiplication. With the opcode, it is determined which IV. P ROPOSED A RCHITECTURE operation has to be performed. Each computation takes one The proposed architecture is based on the principles of clock cycle, i.e. there is no pipelining and the result is dataflow processors found in literature [2]. It is implemented immediately sent to the output. as a static dataflow machine like [3], but with the explicit token By connecting the modules like shown in Figure 1, a store (ETS) principle presented in [4]. complete (though limited) dataflow processor is constructed. An overview is displayed in Figure 1. The processor consists A. Execution of dataflow graphs of three main modules, namely a router, which arbitrates data from both the external and the internal input, a matcher, which The dataflow processor is programmed by defining the is responsible for the matching process, i.e. the central principle destination of each node in the graph. Suppose a graph like the of a dataflow machine, and an arithmetical logical unit (ALU), one shown in Figure 2. The graph represents the expression which performs calculations of the data sent by the matcher. out = (i0 + i1 ) ∗ (i2 − (i3 + i4 )) with i0 = 3, i1 = 4, i2 = Tokens travelling through the processor contain a data value 7, i3 = 20, i4 = 15 as example input values. and the destination address. In order to calculate the result for a given set of inputs, the The whole processor can itself be considered a dataflow input values are sent to the corresponding inputs in forms of graph. This means that every connection between the modules tokens. In order to calculate (3 + 4) ∗ (7 − (20 + 15)), which corresponds to an arc on which tokens can be stored. In the is also used in Figure 2, 3 has to be sent to the left input of proposed architecture, buffers at the input of each module are node 0, 4 to the right input of node 0 and so on. The list of used to store those tokens. The same holds for the modules, tokens is shown in Table I. every module in the processor corresponds to a node which The temporary data values, i.e. the values resulting from one operates using the firing rules of dataflow. computation and travelling to the next computation, have to be sent to the correct destination. The destination is determined The router is responsible for managing incoming data from from the program memory. For the example graph, the program the outside (the external input) and data from within the memory is shown in Table II. processor (the internal input). Data which is present in the buffer of the internal input has priority over data in the external V. I MPLEMENTATION input. Also, the router can send data out of the processor. In this section, a brief introduction to the implementation The matcher consists of the token storage (TSt), which of the processor is given. implements the ETS principle, the program memory (PMem), which stores the operation in form of an opcode and the destination address(es) for every node in the graph, and a Table II control unit that takes care of the matching process. For each P ROGRAM MEMORY FOR GRAPH IN F IGURE 2 incoming token from the router it is checked whether it can be Node Operation Destination matched with a token already in the token storage. If not, the 0 ADD 1, L token is stored. If a match is found, the values of both tokens, 1 MUL out i.e. the stored one and the incoming one, are sent to the ALU 2 SU B 1, R together with the opcode and the destination address(es) from 3 ADD 2, R
  • 3. module module (State (fs,ms)) i = ((State (fs’,ms’)),o) where input output (fs’,fo) = fifo_control fs i fifo control fifo state (ms’,o) = module_control ms fo where fs and ms denote the current fifo and internal state, fs’ and ms’ are the new states, i and o are input and output module control internal state and fo is the current output of the fifo. The type of the output is a Maybe type to distinguish between a token present at the output and no token present. (fifo state,internal state) Figure 3. General implementation of the modules D. Matcher A. Buffers As the matcher is the central module which implements To implement the buffers at the inputs, fifo buffers were the dataflow principle, its implementation is described in used. The input of the fifo consists of a token wrapped in the more detail. The structure is implemented as shown in Figure so-called Maybe type2 , i.e. either a new token was received 3. The internal state of the matcher consists of the token or not, and a read-signal from the module which indicates if storage TSt and the program memory PMem. The state input to a value has been read from the fifo and can be erased. The module_control are the states of TSt and PMem, the state output is a Maybe token and a full-signal indicating if the fifo output is only the new state of TSt as the program memory is full. As flow control mechanism back pressure is used, i.e. does not change during execution. The data input consists of when the buffer is full, its full line is set to true which notifies the token sent by the router (Just (v,(u,s)), where v is the sending module that no more data should be sent. the value and (u,s) is the destination) and the full-signal from the ALU input buffer (fi). The data output consists of B. Tokens the read-signal to the fifo which indicates if a value was taken Tokens consist of a value and a destination. The destination out of the fifo and the data packet which is sent to the ALU. consists of an address which represents the node in the graph The Haskell implementation of module_control looks as and the input of the node, i.e. left or right. The implementation follows: in Haskell is as follows: module_control (tst,pmem) (Just (v,(u,s)),fi) data Side = L | R | not (check_match u tst) = (tst’,(True,Nothing)) type Word = Int16 | check_match u tst && fi = (tst,(False,Nothing)) type Dest = (Int7, Side) | otherwise = (tst’’,(True,(Just (l,r,op,d)))) type Token = (Word,Dest) where (l,r) | s == L = (v,(tst!u)) The keyword data defines new data values. The keyword | otherwise = ((tst!u),v) type is used to define a new data type by using existing data tst’ = addToken tst (v,u) types. tst’’ = clearPBit tst u (op,d) = pmem!u The data which is sent from the matcher to the ALU is an extended token which combines two data values, the opcode The function is divided into three cases. (1) No match is and four destinations which are wrapped in the Maybe type. found for the incoming token; (2) a match is found but the The Haskell implementation of the extended token type looks ALU buffer is full, i.e. it cannot receive any new data; or (3) a as follows: match is found and the ALU can receive data. data Op = ADD | SUB | MUL The different cases are modelled with Haskell guards, their type ExToken = (Word,Word,Op, principle is illustrated at the example of an inverter: (Vector 4 (Maybe Dest))) f :: Bool -> Bool C. General implementation of the modules f i The general implementation of the modules is shown in | i = False | otherwise = True Figure 3. The structure is similar for all modules, as they all have an internal state (internal state), a state of the fifo(s) If the input i is True, i.e. the first case, the result is False (fifo state), and data input and output. The internal state and If i is False, which is modelled in the otherwise case the fifo state are controlled by their respective control functions (which is the default case if all other cases evaluate to False), module ctrl and fifo ctrl. fifo ctrl depends on the data input the result is True and the state of the fifo, module ctrl is a function of the current Furthermore, some helper functions are used in output of the fifo and the internal state of the module. Both module_control. check_match u tst is True control function resemble a mealy machine. Both state elements if there is a match for the current token at address u in tst, are then combined and fed back into the module as Haskell otherwise it is False. addToken tst (v,u) adds a natively does not have a notion of state. The implementation value v to tst at index u, and clearPBit tst u clears looks as follows: the presence bit in tst at index u, i.e. sets the content 2 a Haskell datatype which can have the values Just x, indicating a valid to Nothing. The ! operator is the indexing operator in a value x or Nothing, indicating no value. CλaSH vector.
  • 4. Table III R ESULTS FOR ASIC SYNTHESIS The Haskell implementation of the complete processor was 300 lines of code, including comments and type definitions. area gates cells power Such a compact description also greatly simplifies debugging. 70850 µm2 33470 7460 6.9 mW To evaluate the outcome of the CλaSH implementation, the same processor was also implemented using pure VHDL. The E. Program memory details are presented in [9] and are not discussed here, only a The program memory is implemented as a vector of length brief summary is given. Both implementations were synthesised 128, i.e. currently 128 nodes in the dataflow graph are possible. for an FPGA and also for ASIC. For the FPGA synthesis, the Each element of the vector contains the opcode for the results in terms performance were similar with only small corresponding node in the graph and one up to four destinations. deviations. The ressource utilisation differed for the number The destinations are wrapped in the Maybe datatype to of registers, we assume it could be caused by the way CλaSH distinguish between valid and invalid destinations as nodes in a implements registers (e.g. no write-enable signal is used). dataflow graph usually have a different number of destinations. The ASIC synthesis (without clock gating) results were The number of the node is used as the address, i.e. to index surprising, as the VHDL implementation required considerably the vector. The Haskell implementation is as follows: more area (roughly a factor of 2.5) and consequently also more type PMem = Vector 128 power (a factor of 3). At the moment, we are not sure what (Op,(Vector 4 (Maybe Dest))) the reason for that is. However, when clock gating is enabled, the VHDL design is more power efficient, roughly by a factor where the keyword Vector denotes the CλaSH datatype of 8. The reason for that is that clock gating is not inserted for a vector with a defined size. during synthesis for the CλaSH implementation. We assume F. Token storage that this is again due to the missing write-enable signal. The token storage is also a vector of 128 elements, each VII. C ONCLUSION AND F UTURE W ORK element has space for one data value. The rest of the data In this paper we presented the complete design of a dataflow token, i.e. the destination, is not stored as it can be derived processor by descriping the architecture in Haskell and then from the second token which is matched with the token value using CλaSH to translate the description to fully synthesisable stored in the token storage. Each element is a Maybe datatype VHDL code. The VHDL was synthesised and placed and to distinguish between empty and occupied spots, i.e. to model routed, its correct functionality was verified by simulation. Due a presence bit required for the ETS principle. Like in the to full support of Haskell’s features like control structures and program memory, the number of the node is used as address. polymorphism in CλaSH, the processor could be described in The implementation in Haskell looks as follows: a very compact way. In Haskell, different levels of abstraction type TSt = Vector 128 (Maybe Word) can be used, in general the level of abstraction level is very high and close to the mathematical description of a design. Internal G. Arcs between the modules implementation details like timing are completely hidden from The modules are connected using Haskell’s so-called arrow- the designer. Summarising, we can say that it is possible to abstraction. Each module is wrapped into an arrow together design a complex architecture using CλaSH with relatively with its initial state. Several modules can be grouped by defining little effort. However, there are still many open issues which a new arrow where the arrows of the modules are connected. are being investigated at the moment, like the problem with The arrow for the processor is as follows: clock gating. R EFERENCES processorA = proc (di,fi) -> do rec (d,d2,f,f2) <- routerA -< (di,d1,fi,f1) [1] C. Baaij, M. Kooijman, J. Kuper, A. Boeijink, and M. Gerards, “Cλash: (d1,f1) <- coreA -< (d2,f2) Structural descriptions of synchronous hardware using haskell,” in returnA -< (d,f) Proceedings of the 13th EUROMICRO Conference on Digital System Design, Architectures, Methods and Tools, Nice, France, September 2010. where routerA and coreA are arrows for the router and the [2] A. H. Veen, “Dataflow machine architecture,” ACM Comput. Surv., vol. 18, no. 4, pp. 365–396, 1986. core, respectively, di and fi are the inputs to the processor, d [3] J. B. Dennis and D. P. Misunas, “A preliminary architecture for a basic data- and f are the outputs. d1,d2,f1, and f2 are the connections flow processor,” in ISCA ’75: Proceedings of the 2nd annual symposium on between the router and the core. Computer architecture. New York, NY, USA: ACM, 1975, pp. 126–132. [4] G. M. Papadopoulos, “Monsoon: an explicit token-store architecture,” in In Proc. of the 17th Annual Int. Symp. on Comp. Arch, 1990, pp. 82–91. VI. R ESULTS AND DISCUSSION [5] http://www.haskell.org/ghc/. The generated VHDL code was verified using a four-point [6] J. Kuper, C. Baaij, M. Kooijman, and M. Gerards, “Exercises in architecture specification using cλash,” in Proceedings of the Forum on Specification FFT and different mathematical expressions. The simulations and Design Languages, 2010. showed that the design worked as expected, i.e. the CλaSH [7] P. Bjesse, K. Claessen, M. Sheeran, and S. Singh, “Lava: hardware design generated VHDL code resembles the functionality of the in Haskell,” in Proceedings of the third ACM SIGPLAN international conference on Functional programming. ACM, 1998, pp. 174–184. Haskell description. [8] I. Sander and A. Jantsch, “System modeling and transformational design The design was fully synthesised and placed and routed refinement in ForSyDe,” IEEE Transactions on Computer-Aided Design using TSMC 90 nm low power library for a clock speed of of Integrated Circuits and Systems, vol. 23, no. 1, p. 17, 2004. [9] A. Niedermeier, R. Wester, K. Rovers, C. Baaij, J. Kuper, and G. Smit, 100 MHz. A four-point FFT was used to extract power numbers. “Comparing CλaSH and vhdl by implementing a dataflow processor,” in The results for both the area and the power consumption are Proceedings of the Program for Research on Embedded Systems and shown in Table III. Software (PROGRESS), 2010.