1. RULE EVALUATION ON A MOTOROLA SIMD
Melti n Bell: 512-505-8125, rzvy60@email.sps.mot.com
&
Rod Goke: 512-505-8121, rod_goke@oakqm3.sps.mot.com
Motorola Parallel Scalable Processors/Center for Emerging Computer Technology
505 Barton Spgs. Rd. Suite 1055, MD: F30, Austin, TX 78704
FAX: 512-505-81 00
ABSTRACT
Fuzzification, rule evaluation and defuzzification in most fuzzy logic systems are computationally expensive tasks. Many sys
tems using a sequential processor will scan the rules/knowledge base and fetch or recompute the fuzzy inputs even if one of them
is zero. Due to the nature of fuzzy AND-OR inference processing, this leads to unnecessary fetches and/or computations nega
tively impacting execution time and hardware resources. This paper presents an algorithm applied to the Association Engine
(AE) Single Instruction Multiple Data (SIMD) machine that attempts to make this fuzzy inference process more efficient by min
imizing the number of fetches and computations when fuzzy inputs are zero. Although this algorithm may be applied to fuzzy
logic systems using sequential processors, analyzing the fuzzy inputs before scanning the knowledge base will highlight the scal
able computing power of the AE as well as support Motorola's data oriented processing excellence in the fuzzy logic market.
BACKGROUND logic system, defuzzification, takes the fuzzy output data of
the second stage and converts it into a crisp output.
Although fuzzy logic has been around for more than 20 years,
it's taken a long time for it to gain acceptance in the engineer The process of taking the usually small set of fuzzy input
ing community. Over time, many people have addressed the grades and combining them with the rules for producing
potential drawbacks of fuzzy logic so that it is now seen as an fuzzy outputs closely matches our reasoning abilities and
invaluable tool in many of todays' systems. Even though partly explains why fuzzy logic systems often take less code
fuzzy logic is not generally suited for use in linear systems, and/or execute faster than traditional boolean logic systems.
it's projected that the fuzzy logic market will increase by 76% TIle basis for this second stage of the fuzzy logic process is
every year into a billion dollar business through 1998 [St93]. the fuzzy MIN-MAX inference method most frequently
The factors responsible for such market projections are applied to fuzzy set logical computation [Ar92]. This method
related to what makes fuzzy logic invaluable in many nonlin computes the fuzzy AND of multiple fuzzy input grades by
ear systems: faster and lower cost development, adaptiveness, taking the minimum grade of each individual fuzzy input type
smoother and simpler controls, fault tolerance, improved used in a rule. The rule weight giving the grade of one of the
product performance, maintenance and extensibility, etc. fuzzy outputs for such a rule is the same as the minimum
grade value of the fuzzy inputs. The method then computes
Fuzzy logic is also popular because it more closely emulates the fuzzy OR of multiple rule weights by taking the maxi
our reasoning abilities and knowledge modelling capabilities mum of the rule weights associated with a particular fuzzy
than traditional boolean logic systems [Ba93]. The first stage output. Mathematically, this method may be summarized by
of a typical fuzzy logic system, fuzzification, deals with find
ing the degree/grade to which crisp system inputs fit within • fuzzy out typeX.ruleY = MIN(ruleY.fuzzy in typel,
the membership functions (MF) of the fuzzy inputs. The sec grade...ruleY.fuzzy in typeN.grade)
ond part, rule evaluation or fuzzy inference, uses these fuzzy • fuzzy out typeX = MAX(fuzzy out typeX.ruleJ... fuzzy
input grades and the rules describing the desired behavior of out typeX.ruleN)
the system to produce fuzzy output grades. This is the key
stage of the process that models our knowledge reasoning
capabilities, and, consequently, is responsible for much of the where rule.fuzzy in type.grade is the grade of a particular
computation in most fuzzy systems. The last part of a fuzzy fuzzy input type associated with a rule, fuzzy out type.rule is
-1
2. the grade of a fuzzy output type associatedwith a particular
rule and fuzzy out type is the highestgrade for a particular
fuzzy output type. PENDULUM
MOTIVATION
Figure 1: Inverted Pendulum
Many fuzzy logic systems spend most of their computation
time during the fuzzy inferencestage because of the large
numberof fuzzy inputsand rules that mustbe scannedduring
the fuzzy AND-QRoperations. Since a fuzzy input grade of MOTOR
zerofor a rule meansa corresponding zero fuzzyoutput value
for thatrule and 75% of the fuzzy inputgradesof manyfuzzy
systemscharacteristically have zero values,significantcom
putationtimeand resourcesare wastedscanningtherules and
performing fuzzy AND-OR/MIN-MAX operationson zero
values. This paper will address this significant drawback to D D D
typical fuzzy logic systems with an algorithm written for a
MotorolaSIMD that improves the performance factor
directlyimpactingMotorola'sabilityto successfully compete
in the expanding fuzzy logic market.
~ ...
The example fuzzy logic applicationfor this algorithm is the There are seven triangularmembershipfunctions per input
InvertedPendulumProblem while the targetarchitectureis for this example.Three of the membershipfunctions repre
the AE. The InvertedPendulum Problem fuzzy logic param sent positive values: Positive_Large (PL), Positive_Medium
eters are given in the followingsection and derived in the ref (PM),and Positive_Small (PS). Three more membership
erence [K092]. The section after the InvertedPendulum functions representnegative values: Negative_Large (NL),
Problemdescription gives information on the AE related to Negative_Medium (NM) and Negative_Small (NS). The last
the example.The next section will cover the specifics of the membership function is Zero (ZZ). Each edge of these mem
algorithm itself (the sorting of the fuzzy inputs, the represen bership functions is prohibited from overlappingwith more
tationof rules/knowledge base format, the knowledgebase than one other membership function edge so that each crisp
scanning/generation of fuzzy outputs) and illustrate data ori system input will be described by no more than 2 nonzero
ented processing's effect on algorithmdesign. The section fuzzy inputs (out of 7 possible). Although three points are
following the algorithmdescriptionwill analyzeand summa enough to define triangularmembershipfunctions, four
rize the performanceof this algorithmfor the InvertedPendu points (Pl, P2, P3, and P4) are used in this exampleso that
lum Problem as well as larger fuzzy logic applications. The the applicationwill be general enough to be applied to fuzzy
last sectionacknowledges those who have contributed to this logicsystemsusingtrapezoidal membershipfunctions as well
paper. as triangularones. Unlike the input membershipfunctions,
singletons are used for the seven output membership func
tions (pL, PM, PS, ZZ, NI.quot; NM, NS) so that only one point
INVERTED PENDULUM PROBLEM (PI) is needed.
Balancing an invertedpendulum in two dimensions is a clas With the input and output membershipfunctions defined,
sic control problem. A motor is used to move the base of the commonsenseand some engineeringanalysismay be usedto
invertedpendulum. Motionin onlyonedimension is assumed generate the rules and membershipfunction point values
for thisexample to simplify theproblem to two inputs.These describing the behaviorof the system. For example, if the
inputsare theangle thependulummakeswith the vertical(A) pendulum falls to the right, a negative current should make
and theper secondrate at whichthe anglechanges(AC). The the motor compensate. Conversely, if the pendulum falls to
positiveor negativeamount of current (C) supplied to the theleft,the outputcurrentshouldbe positive. If thependulum
motor is the output that will balance the pendulum. The sys is balancedat the vertical, the output current should be zero.
tem is shown in the following figure: The full set of rules describingthe behaviorof the systemfol
low:
(1) IF A IS NL AND AC IS ZZ THEN C IS PL
-2
3. (2) IF A IS NM AND AC IS ZZ THEN C IS PM
(3) IF A IS NS AND AC IS ZZ THEN C IS PS Table 2: ANGLE CHANGE MF POINTS
(4) IF A IS NS AND AC IS PS THEN C IS PS MF PI P2 P3 P4
(5) IF A IS ZZ AND AC IS NL THEN C IS PL
NL -90 -90 -72 -49
(6) IF A IS ZZ AND AC IS NM THEN C IS PM
(7) IF A IS z: AND AC IS z:z THEN C IS ZZ NM -72 -49 -48 -25
(8) IF A IS zz AND AC IS PS THEN C IS NS
NS -48 -25 -24 -1
(9) IF A IS zz. AND AC IS PM THEN C IS NM
(10) IF A IS ZZ AND AC IS PL THEN C IS NL zz -24 -1 0 +23
(11) IF A IS PS AND AC IS NS THEN C IS NS
PS 0 +23 +24 +47
(12) IF A IS PS AND AC IS ZZ THEN C IS NS
(13) IF A IS PM AND AC IS ZZ THEN C IS NM PM +24 +47 +48 +71
(14) IF A IS PL AND AC IS zz THEN CIS NL
PL +48 +71 +90 +90
(15) IF A IS zz AND AC IS NS THEN C IS PS
The following tables apply engineering analysis techniques
for relating the crisp system input or output points to their
respective membership functions:
Table 3: CURRENT MF POINTS
MF PI
Table 1: ANGLE MF POINTS
MF PI P2 P3 P4 NL -18
NL -90 -90 -54 -36 NM -12
NM -54 -36 -36 -16 NS -6
NS -36 -19 -18 0 ZZ 0
ZZ -18 0 0 +20 PS +6
PS 0 +17 +18 +36 PM +12
PM +18 +36 +36 +56 PL +18
PL +36 +56 +90 +90
To summarize, the Inverted Pendulum Problem may be
described as a 2-input, l-output fuzzy logic system with 7
membership functions per input or output, a maximum of 4
nonzero fuzzy inputs and a total of 15 rules.
THE ASSOCIATION ENGINE
The AE is a single-chip SIMD coprocessor intended for data
oriented processing environments and parallel computing
-3
4. applicationsrequiring significantcompute power,such as for
pattern recognition, image compressionand decompression,
neural networks,and fuzzy logic [AE93]. Although many
AEs may be linked together in arrays for MIMDand/or large
SIMD processing, only one AE is required for the Inverted
Pendulumexample.ntis examplewill demonstratethescalar
engine which handles sequentialprogram execution,process
control, exception processing and other traditional scalar
operationsas well as the vector engine consistingof 64 pro Each of the scalar and vector PEs (65 per AE) contain a ded
cessing elements (PEs) for efficientexecution of parallel or icated 8-bit ALU enabling each AE to deliver 1.3 billion
vectorprocessingalgorithms. The followingfigures show all signed, unsigned or multibyte operations per second at a
of the major AE modules explained in this section: 20MHz clock frequency. The PEs receive their commands
from the Sequence Controller which in tum accesses them
Figure 2: Modules of the AE from the 256 byte InstructionCache (K'), Vectorengine PEs
execute the same instruction simultaneously, in lock-step,
each accessing the Input Data Register (lOR), Coefficient
Memory Array (CMA), or vector data registers (vO-v7) asso
ciated with it while the scalarengine PE executes instructions
that access the lOR, CMA, and scalar global and pointer reg
isters (gO-g7, pO-p7).
CMA
In combination with the scalar and vector engines, the CMA
and lOR are other major AE modules that demonstrate the
AE's flexibility. The 64 by 64 (=4K) bytes of CMA SRAM
functions as the general memory storage for instructions,
Control
Regia....
stack space,jump tables, workingdata and data arrays. A row
i15 I I
of 64 bytes is allocated to each of the 64 PEs so that a CMA
~
columnof 64 bytes is availablefor vector/paralleloperations.
The CMA can also interact with the lOR when the AE is in
Run (vs. Host) mode (e.g. the AE is processing instructions
insteadof interactingwith a host processor for randomand/or
stream accesses).
Figure 3: A Vector Engine Row
The IDR is the only input data path for the AE when the AE
is in Run mode. An input tagging feature allows the lOR to
access individual bytes of data out of a byte stream while an
inputreplicationfeature allows the individualbytes to be cop
ied to more thanone of the 64 IDRelements.Theseindividual
bytes enter from either of the 4 AE ports (North, South, East
Indirect-Pointer
PO through P7
and West) and go directly into the IDR. Up to 64 bytes of data
may then be accessed from the lOR by the scalar and vector
enginesduring AE programexecution. The scalarengine can
accessan element/byteout of the lOR while the vectorengine
can access all 64 elements/bytesof the lOR.
Although other features of the AE include many control reg
Figure 4: The Scalar Engine PE isters not yet definedand a rich instruction set where many
operations take 1 clock cycle, the Vector Process Control
Register(VPCR)and the instructionslisted in this sectionare
used to solve the fuzzy inferenceportion of the InvertedPen
dulum Problem. A VPCR is contained in each of the 64 PEs
of the vectorengine.Only two of the 8 bits in the VPCRapply
to this example. Although the Vector Conditional True (VT)
bit is usually used to evaluate if-then-elseconditions,the loc
-4
5. max instruction uses it to deactivate PEs that don't have the THE ALGORITHM
highest value among all vector register (vO-v7 and IDR) ele
ments. The ValidInput Data (VID) bit indicates that the asso As the second stage of a fuzzy logic system, rule evaluation
ciated lOR element has data that is valid for use. requires
Besides the locmax instruction, the following instructions • the fuzzy input grades of the first stage and
may be used for implementing efficient rule evaluation on
AEs: • the rules describing the mapping of fuzzy inputs to the
fuzzy outputs
• vnwv
• movi in order to generate the fuzzy output weights required for the
third stage. As implied earlier, most fuzzy logic systems start
• nwv
the fuzzy AND-ORIMIN-MAX operations by scanning the
• dskip rules and then fetching or computing the fuzzy inputs. This
• skipne means that rule processing will not only be proportional to the
number of rules, but the number of fuzzy inputs possible in a
• skipnvt system. With the 7 membership functions/system input, 2
• repeat system inputs and 15 rule Inverted Pendulum Problem, rule
processing will be proportional to 7 * 2 * 15 =210 member
• repeate ship function * rules even though a majority of the fuzzy
• vwritel inputs are zero.
• locmin By analyzing the fuzzy inputs and their impact on the fuzzy
AND-ORoperations before the process of scanning the rules,
• rowmin
this data oriented processing exercise changes the focus of
• rowmax computing from scanning all the rules and performing fuzzy
MIN-MAX computations on every fuzzy input to determin
• colmin ing the useful fuzzy inputs and then minimizing the amount
• colmax of computation performed on them. With a maximum of 2
nonzero membership functions/system input, 2 system inputs
• bra and 15 rules, rule processing under such a data oriented para
• vifgt digm extends the execution time so that it is proportionally
bounded by 2 * 2 * 15 =60 membership function * rules.
• vifne
• vifeq The data oriented processing emphasis of this algorithm is
atypical of many fuzzy logic systems because the processing
• vendif and space limitations of Single Instruction Single Data
• vor (SISD) chips, no matter how well or highly pipelined, require
that all fuzzy AND-OR computations, the scanning of rules,
• add and the recomputation or storing and retrieving of intermedi
• getpe ate results be performed by the single sequential processor.
During all phases of this algorithm, the data flowarchitecture
• get of the AE and the compute power available from its 65 pro
• put cessors stress the performance improvementover SISD chips
of using Motorola data oriented processing engines, such as
• inc AEs, for fuzzy logic solutions.
• dec
The first part of this algorithm will sort the fuzzy inputs and
• dsrot maintain/track the relationship of fuzzy input to membership
function so that the nonzero fuzzy inputs and rules using them
will facilitate efficient scanning of the knowledge base. The
The reader should consult the reference [AE93] for instruc second part of this algorithm, generating the fuzzy outputs
tion execution times and further explanation of instructions, from the sorted fuzzy inputs by efficientlyscanning the rules/
registers, or other AE features. knowledgebase, is closely related to the rule knowledge base
-5
6. format so this is given after the sorting and before the scan With the instructions listed above, many sorting options are
ning. For the remainder of this discussion, the fuzzy input available on the AE. Some of the options apply theory from
grades from the fuzzification stage will be stored in 14 ele sorting algorithms for conventional SISD processors, but
ments of a vector register and in the lOR, the fuzzy input offer a significantperformance improvement when applied to
membership functions will be stored in a vector register, the the AE. For example, a good sorting algorithm for a sequen
rules will be stored in a 7 by 14 byte space within the CMA tial processor would have a performance proportional to O(N
and the fuzzy output rule weights will be computed in a sec * 10g(N)) where N is the number of items to be sorted.
ond vector register. The register map for these and other val Although there are theoretically faster sorting algorithms for
ues used for intermediate calculations follows: sequential processors, the hardware or software overhead
usually makes them undesirable or inefficient for small N.
The application of an O(N * 10g(N)) conventional SISO sort
ing algorithm to an AE, however, can result in linear perfor
Table 4: Register Map
mance, O(N), practically impossible to achieve on any
conventional sequential processor [l(n73, Be93]. Though
Fuzzy Input Grades v l, either linear sorting algorithm would be sufficient for the
lOR Inverted Pendulum Problem, a routine based on the locmax
instruction will be used to demonstrate the diversity and
PE With the Largest Fuzzy Input Grade p3 uniqueness of the AE instruction set and architecture.
The first part of this routine will initialize the Sorted Fuzzy
The Largest Fuzzy Input Grade g4
Input Grades vector (v l), Tracked Fuzzy Input MFs vector
(v2), Zero global (g3), and the Sorted Fuzzy Input Grades
Sorted Fuzzy Input Grades vO Index Pointer (p4) registers to zero. The next part of the rou
tine is a loop that selects the largest fuzzy input grade from
Sorted Fuzzy Input Grades Index Pointer p4 the Fuzzy Input Grades vector (v1) register, inserts that value
into increasing locations of the Sorted Fuzzy Input Grades
Number of Nonzero Fuzzy Inputs g5 vector (vO), and then replaces the largest fuzzy input grade in
the Fuzzy Input Grades vector (vI) with zero. The AE assem
Tracked Fuzzy Input MFs v2 bly code of this descending values sorting routine follows:
vmov#O, vO
Fuzzy Input MF Pointer Into CMA pO
vmov#O, v2
Rules CMA[O,3]
CMA[6,I6] movi #0, g3
movi #0, p4
Fuzzy Input MF Column Offset Into CMA g7
IDP: locmax#8, vI
Number of Fuzzy Input MFs for Example g6
skipnvt
Zero g3
bra BOTTOM
Pointer Into IOR/Fuzzy Input Grades p2 getpe p3
Latches Bit Vector v3 get v l, pe[p3], g4
Fuzzy Output Weights v4 put g4, pe[p4], vO
put p3, pe[p4], v2
put g3, pe[p3] vI
Sorting The Fuzzy Inputs inc #1, p4
-6
7. vendif
bra TOP
BOTTOM:
quot;'faIl~
MF·» NL NM NS
AHQl£
zz. PS PM PI.. NL NM
ANGLECHANGE
NS zz PS PM PI..
8
u
b
r
.
R
u
I
.. CIIA
Oulpul
)(~
·
0
The locmax-based sorting routine given above may be easily
CMACol.. 3 4 5
• 7 1
• 10 11 12 13 14 15
quot;
1 1 1 1 1 1 1 1 1 1 1 1 1 1 0
modified for sorting across multiple AEs by substituting row 1 1 t 1 1 1 t 1 t 1 1 1 t 1 1
max,rowmin, colmax, or colmin instructions for locmax and t 1 t t t 1 1 1 1 1 1 1 1 1 2
then writing the result out to a port for further processing by 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3
1 1 1 1 1 1 1 1 1 1 1 1 , , 4 QINL
[an]other AEs or other hardware. , ,
1 1 1 1 1 1 1 1 1 1 1 1 5
0 0 0 0 0 0 , 0 0 0 1 0 0 0
• 14
a a , a a a a a
Rules Knowledge Base Format ... ---» FC
0
FC Fe FD Fe
0
Fe FE FC Fe FC
0
FE
0
Fe
0 t 7 10
Fe FD
1 t 1 t , , , t , 1 1 1 1 1 1
The format for representing the rules of a fuzzy logic applica 1 , 1 1 , • 1 1 1 1 , , 1 1 1
tion written for the AE further illustrates one of the data ori 1, , , , , , , , , , , , 1
'0
ented processing edges over conventional function oriented
, , , , , , 1, , , , 1 t 1
1, , , , , 1
quot;
1 1 1 1 1 1 1 12 11NU
processors. ntis format was chosen to make the storage of the , , , , , , , , 1 1 t 1 t 1 13
knowledge base very compact and the scanning of these rules a a a 0 0 1 a a 0 a 1 a a a 14 13
highly efficient. Each rule stored within the CMA will take up D a D 1 a D a a D D D a 1 a 15
,
a subrow of bits in the CMA. The length of the subrow will ... ---» FC FC Fe FD FC FE FC Fe FC FC FE FC FD FC
be the number of fuzzy input MFs. All subrows contributing 1 , , 1 , 1 1 , 1 1 t , , 1
, , , , , , , , , quot;
1 1 t 1 '7 1
to a fuzzy output must be grouped together in a CMA row so , , , , , , , '1
1 1 1 t 1 1 1
that a total of 8 rules may affect a fuzzy output. For the 1 1 1 1 1 1 1 1 1 1 1 1 1 1
quot;
Inverted Pendulum example, the fuzzy inputs MF relation 1 1 1 1 1 1 1 1 1 , t 1 , 1 2D 2INS
ship to fuzzy output MFs requires 14 columns and 7 rows of a D a a 1 D 0 a a a 1 a a a 21 12
CMA space and can represent a maximum of 7 fuzzy outputs a 0 0 a 1 0 a D a 1 0 0 a a 22 11
* 8 rules per fuzzy output = 56 rules with the limitation that ... ---»
D
FI
D
Fa
D
Fa
1
FI
D
FE
D
FI
0
Fa
a
Fa
0
Fa
0
FA
a
FC
1
FI
0
Fa
0
FI
23 1
no more than 8 rules contribute to a fuzzy output, 1 1 1 1 1 t , , 1 1 , , ,
1 24
, 1 , , 1 1 , 1 , , 1 , 1 1 25
Since there are only 15 rules describing the Inverted Pendu , 1 1 t 1 , , , 1 1 , , 1 1 21
lum Problem, there will be 56 - 15 = 41 subrows that will not 1 , , 1 , , , , ,
1 1 , , 1 27
contribute to a fuzzy output. These excess subrows must be 1 1 , 1 1 1 , , 1 1 , 1 t 1 a 3IZZ
, 1 , , , , , , , 1 1 , , ,
21
filled with I's to facilitate a latching mechanism described , , , , , , , , ,
1 1 1 1 1 3D
later.The other subrows identifying fuzzy input MFs contrib , ,
D
uting to a fuzzy output MF will be filled with l's and O's. For
each of the 15 rules, the bits within each subrow set to 1 will
... ..
~
D
FE
, ,
0
FE
D
FE
1
FF
,
0
FE
1
a
FE
,
D
FE
1
D
FE
,
D
FE
1
D
FE
,
FF
,
a
FE
1
0
FE
t
FE
1
3'
3i2
7
identify the fuzzy input MFs contributing to a fuzzy output , , t t 1 , , , , 1 , t , 1 33
, , , , , t , , 1 , , 1 , , 34
while those fuzzy input MFs not contributing to the fuzzy , , , , t , , , ,
t t t t 1 36
output for this same subrow/rule will be set to O. For this 1 , 1 , , , , , , t 1 1 , , 3& 4JPS
example, exactly two bits will be set in a subrow for a rule 0 0 0 , 0 D ,
0 0 0 0 D D 0 37 '5
because each rule uses both fuzzy inputs. The following bit 0 0 t 0 0 0 0 D 0 0 0 , 0 0 38 4
map of the CMA representing this format for the Inverted 0 0 , 0 0 0 0 0 0 0 t 0 0 D 38 3
... ~-» FI FI F8 FC FI FI FI FI FI FC FI FA FI FI
Pendulum rules shows the CMA columns and subrows iden , 1 , , , , , , ,
1 1 t t t 4D
tifying the fuzzy inputs and associated MFs that contribute to , , , , , , , , 1 t , , , , 4'
a particular fuzzy output: , , , , , , , , , t , , , , Q
t , , , , , , , t t , , , t 43
Table 5: Inverted Pendulum Problem Rules Knowledge Base t t , , , , , , , , , , ,
t .w 5iPM
, , t , , t , , , , t , , , 45
D D D ,
0 D 0 0 1 0 D D a a 46 I
,
... ~ ..
D
FC FD
0
FC
0
FE
0
FC
0
FC
0
FC
0
FC
0
FE
0
FC
1
FD
D
FC
0
FC
0
FC
47 2
-7
8. weights. For these reasons and the fact that finding the zero
Input·.
W .. M. NY NS
AHG..e
ZZ PS PM Pl M. NY
ANOl.ECHANGE
N8 zz. PS PM Pl
a
II
b
..
.. R
II
CMA fuzzyoutput weights facilitatescalculating the nonzero fuzzy
output weights, this part of the fuzzy MIN/ANDevaluation is
.
r I 0uqIu&
CUACGI·. 3 4 5 I 7 I , '0 '2 '4 '5 '1
M~
performedas the first computations for generating the fuzzy
quot; quot;
, , , 1 1 1 1 1 , 1 1 1 1 1 48 output weights.
, 1 1 , , 1 1 1 1 1 1 , 1 1 4'
, , , , , , 1 , ,
1
,1 1 1 50 Just as with the sorting routine, the first part of generatingthe
, 1 , , , , , 1
,1 1 1 1 1 51
fuzzyoutput weights will initialize a number of registers with
, , , , 1 , , , 1 1 1 1 52 IA.
, , ,
1
, , , ,
1
, , 1 1 !as
appropriate values. The Fuzzy Output Weights (v4) vector,
1 1 1
0 0 0 , a 0 a , 0 0 a a 0 a 54 5
Pointer Into IDR/Fuzzy Input Grades (p2), and the Latches
, a a 0 0 0 0 0 0 a , a a 0 55 1 Bit Vector(v3) registers will be initialized to zero while the
..... ar-.. FD Fe FC FE FC Fe Fe FE Fe FC FD Fe Fe Fe Fuzzy Input MF Column Offset Into CMA (g7) global and
Fuzzy Input MF Pointer Into CMA (pO) registers will be set
to 3. The Number of Fuzzy Input MFs for Example (g6) glo
Although each subrow contributing to a fuzzy output is bal register will be set to 14. After the initializations, the
placed at the end of a CMA row, the actual order of subrows CMA will be scanned and bits within the Latches Bit Vector
within a CMA row is not important Gustthat all subrows (v3) register will be set to reflect fuzzy input MFs with zero
affecting a fuzzy output be groupedin the same row). With so weightsand other excess subrows not contributing to a fuzzy
many excess subrows, however, it helps in generating the output MF weight. Any PEs containing a Latches Bit Vector
hexadecimalCMA bytes if the upperor lower 4 bits are all 1's (v3) element/byte with all bits set will be deactivated so that
(i.e. F). rule weights of zero will not be changed by subsequent pro
cessing.The AE assembly code for the first part of generating
Besides being a compact representationof the rules knowl
the fuzzy output MF weights follows (fuzzy MIN/AND oper
edge base, this format allows for a latching mechanism to be
ation):
employed when scanning the rules so that bits within the latch
are set for excess subrows and when fuzzy input MFs contrib vmov#O, v4
ute to a fuzzy output MF.The bits within the latch will never
be cleared so that a fuzzy output MF weight is known when movi #3, g7
all bits in a byte of the latch are set to 1.This weight, however,
may not be the correct weight because more than one fuzzy movi g7, pO
output MF weight is possible and the fuzzy OR operation
requires that the highest weight be chosen. movi #O,p2
vmov#O, v3
Generating Fuzzy Output MF Weights From
Rules and Sorted Fuzzy Input Grades
movi #14, g6
As stated above, the latching mechanism supported by the repeate #2, g6
rules knowledge base format is not enough to guarantee that
vifeq IDR[p2++], v4
the correct fuzzy output weight will be generated when scan
ning the knowledge base. The fuzzy input grades are sorted vor CMA[pO++], v3
from highest value to lowest value partly because of this
problem. The main concept behind this phase of the algo vifne #-1, v3
rithm is a method of using the fuzzy inputs, sorted fuzzy
inputs and associated MFs for efficientlyscanning the knowl The next part of generating the fuzzy output MF weights ini
edge base so that the fuzzy AND-OR operations are pre tializes the Number of Nonzero Fuzzy Inputs (g5) global reg
served and the correct fuzzy output weight is generated for ister and sets the Sorted Fuzzy Input Grades Index Pointer
each fuzzy output MF. (p4) register to point to the last element (e.g. lowest grade) of
the Sorted Fuzzy Input Grades vector (vO) register. These ini
Since a majority of the fuzzy input grades are zero, this tializationsare done so that the Sorted Fuzzy Input Grades
method must evaluate all of the rules dependingon these zero (vO) vector may be traversed from smallest grade to largest
fuzzy input grades and generate zero fuzzy output weights grade as part of this algorithm's fuzzy AND-OR/MIN-MAX
appropriately. With the IDR holdinga copy of the fuzzy input inference processing. The fuzzy AND-OR/MIN-MAX infer
grades, this operation is relativelyeasy to perform and under ence processing loop involves
stand compared to calculating the nonzero fuzzy output
-8
9. • extracting the MF numberof the lowest fuzzy input grade that the knowledge base would only be scanned once. Since
not yet processed into the Fuzzy Input MF Pointer Into the rules knowledge base is scanned twice during this last
CMA (pO) register, phase of thealgorithm (once for processing zero fuzzy output
weights and once for processing nonzero fuzzy output
• extracting the lowest fuzzy input grade not yet processed
weights), the theoreticalexecution time is proportionally
(continuance of the fuzzy MIN/AND operation which
bounded by 2 nonzero membership functions/system input *
started with computing the zero fuzzy output member
ship function weights), 2 system inputs * 15 rules * 2 = 120 membership function *
rules. This is still just under twice as fast as is possible on a
• adding the Fuzzy Input MF Column Offset Into CMA conventionalprocessor.In practice, however, the theoretical
(g7) register to the MF number of the lowest fuzzy input execution time can be proportional to as little as 60 member
grade not yet processed (pO) register, ship function * rules when there is only 1 nonzero member
ship function/system input. For comparison's sake, let's
• ORing the rules using the lowest fuzzy input MF with the
assume that the average theoretical execution time of this
Latches Bit Vector (v3) register,
algorithmwill be proportional to (120 + 60) /2 = 90 member
• moving the lowest fuzzy input grade into the active ele ship function * rules. This represents a theoretical 210/90 =
ments of the Fuzzy Output Weights (v4) vector register, 233% performanceimprovement over must fuzzy logic sys
tems.
• setting up the Sorted Fuzzy Input Grades Index Pointer
(p4) register to point to the next lowest fuzzy input grade
not yet processed, and PERFORMANCE AND SUMMARY
• deactivating the PEs with all the bits set in their Latches
Though the performance estimates given above for the algo
Bit Vector (v3) register (fuzzy MAX/OR operation)
rithm are impressive, they do not give the exact amount of
time it takes for the algorithm to execute on the AE nor do
The AE assembly code for this last part of generating the they illustrate the AE's suitability for solving fuzzy logic
fuzzy output MF weights follows: problems of varying sizes. This section will give the algo
rithm's worst execution time in clock cycles for the Inverted
movp4,g5 PendulumProblem and larger fuzzy logic systems based on
an unpipelinedAE and instruction cycle times given in the
dec #1, p4 reference [AE93].The calculations used to generate the num
ber of clock cycles in the following table reduces to 2 * I * 10
repeat #7, g5 + 72 * 1+ 41, where I is the number of system inputs for the
fuzzy logic problem, 10 is the number of fuzzy input or out
get v2, pe[p4], pO put membershipfunctionsper system input or output and the
maximum number of rules supported is 10 * the number of
get vO, pe[p4], g4
fuzzy outputs * 8. Since the number of CMA columns
addg7,pO accessed by this algorithm is only dependent on the number
of fuzzy input MFs, this algorithm has the added benefitof
vor CMA[pO], v3 allowing for a constant execution time when the number of
rules is less than the maximum number of rules supported.
vmov g4, v4
Table6: Performanceof Algorithm for DifferentFuzzyLogic
dec #1, p4 Systems
vifne #-1, v3
vendif
Fuzzy Logic System I 0 10 Max Rules Cycles
The vendifreactivates all the PEs that were deactivated dur
ing the fuzzy AND-OR/MIN/MAX inference processing so Inverted Pendulum 2 1 7 56 213
that the third stage of fuzzy logic processing,defuzzification, 2/1 2 1 8 64 217
doesn't have to worry about the state of the PEs. 4/2 4 2 8 128 393
It should also be noted that the theoreticalexecution time esti 6/3 6 3 8 192 569
mategiven for the algorithmearlier was under the assumption 8/4 8 4 8 256 745
-9
10. [Ko92] Kosko, B., quot;Neural Networks and Fuzzy Systemsquot;,
Prentice-Hall,Inc., Englewood Cliffs, NJ, 1992.
The difference between the 2/1 and 4/2, 4/2 and 6/3, 6/3 and
8/4 fuzzy logic systems in the above table is exactly 176clock [[5t93] Stevens, T., quot;Fuzzy Logic Makes Sensequot;, Industry
cycles. This data proves that the AE scales linearly with the Week, March 1, 1993 pp. 36 - 42.
size of fuzzylogic systemsand providesan excellentexample
of a chip well designed for scalable computing performance,
Note also that even for the largest fuzzy logic system, 8/4,
halfof the CMA rows are empty.This implies that the AE can
support larger fuzzy logic applications requiring more rules
and/or fuzzy output MFs with slightly modified(if modified
at all) code. This is important to note because although the
problem size may increase, the code size may very well stay
the same without adding significantlyto execution time.
In summary, this algorithm is particularly exemplary of data
oriented processing enhancements available with applica-
tions using the AE. It shows how solving smaller parts of a
fuzzy logic problem on the AE with data oriented partitioning
elegance creates an interdependenceamong all phases of a
problem solution allowing for greater overall efficiencyand
scalabilitythan can be attained with conventional processors.
These factors will give Motorola a clear performance advan-
tage in fuzzy logic markets.
ACKNOWLEDGEMENTS
The authors would like to recognizeWilliamArchibaldas the
firstand only other individual (to the authors' knowledge) to
develop the basic algorithm and apply it to any other hard-
ware (Ar92] as well as for his time in helpingus to understand
the algorithm. Alex DeCastro also provided the figure used in
this paper and the source for one of the references.
BIBLIOGRAPHY
[AE93] Motorola Parallel Scalable ProcessorsGroup, quot;Asso-
ciation Engine (AE) Software Manualquot;, Motorola MCTG
Publications, 1993.
[Ar92]Archibald,W.,quot;FLIPPER Architecturaland Algorith-
mic Notesquot;, Not yet published.
[Ba93] Barron, J., quot;Putting Fuzzy Logic Into Focusquot;, Byte,
April 1993 pp. 111 - 118.
[Be93] Bell, M., quot;Sorting on the AEquot;, Not yet published.
[Kn73] Knuth, D., quot;Sorting and Searchingquot;, The Art of Com-
puter Programming, Vol. 3, Addison-Wesley Publishing
Company, Menlo Park, CA, 1973.
- 10 -