SlideShare une entreprise Scribd logo
1  sur  10
Télécharger pour lire hors ligne
VoCoRoBo: Remote Speech Recognition and Tilt
Sensing Multi-Robotic System
Sagun Man Singh Shrestha1
, Labu Manandhar2
, Ritesh Bhattarai3
Department of Electronics and Computer Engineering,
Tribhuvan University – Kathmandu Engineering College, Nepal
Gmail: 1
sagunms, 2
laburocks, 2
reittes | github.com/sagunms/vocorobo
Abstract: This work is based on the implementation of real-time speech recognition using DSP
algorithms such as Chebyshev IIR filters, accelerometer for tilt-sensing and establishment of short-
range wireless secure link with ARC4 cipher, all using low-cost 8-bit ATmega microcontrollers.
The robot implements a simple but effective algorithm for comparing the spoken word with a
dictionary of fingerprints using a modified Euclidean distance calculation. It also includes the ability
to securely control the navigation of multiple robots located at remote locations wirelessly from the
Control Module and also gather the various environmental data collected by the Robot Modules and
display them in the back to Control. Considering the time-critical algorithms actually requiring large
computations as well as a variety of sensors interfaced in the system, this project can demonstrate
how one can build an expansible multi-robotic system from cheap and ubiquitous electronics.
Keywords: Speech Recognition, Chebyshev, Digital Signal Processing, Euclidean Distance, ARC4
Cryptography, ATMega16/32, nRF24L01+ Wireless Transceiver, MMA7260Q Accelerometer
I. INTRODUCTION
VoCoRoBo stands for Voice Controlled RoBot in
which the user is capable of wirelessly controlling
multiple robots with either a voice command or
tilting the controls towards the desired direction. In
addition to this, each robot also relays temperature
and light sensor data securely back to the user station.
1.1 HARDWARE
A microcontroller is an integrated circuit composed
of a microprocessor unit, memory, and input/output
peripheral devices. Atmel ATmega32/16 is a low-
power CMOS 8-bit microcontroller based on the
AVR RISC architecture which is used to implement
the voice recognition, tilt-sensing, wireless and
cryptography algorithms. An accelerometer measures
proper magnitude and direction of acceleration
experienced relative to freefall, and can be used to
sense orientation. Controlling the robots with fun and
intuitive tilt gestures was possible using the Freescale
MMA7260Q 3-axis accelerometer. The two parts of
the system – control and robot modules are linked
wirelessly using the popular Nordic nRF24L0+ radio
transceiver. It operates on 2.4 - 2.5 GHz ISM band,
with air data rate up to 2Mbps, has ultra low power
operation and is ideally suited for remote control and
data acquisition. L293D H-bridge IC is a quad push-
pull driver capable of delivering output currents up to
600mA per channel. To control each robot turning
speeds simply by speed difference between wheels on
either side, differential drive technique was used.
1.2 SOFTWARE
Speech recognition is the process of converting an
acoustic signal captured by microphone and then
identifying the word from the sound. Due to speaker
dependence, the system needs to be trained before
use. Digital signal processing is concerned with the
representation of signals by a sequence of numbers
and their processing. Infinite impulse response is a
property of signal processing systems having impulse
response function that is non-zero over infinite length
of time. An example of IIR filter are Chebyshev II
filters having a steeper roll-off and more stop band
ripple than Butterworth filters. They minimize the
error between the idealized and the actual filter
characteristic over the range of the filter.
1.2.1 Speech Analysis
In speech recognition, the frequency content of the
detected word has to be analyzed. Several 4th
order
Chebyshev band pass filters are created by cascading
two 2nd
order filters using the following Direct Form
II Transposed realization of difference equations.
( ) ( ) ( )
( ) ( )
( ) ( ) ( )
( ) ( )
( ) ( )
Coefficients a’s and b’s used in the above equations
was obtained using the following syntax in Matlab.
[B,A] = cheby2(2,40,[Freq1, Freq2]);
cheby2 designs Chebyshev Type II digital filter using
the given specifications, 2 defines a 4th
order filter, 40
defines the stop band ripple in dB, and Freq1 and
Freq2 are the normalized cutoff frequencies. The
tf2sos function is then used to convert the transfer
function of the filter to a 2nd
order section version.
1.2.2 Voice-fingerprint Calculation
Due to the limited RAM on the ATMega32, the
relevant information of each spoken word had to be
encoded in the form of a ‘fingerprint’. To compare
fingerprints, the following pseudo Euclidean distance
formula was used between the fingerprint of stored
and sampled word to find correct word.
∑| |
where, P = (p1, p2, ..., pn) is the dictionary fingerprint
and, Q = (q1, q2, ..., qn) is sampled word fingerprint.
pi and qi are the fingerprint data points. To see if two
words are the same, the distance between them are
computed and the words with the minimum distance
in the database are considered to be the matching
word. Original Euclidean distance requires squaring
the difference between two points. Fixed point
arithmetic produces too large a number, causing the
variables to overflow. Thus a modified formula was
used by neglecting the square root and the square
which practically showed satisfactory results.
1.2.3 ARC4 Cryptography
ARC4 is one of the most widely used software stream
ciphers in many encryption schemes, including WEP,
WPA, and SSL. The main factors in ARC4's success
over such a wide range of applications are its speed,
simplicity and efficiency in software and hardware.
3. DESIGN AND IMPLEMENTATION
3.1 HARDWARE ARCHITECTURE
Figure 3.1: Overall Hardware Architecture
2.4 GHz
wireless link
with 2 bytes
(control byte
+ count byte)
payload
ATmega16 @ 8 MHz
(nRF24L01+ wireless
interface with ARC4
Cryptography)
ATmega32 @ 16 MHz
(Speech Recognition and
MMA7260Q Tilt
Sensing)
Port C
PB0-PB3
ADC
(Port A)
PD3-PD5
x y
z
Port C
SPI
(Port B)
PA0-PA2
nRF24L01
Module
LCD LEDs
Port C
SPI
(Port B)
PD0-PD3
LEDs
L293D
H-Bridge
M
M
nRF24L01
Module
ATmega16 @ 8 MHz
(nRF24L01+ with
ARC4 and H-Bridge
interface)
The system is divided into two broad sub-
subsystems: Control Module and Multi-Robot
Module. The Control Module is further divided into
two layers: the topmost layer and the second layer.
3.1.1 Control Module
The topmost layer of the control module consists of
ATMega32, where speech recognition, MMA7260Q
accelerometer sensing, output to 16x2 text LCD are
handled. The 2nd
layer consists of ATMega16 where
the nRF24L01 wireless routine as well as encryption
and decryption with ARC4 cipher are implemented.
The bridge protocol between the 1st
and 2nd
layers in
the control module (Fig. 3.1 and 3.2) is designed such
the three output pins of PORTD of ATMega32 viz.
PD2, PD3 and PD4 are connected to the respective
input pins of PORTA of ATMega16 viz. PA0, PA1
and PA3. When the 1st
layer recognizes the spoken
word (front, back, left, right or stop), the equivalent
bit combination is inputted to PORTA of the 2nd
layer
via these bridge lines. The 2nd
layer then sends out
the corresponding control byte wirelessly via SPI
port. When the one of the robot receives this control
byte, it will be decoded into its matching differential
drive motor combinations that will move the robot
physically in the commanded direction.
FUNCTION
Equivalent
received
control byte
PIN A
(Connected to Layer 1)
BINARY HEX
PA2 PA1 PA0
STOP S 0 0 0 00H
FRONT F 0 0 1 01H
BACK B 0 1 0 02H
LEFT L 0 1 1 03H
RIGHT R 1 0 0 04H
SPD_UP U 1 0 1 06H
SPD_DN D 1 1 1 07H
Table 3.1: Function control byte to be sent out via Wireless
(SPI port) and corresponding bit combination inputted to
the second layer of Control Module (PINA).
3.1.2 Robot Module
It consists of two identical robots (A and B) which
can be positioned at different locations, provided they
are within the signal range of the Control Module.
Each robot consists of an ATMega16 with sensors
that take environmental data specifically, LM35
temperature sensor and a light dependent resistor. A
2.4 GHz wireless transceiver nRF24L01 is also
available on-board to receive control data and
transmit the remote data for data acquisition. With
L293D H-Bridge driver, two differential drive motors
are controlled independently so that the robot can
navigate front, back, left or right. Four input pins of
the L293D viz. IN1, IN2, IN3 and IN4 are connected
to four output pins of PORTD of ATMega16 viz.
PD0, PD1, PD2 and PD3 respectively.
FUNCTION
Equivalent
received
control byte
PORT D
(Connected to H-
bridge)
BINARY HEX
(LSB)
IN4 IN3 IN2 IN1
STOP S 0 0 0 0 00H
FRONT F 0 1 1 0 06H
BACK B 1 0 0 1 09H
LEFT L 0 0 1 1 03H
RIGHT R 1 1 0 0 04H
SPD_UP U
SPD_DN D
Table 3.2: Function control byte received via Wireless
(SPI port) and corresponding bit combinations outputted to
H-bridge (PORTD).
3.2 SOFTWARE ARCHITECTURE
3.2.1 Input, Processing, Output
Figure 3.2: Input, Processing and Output block diagram for
speech recognition
At a rate of 4 KHz, the algorithm checks the ADC
input for audio signal. If the ADC value exceeds the
threshold value, it is taken as the start of half a
second long word. The sampled word passes through
Speech ADC Band Pass
Filters
Generate Voice
Fingerprints
Fingerprint
Templates
Control
Signals
Output to
the Robot
COMPA
RE
8 band pass filters to be encoded into a fingerprint.
The words to be matched are stored as fingerprints in
a dictionary so that newly generated sampled
fingerprints can be compared with them later. The
modified Euclidean distance calculation finds the
fingerprint that is the closest match and then sends a
control signal ultimately to the robot to perform
operations like left, right, front, back and stop.
3.2.2 Initial-Threshold Calculation
All the background sound at the startup is considered
to be a base value which improves the accuracy of
the speech recognition. At the start up, the algorithm
reads the ADC input using ATMega32 timercounter0
and accumulates its value 256 times. By interpreting
the reading of the ADC value as a number between 1
to 1/256 in fixed point, and accumulating 256 times,
the average ADC value is calculated without doing a
multiply or divide. Three average values are taken
each with a 16.4 ms delay between the samples. After
this, the threshold value is to be four times the value
of the median number. The threshold value is useful
to detect whether a word has been spoken or not.
3.2.3 Voice-fingerprint Generation
Figure 3.3: Filter implementation block diagram for the
generation of fingerprints
The program considers a word detected if a sample
value from the ADC is greater than the threshold
value. Every sample of ADC stored in an integer
variable Ain which again passes through eight 4th
order band pass filters for 2000 samples (half a
second) once a word has been detected. When a filter
is used its output is squared and that value is
accumulated with the previous squares of the filter
output. After 125 samples the accumulated value is
stored as a data point in the fingerprint of that word.
The accumulator is then cleared and the process is
begun again. After 2000 samples 16 points have
been generated from each filter, thus every sampled
word is divided up into 16 parts. Our assembly
language code is based around using 8 filters and
since each one gives an output of 16 data points and
every fingerprint is made up of 128 data points.
3.2.4 Filter Design and Implementation
Figure 3.4: Band pass Filter 200-400 Hz
Figure 3.5: Band pass Filter 1600-1800 Hz
3.2.5 Digital Filter Implementation
The 4th
order Chebyshev digital filter with 40 dB stop
band was chosen due to very sharp transitions after
the cutoff frequency. Most of the important
frequency content in speech is found to be within the
first 2 KHz as it usually contains the first and second
speech formants. Thus 8 BPFs of frequencies ranging
from .2 to 1.8 KHz were designed as shown in the
magnitude and phase plot. This also permitted the
sampling at 4 KHz (to satisfy the Nyquist criteria for
sampling first 2 KHz voice frequencies) and enough
time to implement 8 filters. For sufficient frequency
resolution to properly identify words, bandwidth of
each filter is set to 200 Hz.
Each 4th
order filter is created in assembly code by
cascading two 2nd
order IIR filters whose coefficients
ADC FILTER 2
FILTER 9
ACCUMULATOR
ACCUMULATOR
VOICE
FINGERPRINT
are generated using Matlab (Listing 1.2.1). Floating
point coefficients are converted to fixed point by
multiplying them by 256 and rounding off to nearest
integer in real-time. Fixed point was used instead of
floating point (which would have been more
accurate) as floating point calculations of ATMega32
is too slow to call all the filters within 4 KHz.
The ATMega32 only has 2 KB of RAM and a word
sampled at 4 KHz for a half a second would require
entire 2 KB. In order to make a fingerprint then from
a word the ADC output has to pass through all the
filters faster than the ADC sample time of 250 µs.
The output of the filter was squared in order to store
the intensity of the sound rather than just the
amplitude. Since the lowest and highest frequencies
could be neglected without noticeable degradation in
accuracy of speech recognition and that the memory
and cycle time of ATMega32 wouldn’t be sufficient
to implement all ten filters, only 8 BPFs was
sufficient to compartmentalize frequencies between
200 Hz - 1.8 KHz.
3.2.7.1 Chebyshev II filter coefficients
# Filter 1 Filter 2 Filter 3 Filter 4
f,KHz
0.2 – 0.4 0.4 – 0.6 0.6 – 0.8 0.8 – 1
1st
2nd
-order
coeff.
A112:451
A113:-248
B111:21
B112:-32
B113:21
A212:355
A213:-248
B211:27
B212:-29
B213:27
A312:224
A313:-248
B311:31
B312:-15
B313:31
A412:72
A413:-248
B411:34
B412:4
B413:34
2nd
2nd
-order
coeff.
A122:458
A123:-248
B121:2225
B122:-4285
B123:2225
A222:366
A223:-248
B221:1090
B222:-1826
B223:1090
A322:239
A323:-248
B321:762
B322:-965
B323:762
A422:88
A423:-248
B421:633
B422:-464
B423:633
Gain
G1=80 G2=120 G3=140 G4=160
Table 3.3: MATLAB filter coefficients for Chebyshev II
(40 dB stop band) Filters 1-4
# Filter 5 Filter 6 Filter 7 Filter 8
f,KHz
1 – 1.2 1.2 – 1.4 1.4 – 1.6 1.6 – 1.8
1st
2nd
-order
coeff.
A512:-72
A513:-248
B511:34
B512:-4
B513:34
A622:-239
A623:-248
B621:762
B622:965
B623:762
A712:-355
A713:-248
B711:27
B712:29
B713:27
A812:-451
A813:-248
B811:21
B812:32
B813:21
2nd
2nd
-order
coeff.
A522:-88
A523:-248
B521:633
B522:464
B523:633
A622:458
A723:-248
B721:2225
B722:-4285
B723:2225
A722:-366
A723:-248
B721:1090
B722:1826
B723:1090
A822:-458
A823:-248
B821:2225
B822:4285
B823:2225
Gain
G5=160 G6=140 G7=120 G8=80
Table 3.4: MATLAB filter coefficients for Chebyshev II
(40 dB stop band) Filters 5-8
3.2.6 Wireless Packet Format
The preamble byte composed of alternating zeros and
ones is sent first, followed by five bytes address field.
Data payload of user settable length (1-32 bytes) is
sent next. Two versions of payload was implemented
i.e. 2 bytes payload was primarily used having only
the encrypted byte and a count byte, however for data
acquisition from temperature and light sensors from a
remote location, an 18 byte payload version was
designed. The final part is the two byte long CRC.
3.2.7.2 Wireless Data Payload format
The data payload for control module is of two types:
transmitter mode and receiver mode payload, both
having 18 bytes payload width. Control module has
to transmit data packets to individual robots and also
receive sensor data from replying robots. So it has to
hold the entire payload of the each robot (two in our
case) for both transmission and reception. Two 18
bytes char arrays data_tx1 and data_tx2 stores the
transmission mode payload while the other two
arrays data_rx1 and data_rx2 are for the receiver.
Both the payload sizes are of PAYLOAD_SIZE (18
bytes) defined in the wireless routine of ATmega16.
Figure 3.6: Transmitter Mode Payload
Out of three blocks, the first 16 bytes block holds the
data text to be sent from the control to the robot
modules. For inputting the text data, we use
RealTerm to send the text from the computer to the
MCU via UART for sending text messages to the
individual robots at different locations. The control
block is formed by the 1 byte data_control which
stores the ASCII characters: ‘F’, ‘B’, ‘L’, ‘R’ and ‘S’,
representing the control information for front, back,
left, right and stop. When the targeted robot receives
the control information in transmitter payload, it will
interpret the ASCII control byte as the corresponding
robot movement commands.
The ARC4 cipher is used to encrypt the control byte
and data text block. Being a stream cipher, the byte
count must be kept up to date (missing a packet will
result in an incorrect decryption from that point on),
so a packet count byte is added with each packet that
keeps a packet count. This allows the local unit to
catch up to the correct byte in the PRGA (assuming
targeted robot misses less than 256 packets in a row).
Figure 3.7: Receiver Mode Payload
Receiver mode payload is needed to receive the
encrypted data payload sent by the individual robots,
decrypt the encrypted block by syncing with the help
of packet count, segregate the sensor readings of
temperature (2 bytes), light (2 bytes) and speed
setting (1 byte) and store it in their respective
variables for data logging. For data acquisition, the
payload is divided into temperature and light blocks.
The readings from the two sensors in each robot are
stored in their respective integer variables and sent to
the control module in packet format.
3.2.7.3 Source and destination data pipe addressing
Using switches connected to PD4 and PD5 of
ATMega16, the user can select whether the control
byte generated by roboControl function is directed to
control bytes data_control1 or data_control2 which
are concatenated to respective data packets for each
robot. The user would thus be able to select to which
robot the current command would be directed to. This
technique would enable the realization of multi-robot
control paradigm from a single control module.
For implementing a minimalistic Star network
topology, the receiving pipes of control module,
Robot1 and Robot2 are 0, 1 and 2 respectively and
the corresponding pipe addresses are E7:E7:E7:E7:E7,
C2:C2:C2:C2:C2, C2:C2:C2:C2:C3. The rest of the five
data pipes in each of the three linking modules are
disabled to effectively block reception of packets
destination was elsewhere. Prior to transmitting a
data packet, the destination address should be set.
Figure 3.8: Minimalistic Star Network Topology for
establishing communication link between Control and
Robot Agent modules and their respective destination
multi-pipe addressing
3.2.7 ARC4 Cryptography
ARC4 generates a pseudorandom stream of bits
(keystream) which, for encryption, is combined with
the plaintext using bit-wise xor; decryption is
performed in the same way (since xor is a symmetric
operation). To generate the keystream, the cipher
Data text (data_text1, data_text2)
(16 bytes)
17 16 15----------------------------------------------- 0
Packet
count
(1 byte)
Control
(1 byte)
data_control
Encrypted Block
T T T T T T T T T T T T T T T
T
PAYLOAD_SIZE (18 Bytes)
data_tx1
data_tx2
Padding bits
(12 bytes)
Light
(2 bytes)
P P P P P P P P P P
P P
L L T T
17 16 15 ------------------------ 4 3 2 1 0
Speed
(1 byte)
Packet
count
(1 byte)
Temp
(2 bytes)
PAYLOAD_SIZE (18 Bytes)
Encrypted Block
E7:E7:E7:E7:E7
Pipe 0
Robotic
Agent I
Pipe 2Pipe 0
C2:C2:C2:C2:C2
Pipe 1
C2:C2:C2:C2:C3
Pipe 2
Pipe 1 Pipe 0
P5 P4 P3 P2 P1
P0
P5 P4 P3 P2 P1
P0
P5 P4 P3 P2 P1
P0 TX
TX
Robotic
Agent II
TX
Control Module
Communication
Link (Pipe
Destination)
makes use of a secret internal state which consists of
two parts:
 A permutation of all 256 possible bytes (denoted
"S" below).
 Two 8-bit index-pointers (denoted "i" and "j").
The permutation is initialized with a variable length
key, typically between 40 and 256 bits, using the key-
scheduling algorithm (KSA). After this, the stream of
bits is generated using the pseudo-random generation
algorithm (PRGA). The ARC4 cipher is implemented
in conjugation with the wireless routine of
ATMega16 of both control and robot modules.
3.2.8 MMA7260Q Tilt Sensing
Figure 3.9: Overall accelerometer tilt sensing algorithm
MMA7260Q has three sensor output pins viz. X, Y
and Z connected to three of the ADC inputs viz. PA3,
PA4 and PA5 of ATMega32. The robot functions
(front, back, left and right) are controlled in either
Speech or Accelerometer mode. In the latter, the tilt-
sensing algorithm samples the X, Y, Z values for
origin first into xyzOrigin, and rapidly stores the
remaining into xyzADCArray. These arrays are used
by the three decision blocks to determine the speeds
in the individual directions. In the speed and decision
block, once the speed either in positive or negative
direction (depending on accelerometer orientation) is
determined, decide whether the function to be
interpreted is a front, back, left, right or stop. For this,
the calculated speed in either X or Y has to exceed a
predefined threshold, to consider the movement data
valid. The decision of the command interpreted by
the algorithm is sent to the roboControl function
which conveys it ultimately to one of the robots.
Figure 3.10: Flowchart showing xSpeed determination and
decision making of robot functions (FRONT and BACK)
Yes
Samples X, Y, Z values for
Origin into xyzOrigin array
No
STAR
T
Initialize:
*Origin & Speed variables for x, y, z
XOrizin = yOrigin = zOrigin = 0
XSpeed = ySpeed = zSpeed = 0
*Configure ADC pin = 3 to 5
*Initialize LCD
Determine xSpeed, ySpeed and
zSpeed (REFERENCE AXIS)
Decision of robot function
Send appropriate control signal
ADC
conversi
on
Store the remaining values
into xyzADCArray
No
Yes
Yes
No
Yes
Yes
No
Yes
Decision
= FRONT
Decisio
n =
STOP
Decision
= LT/RT
Is
xADCArr
ay>
xSpeed = xADCArray -
xOrigin
(+ve speed value)
From
xOrigin
From
xADCArray
xSpeed = xOrigin -
xADCArray
(-ve speed value)
Is xSpeed
>
threshold
Decision
= BACK
AXIS=
1?
Decisio
n =
Is xSpeed
>
threshold
AXIS=
1?
Send decision to roboControl function
4. RESULTS
4.1 Time domain waveform
The figure depicts different time domain waveforms
of the spoken word generated by Matlab. The time
duration of the spoken words front, left and right are
approximately of 4s duration. The word back is of
lowest duration of 2s due to which it is recognized
with least accuracy relative to other five words while
stop is of highest duration of 5s accuracy is highest.
Figure 4.1: Time domain representation of Back
Figure 4.2: Time domain representation of Stop
4.2 Frequency domain waveform
These figures depict the spectral analysis (discreate
fourier transform) of the sampled time domain data
generated using Matlab.
Figure 4.3: FFT of the word Back
Figure 4.4: FFT of the word Stop
4.3 Dictionary data points for voice fingerprints
Table 4.1: Dictionary data points for the word FRONT
stored in the flash memory
128 data points for each of the five words are logged
via RealTerm in similar manner during the training
stage and stored as dictionary in the flash memory.
4.4 Speech Recognition
Figure 4.5: Recognition Probability Comparison
85%
90%
95%
100%
Front Back Left Right Stop
95%
90%
95% 95%
100%
Recognition Probability
Number of Testing=20
Filter
1
Filter
2
Filter
3
Filter
4
Filter
5
Filter
6
Filter
7
Filter
8
731
831
723
2343
4838
2514
7815
1085
681
1025
707
1057
625
309
172
672
177
346
307
364
95
59
10
0
0
0
0
35
4
0
0
0
3120
3704
4341
1001
1957
5105
288
51
156
31
0
732
175
4
0
44
474
1188
1966
539
167
184
78
0
30
52
30
193
0
0
0
0
7662
4377
3991
2200
1639
347
561
134
0
23
20
1309
874
0
0
0
1564
789
4137
1752
1311
1629
52
5
34
68
123
728
343
120
77
76
385
183
306
171
553
163
3
56
72
123
68
219
196
42
41
37
704
764
796
950
2347
1998
489
665
266
379
137
138
729
944
1400
516
The accuracy of the speech recognition was within an
acceptable range of above 90% by our initial
expectations of the system design. However,
considering the basic speech algorithm, recognition is
valid only for the same person who underwent the
preliminary voice training to initialize the dictionary
fingerprints. For convenience, the recorded voice of
Oxford dictionary software stored as a .wav file was
played in a relatively quiet surroundings.
4.5 Euclidean Distance Comparison
Figure 4.6: Euclidean Distance Comparison
UART logging from RealTerm was done and the
Euclidean distance comparison was logged with all
five different fingerprints already stored in the
EEPROM. As expected, the word was recognized as
the one with the least distance when comparing with
the five fingerprints.
4.6 Wireless Transmit and Receive
4.6.1 Correct ARC4 Key Encryption/Decryption
The logged data data from the RealTerm is presented
below. It depicts correct ARC4 key encryption and
decryption. If the private key is matched in both the
control and robot modules as shown below, then the
encrypted data is decrypted back to the original data
as the PRGA of robot agent updates 12 times to catch
up with the PRGA of Control module.
CONTROL Initialized!
== Control Module ==
Private Key = SaGuN
- TX to Robot I -
Destination:
C2:C2:C2:C2:C2(Pipe1)
Original:
data_tx1[0]= S
data_tx1[1]=0
ROBOT Initialized!
== Robot Module I==
Private Key = SaGuN
-RX from Control-
Packet received!
Encrypted
data[0]= ‘
data[1]=0
No. of PRGA updates =
Encrypted:
data_tx1[0]= ‘
data_tx1[1]=0
Packet sent!
Current Sequence = 1
- TX to Robot I -
Destination:
C2:C2:C2:C2:C2(Pipe1)
Original:
data_tx1[0]= S
data_tx1[1]=1
Encrypted:
data_tx1[0]= ,
data_tx1[1]=1
Packet sent!
Current Sequence = 2
12 times
Decrypted
data[0]= S
data[1]=0
Current Sequence = 1
-RX from Control-
Packet received!
Encrypted
data[0]= ,
data[1]=1
Decrypted
data[0]= S
data[1]=1
Current Sequence = 2
4.6.2 Incorrect ARC4 Key Encryption/Decryption
If the private key is not matched between the two
modules then the encrypted data cannot be decrypted
back to its original data as shown below.
CONTROL Initialized!
= Control Module =
Private Key= VoCoRoBo
- TX to Robot II -
Destination:
C2:C2:C2:C2:C3(Pipe2)
Original
data_tx1[0]= S
data_tx1[1]=0
Encrypted
data_tx1[0]= j
data_tx1[1]=0
Packet sent!
Current Sequence = 1
- TX to Robot II -
Destination:
C2:C2:C2:C2:C3(Pipe2)
Original
data_tx1[0]= S
data_tx1[1]=1
Encrypted
data_tx1[0]= D
data_tx1[1]=1
Packet sent!
Current Sequence = 2
ROBOT Initialized!
=Robot Module II=
Private Key = SaGuN
- RX from Control-
Packet received!
Encrypted
data[0]= j
data[1]=0
No. of PRGA updates =
7 times
Decrypted
data[0]= ƒ
data[1]=0
Current Sequence = 1
- RX from Control-
Packet received!
Encrypted
data[0]= D
data[1]=1
Decrypted
data[0]= ~
data[1]=1
Current Sequence = 2
5. CONCLUSION
This project is based on the implementation of real-
time speech recognition using DSP algorithms such
as Chebyshev IIR filters, accelerometer for tilt-
sensing and establishment of short-range wireless
secure link with ARC4 cipher, all using ubiquitous
low-cost 8-bit microcontrollers. With an accuracy of
the speech recognition above 90%, it shows the
feasibility of the system to be applied in any low cost
applications in real time. It was observed that the
words with greater pronunciation stress were
recognized better. Although for now, the recognition
is accurate only for the same person who trained the
system, it can be expanded to make the system
speaker independent by further research on the
storing and retrieval of the voice fingerprint from a
different media. Multi-channel wireless link with
ARC4 was also successfully implemented to
exchange control and sensor data. As nRF24L01 is
capable of higher speed data transmission, the system
can also be expanded to incorporate other sensors
like audio or video sensors for richer data acquisition
.
6. REFERENCES
[1] T. Aamodt. (2003, April) “Speech Recognition
Algorithm”, University of British Columbia.
http://www.eecg.toronto.edu/%7Eaamodt/ece34
1/speech-recognition
[2] X. Lu, S. Lee, 2006. “Voice Recognition
Security System”, Cornell University
[3] A. Harison, C. Shah, 2006 "Voice Recognition
Car", Cornell University.
[4] B. R. Land; Cornell University; Fixed Point
mathematical function in GCC and assembler;
Optimized 2nd order IIR code.
[5] B. R. Land (2008, September). Fast Digital
Filtering. Circuit Cellar Issue # 218, p. 40.
[6] Application Note AVR201: “Using the AVR®
Hardware Multiplier”, Atmel Corporation.
[7] IIR Design: nauticom.net/www/jdtaft/iir.htm
[8] Brennen Ball; 2007; “Specializing in the NXP
LPC2148 and Microchip PIC18F452
microcontrollers and the Nordic Semiconductor
nRF24L01 2.4 GHz RF link”; diyembedded.com
[9] “Interfacing nRF2401 with SPI” (White Paper),
Nordic Semiconductor.
[10] T. Igoe, “MMA7260Q 3-Axis Accelerometer
Report for PIC 18F252 using PicBasic Pro”,
Sensor Workshop at ITP (January 16, 2006).
[11] Application Note AN3447: “Implementing Auto-
zero calibration technique for accelerometers”,
Freescale Semiconductors.
7. PICTURES
Figure 7.1: Overall System
Figure 7.2: Schematic Diagram of Control Module
Figure 7.3: Schematic Diagram of a single Robot Module

Contenu connexe

Tendances

Robotic Project - published paper
Robotic Project - published paperRobotic Project - published paper
Robotic Project - published paper
Robert Rosier
 
Serial Data Communication
Serial Data CommunicationSerial Data Communication
Serial Data Communication
Desty Rahayu
 
8051 serial communication
8051 serial communication8051 serial communication
8051 serial communication
asteriskbimal
 
Computer archi&mp
Computer archi&mpComputer archi&mp
Computer archi&mp
MSc CST
 
Mridul_Verma_Intern_Tech_Adityaa_UART
Mridul_Verma_Intern_Tech_Adityaa_UARTMridul_Verma_Intern_Tech_Adityaa_UART
Mridul_Verma_Intern_Tech_Adityaa_UART
Mridul Verma
 
Interfacing to the analog world
Interfacing to the analog worldInterfacing to the analog world
Interfacing to the analog world
Islam Samir
 

Tendances (20)

8051 serial communication-UART
8051 serial communication-UART8051 serial communication-UART
8051 serial communication-UART
 
Robotic Project - published paper
Robotic Project - published paperRobotic Project - published paper
Robotic Project - published paper
 
Batch 25(a)
Batch 25(a)Batch 25(a)
Batch 25(a)
 
Serial communication
Serial communicationSerial communication
Serial communication
 
Serial Data Communication
Serial Data CommunicationSerial Data Communication
Serial Data Communication
 
Universal asynchronous receiver-transmitter UART Dsa project report
Universal asynchronous receiver-transmitter UART Dsa project reportUniversal asynchronous receiver-transmitter UART Dsa project report
Universal asynchronous receiver-transmitter UART Dsa project report
 
Serial communication of microcontroller 8051
Serial communication of microcontroller 8051Serial communication of microcontroller 8051
Serial communication of microcontroller 8051
 
Universal asynchronous receiver_transmitter_uart_rs232
Universal asynchronous receiver_transmitter_uart_rs232Universal asynchronous receiver_transmitter_uart_rs232
Universal asynchronous receiver_transmitter_uart_rs232
 
RFID Masterclass 2015
RFID Masterclass 2015RFID Masterclass 2015
RFID Masterclass 2015
 
Power Analysis of Embedded Low Latency Network on Chip
Power Analysis of Embedded Low Latency Network on ChipPower Analysis of Embedded Low Latency Network on Chip
Power Analysis of Embedded Low Latency Network on Chip
 
Uart
UartUart
Uart
 
8051 serial communication
8051 serial communication8051 serial communication
8051 serial communication
 
A gen2 based rfid authentication protocol
A gen2 based rfid authentication protocolA gen2 based rfid authentication protocol
A gen2 based rfid authentication protocol
 
Stmo oh bytes
Stmo oh bytesStmo oh bytes
Stmo oh bytes
 
Serial Communication Part-16
Serial Communication Part-16Serial Communication Part-16
Serial Communication Part-16
 
Computer archi&mp
Computer archi&mpComputer archi&mp
Computer archi&mp
 
Mridul_Verma_Intern_Tech_Adityaa_UART
Mridul_Verma_Intern_Tech_Adityaa_UARTMridul_Verma_Intern_Tech_Adityaa_UART
Mridul_Verma_Intern_Tech_Adityaa_UART
 
Lb35189919904
Lb35189919904Lb35189919904
Lb35189919904
 
Interfacing to the analog world
Interfacing to the analog worldInterfacing to the analog world
Interfacing to the analog world
 
K0216571
K0216571K0216571
K0216571
 

Similaire à VoCoRoBo: Remote Speech Recognition and Tilt Sensing Multi-Robotic System

Design and Implement Any Digital Filters in Less than 60 Seconds
Design and Implement Any Digital Filters in Less than 60 SecondsDesign and Implement Any Digital Filters in Less than 60 Seconds
Design and Implement Any Digital Filters in Less than 60 Seconds
Mike Ellis
 

Similaire à VoCoRoBo: Remote Speech Recognition and Tilt Sensing Multi-Robotic System (20)

Design and Implementation of Low Power High Speed Symmetric Decoder Structure...
Design and Implementation of Low Power High Speed Symmetric Decoder Structure...Design and Implementation of Low Power High Speed Symmetric Decoder Structure...
Design and Implementation of Low Power High Speed Symmetric Decoder Structure...
 
Jy3717961800
Jy3717961800Jy3717961800
Jy3717961800
 
Cdma
CdmaCdma
Cdma
 
C211824
C211824C211824
C211824
 
A010240110
A010240110A010240110
A010240110
 
Implementation of Algorithms For Multi-Channel Digital Monitoring Receiver
Implementation of Algorithms For Multi-Channel Digital Monitoring ReceiverImplementation of Algorithms For Multi-Channel Digital Monitoring Receiver
Implementation of Algorithms For Multi-Channel Digital Monitoring Receiver
 
Final Report - morse code.pdf
Final Report - morse code.pdfFinal Report - morse code.pdf
Final Report - morse code.pdf
 
Design of Adjustable Reconfigurable Wireless Single Core CORDIC based Rake Re...
Design of Adjustable Reconfigurable Wireless Single Core CORDIC based Rake Re...Design of Adjustable Reconfigurable Wireless Single Core CORDIC based Rake Re...
Design of Adjustable Reconfigurable Wireless Single Core CORDIC based Rake Re...
 
Design of Adjustable Reconfigurable Wireless Single Core CORDIC based Rake Re...
Design of Adjustable Reconfigurable Wireless Single Core CORDIC based Rake Re...Design of Adjustable Reconfigurable Wireless Single Core CORDIC based Rake Re...
Design of Adjustable Reconfigurable Wireless Single Core CORDIC based Rake Re...
 
Generation and Implementation of Barker and Nested Binary codes
Generation and Implementation of Barker and Nested Binary codesGeneration and Implementation of Barker and Nested Binary codes
Generation and Implementation of Barker and Nested Binary codes
 
A NOVEL APPROACH FOR LOWER POWER DESIGN IN TURBO CODING SYSTEM
A NOVEL APPROACH FOR LOWER POWER DESIGN IN TURBO CODING SYSTEMA NOVEL APPROACH FOR LOWER POWER DESIGN IN TURBO CODING SYSTEM
A NOVEL APPROACH FOR LOWER POWER DESIGN IN TURBO CODING SYSTEM
 
Ijeee 33-36-surveillance system for coal mines based on wireless sensor network
Ijeee 33-36-surveillance system for coal mines based on wireless sensor networkIjeee 33-36-surveillance system for coal mines based on wireless sensor network
Ijeee 33-36-surveillance system for coal mines based on wireless sensor network
 
Implementation of XOR Based Pad Generation Mutual Authentication Protocol for...
Implementation of XOR Based Pad Generation Mutual Authentication Protocol for...Implementation of XOR Based Pad Generation Mutual Authentication Protocol for...
Implementation of XOR Based Pad Generation Mutual Authentication Protocol for...
 
Implementation of XOR Based Pad Generation Mutual Authentication Protocol for...
Implementation of XOR Based Pad Generation Mutual Authentication Protocol for...Implementation of XOR Based Pad Generation Mutual Authentication Protocol for...
Implementation of XOR Based Pad Generation Mutual Authentication Protocol for...
 
Software Design of Digital Receiver using FPGA
Software Design of Digital Receiver using FPGASoftware Design of Digital Receiver using FPGA
Software Design of Digital Receiver using FPGA
 
Hv3414491454
Hv3414491454Hv3414491454
Hv3414491454
 
simulation of turbo encoding and decoding
simulation of turbo encoding and decodingsimulation of turbo encoding and decoding
simulation of turbo encoding and decoding
 
Design and Implement Any Digital Filters in Less than 60 Seconds
Design and Implement Any Digital Filters in Less than 60 SecondsDesign and Implement Any Digital Filters in Less than 60 Seconds
Design and Implement Any Digital Filters in Less than 60 Seconds
 
Design of a Digital Baseband Processor for UWB Transceiver on RFID Tag
Design of a Digital Baseband Processor for UWB Transceiver on RFID TagDesign of a Digital Baseband Processor for UWB Transceiver on RFID Tag
Design of a Digital Baseband Processor for UWB Transceiver on RFID Tag
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)
 

Dernier

Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Dernier (20)

CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 

VoCoRoBo: Remote Speech Recognition and Tilt Sensing Multi-Robotic System

  • 1. VoCoRoBo: Remote Speech Recognition and Tilt Sensing Multi-Robotic System Sagun Man Singh Shrestha1 , Labu Manandhar2 , Ritesh Bhattarai3 Department of Electronics and Computer Engineering, Tribhuvan University – Kathmandu Engineering College, Nepal Gmail: 1 sagunms, 2 laburocks, 2 reittes | github.com/sagunms/vocorobo Abstract: This work is based on the implementation of real-time speech recognition using DSP algorithms such as Chebyshev IIR filters, accelerometer for tilt-sensing and establishment of short- range wireless secure link with ARC4 cipher, all using low-cost 8-bit ATmega microcontrollers. The robot implements a simple but effective algorithm for comparing the spoken word with a dictionary of fingerprints using a modified Euclidean distance calculation. It also includes the ability to securely control the navigation of multiple robots located at remote locations wirelessly from the Control Module and also gather the various environmental data collected by the Robot Modules and display them in the back to Control. Considering the time-critical algorithms actually requiring large computations as well as a variety of sensors interfaced in the system, this project can demonstrate how one can build an expansible multi-robotic system from cheap and ubiquitous electronics. Keywords: Speech Recognition, Chebyshev, Digital Signal Processing, Euclidean Distance, ARC4 Cryptography, ATMega16/32, nRF24L01+ Wireless Transceiver, MMA7260Q Accelerometer I. INTRODUCTION VoCoRoBo stands for Voice Controlled RoBot in which the user is capable of wirelessly controlling multiple robots with either a voice command or tilting the controls towards the desired direction. In addition to this, each robot also relays temperature and light sensor data securely back to the user station. 1.1 HARDWARE A microcontroller is an integrated circuit composed of a microprocessor unit, memory, and input/output peripheral devices. Atmel ATmega32/16 is a low- power CMOS 8-bit microcontroller based on the AVR RISC architecture which is used to implement the voice recognition, tilt-sensing, wireless and cryptography algorithms. An accelerometer measures proper magnitude and direction of acceleration experienced relative to freefall, and can be used to sense orientation. Controlling the robots with fun and intuitive tilt gestures was possible using the Freescale MMA7260Q 3-axis accelerometer. The two parts of the system – control and robot modules are linked wirelessly using the popular Nordic nRF24L0+ radio transceiver. It operates on 2.4 - 2.5 GHz ISM band, with air data rate up to 2Mbps, has ultra low power operation and is ideally suited for remote control and data acquisition. L293D H-bridge IC is a quad push- pull driver capable of delivering output currents up to 600mA per channel. To control each robot turning speeds simply by speed difference between wheels on either side, differential drive technique was used. 1.2 SOFTWARE Speech recognition is the process of converting an acoustic signal captured by microphone and then identifying the word from the sound. Due to speaker dependence, the system needs to be trained before use. Digital signal processing is concerned with the representation of signals by a sequence of numbers and their processing. Infinite impulse response is a property of signal processing systems having impulse response function that is non-zero over infinite length of time. An example of IIR filter are Chebyshev II filters having a steeper roll-off and more stop band ripple than Butterworth filters. They minimize the error between the idealized and the actual filter characteristic over the range of the filter.
  • 2. 1.2.1 Speech Analysis In speech recognition, the frequency content of the detected word has to be analyzed. Several 4th order Chebyshev band pass filters are created by cascading two 2nd order filters using the following Direct Form II Transposed realization of difference equations. ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) Coefficients a’s and b’s used in the above equations was obtained using the following syntax in Matlab. [B,A] = cheby2(2,40,[Freq1, Freq2]); cheby2 designs Chebyshev Type II digital filter using the given specifications, 2 defines a 4th order filter, 40 defines the stop band ripple in dB, and Freq1 and Freq2 are the normalized cutoff frequencies. The tf2sos function is then used to convert the transfer function of the filter to a 2nd order section version. 1.2.2 Voice-fingerprint Calculation Due to the limited RAM on the ATMega32, the relevant information of each spoken word had to be encoded in the form of a ‘fingerprint’. To compare fingerprints, the following pseudo Euclidean distance formula was used between the fingerprint of stored and sampled word to find correct word. ∑| | where, P = (p1, p2, ..., pn) is the dictionary fingerprint and, Q = (q1, q2, ..., qn) is sampled word fingerprint. pi and qi are the fingerprint data points. To see if two words are the same, the distance between them are computed and the words with the minimum distance in the database are considered to be the matching word. Original Euclidean distance requires squaring the difference between two points. Fixed point arithmetic produces too large a number, causing the variables to overflow. Thus a modified formula was used by neglecting the square root and the square which practically showed satisfactory results. 1.2.3 ARC4 Cryptography ARC4 is one of the most widely used software stream ciphers in many encryption schemes, including WEP, WPA, and SSL. The main factors in ARC4's success over such a wide range of applications are its speed, simplicity and efficiency in software and hardware. 3. DESIGN AND IMPLEMENTATION 3.1 HARDWARE ARCHITECTURE Figure 3.1: Overall Hardware Architecture 2.4 GHz wireless link with 2 bytes (control byte + count byte) payload ATmega16 @ 8 MHz (nRF24L01+ wireless interface with ARC4 Cryptography) ATmega32 @ 16 MHz (Speech Recognition and MMA7260Q Tilt Sensing) Port C PB0-PB3 ADC (Port A) PD3-PD5 x y z Port C SPI (Port B) PA0-PA2 nRF24L01 Module LCD LEDs Port C SPI (Port B) PD0-PD3 LEDs L293D H-Bridge M M nRF24L01 Module ATmega16 @ 8 MHz (nRF24L01+ with ARC4 and H-Bridge interface)
  • 3. The system is divided into two broad sub- subsystems: Control Module and Multi-Robot Module. The Control Module is further divided into two layers: the topmost layer and the second layer. 3.1.1 Control Module The topmost layer of the control module consists of ATMega32, where speech recognition, MMA7260Q accelerometer sensing, output to 16x2 text LCD are handled. The 2nd layer consists of ATMega16 where the nRF24L01 wireless routine as well as encryption and decryption with ARC4 cipher are implemented. The bridge protocol between the 1st and 2nd layers in the control module (Fig. 3.1 and 3.2) is designed such the three output pins of PORTD of ATMega32 viz. PD2, PD3 and PD4 are connected to the respective input pins of PORTA of ATMega16 viz. PA0, PA1 and PA3. When the 1st layer recognizes the spoken word (front, back, left, right or stop), the equivalent bit combination is inputted to PORTA of the 2nd layer via these bridge lines. The 2nd layer then sends out the corresponding control byte wirelessly via SPI port. When the one of the robot receives this control byte, it will be decoded into its matching differential drive motor combinations that will move the robot physically in the commanded direction. FUNCTION Equivalent received control byte PIN A (Connected to Layer 1) BINARY HEX PA2 PA1 PA0 STOP S 0 0 0 00H FRONT F 0 0 1 01H BACK B 0 1 0 02H LEFT L 0 1 1 03H RIGHT R 1 0 0 04H SPD_UP U 1 0 1 06H SPD_DN D 1 1 1 07H Table 3.1: Function control byte to be sent out via Wireless (SPI port) and corresponding bit combination inputted to the second layer of Control Module (PINA). 3.1.2 Robot Module It consists of two identical robots (A and B) which can be positioned at different locations, provided they are within the signal range of the Control Module. Each robot consists of an ATMega16 with sensors that take environmental data specifically, LM35 temperature sensor and a light dependent resistor. A 2.4 GHz wireless transceiver nRF24L01 is also available on-board to receive control data and transmit the remote data for data acquisition. With L293D H-Bridge driver, two differential drive motors are controlled independently so that the robot can navigate front, back, left or right. Four input pins of the L293D viz. IN1, IN2, IN3 and IN4 are connected to four output pins of PORTD of ATMega16 viz. PD0, PD1, PD2 and PD3 respectively. FUNCTION Equivalent received control byte PORT D (Connected to H- bridge) BINARY HEX (LSB) IN4 IN3 IN2 IN1 STOP S 0 0 0 0 00H FRONT F 0 1 1 0 06H BACK B 1 0 0 1 09H LEFT L 0 0 1 1 03H RIGHT R 1 1 0 0 04H SPD_UP U SPD_DN D Table 3.2: Function control byte received via Wireless (SPI port) and corresponding bit combinations outputted to H-bridge (PORTD). 3.2 SOFTWARE ARCHITECTURE 3.2.1 Input, Processing, Output Figure 3.2: Input, Processing and Output block diagram for speech recognition At a rate of 4 KHz, the algorithm checks the ADC input for audio signal. If the ADC value exceeds the threshold value, it is taken as the start of half a second long word. The sampled word passes through Speech ADC Band Pass Filters Generate Voice Fingerprints Fingerprint Templates Control Signals Output to the Robot COMPA RE
  • 4. 8 band pass filters to be encoded into a fingerprint. The words to be matched are stored as fingerprints in a dictionary so that newly generated sampled fingerprints can be compared with them later. The modified Euclidean distance calculation finds the fingerprint that is the closest match and then sends a control signal ultimately to the robot to perform operations like left, right, front, back and stop. 3.2.2 Initial-Threshold Calculation All the background sound at the startup is considered to be a base value which improves the accuracy of the speech recognition. At the start up, the algorithm reads the ADC input using ATMega32 timercounter0 and accumulates its value 256 times. By interpreting the reading of the ADC value as a number between 1 to 1/256 in fixed point, and accumulating 256 times, the average ADC value is calculated without doing a multiply or divide. Three average values are taken each with a 16.4 ms delay between the samples. After this, the threshold value is to be four times the value of the median number. The threshold value is useful to detect whether a word has been spoken or not. 3.2.3 Voice-fingerprint Generation Figure 3.3: Filter implementation block diagram for the generation of fingerprints The program considers a word detected if a sample value from the ADC is greater than the threshold value. Every sample of ADC stored in an integer variable Ain which again passes through eight 4th order band pass filters for 2000 samples (half a second) once a word has been detected. When a filter is used its output is squared and that value is accumulated with the previous squares of the filter output. After 125 samples the accumulated value is stored as a data point in the fingerprint of that word. The accumulator is then cleared and the process is begun again. After 2000 samples 16 points have been generated from each filter, thus every sampled word is divided up into 16 parts. Our assembly language code is based around using 8 filters and since each one gives an output of 16 data points and every fingerprint is made up of 128 data points. 3.2.4 Filter Design and Implementation Figure 3.4: Band pass Filter 200-400 Hz Figure 3.5: Band pass Filter 1600-1800 Hz 3.2.5 Digital Filter Implementation The 4th order Chebyshev digital filter with 40 dB stop band was chosen due to very sharp transitions after the cutoff frequency. Most of the important frequency content in speech is found to be within the first 2 KHz as it usually contains the first and second speech formants. Thus 8 BPFs of frequencies ranging from .2 to 1.8 KHz were designed as shown in the magnitude and phase plot. This also permitted the sampling at 4 KHz (to satisfy the Nyquist criteria for sampling first 2 KHz voice frequencies) and enough time to implement 8 filters. For sufficient frequency resolution to properly identify words, bandwidth of each filter is set to 200 Hz. Each 4th order filter is created in assembly code by cascading two 2nd order IIR filters whose coefficients ADC FILTER 2 FILTER 9 ACCUMULATOR ACCUMULATOR VOICE FINGERPRINT
  • 5. are generated using Matlab (Listing 1.2.1). Floating point coefficients are converted to fixed point by multiplying them by 256 and rounding off to nearest integer in real-time. Fixed point was used instead of floating point (which would have been more accurate) as floating point calculations of ATMega32 is too slow to call all the filters within 4 KHz. The ATMega32 only has 2 KB of RAM and a word sampled at 4 KHz for a half a second would require entire 2 KB. In order to make a fingerprint then from a word the ADC output has to pass through all the filters faster than the ADC sample time of 250 µs. The output of the filter was squared in order to store the intensity of the sound rather than just the amplitude. Since the lowest and highest frequencies could be neglected without noticeable degradation in accuracy of speech recognition and that the memory and cycle time of ATMega32 wouldn’t be sufficient to implement all ten filters, only 8 BPFs was sufficient to compartmentalize frequencies between 200 Hz - 1.8 KHz. 3.2.7.1 Chebyshev II filter coefficients # Filter 1 Filter 2 Filter 3 Filter 4 f,KHz 0.2 – 0.4 0.4 – 0.6 0.6 – 0.8 0.8 – 1 1st 2nd -order coeff. A112:451 A113:-248 B111:21 B112:-32 B113:21 A212:355 A213:-248 B211:27 B212:-29 B213:27 A312:224 A313:-248 B311:31 B312:-15 B313:31 A412:72 A413:-248 B411:34 B412:4 B413:34 2nd 2nd -order coeff. A122:458 A123:-248 B121:2225 B122:-4285 B123:2225 A222:366 A223:-248 B221:1090 B222:-1826 B223:1090 A322:239 A323:-248 B321:762 B322:-965 B323:762 A422:88 A423:-248 B421:633 B422:-464 B423:633 Gain G1=80 G2=120 G3=140 G4=160 Table 3.3: MATLAB filter coefficients for Chebyshev II (40 dB stop band) Filters 1-4 # Filter 5 Filter 6 Filter 7 Filter 8 f,KHz 1 – 1.2 1.2 – 1.4 1.4 – 1.6 1.6 – 1.8 1st 2nd -order coeff. A512:-72 A513:-248 B511:34 B512:-4 B513:34 A622:-239 A623:-248 B621:762 B622:965 B623:762 A712:-355 A713:-248 B711:27 B712:29 B713:27 A812:-451 A813:-248 B811:21 B812:32 B813:21 2nd 2nd -order coeff. A522:-88 A523:-248 B521:633 B522:464 B523:633 A622:458 A723:-248 B721:2225 B722:-4285 B723:2225 A722:-366 A723:-248 B721:1090 B722:1826 B723:1090 A822:-458 A823:-248 B821:2225 B822:4285 B823:2225 Gain G5=160 G6=140 G7=120 G8=80 Table 3.4: MATLAB filter coefficients for Chebyshev II (40 dB stop band) Filters 5-8 3.2.6 Wireless Packet Format The preamble byte composed of alternating zeros and ones is sent first, followed by five bytes address field. Data payload of user settable length (1-32 bytes) is sent next. Two versions of payload was implemented i.e. 2 bytes payload was primarily used having only the encrypted byte and a count byte, however for data acquisition from temperature and light sensors from a remote location, an 18 byte payload version was designed. The final part is the two byte long CRC. 3.2.7.2 Wireless Data Payload format The data payload for control module is of two types: transmitter mode and receiver mode payload, both having 18 bytes payload width. Control module has to transmit data packets to individual robots and also receive sensor data from replying robots. So it has to hold the entire payload of the each robot (two in our case) for both transmission and reception. Two 18 bytes char arrays data_tx1 and data_tx2 stores the transmission mode payload while the other two arrays data_rx1 and data_rx2 are for the receiver. Both the payload sizes are of PAYLOAD_SIZE (18 bytes) defined in the wireless routine of ATmega16.
  • 6. Figure 3.6: Transmitter Mode Payload Out of three blocks, the first 16 bytes block holds the data text to be sent from the control to the robot modules. For inputting the text data, we use RealTerm to send the text from the computer to the MCU via UART for sending text messages to the individual robots at different locations. The control block is formed by the 1 byte data_control which stores the ASCII characters: ‘F’, ‘B’, ‘L’, ‘R’ and ‘S’, representing the control information for front, back, left, right and stop. When the targeted robot receives the control information in transmitter payload, it will interpret the ASCII control byte as the corresponding robot movement commands. The ARC4 cipher is used to encrypt the control byte and data text block. Being a stream cipher, the byte count must be kept up to date (missing a packet will result in an incorrect decryption from that point on), so a packet count byte is added with each packet that keeps a packet count. This allows the local unit to catch up to the correct byte in the PRGA (assuming targeted robot misses less than 256 packets in a row). Figure 3.7: Receiver Mode Payload Receiver mode payload is needed to receive the encrypted data payload sent by the individual robots, decrypt the encrypted block by syncing with the help of packet count, segregate the sensor readings of temperature (2 bytes), light (2 bytes) and speed setting (1 byte) and store it in their respective variables for data logging. For data acquisition, the payload is divided into temperature and light blocks. The readings from the two sensors in each robot are stored in their respective integer variables and sent to the control module in packet format. 3.2.7.3 Source and destination data pipe addressing Using switches connected to PD4 and PD5 of ATMega16, the user can select whether the control byte generated by roboControl function is directed to control bytes data_control1 or data_control2 which are concatenated to respective data packets for each robot. The user would thus be able to select to which robot the current command would be directed to. This technique would enable the realization of multi-robot control paradigm from a single control module. For implementing a minimalistic Star network topology, the receiving pipes of control module, Robot1 and Robot2 are 0, 1 and 2 respectively and the corresponding pipe addresses are E7:E7:E7:E7:E7, C2:C2:C2:C2:C2, C2:C2:C2:C2:C3. The rest of the five data pipes in each of the three linking modules are disabled to effectively block reception of packets destination was elsewhere. Prior to transmitting a data packet, the destination address should be set. Figure 3.8: Minimalistic Star Network Topology for establishing communication link between Control and Robot Agent modules and their respective destination multi-pipe addressing 3.2.7 ARC4 Cryptography ARC4 generates a pseudorandom stream of bits (keystream) which, for encryption, is combined with the plaintext using bit-wise xor; decryption is performed in the same way (since xor is a symmetric operation). To generate the keystream, the cipher Data text (data_text1, data_text2) (16 bytes) 17 16 15----------------------------------------------- 0 Packet count (1 byte) Control (1 byte) data_control Encrypted Block T T T T T T T T T T T T T T T T PAYLOAD_SIZE (18 Bytes) data_tx1 data_tx2 Padding bits (12 bytes) Light (2 bytes) P P P P P P P P P P P P L L T T 17 16 15 ------------------------ 4 3 2 1 0 Speed (1 byte) Packet count (1 byte) Temp (2 bytes) PAYLOAD_SIZE (18 Bytes) Encrypted Block E7:E7:E7:E7:E7 Pipe 0 Robotic Agent I Pipe 2Pipe 0 C2:C2:C2:C2:C2 Pipe 1 C2:C2:C2:C2:C3 Pipe 2 Pipe 1 Pipe 0 P5 P4 P3 P2 P1 P0 P5 P4 P3 P2 P1 P0 P5 P4 P3 P2 P1 P0 TX TX Robotic Agent II TX Control Module Communication Link (Pipe Destination)
  • 7. makes use of a secret internal state which consists of two parts:  A permutation of all 256 possible bytes (denoted "S" below).  Two 8-bit index-pointers (denoted "i" and "j"). The permutation is initialized with a variable length key, typically between 40 and 256 bits, using the key- scheduling algorithm (KSA). After this, the stream of bits is generated using the pseudo-random generation algorithm (PRGA). The ARC4 cipher is implemented in conjugation with the wireless routine of ATMega16 of both control and robot modules. 3.2.8 MMA7260Q Tilt Sensing Figure 3.9: Overall accelerometer tilt sensing algorithm MMA7260Q has three sensor output pins viz. X, Y and Z connected to three of the ADC inputs viz. PA3, PA4 and PA5 of ATMega32. The robot functions (front, back, left and right) are controlled in either Speech or Accelerometer mode. In the latter, the tilt- sensing algorithm samples the X, Y, Z values for origin first into xyzOrigin, and rapidly stores the remaining into xyzADCArray. These arrays are used by the three decision blocks to determine the speeds in the individual directions. In the speed and decision block, once the speed either in positive or negative direction (depending on accelerometer orientation) is determined, decide whether the function to be interpreted is a front, back, left, right or stop. For this, the calculated speed in either X or Y has to exceed a predefined threshold, to consider the movement data valid. The decision of the command interpreted by the algorithm is sent to the roboControl function which conveys it ultimately to one of the robots. Figure 3.10: Flowchart showing xSpeed determination and decision making of robot functions (FRONT and BACK) Yes Samples X, Y, Z values for Origin into xyzOrigin array No STAR T Initialize: *Origin & Speed variables for x, y, z XOrizin = yOrigin = zOrigin = 0 XSpeed = ySpeed = zSpeed = 0 *Configure ADC pin = 3 to 5 *Initialize LCD Determine xSpeed, ySpeed and zSpeed (REFERENCE AXIS) Decision of robot function Send appropriate control signal ADC conversi on Store the remaining values into xyzADCArray No Yes Yes No Yes Yes No Yes Decision = FRONT Decisio n = STOP Decision = LT/RT Is xADCArr ay> xSpeed = xADCArray - xOrigin (+ve speed value) From xOrigin From xADCArray xSpeed = xOrigin - xADCArray (-ve speed value) Is xSpeed > threshold Decision = BACK AXIS= 1? Decisio n = Is xSpeed > threshold AXIS= 1? Send decision to roboControl function
  • 8. 4. RESULTS 4.1 Time domain waveform The figure depicts different time domain waveforms of the spoken word generated by Matlab. The time duration of the spoken words front, left and right are approximately of 4s duration. The word back is of lowest duration of 2s due to which it is recognized with least accuracy relative to other five words while stop is of highest duration of 5s accuracy is highest. Figure 4.1: Time domain representation of Back Figure 4.2: Time domain representation of Stop 4.2 Frequency domain waveform These figures depict the spectral analysis (discreate fourier transform) of the sampled time domain data generated using Matlab. Figure 4.3: FFT of the word Back Figure 4.4: FFT of the word Stop 4.3 Dictionary data points for voice fingerprints Table 4.1: Dictionary data points for the word FRONT stored in the flash memory 128 data points for each of the five words are logged via RealTerm in similar manner during the training stage and stored as dictionary in the flash memory. 4.4 Speech Recognition Figure 4.5: Recognition Probability Comparison 85% 90% 95% 100% Front Back Left Right Stop 95% 90% 95% 95% 100% Recognition Probability Number of Testing=20 Filter 1 Filter 2 Filter 3 Filter 4 Filter 5 Filter 6 Filter 7 Filter 8 731 831 723 2343 4838 2514 7815 1085 681 1025 707 1057 625 309 172 672 177 346 307 364 95 59 10 0 0 0 0 35 4 0 0 0 3120 3704 4341 1001 1957 5105 288 51 156 31 0 732 175 4 0 44 474 1188 1966 539 167 184 78 0 30 52 30 193 0 0 0 0 7662 4377 3991 2200 1639 347 561 134 0 23 20 1309 874 0 0 0 1564 789 4137 1752 1311 1629 52 5 34 68 123 728 343 120 77 76 385 183 306 171 553 163 3 56 72 123 68 219 196 42 41 37 704 764 796 950 2347 1998 489 665 266 379 137 138 729 944 1400 516
  • 9. The accuracy of the speech recognition was within an acceptable range of above 90% by our initial expectations of the system design. However, considering the basic speech algorithm, recognition is valid only for the same person who underwent the preliminary voice training to initialize the dictionary fingerprints. For convenience, the recorded voice of Oxford dictionary software stored as a .wav file was played in a relatively quiet surroundings. 4.5 Euclidean Distance Comparison Figure 4.6: Euclidean Distance Comparison UART logging from RealTerm was done and the Euclidean distance comparison was logged with all five different fingerprints already stored in the EEPROM. As expected, the word was recognized as the one with the least distance when comparing with the five fingerprints. 4.6 Wireless Transmit and Receive 4.6.1 Correct ARC4 Key Encryption/Decryption The logged data data from the RealTerm is presented below. It depicts correct ARC4 key encryption and decryption. If the private key is matched in both the control and robot modules as shown below, then the encrypted data is decrypted back to the original data as the PRGA of robot agent updates 12 times to catch up with the PRGA of Control module. CONTROL Initialized! == Control Module == Private Key = SaGuN - TX to Robot I - Destination: C2:C2:C2:C2:C2(Pipe1) Original: data_tx1[0]= S data_tx1[1]=0 ROBOT Initialized! == Robot Module I== Private Key = SaGuN -RX from Control- Packet received! Encrypted data[0]= ‘ data[1]=0 No. of PRGA updates = Encrypted: data_tx1[0]= ‘ data_tx1[1]=0 Packet sent! Current Sequence = 1 - TX to Robot I - Destination: C2:C2:C2:C2:C2(Pipe1) Original: data_tx1[0]= S data_tx1[1]=1 Encrypted: data_tx1[0]= , data_tx1[1]=1 Packet sent! Current Sequence = 2 12 times Decrypted data[0]= S data[1]=0 Current Sequence = 1 -RX from Control- Packet received! Encrypted data[0]= , data[1]=1 Decrypted data[0]= S data[1]=1 Current Sequence = 2 4.6.2 Incorrect ARC4 Key Encryption/Decryption If the private key is not matched between the two modules then the encrypted data cannot be decrypted back to its original data as shown below. CONTROL Initialized! = Control Module = Private Key= VoCoRoBo - TX to Robot II - Destination: C2:C2:C2:C2:C3(Pipe2) Original data_tx1[0]= S data_tx1[1]=0 Encrypted data_tx1[0]= j data_tx1[1]=0 Packet sent! Current Sequence = 1 - TX to Robot II - Destination: C2:C2:C2:C2:C3(Pipe2) Original data_tx1[0]= S data_tx1[1]=1 Encrypted data_tx1[0]= D data_tx1[1]=1 Packet sent! Current Sequence = 2 ROBOT Initialized! =Robot Module II= Private Key = SaGuN - RX from Control- Packet received! Encrypted data[0]= j data[1]=0 No. of PRGA updates = 7 times Decrypted data[0]= ƒ data[1]=0 Current Sequence = 1 - RX from Control- Packet received! Encrypted data[0]= D data[1]=1 Decrypted data[0]= ~ data[1]=1 Current Sequence = 2 5. CONCLUSION This project is based on the implementation of real- time speech recognition using DSP algorithms such as Chebyshev IIR filters, accelerometer for tilt- sensing and establishment of short-range wireless
  • 10. secure link with ARC4 cipher, all using ubiquitous low-cost 8-bit microcontrollers. With an accuracy of the speech recognition above 90%, it shows the feasibility of the system to be applied in any low cost applications in real time. It was observed that the words with greater pronunciation stress were recognized better. Although for now, the recognition is accurate only for the same person who trained the system, it can be expanded to make the system speaker independent by further research on the storing and retrieval of the voice fingerprint from a different media. Multi-channel wireless link with ARC4 was also successfully implemented to exchange control and sensor data. As nRF24L01 is capable of higher speed data transmission, the system can also be expanded to incorporate other sensors like audio or video sensors for richer data acquisition . 6. REFERENCES [1] T. Aamodt. (2003, April) “Speech Recognition Algorithm”, University of British Columbia. http://www.eecg.toronto.edu/%7Eaamodt/ece34 1/speech-recognition [2] X. Lu, S. Lee, 2006. “Voice Recognition Security System”, Cornell University [3] A. Harison, C. Shah, 2006 "Voice Recognition Car", Cornell University. [4] B. R. Land; Cornell University; Fixed Point mathematical function in GCC and assembler; Optimized 2nd order IIR code. [5] B. R. Land (2008, September). Fast Digital Filtering. Circuit Cellar Issue # 218, p. 40. [6] Application Note AVR201: “Using the AVR® Hardware Multiplier”, Atmel Corporation. [7] IIR Design: nauticom.net/www/jdtaft/iir.htm [8] Brennen Ball; 2007; “Specializing in the NXP LPC2148 and Microchip PIC18F452 microcontrollers and the Nordic Semiconductor nRF24L01 2.4 GHz RF link”; diyembedded.com [9] “Interfacing nRF2401 with SPI” (White Paper), Nordic Semiconductor. [10] T. Igoe, “MMA7260Q 3-Axis Accelerometer Report for PIC 18F252 using PicBasic Pro”, Sensor Workshop at ITP (January 16, 2006). [11] Application Note AN3447: “Implementing Auto- zero calibration technique for accelerometers”, Freescale Semiconductors. 7. PICTURES Figure 7.1: Overall System Figure 7.2: Schematic Diagram of Control Module Figure 7.3: Schematic Diagram of a single Robot Module