VoCoRoBo: Remote Speech Recognition and Tilt Sensing Multi-Robotic System

VoCoRoBo: Remote Speech Recognition and Tilt
Sensing Multi-Robotic System
Sagun Man Singh Shrestha1
, Labu Manandhar2
, Ritesh Bhattarai3
Department of Electronics and Computer Engineering,
Tribhuvan University – Kathmandu Engineering College, Nepal
Gmail: 1
sagunms, 2
laburocks, 2
reittes | github.com/sagunms/vocorobo
Abstract: This work is based on the implementation of real-time speech recognition using DSP
algorithms such as Chebyshev IIR filters, accelerometer for tilt-sensing and establishment of short-
range wireless secure link with ARC4 cipher, all using low-cost 8-bit ATmega microcontrollers.
The robot implements a simple but effective algorithm for comparing the spoken word with a
dictionary of fingerprints using a modified Euclidean distance calculation. It also includes the ability
to securely control the navigation of multiple robots located at remote locations wirelessly from the
Control Module and also gather the various environmental data collected by the Robot Modules and
display them in the back to Control. Considering the time-critical algorithms actually requiring large
computations as well as a variety of sensors interfaced in the system, this project can demonstrate
how one can build an expansible multi-robotic system from cheap and ubiquitous electronics.
Keywords: Speech Recognition, Chebyshev, Digital Signal Processing, Euclidean Distance, ARC4
Cryptography, ATMega16/32, nRF24L01+ Wireless Transceiver, MMA7260Q Accelerometer
I. INTRODUCTION
VoCoRoBo stands for Voice Controlled RoBot in
which the user is capable of wirelessly controlling
multiple robots with either a voice command or
tilting the controls towards the desired direction. In
addition to this, each robot also relays temperature
and light sensor data securely back to the user station.
1.1 HARDWARE
A microcontroller is an integrated circuit composed
of a microprocessor unit, memory, and input/output
peripheral devices. Atmel ATmega32/16 is a low-
power CMOS 8-bit microcontroller based on the
AVR RISC architecture which is used to implement
the voice recognition, tilt-sensing, wireless and
cryptography algorithms. An accelerometer measures
proper magnitude and direction of acceleration
experienced relative to freefall, and can be used to
sense orientation. Controlling the robots with fun and
intuitive tilt gestures was possible using the Freescale
MMA7260Q 3-axis accelerometer. The two parts of
the system – control and robot modules are linked
wirelessly using the popular Nordic nRF24L0+ radio
transceiver. It operates on 2.4 - 2.5 GHz ISM band,
with air data rate up to 2Mbps, has ultra low power
operation and is ideally suited for remote control and
data acquisition. L293D H-bridge IC is a quad push-
pull driver capable of delivering output currents up to
600mA per channel. To control each robot turning
speeds simply by speed difference between wheels on
either side, differential drive technique was used.
1.2 SOFTWARE
Speech recognition is the process of converting an
acoustic signal captured by microphone and then
identifying the word from the sound. Due to speaker
dependence, the system needs to be trained before
use. Digital signal processing is concerned with the
representation of signals by a sequence of numbers
and their processing. Infinite impulse response is a
property of signal processing systems having impulse
response function that is non-zero over infinite length
of time. An example of IIR filter are Chebyshev II
filters having a steeper roll-off and more stop band
ripple than Butterworth filters. They minimize the
error between the idealized and the actual filter
characteristic over the range of the filter.

1.2.1 Speech Analysis
In speech recognition, the frequency content of the
detected word has to be analyzed. Several 4th
order
Chebyshev band pass filters are created by cascading
two 2nd
order filters using the following Direct Form
II Transposed realization of difference equations.
( ) ( ) ( )
( ) ( )
( ) ( ) ( )
( ) ( )
( ) ( )
Coefficients a’s and b’s used in the above equations
was obtained using the following syntax in Matlab.
[B,A] = cheby2(2,40,[Freq1, Freq2]);
cheby2 designs Chebyshev Type II digital filter using
the given specifications, 2 defines a 4th
order filter, 40
defines the stop band ripple in dB, and Freq1 and
Freq2 are the normalized cutoff frequencies. The
tf2sos function is then used to convert the transfer
function of the filter to a 2nd
order section version.
1.2.2 Voice-fingerprint Calculation
Due to the limited RAM on the ATMega32, the
relevant information of each spoken word had to be
encoded in the form of a ‘fingerprint’. To compare
fingerprints, the following pseudo Euclidean distance
formula was used between the fingerprint of stored
and sampled word to find correct word.
∑| |
where, P = (p1, p2, ..., pn) is the dictionary fingerprint
and, Q = (q1, q2, ..., qn) is sampled word fingerprint.
pi and qi are the fingerprint data points. To see if two
words are the same, the distance between them are
computed and the words with the minimum distance
in the database are considered to be the matching
word. Original Euclidean distance requires squaring
the difference between two points. Fixed point
arithmetic produces too large a number, causing the
variables to overflow. Thus a modified formula was
used by neglecting the square root and the square
which practically showed satisfactory results.
1.2.3 ARC4 Cryptography
ARC4 is one of the most widely used software stream
ciphers in many encryption schemes, including WEP,
WPA, and SSL. The main factors in ARC4's success
over such a wide range of applications are its speed,
simplicity and efficiency in software and hardware.
3. DESIGN AND IMPLEMENTATION
3.1 HARDWARE ARCHITECTURE
Figure 3.1: Overall Hardware Architecture
2.4 GHz
wireless link
with 2 bytes
(control byte
+ count byte)
payload
ATmega16 @ 8 MHz
(nRF24L01+ wireless
interface with ARC4
Cryptography)
ATmega32 @ 16 MHz
(Speech Recognition and
MMA7260Q Tilt
Sensing)
Port C
PB0-PB3
ADC
(Port A)
PD3-PD5
x y
z
Port C
SPI
(Port B)
PA0-PA2
nRF24L01
Module
LCD LEDs
Port C
SPI
(Port B)
PD0-PD3
LEDs
L293D
H-Bridge
M
M
nRF24L01
Module
ATmega16 @ 8 MHz
(nRF24L01+ with
ARC4 and H-Bridge
interface)

The system is divided into two broad sub-
subsystems: Control Module and Multi-Robot
Module. The Control Module is further divided into
two layers: the topmost layer and the second layer.
3.1.1 Control Module
The topmost layer of the control module consists of
ATMega32, where speech recognition, MMA7260Q
accelerometer sensing, output to 16x2 text LCD are
handled. The 2nd
layer consists of ATMega16 where
the nRF24L01 wireless routine as well as encryption
and decryption with ARC4 cipher are implemented.
The bridge protocol between the 1st
and 2nd
layers in
the control module (Fig. 3.1 and 3.2) is designed such
the three output pins of PORTD of ATMega32 viz.
PD2, PD3 and PD4 are connected to the respective
input pins of PORTA of ATMega16 viz. PA0, PA1
and PA3. When the 1st
layer recognizes the spoken
word (front, back, left, right or stop), the equivalent
bit combination is inputted to PORTA of the 2nd
layer
via these bridge lines. The 2nd
layer then sends out
the corresponding control byte wirelessly via SPI
port. When the one of the robot receives this control
byte, it will be decoded into its matching differential
drive motor combinations that will move the robot
physically in the commanded direction.
FUNCTION
Equivalent
received
control byte
PIN A
(Connected to Layer 1)
BINARY HEX
PA2 PA1 PA0
STOP S 0 0 0 00H
FRONT F 0 0 1 01H
BACK B 0 1 0 02H
LEFT L 0 1 1 03H
RIGHT R 1 0 0 04H
SPD_UP U 1 0 1 06H
SPD_DN D 1 1 1 07H
Table 3.1: Function control byte to be sent out via Wireless
(SPI port) and corresponding bit combination inputted to
the second layer of Control Module (PINA).
3.1.2 Robot Module
It consists of two identical robots (A and B) which
can be positioned at different locations, provided they
are within the signal range of the Control Module.
Each robot consists of an ATMega16 with sensors
that take environmental data specifically, LM35
temperature sensor and a light dependent resistor. A
2.4 GHz wireless transceiver nRF24L01 is also
available on-board to receive control data and
transmit the remote data for data acquisition. With
L293D H-Bridge driver, two differential drive motors
are controlled independently so that the robot can
navigate front, back, left or right. Four input pins of
the L293D viz. IN1, IN2, IN3 and IN4 are connected
to four output pins of PORTD of ATMega16 viz.
PD0, PD1, PD2 and PD3 respectively.
FUNCTION
Equivalent
received
control byte
PORT D
(Connected to H-
bridge)
BINARY HEX
(LSB)
IN4 IN3 IN2 IN1
STOP S 0 0 0 0 00H
FRONT F 0 1 1 0 06H
BACK B 1 0 0 1 09H
LEFT L 0 0 1 1 03H
RIGHT R 1 1 0 0 04H
SPD_UP U
SPD_DN D
Table 3.2: Function control byte received via Wireless
(SPI port) and corresponding bit combinations outputted to
H-bridge (PORTD).
3.2 SOFTWARE ARCHITECTURE
3.2.1 Input, Processing, Output
Figure 3.2: Input, Processing and Output block diagram for
speech recognition
At a rate of 4 KHz, the algorithm checks the ADC
input for audio signal. If the ADC value exceeds the
threshold value, it is taken as the start of half a
second long word. The sampled word passes through
Speech ADC Band Pass
Filters
Generate Voice
Fingerprints
Fingerprint
Templates
Control
Signals
Output to
the Robot
COMPA
RE

8 band pass filters to be encoded into a fingerprint.
The words to be matched are stored as fingerprints in
a dictionary so that newly generated sampled
fingerprints can be compared with them later. The
modified Euclidean distance calculation finds the
fingerprint that is the closest match and then sends a
control signal ultimately to the robot to perform
operations like left, right, front, back and stop.
3.2.2 Initial-Threshold Calculation
All the background sound at the startup is considered
to be a base value which improves the accuracy of
the speech recognition. At the start up, the algorithm
reads the ADC input using ATMega32 timercounter0
and accumulates its value 256 times. By interpreting
the reading of the ADC value as a number between 1
to 1/256 in fixed point, and accumulating 256 times,
the average ADC value is calculated without doing a
multiply or divide. Three average values are taken
each with a 16.4 ms delay between the samples. After
this, the threshold value is to be four times the value
of the median number. The threshold value is useful
to detect whether a word has been spoken or not.
3.2.3 Voice-fingerprint Generation
Figure 3.3: Filter implementation block diagram for the
generation of fingerprints
The program considers a word detected if a sample
value from the ADC is greater than the threshold
value. Every sample of ADC stored in an integer
variable Ain which again passes through eight 4th
order band pass filters for 2000 samples (half a
second) once a word has been detected. When a filter
is used its output is squared and that value is
accumulated with the previous squares of the filter
output. After 125 samples the accumulated value is
stored as a data point in the fingerprint of that word.
The accumulator is then cleared and the process is
begun again. After 2000 samples 16 points have
been generated from each filter, thus every sampled
word is divided up into 16 parts. Our assembly
language code is based around using 8 filters and
since each one gives an output of 16 data points and
every fingerprint is made up of 128 data points.
3.2.4 Filter Design and Implementation
Figure 3.4: Band pass Filter 200-400 Hz
Figure 3.5: Band pass Filter 1600-1800 Hz
3.2.5 Digital Filter Implementation
The 4th
order Chebyshev digital filter with 40 dB stop
band was chosen due to very sharp transitions after
the cutoff frequency. Most of the important
frequency content in speech is found to be within the
first 2 KHz as it usually contains the first and second
speech formants. Thus 8 BPFs of frequencies ranging
from .2 to 1.8 KHz were designed as shown in the
magnitude and phase plot. This also permitted the
sampling at 4 KHz (to satisfy the Nyquist criteria for
sampling first 2 KHz voice frequencies) and enough
time to implement 8 filters. For sufficient frequency
resolution to properly identify words, bandwidth of
each filter is set to 200 Hz.
Each 4th
order filter is created in assembly code by
cascading two 2nd
order IIR filters whose coefficients
ADC FILTER 2
FILTER 9
ACCUMULATOR
ACCUMULATOR
VOICE
FINGERPRINT

are generated using Matlab (Listing 1.2.1). Floating
point coefficients are converted to fixed point by
multiplying them by 256 and rounding off to nearest
integer in real-time. Fixed point was used instead of
floating point (which would have been more
accurate) as floating point calculations of ATMega32
is too slow to call all the filters within 4 KHz.
The ATMega32 only has 2 KB of RAM and a word
sampled at 4 KHz for a half a second would require
entire 2 KB. In order to make a fingerprint then from
a word the ADC output has to pass through all the
filters faster than the ADC sample time of 250 µs.
The output of the filter was squared in order to store
the intensity of the sound rather than just the
amplitude. Since the lowest and highest frequencies
could be neglected without noticeable degradation in
accuracy of speech recognition and that the memory
and cycle time of ATMega32 wouldn’t be sufficient
to implement all ten filters, only 8 BPFs was
sufficient to compartmentalize frequencies between
200 Hz - 1.8 KHz.
3.2.7.1 Chebyshev II filter coefficients
# Filter 1 Filter 2 Filter 3 Filter 4
f,KHz
0.2 – 0.4 0.4 – 0.6 0.6 – 0.8 0.8 – 1
1st
2nd
-order
coeff.
A112:451
A113:-248
B111:21
B112:-32
B113:21
A212:355
A213:-248
B211:27
B212:-29
B213:27
A312:224
A313:-248
B311:31
B312:-15
B313:31
A412:72
A413:-248
B411:34
B412:4
B413:34
2nd
2nd
-order
coeff.
A122:458
A123:-248
B121:2225
B122:-4285
B123:2225
A222:366
A223:-248
B221:1090
B222:-1826
B223:1090
A322:239
A323:-248
B321:762
B322:-965
B323:762
A422:88
A423:-248
B421:633
B422:-464
B423:633
Gain
G1=80 G2=120 G3=140 G4=160
Table 3.3: MATLAB filter coefficients for Chebyshev II
(40 dB stop band) Filters 1-4
# Filter 5 Filter 6 Filter 7 Filter 8
f,KHz
1 – 1.2 1.2 – 1.4 1.4 – 1.6 1.6 – 1.8
1st
2nd
-order
coeff.
A512:-72
A513:-248
B511:34
B512:-4
B513:34
A622:-239
A623:-248
B621:762
B622:965
B623:762
A712:-355
A713:-248
B711:27
B712:29
B713:27
A812:-451
A813:-248
B811:21
B812:32
B813:21
2nd
2nd
-order
coeff.
A522:-88
A523:-248
B521:633
B522:464
B523:633
A622:458
A723:-248
B721:2225
B722:-4285
B723:2225
A722:-366
A723:-248
B721:1090
B722:1826
B723:1090
A822:-458
A823:-248
B821:2225
B822:4285
B823:2225
Gain
G5=160 G6=140 G7=120 G8=80
Table 3.4: MATLAB filter coefficients for Chebyshev II
(40 dB stop band) Filters 5-8
3.2.6 Wireless Packet Format
The preamble byte composed of alternating zeros and
ones is sent first, followed by five bytes address field.
Data payload of user settable length (1-32 bytes) is
sent next. Two versions of payload was implemented
i.e. 2 bytes payload was primarily used having only
the encrypted byte and a count byte, however for data
acquisition from temperature and light sensors from a
remote location, an 18 byte payload version was
designed. The final part is the two byte long CRC.
3.2.7.2 Wireless Data Payload format
The data payload for control module is of two types:
transmitter mode and receiver mode payload, both
having 18 bytes payload width. Control module has
to transmit data packets to individual robots and also
receive sensor data from replying robots. So it has to
hold the entire payload of the each robot (two in our
case) for both transmission and reception. Two 18
bytes char arrays data_tx1 and data_tx2 stores the
transmission mode payload while the other two
arrays data_rx1 and data_rx2 are for the receiver.
Both the payload sizes are of PAYLOAD_SIZE (18
bytes) defined in the wireless routine of ATmega16.

Figure 3.6: Transmitter Mode Payload
Out of three blocks, the first 16 bytes block holds the
data text to be sent from the control to the robot
modules. For inputting the text data, we use
RealTerm to send the text from the computer to the
MCU via UART for sending text messages to the
individual robots at different locations. The control
block is formed by the 1 byte data_control which
stores the ASCII characters: ‘F’, ‘B’, ‘L’, ‘R’ and ‘S’,
representing the control information for front, back,
left, right and stop. When the targeted robot receives
the control information in transmitter payload, it will
interpret the ASCII control byte as the corresponding
robot movement commands.
The ARC4 cipher is used to encrypt the control byte
and data text block. Being a stream cipher, the byte
count must be kept up to date (missing a packet will
result in an incorrect decryption from that point on),
so a packet count byte is added with each packet that
keeps a packet count. This allows the local unit to
catch up to the correct byte in the PRGA (assuming
targeted robot misses less than 256 packets in a row).
Figure 3.7: Receiver Mode Payload
Receiver mode payload is needed to receive the
encrypted data payload sent by the individual robots,
decrypt the encrypted block by syncing with the help
of packet count, segregate the sensor readings of
temperature (2 bytes), light (2 bytes) and speed
setting (1 byte) and store it in their respective
variables for data logging. For data acquisition, the
payload is divided into temperature and light blocks.
The readings from the two sensors in each robot are
stored in their respective integer variables and sent to
the control module in packet format.
3.2.7.3 Source and destination data pipe addressing
Using switches connected to PD4 and PD5 of
ATMega16, the user can select whether the control
byte generated by roboControl function is directed to
control bytes data_control1 or data_control2 which
are concatenated to respective data packets for each
robot. The user would thus be able to select to which
robot the current command would be directed to. This
technique would enable the realization of multi-robot
control paradigm from a single control module.
For implementing a minimalistic Star network
topology, the receiving pipes of control module,
Robot1 and Robot2 are 0, 1 and 2 respectively and
the corresponding pipe addresses are E7:E7:E7:E7:E7,
C2:C2:C2:C2:C2, C2:C2:C2:C2:C3. The rest of the five
data pipes in each of the three linking modules are
disabled to effectively block reception of packets
destination was elsewhere. Prior to transmitting a
data packet, the destination address should be set.
Figure 3.8: Minimalistic Star Network Topology for
establishing communication link between Control and
Robot Agent modules and their respective destination
multi-pipe addressing
3.2.7 ARC4 Cryptography
ARC4 generates a pseudorandom stream of bits
(keystream) which, for encryption, is combined with
the plaintext using bit-wise xor; decryption is
performed in the same way (since xor is a symmetric
operation). To generate the keystream, the cipher
Data text (data_text1, data_text2)
(16 bytes)
17 16 15----------------------------------------------- 0
Packet
count
(1 byte)
Control
(1 byte)
data_control
Encrypted Block
T T T T T T T T T T T T T T T
T
PAYLOAD_SIZE (18 Bytes)
data_tx1
data_tx2
Padding bits
(12 bytes)
Light
(2 bytes)
P P P P P P P P P P
P P
L L T T
17 16 15 ------------------------ 4 3 2 1 0
Speed
(1 byte)
Packet
count
(1 byte)
Temp
(2 bytes)
PAYLOAD_SIZE (18 Bytes)
Encrypted Block
E7:E7:E7:E7:E7
Pipe 0
Robotic
Agent I
Pipe 2Pipe 0
C2:C2:C2:C2:C2
Pipe 1
C2:C2:C2:C2:C3
Pipe 2
Pipe 1 Pipe 0
P5 P4 P3 P2 P1
P0
P5 P4 P3 P2 P1
P0
P5 P4 P3 P2 P1
P0 TX
TX
Robotic
Agent II
TX
Control Module
Communication
Link (Pipe
Destination)

makes use of a secret internal state which consists of
two parts:
 A permutation of all 256 possible bytes (denoted
"S" below).
 Two 8-bit index-pointers (denoted "i" and "j").
The permutation is initialized with a variable length
key, typically between 40 and 256 bits, using the key-
scheduling algorithm (KSA). After this, the stream of
bits is generated using the pseudo-random generation
algorithm (PRGA). The ARC4 cipher is implemented
in conjugation with the wireless routine of
ATMega16 of both control and robot modules.
3.2.8 MMA7260Q Tilt Sensing
Figure 3.9: Overall accelerometer tilt sensing algorithm
MMA7260Q has three sensor output pins viz. X, Y
and Z connected to three of the ADC inputs viz. PA3,
PA4 and PA5 of ATMega32. The robot functions
(front, back, left and right) are controlled in either
Speech or Accelerometer mode. In the latter, the tilt-
sensing algorithm samples the X, Y, Z values for
origin first into xyzOrigin, and rapidly stores the
remaining into xyzADCArray. These arrays are used
by the three decision blocks to determine the speeds
in the individual directions. In the speed and decision
block, once the speed either in positive or negative
direction (depending on accelerometer orientation) is
determined, decide whether the function to be
interpreted is a front, back, left, right or stop. For this,
the calculated speed in either X or Y has to exceed a
predefined threshold, to consider the movement data
valid. The decision of the command interpreted by
the algorithm is sent to the roboControl function
which conveys it ultimately to one of the robots.
Figure 3.10: Flowchart showing xSpeed determination and
decision making of robot functions (FRONT and BACK)
Yes
Samples X, Y, Z values for
Origin into xyzOrigin array
No
STAR
T
Initialize:
*Origin & Speed variables for x, y, z
XOrizin = yOrigin = zOrigin = 0
XSpeed = ySpeed = zSpeed = 0
*Configure ADC pin = 3 to 5
*Initialize LCD
Determine xSpeed, ySpeed and
zSpeed (REFERENCE AXIS)
Decision of robot function
Send appropriate control signal
ADC
conversi
on
Store the remaining values
into xyzADCArray
No
Yes
Yes
No
Yes
Yes
No
Yes
Decision
= FRONT
Decisio
n =
STOP
Decision
= LT/RT
Is
xADCArr
ay>
xSpeed = xADCArray -
xOrigin
(+ve speed value)
From
xOrigin
From
xADCArray
xSpeed = xOrigin -
xADCArray
(-ve speed value)
Is xSpeed
>
threshold
Decision
= BACK
AXIS=
1?
Decisio
n =
Is xSpeed
>
threshold
AXIS=
1?
Send decision to roboControl function

4. RESULTS
4.1 Time domain waveform
The figure depicts different time domain waveforms
of the spoken word generated by Matlab. The time
duration of the spoken words front, left and right are
approximately of 4s duration. The word back is of
lowest duration of 2s due to which it is recognized
with least accuracy relative to other five words while
stop is of highest duration of 5s accuracy is highest.
Figure 4.1: Time domain representation of Back
Figure 4.2: Time domain representation of Stop
4.2 Frequency domain waveform
These figures depict the spectral analysis (discreate
fourier transform) of the sampled time domain data
generated using Matlab.
Figure 4.3: FFT of the word Back
Figure 4.4: FFT of the word Stop
4.3 Dictionary data points for voice fingerprints
Table 4.1: Dictionary data points for the word FRONT
stored in the flash memory
128 data points for each of the five words are logged
via RealTerm in similar manner during the training
stage and stored as dictionary in the flash memory.
4.4 Speech Recognition
Figure 4.5: Recognition Probability Comparison
85%
90%
95%
100%
Front Back Left Right Stop
95%
90%
95% 95%
100%
Recognition Probability
Number of Testing=20
Filter
1
Filter
2
Filter
3
Filter
4
Filter
5
Filter
6
Filter
7
Filter
8
731
831
723
2343
4838
2514
7815
1085
681
1025
707
1057
625
309
172
672
177
346
307
364
95
59
10
0
0
0
0
35
4
0
0
0
3120
3704
4341
1001
1957
5105
288
51
156
31
0
732
175
4
0
44
474
1188
1966
539
167
184
78
0
30
52
30
193
0
0
0
0
7662
4377
3991
2200
1639
347
561
134
0
23
20
1309
874
0
0
0
1564
789
4137
1752
1311
1629
52
5
34
68
123
728
343
120
77
76
385
183
306
171
553
163
3
56
72
123
68
219
196
42
41
37
704
764
796
950
2347
1998
489
665
266
379
137
138
729
944
1400
516

The accuracy of the speech recognition was within an
acceptable range of above 90% by our initial
expectations of the system design. However,
considering the basic speech algorithm, recognition is
valid only for the same person who underwent the
preliminary voice training to initialize the dictionary
fingerprints. For convenience, the recorded voice of
Oxford dictionary software stored as a .wav file was
played in a relatively quiet surroundings.
4.5 Euclidean Distance Comparison
Figure 4.6: Euclidean Distance Comparison
UART logging from RealTerm was done and the
Euclidean distance comparison was logged with all
five different fingerprints already stored in the
EEPROM. As expected, the word was recognized as
the one with the least distance when comparing with
the five fingerprints.
4.6 Wireless Transmit and Receive
4.6.1 Correct ARC4 Key Encryption/Decryption
The logged data data from the RealTerm is presented
below. It depicts correct ARC4 key encryption and
decryption. If the private key is matched in both the
control and robot modules as shown below, then the
encrypted data is decrypted back to the original data
as the PRGA of robot agent updates 12 times to catch
up with the PRGA of Control module.
CONTROL Initialized!
== Control Module ==
Private Key = SaGuN
- TX to Robot I -
Destination:
C2:C2:C2:C2:C2(Pipe1)
Original:
data_tx1[0]= S
data_tx1[1]=0
ROBOT Initialized!
== Robot Module I==
Private Key = SaGuN
-RX from Control-
Packet received!
Encrypted
data[0]= ‘
data[1]=0
No. of PRGA updates =
Encrypted:
data_tx1[0]= ‘
data_tx1[1]=0
Packet sent!
Current Sequence = 1
- TX to Robot I -
Destination:
Original:
data_tx1[0]= S
data_tx1[1]=1
Encrypted:
data_tx1[0]= ,
data_tx1[1]=1
Packet sent!
12 times
Decrypted
data[0]= S
data[1]=0
-RX from Control-
Packet received!
Encrypted
data[0]= ,
data[1]=1
Decrypted
data[0]= S
data[1]=1
4.6.2 Incorrect ARC4 Key Encryption/Decryption
If the private key is not matched between the two
modules then the encrypted data cannot be decrypted
back to its original data as shown below.
CONTROL Initialized!
= Control Module =
Private Key= VoCoRoBo
- TX to Robot II -
Destination:
Original
data_tx1[0]= S
data_tx1[1]=0
Encrypted
data_tx1[0]= j
data_tx1[1]=0
Packet sent!
- TX to Robot II -
Destination:
Original
data_tx1[0]= S
data_tx1[1]=1
Encrypted
data_tx1[0]= D
data_tx1[1]=1
Packet sent!
ROBOT Initialized!
=Robot Module II=
Private Key = SaGuN
- RX from Control-
Packet received!
Encrypted
data[0]= j
data[1]=0
No. of PRGA updates =
7 times
Decrypted
data[0]= ƒ
data[1]=0
- RX from Control-
Packet received!
Encrypted
data[0]= D
data[1]=1
Decrypted
data[0]= ~
data[1]=1
5. CONCLUSION
This project is based on the implementation of real-
time speech recognition using DSP algorithms such
as Chebyshev IIR filters, accelerometer for tilt-
sensing and establishment of short-range wireless

secure link with ARC4 cipher, all using ubiquitous
low-cost 8-bit microcontrollers. With an accuracy of
the speech recognition above 90%, it shows the
feasibility of the system to be applied in any low cost
applications in real time. It was observed that the
words with greater pronunciation stress were
recognized better. Although for now, the recognition
is accurate only for the same person who trained the
system, it can be expanded to make the system
speaker independent by further research on the
storing and retrieval of the voice fingerprint from a
different media. Multi-channel wireless link with
ARC4 was also successfully implemented to
exchange control and sensor data. As nRF24L01 is
capable of higher speed data transmission, the system
can also be expanded to incorporate other sensors
like audio or video sensors for richer data acquisition
.
6. REFERENCES
[1] T. Aamodt. (2003, April) “Speech Recognition
Algorithm”, University of British Columbia.
http://www.eecg.toronto.edu/%7Eaamodt/ece34
1/speech-recognition
[2] X. Lu, S. Lee, 2006. “Voice Recognition
Security System”, Cornell University
[3] A. Harison, C. Shah, 2006 "Voice Recognition
Car", Cornell University.
[4] B. R. Land; Cornell University; Fixed Point
mathematical function in GCC and assembler;
Optimized 2nd order IIR code.
[5] B. R. Land (2008, September). Fast Digital
Filtering. Circuit Cellar Issue # 218, p. 40.
[6] Application Note AVR201: “Using the AVR®
Hardware Multiplier”, Atmel Corporation.
[7] IIR Design: nauticom.net/www/jdtaft/iir.htm
[8] Brennen Ball; 2007; “Specializing in the NXP
LPC2148 and Microchip PIC18F452
microcontrollers and the Nordic Semiconductor
nRF24L01 2.4 GHz RF link”; diyembedded.com
[9] “Interfacing nRF2401 with SPI” (White Paper),
Nordic Semiconductor.
[10] T. Igoe, “MMA7260Q 3-Axis Accelerometer
Report for PIC 18F252 using PicBasic Pro”,
Sensor Workshop at ITP (January 16, 2006).
[11] Application Note AN3447: “Implementing Auto-
zero calibration technique for accelerometers”,
Freescale Semiconductors.
7. PICTURES
Figure 7.1: Overall System
Figure 7.2: Schematic Diagram of Control Module
Figure 7.3: Schematic Diagram of a single Robot Module

VoCoRoBo: Remote Speech Recognition and Tilt Sensing Multi-Robotic System

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à VoCoRoBo: Remote Speech Recognition and Tilt Sensing Multi-Robotic System

Similaire à VoCoRoBo: Remote Speech Recognition and Tilt Sensing Multi-Robotic System (20)

Dernier

Dernier (20)

VoCoRoBo: Remote Speech Recognition and Tilt Sensing Multi-Robotic System