1. This is actually the page of my thesis and will be discarded after the
print out. This is done because the title page has to be an even page. The
memoir style package used by this template makes different indentations
for odd and even pages which is usally done for better readability.
2. University of Augsburg
Faculty of Applied Computer Science
Department of Computer Science
Bachelor’s Program in Computer Science
Bachelor’s Thesis
Full Body Immersion in Augmented Reality
using Motion Capturing suit and video see-through
technologies.
submitted by
Ali Said
on 31.07.2014
Supervisor:
Prof. Dr. Elisabeth Andr´e
Adviser:
Ionut Damian
Reviewers:
Prof. Dr. Elisabeth Andr´e
5. Abstract
The purpose of this study is to immerse the user into augmented reality. We
aim at achieving realistic interactions between the user and the augmented
reality scene, for that we provide an augmented environment fitted with
several virtual characters. The user controls the characters within the scene
using body movements and gestures to experience concepts such as natural
interactions and action mirroring. This is accomplished using the x-sens
full body motion detection suit and the Oculus Rift head mounted virtual
display along with 2 web-cameras to capture the real environment. The
virtual environment is generated using the Unity game engine over C#.
We describe an evaluation study of the showcase application featuring an
interactive scenario with a virtual agent where several subjects were fitted
with the x-sens suit and the Oculus rift display to evaluate the system and
give feedback via survey questionnaires. We conclude the study with the
results and findings of the evaluation survey and give recommendations for
future work.
v
6.
7. Acknowledgments
I would like to express my gratitude to my advisor Ionut Damian for the con-
tinuous support of my research, for his patience, motivation and knowledge.
I would also like to thank Prof. Dr. Elisabeth Andr´e for her encouragement
and inspiration.
I would like to thank my family and my friends, for the stimulating
discussions and the constant help.
vii
8.
9. Statement and Declaration of
Consent
Statement
Hereby I confirm that this thesis is my own work and that I have docu-
mented all sources used.
Ali Said
Augsburg, 31.07.2014
Declaration of Consent
Herewith I agree that my thesis will be made available through the library
of the Computer Science Department.
Ali Said
Augsburg, 31.07.2014
ix
13. Chapter 1
Introduction
The field of augmented reality has existed for over a decade and has been
growing steadily ever since, the basic goal of most augment reality appli-
cations is to enhance the user’s perception of the real world and introduce
new interactions using virtual objects that co-exist in the same space as
the user. While augmented reality definition broadens beyond visual aug-
mentation, this paper is mainly concerned with optical augmented reality.
[1]
Virtual reality environment supply the user with a surrounding virtual
scene where the user’s real position, orientation and motion are occluded
and thus not perceived by the user. Augmented reality is the middle ground
between the real world and the virtual environment, where the real world
is still the premise of the application but it’s populated by virtual objects.
The user perceives the real world via either projection display; where the
virtual information is projected directly onto the physical objects that are
to be augmented, or see-through displays where the user mounts a head
display that renders virtual data on top of a live feed of real world; this
study uses video see-through via head-worn display.
To integrate the 3D virtual objects into the real environment in real
time, either optical see-through displays or video see-through displays can
be used. Virtual objects in conventional optical see-through displays do not
occlude real objects in the background but rather appear as low opacity
images on top and control over the field of view is limited. However video
see-through can implement designed scene completely since the user’s whole
field of view is computer generated rather than projected. To supply the
video see-through system with live data at a low latency, the stream of video
1
14. 2 CHAPTER 1. INTRODUCTION
input had to be fed to the head mounted display with minimal processing
to avoid shuttering or delays. To simulate realistic and intuitive perception
of the world, this study uses two web-cameras placed over the display to
simulate eyes. The system has to ensure that the set-up is parallax free;
meaning that the user’s eyes and the camera share the same path in terms
of camera alignment, field of view and angle. [2]
The user needs to be placed as a first person previewer of the augmented
world thus arises the need to track the user’s body, orientation and location.
User tracking is very crucial for augmented reality registration. Typically
AR systems can use several tracking techniques such as video tracking, mag-
netic tracking or markers. Video tracking involves video processing of the
scene using computer vision algorithms which can lead to a significant de-
lay. Magnetic tracking requires the pre-building of the virtual environment
on top of a known physical location (typically indoors) which allows the
designing of all the objects of interest within the given area for augmented
interactions. Using markers demands modifying the real environment with
colored fiducial markers and sensors at known locations then monitoring
the various interactions of the user using mounted video cameras to track
the markers. This study introduced another technique to track the user,
the selected methodology was full body tracking via a motion detection
suit that allows for head orientation and body movement tracking as well
as limb location and gesture recognition. As a result the location of most
virtual objects was designed to be bound to the user; rendering the need of
setting up a controlled environment unnecessary. [3]
1.1 Motivation
Augmented reality enhances the user’s perception and interaction with the
real world to allow users to perform tasks that may not be available to
his senses otherwise. This paper is introducing the combination of several
recent techniques of augmented reality and showcasing the result into an
interactive application. The used methods are full body tracking and video
see-through augmentation. Positive results can be a step forward towards
various applications such as simulations and entertainment applications.
Simulation applications can vary from medical experiments simulations to
full body applications such as dancing simulations, this can be used widely
as a teaching methodology with the teacher manifested as a virtual charac-
ter within the same environment as the user. Some entertainment applica-
tions may include immersion in gaming where the user can exist within the
game environment with his real body occluded by armor and certain ges-
15. 1.2. MATERIAL AND METHODS 3
tures used to fire weapons or execute commands. With full body tracking
technology deployed, the possibilities remain vast.
1.2 Material and Methods
To meet the specifications of the application, a certain standard of hardware
was required for capturing and previewing results. For video see-through
augmentation, the selected hardware tool was the Oculus Rift. The Oculus
rift is a head-mounted display screen that views a computer generated video
stream, while the Rift actually provides head tracking data it was not used
within the project for better synchronization. The favored motion detection
hardware for user tracking was the X-sens full body motion tracking suit.
The x-sens MVN suit is fitted with sensors that outputs a skeleton model of
the user’s body. To capture the real world and stream to the Oculus rift, two
Logitech C920 Web-Cameras were used for their field of view, low latency
capturing and to match the rift’s display since the required camera doesn’t
need to match human eye standards; simply the rift’s field of view and
output quality. The supporting software selected was Unity game engine
as an medium to integrate all the hardware equipment over their respective
software.
1.3 Objectives
The aim of the project is creating an augmented reality system where the
whole physical body of the user is immersed into the Augmented environ-
ment where the application is situated in the real world and the point of
view is completely determined by the user’s body location and orientation,
and the virtual information is relayed by a head worn video see-through
display, where the user is mobile and able to move freely and no extra
equipment or devices are needed for virtual interface.
1.4 Outline
This thesis is divided into 5 chapters, the first chapter explains the needed
Theoretical background, followed by the implementation chapter concerning
both software and hardware integrations, the third chapter is concerned
with the created augmented environment. The fourth chapter discuss the
evaluation process and the findings. The final chapter is the summary,
16. 4 CHAPTER 1. INTRODUCTION
conclusion and future work recommendations. The flow of the thesis follows
the actual work course taken during the study.
17. Chapter 2
Theoretical Background
2.1 Augmented Reality
By definition Augmented reality is a variation of virtual reality where the
user isn’t fully immersed into the virtual world but rather the virtual objects
are superimposed over the real world. The ultimate goal of Augmented
reality applications is to surround the user’s real world with virtual objects.
Thus all augmented reality systems need real time rendering methods for
generating the virtual scene elements so that they can overlay the real
objects. Augmented reality systems typically face several problems and
challenges that range from selecting the blending approach to handling
registration and sensing errors.
2.1.1 Augmentation Methods
The main question when designing an Augmented reality system is the
method of augmentation; how to merge the virtual and real worlds. The
basic choices are optical and video technologies. Optical see-through head
mounted display work by placing partially transmissive optical combiners
in front of the user’s eyes so that vision of the real world isn’t occluded, they
are also partially reflective so that the user can see virtual images bounced
off the combiners. On the other hand, video see-through displays are closed
view displays mounted with cameras. The video cameras provide the user
with vision of the real world and the video is combined with the virtual
images generated by the software to achieve the blending. The selected
method was video see-through since it allows better control over the user’s
5
18. 6 CHAPTER 2. THEORETICAL BACKGROUND
view since both the virtual and real images are available for editing in digital
format, the optical method gains control over only the virtual side of the
scene and problems can arise such as synchronization and obscuring issues.
Figure 2.1: User wear-
ing the proposed AR
set-up consisting of
the motion capture
suit and video see-
through head display
2.1.2 Registration
For the illusion that the virtual and real worlds are
merged to hold, the objects in both worlds have
to be properly aligned. Registration address the
problem of proper alignment which can cause sev-
eral problems ranging from failure to achieve the
task at hand in the augmented application to mo-
tion sickness. Registration errors are difficult to
solve due to the numerous sources of error. Static
errors arise even when the user and objects in the
environment are still such as optical distortion and
tracking errors. Dynamic errors only arise when ob-
jects or user start moving and are usually concerned
with the system’s end-to-end delay. Accurate posi-
tioning and registration requires exact tracking of
the user. Environment sensing is a tracking method
where the position of all objects of interest in the
environment are tracked using cameras and fiducial
markers, computer vision techniques can be used to
identify the markers accurately. Another method is
user body tracking where either a part of the user
can be tracked like his head and hands, or his full
body can be tracked using a full body motion de-
tection suit, the later method is used in this study.
[7]
2.2 Oculus Rift
One of the most conventional virtual reality device
set-ups, and thus augmented reality, is the head
mounted display; a device worn over the head cov-
ering the eyes. There are several commercially produced displays that sup-
port stereoscopic display and tracking systems. There is one display screen
19. 2.3. X-SENS 7
per eye and thus stereoscopic vision is achieved by creating two virtual
cameras on the software; one for each display.
The Oculus Rift is a headset built for gaming. It provides a large field
of view of 110 degrees stereoscopic vision and head tracking; although the
rift’s head tracking wasn’t used in this study due to the reliance on the
x-sens MVN body tracking, the rift’s light weight of 379g ensured comfort
and mobility for the user, also the front most surface of the rift is only 5
cm in front of the user’s eyes meaning that mounting cameras on top will
only yield a small offset. The rift creates two virtual cameras that stream
the virtual environment to the display screens on scene render, the software
applies shaders and filters both pre-rendering and post-rendering to ensure
lens corrections in color and curvature, allowing for the customization of
the execution pipeline. [4]
2.3 X-sens
The usual approach of data input in Augmented reality systems focuses on
motion recognition. A device that can read and process natural movements
is a very intuitive method of human-computer interactions. One of the
biggest challenges when developing an AR system is body registration; that
is the geometric alignment of the virtual and real bodies coordinate systems.
The selected solution was the x-sens MVN motion capture suit for full-
body human motion capture. It’s based on miniature inertial sensor, bio-
chemical models and sensor fusion algorithms. The use of the MVN suit
eliminates the need for markers and external cameras and can be used both
outdoors and indoors, it can capture any type of body movements such as
running or jumping. The user wears the suit fixed with mechanical trackers
utilizing fixed or flexible goniometers; angle measuring devices that provide
joint angle data to kinematic algorithms. Accurate orientation estimation
requires the use of signals from gyroscopes, accelerometers and magnetome-
ters; accelerometers provide vertical direction by sensing acceleration due
to gravity while magnetometers provide horizontal stability by sensing the
direction of earth’s magnetic field.
The suit consists of 17 MTx inertial and magnetic 3D sensors connected
to 2 Xbus masters used to synchronize and power the sensors as well as
handle wireless streaming to the computer where the stream is played out
by the MVN studio software. The MVN studio is a graphical software that
allows the user to observe real time stream recordings of the subject or
playback previously recorded tracking. MVN studio has an output stream
20. 8 CHAPTER 2. THEORETICAL BACKGROUND
Figure 2.2: The tracked skeleton by the motion capture suit used within
unity over a virtual model
than can be directed over network to a given port, thus allowing the tracked
skeleton to be used over another computer for a real-time application. [6]
21. Chapter 3
Implementation
3.1 Software Integration
3.1.1 Engine Selection
Several engine options were explored initially over Unity but each had it’s
own problems in the form of compatibility and implementation challenges.
Unity was found to be the most suitable option for creating the virtual
environment. Unity is a portable program available on most platforms
allowing for a range of choices with future applications, also Unity has
extensive physics engine capabilities and a very flexible execution pipeline
that allows control over the rendering process at various states, but mostly
importantly unity supports both x-sens full body motion tracking suit and
the Oculus Rift head display via plug-ins, along with several available plug-
ins for virtual models and animations that were used within the showcase
to simulate most of the interactive tasks.
3.1.2 Character Design
X-sens provides it’s own unity plug-in that links to the x-sens motion track-
ing software called MVN studio, the plug-in grants access over the MVN
motion capture data stream allowing real-time viewing and manipulation of
the tracking skeleton. Oculus Rift also provides a unity plug-in; the Oculus
rift supplies a virtual set of OVR cameras that capture the virtual environ-
ment around normally, then the OVR cameras streams the captured scene
to the Oculus rift head display screen, populating the user’s field of view
9
22. 10 CHAPTER 3. IMPLEMENTATION
with the captured virtual content.
The two plug-ins where edited together into a humanoid model where
the MVN skeleton was bound to our virtual model’s body overlay so that
they share the same movements and gestures, then the virtual OVR cameras
were installed over the head of the virtual character to represent the eyes;
while the Oculus rift plug-in includes head-tracking capabilities, we used
the x-sens skeleton with all required motion tracking for clearer and more
synchronized data. This setup allowed for a humanoid virtual model that
mirrors the user completely creating a copy into the virtual world. Since
the real Oculus rift is to be mounted over the user’s head and the virtual
Oculus rift is mounted on the model’s head, and the model mimics the
user’s movements in the real world, the module actually occludes the user’s
body within the application, meaning that the user can see the virtual
model’s hand over his real hand when looking down. This allows for gesture
recognition to trigger on body effects such as a control pad mounted over
the user’s hands used to control the environment.
3.2 Hardware Modifications
3.2.1 Camera Setup
Figure 3.1: The camera set-up installation over the Oculus Rift display
23. 3.2. HARDWARE MODIFICATIONS 11
Augmented reality requires the blending of real and augmented worlds,
cameras need to be used to capture the world from the user’s perspective so
that the real-time video can be processed and augmented with the graphical
content to be viewed by the user. The camera requirement should not match
human vision but rather the head display screen.
There are several key requirements to match between the Oculus Rift
and the cameras:
1. 800x600 pixel resolution per camera to match the rift eye resolution.
2. 1.33:1 sensor display since the rift uses side-by-side stereo with 1.33:1
vertical to horizontal and thus the cameras have to be mounted in
portrait mode.
3. 120°field of view lens to match the rift’s.
4. 60Hz capture rate to match the rift’s refresh rate.
The selected camera was the Logitech C920 web-cameras which were
slightly modified to match the stated requirements. Wrongly mounted cam-
eras can lead ti several problems such as convergence issues and motion
sickness. The cameras had to mounted in a way that ensures comfort and
smooth precision, but more importantly be intuitive such that the eye con-
vergence to focus and eye rotation in the virtual plane should match the
real process.[5]
3.2.2 Hardware Adjustments
The main problem was achieving the required degree of precision. The first
components were 3D printed arms that can slide onto the front of the rift,
making the cameras adjustable horizontally. Then the camera needed to be
mounted on top of the 3D arms. There are two main ways to mount a stereo
camera rig, either parallel or toed-in. To design a parallel camera set-up,
physically modifying the set-up by horizontally shifting the lens. Toed-in
cameras are rotated inwards such that their optical axes intersect at a given
point midway of the scene rather than at infinity. While each set-up has it’s
own problems, the selected method was toeing in the cameras for simplicity.
To achieve that further 3D pads were printed and the cameras were parallel
mounted on top of the pads, then the pads were installed over the 3D arms
such that they create a horizontally adjustable angle with the arms, with
both opposite pads rotated towards the center they create a set-up with a
focus point of 1 meter in front of the user’s eyes.
24. 12 CHAPTER 3. IMPLEMENTATION
3.2.3 Software Implementation
The game engine Unity’s offering method when dealing with live input from
webcameras is a texturing technique called WebcamTexture, where unity
can list all connected camera devices and play a selected stream as an output
to a texture. A texture is a render of an image that can be applied over
a given surface. The initial approach was applying the WebcamTexture
to a plane that lies directly in front of the Oculus rift’s virtual cameras
(OVRcamera) thus populating it’s field of view and restricting it to the
texture. A problem arose because this technique meant that the surface
holding the texture had to be bound to the Oculus’s virtual camera, making
the distance at which this surface is present and the movements a dynamic
variable which needs adjustment per scene. The initial approach intended
to leave the execution pipeline of the Oculus unity plug-in intact, since that
introduced other problems another solution had to be found.
Unity also offers a view stream within the execution pipeline, a stack
which can hold different images per frame and compile them all together
to create the actual scene frame. The second approach involved manipu-
lating the OVRcamera rendering pipeline such that the WebcamTexture is
applied pre-rendering into the OVRcamera view stream; on rendering the
OVRcamera simply views the virtual scene created by unity in front of it
and then adds it to the view stream. This effectively results in making
the WebcamTexture the background to the scene and the virtual objects
the foreground of that same scene, which simulates augmentation. Several
transformation had to be applied over the WebcamTexture so that it can
fit appropriately into the stream and be dynamically adjustable via custom
shaders.
25. 3.2. HARDWARE MODIFICATIONS 13
Algorithm 3.1 Virtual OVRCamera integration algorithm with the Real
Logitech Camera into the unity execution pipeline.
/* Code in Parent Class OVRCameraController */
// geting all connected webcam devices
WebCamDevice [] devices = WebCamTexture.devices;
// initiating devices with dimensions and frame rate
WebCamTexture left = new WebCamTexture(devices [0]. name ,
1280, 720, 30);
WebCamTexture right = new WebCamTexture(devices [1]. name ,
1280, 720, 30);
// passing WebcamTextures to children
CameraRight.GetComponent <OVRCamera >().SetWebcam(right);
CameraLeft.GetComponent <OVRCamera >().SetWebcam(left);
/* Code in Children Class OVRCamera */
// clearing the execution pipeline
GL.Clear(true , true , camera.backgroundColor );
// start recording from the real camera
logitechCameraTexture .Play ();
// applying appropriate shaders to the texture material
rotation = GetComponent <OVRLensCorrection >(). GetRotMaterial(
yAdd , xAdd , Zoom);
// merging the WebcamTexture with the virtual CameraTexture
that views the virtual scene.
Graphics.Blit(logitechCameraTexture , CameraTexture , rotation
);
26.
27. Chapter 4
Augmented Environment
4.1 The Scene
Figure 4.1: An overview of the virtual scene populated by the virtual char-
acters and models within unity
The augmented environment is populated with several virtual characters
and virtual interactions; the main control panel is mounted on the user’s
body and thus, a virtual model of the user’s physical body had to be de-
signed. The user’s model is a one-to-one replica of the actual user taking
it’s skeleton animation as a stream via the x-sens full body motion tracking
suit. Two extra versions of the same model are duplicated each with a 90
degree rotation to be looking sideways from the main models perspective as
15
28. 16 CHAPTER 4. AUGMENTED ENVIRONMENT
to simulate real-time one-to-one mirroring of the user. Directly in front of
the user is a virtual agent acting as a guide within the augmented premise
to guide the user through the available options and tasks. Also present
are two virtual teddy bears, designed to simulate automated behavior, the
teddy bears are programmed to follow the user in the augmented environ-
ment as he moves around. Finally there are two virtual characters to be
controlled; the first is a virtual character that can either jump or wave
based on user input, the second is a virtual auto-bot that moves around is
a square per user command. All the virtual characters are controlled via
the main control panel projected over the user’s left wrist and accessible
via the user’s right hand.
4.2 Interactions
4.2.1 Actions
Figure 4.2: The home pad of the main control panel used to manipulate
the virtual objects
The main control panel is the basic method to manipulate all the events
and characters in the augmented scene. To enable the panel the user has to
touch his left wrist using his right hand, this pops up the control panel that
provides several options as a multi level displays. The base display simply
offers navigation to the three other displays or closing the panel. The first
display provides control over the first virtual character, where the user can
29. 4.2. INTERACTIONS 17
either make the character jump or wave. The second display grants control
over the virtual auto-bot, to either toggle it’s visibility in the virtual scene
or to trigger it’s square based movement. The third display offers the user
the option to change it’s virtual model’s outfit; effectively changing what
the user seems to be wearing. It also allows the enabling of the virtual
teddy bears to either follow the user or stand still.
4.2.2 The Task
The user is expected to complete a virtual task that he’s guided through
by the virtual agent. After briefly introducing the virtual environment and
characters the agent asks the user to enable the main control panel and
try applying manipulation over the first virtual character, then suggests
changing his virtual outfit. After that the user is encouraged to experiment
all the available options. Last the user should complete the interactive task
that is kicking a virtual ball towards the agent, upon hitting the agent the
task is completed and the user is free to further explore the environment.
30.
31. Chapter 5
Evaluation
5.1 The Survey
The survey questionnaire included items that evaluates the realism and ef-
fect of augmented immersion, also believability and spacial presence. The
survey subjects were selected randomly from diverse backgrounds. The sub-
jects were introduced to the augmented reality application and expected to
follow descriptions and instructions provided by the virtual guide to ac-
complish a given task. The survey questions focused on how believable the
virtual environment was and how the interactions and movement perception
by the system felt. It also asked about the effect of the virtual agent guid-
ing the user on the experience. The experiment procedure was explained
beforehand to the users as well as a more vague description of the task and
the environment. The task as presented by the virtual guide, required the
test subjects to explore the environment by following the given description,
events marked by the agent would trigger the state advancement of the task.
The final state required the user to kick a virtual ball at the agent, upon
succeeding to do so the virtual guide goes to idle as to leave the subject
to explore the surrounding environment freely. Answers to the questions
are provided on a 5-point scale ranging from strongly disagree to strongly
agree.
5.2 The Results
Several conclusions can be derived from the survey. The majority of the
subjects agreed that the experience felt realistic and that they achieved
19
32. 20 CHAPTER 5. EVALUATION
immersion into the virtual environment with all the characters, which con-
cludes that the experience set-up was successful and can be used with fu-
ture applications. However the virtual guide helpfulness was not capturing
enough since subjects stated that the guide was distracting from the task,
showing the need for better human-computer interaction to allow for better
human simulation by the program. While it was agreed that the character
animations were normal, the subjects also agreed that the control panel
interactions were natural, they also agreed that there was neither a de-
lay nor a premature response within virtual interactions, this was due to
the optimization of the pipeline to achieve smoother execution. Despite
little comments on the head display pixelation, most users agreed that the
display’s quality and consistency was acceptable, however room for develop-
ment in the form of better cameras and higher quality displays still stands.
The extensive amount of wiring made movement limited but the subjects
agreed that the hardware was comfortable.
The main point of disagreement was the body movement tracking, the
task of kicking the ball proved challenging because the virtual body respon-
sible for all interactions in the real world didn’t align perfectly with the real
body, while this didn’t introduce difficulty while interacting with the con-
trol panel using the virtual hand, kicking the ball with the virtual feet
proved very difficult. The subjects commented that the difference between
the virtual and real bodies was the main reason for failure to complete the
task and that model needed better alignment. Some users also commented
about the environment since the experiment carried out in a closed room
with no fixed settings, making some virtual characters pass through walls
or real objects. The survey conclusion proves that while the need for bet-
ter equipment and programming would prove yet helpful, the main point
that requires perfection is alignment, full body tracking aligned the user
correctly but video-see through technology introduced other problems such
as scaling, displacement and the need for better limb positioning.
The proposed solution is designing real body measurement into the vir-
tual model so that the virtual and real bodies share the same body height
and dimensions, also positioning the virtual replica of the head display cam-
eras according to the real one as well, since the real cameras distant the
user’s actual eyes by a 4 centimeters that should be accounted for, making
the whole virtual set-up an exact replication of the user in real life includ-
ing all the mounted hardware. Another conclusion is that the pre-designing
of the virtual environment into the real one seemed crucial, to solve the
environment merging problem without using markers or designing a fixed
environment; depth mapping technology can be implemented using image
processing algorithms for better efficiency and minimal delay to build a
33. 5.2. THE RESULTS 21
depth map of the real world.A depth map is a 3D map that presents the
position of each character by the opacity of that character in a video stream,
usually captured by a depth camera, the depth map can be used to create
a 3D model of the view in front of it, efficiently creating the real terrain
capture into a virtual model, thus collisions and interactions of the real and
virtual will be easily handled.
34.
35. Chapter 6
Summary
Augmented reality concerns merging the real and virtual worlds together to
enhance the user’s perception of the surrounding environment. This paper
introduces an approach towards augmented reality using full body tracking
and video see-through head mounted display. The selected hardware for
tracking was the x-sens motion capture suit and the selected head mounted
display was the Oculus Rift, the hardware integration took place over the
unity game engine. A virtual model was designed to reflect the user’s body
in the virtual environment that matches the user’s motion and was fitted
with the rift’s camera controller to act as a point of view into the virtual
world. Merging the virtual and real worlds required capturing the real view
in front of the user and playing it back to him via the Rift, two cameras
were fitted on top of the Rift and their output was used as the background
video for the virtual scene.
To evaluate the system a showcase application was designed where the
user had several virtual characters to interact with and a virtual guide to
explain the surroundings and clarify the required task. Several users took
part as test subjects and were presented with a questionnaire to evaluate
the experience. The main conclusion is that the exact replication of the
user’s real body into the virtual environment was most crucial such that
the user’s real and virtual bodies completely align. Since the virtual limps
were responsible for all interactions within the augmented reality, any slight
mis calibration or noticeable difference would cause confusion and failure
to complete the task by the user. Another conclusion is the need of control-
ling the surrounding environment so that the virtual characters don’t walk
through real walls or objects. Since the reason for using full body track-
23
36. 24 CHAPTER 6. SUMMARY
ing was to eliminate the need of designing the virtual environment to fit
a specific real place for application use, the suggested approach to solving
that problem was depth mapping, this was to be achieved by using a depth
camera to produce a depth map of the real world that can be transformed
into a 3D space and built into the virtual world.
6.1 Future Work
As part of future work towards better immersion with augmented reality
set-ups, several aspects can be improved and introduced. A device that
can save a lot of processing time and hardware issues is a head worn dis-
play with embedded cameras for the purpose of video see-through, where
the device can have 2 cameras already positioned over eyes that stream
directly to it’s own virtual screen where the position and zoom of the cam-
era feed can be designed into the device. A useful addition is the use of
depth cameras, this proves useful with the design of a depth map of the
real world and can also be used to calibrate the user’s real body dimensions
and thus allow for hybrid tracking along with full body tracking for opti-
mal registration, the tracked body can be fed into a gesture recognizer to
react to user’s movements and postures. A fully wireless set-up is easy to
achieve and can be very useful for better user mobility and comfortableness.
Finally introducing augmented reality to another sense will improve body
immersion, this can be achieved by sound augmentation, playing back sur-
rounding sounds to the user using headsets and microphones after adding
the virtually created sounds.
37. Bibliography
[1] Ronald Azuma, Yohan Baillot, Reinhold Behringer, Steven Feiner, Simon
Julier, and Blair MacIntyre. Recent advances in augmented reality. Computer
Graphics and Applications, IEEE, 21(6):34–47, 2001. [cited at p. 1]
[2] Ronald T Azuma et al. A survey of augmented reality. Presence, 6(4):355–
385, 1997. [cited at p. 2]
[3] Hirokazu Kato and Mark Billinghurst. Marker tracking and hmd calibration
for a video-based augmented reality conferencing system. In Augmented Real-
ity, 1999.(IWAR’99) Proceedings. 2nd IEEE and ACM International Work-
shop on, pages 85–94. IEEE, 1999. [cited at p. 2]
[4] Paul Milgram, Haruo Takemura, Akira Utsumi, and Fumio Kishino. Aug-
mented reality: A class of displays on the reality-virtuality continuum. In
Photonics for Industrial Applications, pages 282–292. International Society
for Optics and Photonics, 1995. [cited at p. 7]
[5] Ye Pan, William Steptoe, and Anthony Steed. Comparing flat and spherical
displays in a trust scenario in avatar-mediated interaction. In Proceedings of
the 32nd annual ACM conference on Human factors in computing systems,
pages 1397–1406. ACM, 2014. [cited at p. 11]
[6] Daniel Roetenberg, Henk Luinge, and Per Slycke. Xsens mvn: full 6dof human
motion tracking using miniature inertial sensors. Xsens Motion Technologies
BV, Tech. Rep, 2009. [cited at p. 8]
[7] Bruce Thomas, Benjamin Close, John Donoghue, John Squires, Phillip
De Bondi, Michael Morris, and Wayne Piekarski. Arquake: An outdoor/in-
door augmented reality first person application. In Wearable Computers, The
Fourth International Symposium on, pages 139–146. IEEE, 2000. [cited at p. 6]
25
38.
39. List of Figures
2.1 User wearing the proposed AR set-up consisting of the motion
capture suit and video see-through head display . . . . . . . . . 6
2.2 The tracked skeleton by the motion capture suit used within
unity over a virtual model . . . . . . . . . . . . . . . . . . . . . . 8
3.1 The camera set-up installation over the Oculus Rift display . . 10
4.1 An overview of the virtual scene populated by the virtual char-
acters and models within unity . . . . . . . . . . . . . . . . . . . 15
4.2 The home pad of the main control panel used to manipulate the
virtual objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
27
40.
41. List of Algorithms
3.1 Virtual OVRCamera integration algorithm with the Real
Logitech Camera into the unity execution pipeline. . . . . . 13
29