Advanced Interaction Techniques for Medical Intervention and Diagnosis
LazarusPatent2
1. Invention Title
Means and Methods of Acquisition and Animation
[001] Cross – Reference to Related Applications
[002] Not Applicable
[003] Background of the Invention
[004] (1) Field of the Invention
[005] The invention generally relates to visual special effects. More
particularly, the invention relates to means and methods of capturing human face
attributes and projecting such attributes upon a stationary or moving three
dimensional model.
[006] (2) Description of the Related Art
[007] The known related art fails to anticipate or disclose the principles of the
present invention.
[008] Other systems of facial or human capture use markers to map face
geometry to projected or digitally created images. Systems of the prior art are
computationally intensive, expensive and fail to provide realistic depictions of
human expressions. Thus, there is a need in the art for the presently
disclosed embodiments.
[009] Summary of the Invention
[0010] The present invention overcomes shortfalls in the related art by
presenting an unobvious and unique combination and configuration of methods
and components to allow film makers and others to create static or moving high
resolution images of human faces having changing expressions. The present
invention overcomes shortfalls in the art by using a single camera to efficiently
capture facial attributes and changing facial expressions.
2. First Named Inventor: Zebediah Ysable De Soto; Docket DeSoto101P
Pat. Atty. Steven A. Nielsen, Reg. No. 54,699 Customer No. 45004
[0011] Embodiments of the present invention use a helmet with a camera
pointed to the actor to capture the actor’s facial attributes and dynamic facial
expressions. The invention overcomes shortfalls in the art by introducing new
efficiencies with a single camera and helmet mounted system that eschews the
complex multi camera and 3D systems of the prior art. Other advantages over
the prior art are discussed herein.
[0012] After a single helmet camera records facial expressions, facial
attributes, dynamic facial expressions and other data obtained from the actor the
recorded two dimensional data may be efficiently projected upon a three
dimensional model of the actor’s face and/or body. The use of two dimensional
capture data transposed to a three dimensional model achieves exceptional
results with low hardware and labor costs. Achieved results in quality surpass the
more expensive and labor intensive means and methods of the prior art.
[0013] The invention has many applications that include the capture of
persons who are no longer available. For example, two dimensional images may
be obtained from deceased actors and then reproduced in the form of new
moving images with realistic facial expressions.
[0014] The invention overcomes shortfalls in the related art by presenting
various embodiments enabling a neophyte user to create accurate one-to-one
mapping of an actor’s face to a fully editable digital model.
[0015] The invention overcomes shortfalls in the art by providing means and
methods of simultaneously capturing and projecting data while an actor is on
stage.
[0016] These and other objects and advantages will be made apparent when
considering the following detailed specification when taken in conjunction with
the drawings.
Brief Description of the Drawings
[0017] Fig. 1 depicts means and methods of image capture
Page | 2
3. First Named Inventor: Zebediah Ysable De Soto; Docket DeSoto101P
Pat. Atty. Steven A. Nielsen, Reg. No. 54,699 Customer No. 45004
[0018] Fig. 2 depicts a topography netting placed upon a digital representation
of an actor
[0019] Reference Numerals in the Drawings.
[0020] 100 reserved
[0021] 101 reserved
[0022] 200 reserved
[0023] 300 reserved
[0024] 400 reserved
DETAILED DESCRIPTION OF EMBODIMENTS OF THE INVENTION
[0025] The following detailed description is directed to certain specific
embodiments of the invention. However, the invention can be embodied in a
multitude of different ways as defined and covered by the claims and their
equivalents. In this description, reference is made to the drawings wherein like
parts are designated with like numerals throughout.
[0026] Unless otherwise noted in this specification or in the claims, all of the
terms used in the specification and the claims will have the meanings normally
ascribed to these terms by workers in the art.
[0027] Unless the context clearly requires otherwise, throughout the
description and the claims, the words "comprise," "comprising" and the like are to
be construed in an inclusive sense as opposed to an exclusive or exhaustive
sense; that is to say, in a sense of "including, but not limited to." Words using the
singular or plural number also include the plural or singular number, respectively.
Additionally, the words "herein," "above," "below," and words of similar import,
when used in this application, shall refer to this application as a whole and not to
any particular portions of this application.
[0028] The above detailed description of embodiments of the invention is not
intended to be exhaustive or to limit the invention to the precise form disclosed
Page | 3
4. First Named Inventor: Zebediah Ysable De Soto; Docket DeSoto101P
Pat. Atty. Steven A. Nielsen, Reg. No. 54,699 Customer No. 45004
above. While specific embodiments of, and examples for, the invention are
described above for illustrative purposes, various equivalent modifications are
possible within the scope of the invention, as those skilled in the relevant art will
recognize. For example, while steps are presented in a given order, alternative
embodiments may perform routines having steps in a different order. The
teachings of the invention provided herein can be applied to other systems, not
only the systems described herein. The various embodiments described herein
can be combined to provide further embodiments. These and other changes can
be made to the invention in light of the detailed description.
[0029] Any and all the above references and U.S. patents and applications
are incorporated herein by reference. Aspects of the invention can be modified,
if necessary, to employ the systems, functions and concepts of the various
patents and applications described above to provide yet further embodiments of
the invention.
[0030] These and other changes can be made to the invention in light of the
above detailed description. In general, the terms used in the following claims,
should not be construed to limit the invention to the specific embodiments
disclosed in the specification, unless the above detailed description explicitly
defines.
[0031] General Architecture
[0032] Disclosed embodiments present new and unobvious means, methods,
and approachs to the acquisition and animation of high- and low-resolution facial
geometry and skin. The first step is acquiring a reasonable static 3D
representation of the actor’s face at the highest possible quality in terms of
general overall scale. We then augment a traditional helmet with a single camera
positioned directly center of an actor’s face to track expression. At its optimum
form, the resulting model comprises of high-resolution geometry embedded with
shape-deforming motion capture data and tissue capture data. The end result
creating an eerily accurate 1 to1 match of the original subject’s facial movements
Page | 4
5. First Named Inventor: Zebediah Ysable De Soto; Docket DeSoto101P
Pat. Atty. Steven A. Nielsen, Reg. No. 54,699 Customer No. 45004
transplanted onto a fully editable digital model. We present the results of our
approach for performance replay as well as tissue capture editing.
[0033] 1 Introduction
[0034] The ability to accurately capture the likeness and dynamic
performance of a human face with all its infinite subtleties has been considered
the Rosetta Stone of the VFX industry.
[0035] From birth, mankind’s very physiology has been designed to detect
and recognize subtle facial expressions in an effort to facilitate and support the
collaborative nature of our survival instinct - anticipation and approximation of
potential outcomes all based on our ability to read each other’s facial
expressions. [Essa and Pentland 1997].
[0036] Though great progress has been made within the entertainment
community to replicate the performance of a face in the real world, it has not
come without cost financially in the form of skilled artists to nuance and mold
traditionally captured, and inaccurate data into the intended final output. A long
and inconsistent process once again relying on a group of artists to uniformly
interpret the intended final outcome.
[0037] The process I have developed and documented herein is fairly simple
in its approach to accurately acquire and animate a 3D model of a subject’s face
- a markerless process which removes as many of the variables relying on any
form of interpretation involving human or faulty software as much as possible.
[0038] Oftentimes, the appearance and movement of facial skin has been
inaccurately represented by a hierarchy of complex interactions of skin
components based on their geometric scale and optical properties. The latter
being the least important aspect increasing render time of simulations while
drawing attention away from the more important aspects of recreating accurate 1
to 1 representations of the 3D model’s real world counterpart. [Igarashi et
al.2005].
Page | 5
6. First Named Inventor: Zebediah Ysable De Soto; Docket DeSoto101P
Pat. Atty. Steven A. Nielsen, Reg. No. 54,699 Customer No. 45004
[0039] Performance is timing. Take for example the animated feature film
“Beowulf” where the actor John Malkovich portrayed the role of “Unferth”. To
accomplish this task, a highly detailed model was created with his likeness.
Markers were then applied to his face to drive the animation of his 3D character’s
face. Unfortunately such a process is wrought with errors in the capture
procedure and limitations in terms of spatial capture space, all due to the size of
the markers in relation to the face and the software’s ability to accurately track
the movement of all those markers. To rectify these issues, highly trained and
expensive artists are brought in to interpret the captured data and sent off to
polish and refine the animation by hand. Unfortunately this a process of many
animators (not actors) charged with the responsibility of pantomiming the
complex nuances and timings of Malkovich’s performance.
[0040] With my process we now leave the acting to the actors.
[0041] Scale being measured from the smallest details of pores to more
obvious features such as eyes, nose, etc…
[0042] Within that scale we have those aforementioned complex interactions
with unseen interactions taking place below the surface such as muscle tissue
contracting and expanding to create expressions on the surface skin [Wu et
al.1996].
[0043] To accurately characterize facial motion, it must be measured at
multiple time scales - this being one of the most critical aspects in capturing the
nuance of performance and facial tissue detail unique to each actor.
[0044] I prefer to break these down into three (3) basic tiers of movement:
[0045] Tier I: Movement involving small scale facial features such as pores
and fine line wrinkles which occur within fractions of a second and are usually not
(consciously) detected by the human eye.
Page | 6
7. First Named Inventor: Zebediah Ysable De Soto; Docket DeSoto101P
Pat. Atty. Steven A. Nielsen, Reg. No. 54,699 Customer No. 45004
[0046] Tier II: Movement which occurs within a one- to two-second time
period, involving much larger facial landmarks such as the eyes, nose, and
mouth, which are much easier to perceive visually.
[0047] Tier III: Involving the largest scale of movement which is classified as
movement of the human jaw and head as a whole.
[0048] Herein I shall present a three-dimensional face model that can
precisely represent the different facial landmarks and three tier motion scales
that are necessary for the animation of a photo-realistic dynamic face mesh.
[0049] A principal design element of our model has been to remove any form
of interpretation of these specific element details and allow the natural nuances
to play through unfiltered.
[0050] Illustration space
[0051] Please refer to illustration in Figure 1 for the visual breakdown of our
concept. The first step in our process is to acquire a static low- or high-resolution
model of our subject’s head. We then place our custom-made facial capture
helmet onto our actor’s head. Then, we record the actor’s performance. From the
video data, we begin processing the footage into its core Tier I and Tier II
animated components which are applied to our actor’s 3D head mesh. From
there, our head is processed through its Tier III animated component and is
ready for final rendering.
[0052] The final end result is a highly detailed 3D animated head mesh,
complete with all the individual nuances of an actor’s performance minus the
intercession of an expert 3D animator to refine the animation.
[0053] 2 Related Work
[0054] Facial Animation and Model Acquisition have become a major focus
within the motion graphics and computer vision community [Noh and Neumann
1999].
Page | 7
8. First Named Inventor: Zebediah Ysable De Soto; Docket DeSoto101P
Pat. Atty. Steven A. Nielsen, Reg. No. 54,699 Customer No. 45004
[0055] Within this section we explore the related work within these fields.
[0056] Marker-Based Motion Capture. The concept of marker based facial
capture data being used to drive the animation of a 3D head mesh dates back to
[Williams 1990]. Though many companies large and small now populate the
market, one company towers above the rest of the field; Vicon. Vicon’s marker-
based facial-capture system, along with many others, are capable of acquiring
data with incredible temporal resolution (up to 450 Hz). But due to the fact that
their process is marker based and there for limited to 200 markers maximum on
average. They are not capable of or even designed to capture data such as skin
pore deformation.
[0057] Structured Light Systems are amazing systems capable of fairly decent
real time facial capture. [Zhang et al.2004] Not only do they capture motion, they
also capture color as well as geometry. Through utilization of optical flow,
acquired depth maps are applied to a mesh template. [Wang et al.
2004]Unfortunately static face scans are vastly superior to these Structured Light
System scans in terms of overall resolution [Borshukov and Lewis 2003; Sifakis
et al. 2005] Due to the intense process involved with Structured Light System the
marker based system of animation is unequivocally faster.
[0058] Model-Based Animation from Video. Great strides have been taken in
the development of fixing deformable 3D face model to video (e.g., [Li et al.
1993; Essa et al. 1996; DeCarlo and Metaxas 1996; Pighin et al. 1999]).Through
use of linear [Blanz et al. 2003] and multi-linear [Vlasic et al. 2005] morphable
models the resulting quality of motion and geometry is below the standard
typically found with motion captured data from other more standard methods.
[0059] Image-Based Methods with 3D Geometry [Guenter et al. 1998]
[0060] and [Borshukov et al. 2003] Utilizing a deformable face model which is
fitted with a texture map created from multiple videos sources. [Wengeret al.
2005] Using a projector in combination with a high speed camera [Jones et al.
2006] use the USC Light Stage to produce one of the most photo-realistic
Page | 8
9. First Named Inventor: Zebediah Ysable De Soto; Docket DeSoto101P
Pat. Atty. Steven A. Nielsen, Reg. No. 54,699 Customer No. 45004
examples of facial capture through the acquisition of 3D geometry of a face and
reflectance field. While the rig runs, the cameras record an image of the person
under every possible direction of illumination. They then can recombine this set
of images to generate a rendering of what the person would look like under any
complex form of illumination. Unfortunately, this is another example of a system
regulated by capture parameters which limit the use and constrain the actor’s
performance to a very limited and specific performance space and is an
extremely labor intensive and expensive process(i.e. USC Light Stage).
[0061] Anatomical Face Models Anatomical Models [Sifakis et al. 2005]. A
marker-based motion capture procedure which (utilizes very few markers to)
generate data to drive the movement for a prefabricated template deformable
model of a simulated bone and muscle system which, in turn, drives the facial
movements of a scanned actor’s face which has been fitted to this model.
[0062] High-quality Passive Facial Performance Capture Using Anchor
Frames [T. Beeler et al. 2011 ] is a new technique for passive and markerless
facial performance capture based on anchor frames. This method starts with
high-resolution per-frame geometry acquisition using state-of-the-art stereo
reconstruction and proceeds to establish a single triangle mesh that is
propagated through the entire performance. Leveraging the fact that facial
performances often contain repetitive subsequences, they identify anchor frames
as those which contain similar facial expressions to a manually chosen reference
expression. Anchor frames are automatically computed over one or even multiple
performances. They introduce a robust image-space tracking method that
computes pixel matches directly from the reference frame to all anchor frames,
and subsequently to the remaining frames in the sequence via sequential
matching. This allows them to propagate one reconstructed frame to an entire
sequence in parallel, in contrast to previous sequential methods. Their anchored
reconstruction approach also limits tracker drift and robustly handles occlusions
and motion blur. The parallel tracking and mesh propagation offer low
Page | 9
10. First Named Inventor: Zebediah Ysable De Soto; Docket DeSoto101P
Pat. Atty. Steven A. Nielsen, Reg. No. 54,699 Customer No. 45004
computation times. Their technique will even automatically match anchor frames
across different sequences captured on different occasions, propagating a single
mesh to all performances. Unfortunately this is another example of a system
regulated by capture parameters which limit the use and constrain the actors
performance to a very limited and specific performance space.
[0063] 3 The acquisition of data and its processing time.
[0064] The static, low or high-resolution face mesh is acquired using Singular
Inversions commercial software FACEGEN which specializes in statistical
modeling of the shape and appearance of human faces. By integrating our own
polygonal mesh and UV layout into FaceGen we are able to quickly mass
produce multiple models for multiple actors at our facial capture process
specifications.
[0065] By taking a series of photographs of front, left, and right profiles of a
subjects’ head the program is able to approximate a digital duplicate of the
performers head shape and facial features.
[0066] With my process we now leave the acting to the actors in the most
appropriate setting possible.
[0067] The stage.
[0068] From there, we then place our custom-built facial capture helmet on
our performer and begin recording the actor at any preferred progressive frame
rate as they navigate a motion capture stage to freely express themselves in the
most natural way they’ve grown accustomed to.
[0069] This Stage aspect of the facial capture is important since it generates
appropriate eye-lines for the facial performance since it is captured in tandem
with the body performance, limiting the need for any real performance editing of
the face.
[0070] The facial video data is then collected and taken into our composting
package for processing of our Tier I and Tier II animation.
Page | 10
11. First Named Inventor: Zebediah Ysable De Soto; Docket DeSoto101P
Pat. Atty. Steven A. Nielsen, Reg. No. 54,699 Customer No. 45004
[0071] Fine detail animation of Tier I and general animation of Tier II is
accomplished through a simple automated process of converting and rendering
the collected facial capture into element (1) RGB jpeg sequential image
sequence. (2) Normal map sequential image sequence. (3) RGB Alpha channel
sequential image sequences. RGB Alpha channel supplies us with an alpha
silhouette isolating the facial landmarks of the inside of the mouth and eyes
separately.
[0072] Tier I and Tier II processing time takes approximately fifteen minutes
setup and about 30 minutes rendering for Tier I and Tier II’s three elements for
five-minute facial captured sequence at 24 fps1280 by 720 resolution on a
computer running Intel Core 2 duo CPU p8700 2 2.53GHz @.53GHz, 4g ram ,
and a NVIDIA GeForce 9800M GTS. Render times for this phase of the
animation as well as Tier III can be drastically reduced to a few minutes with
hardware upgrades.
[0073] Model is then imported along with two main facial morph targets into
application of choice for Tier III animation. Elements for Tier I and Tier II
animation elements are applied to 3D mesh via frontal virtual projection camera
onto face.
[0074] Lens settings of virtual camera and physical position and orientation in
space are matched one to one to real-world settings of actual facial capture
camera as it sat in relation to the actors face. Image details are projected back
on to digital actor model. Tier III up and down jaw movement is easily key framed
by hand using two morph targets.
[0075] Target 1 for wide open vertical mouth (PHENOM position for “Aah”) on
Y axis.
[0076] Target 2 for wide open horizontal mouth (PHENOM position for “Eee”)
on X axis.
[0077] These two positions provide depth for inner mouth.
Page | 11
12. First Named Inventor: Zebediah Ysable De Soto; Docket DeSoto101P
Pat. Atty. Steven A. Nielsen, Reg. No. 54,699 Customer No. 45004
[0078] A full five-minute sequence is animated within an hour and a half
requiring artist to operate only two dials. A process so simple most children over
the age of 12 can operate these functions with no background in the process.
[0079] Our first Beta version of software package is set for release May 2013
and will fully automate the entire process eliminating the already limited need for
any 3rd party applications during Tier animation phase
[0080] Though the fact that this process already rivals and or surpasses the
quality of more expensive labor intensive process currently employed within the
industry at a fraction of the cost at a fraction of the time should be regarded as
more than noteworthy; the fact that we are able to capture simultaneously on a
stage with a moving actor is miraculous and solves many shortfalls in the prior
art.
Page | 12