2. Topics to Cover:
• Facial Animation Parameters(FAP)
• Facial Definition Parameters(FDP)
• Face Model
• Coding of FAP’s
• Integration of Face Animation and Text to
Speech(TTS) synthesis.
• (Binary Format Scene)BIFS for Facial
Animation.
3. • What is (Facial Animation Parameters)FAP?
It is based on the minimal perceptual
actions of human beings,such as
expressions,emotions etc..and are closely
related to the muscle actions.
• What is (Facial Definition Parameters)FDP?
It allows the user to configure the 3D
facial model to be used at the receiver.(either
sending the previously sent model or
introducing a fresh model)
5. Face Model:
• Every MPEG-4 terminal that is able to decode FAP
streams has to provide a face model for
animation.
• This model is proprietary to decoder itself.
• The encoder does not know about the look of the
face model.
• Using an FDP node MPEG-4 allows the encoder to
specify completely the face model to animate.
• The FDP node can also be used to calibrate the
proprietary model of the decoder.
7. • The decoder may choose to specify the location of all
or some feature points.
• After specifying the feature points, the decoder can
adapt its own proprietary face model such that the
model conforms to the feature point positions.
• Face model adaptation also allows for the
downloading of texture maps for the face.
• Each feature point has a different texture map
• In order to specify the mapping of the texture map
onto the face model,the encoder sends texture
co-ordinates for each feature point.
8. • Encoder specific.
• The process of adapting the feature point locations of
the face model according to encoder specifications is
referred to as Face Model Calibration.
• Sometimes also called as Face Model Adaptation.
9.
10. Simplified scene graph for a head model.
Root Group
Head Transform X
Head Transform Y
Head Transform Z
Left Eye Right Eye
Face Hair Tongue Teeth
Transform X Transform X
Left Eye Right Eye
Transform Y Transform Y
Left Eye Right Eye
11. • A root node is a collection of objects.
• For the objects to move together in a group, they
need to be in the same transform group.
• When the transform nodes contain different
transforms, the information setting has a
cumulative effect.
• The transform node defines geometric 3D
transformations such as scaling,rotation etc.
• Indexed Face Set is used to define the geometry
and the surface attributes (color and texture) of
the object.
• The rotations for the left eye and right eye are
also embedded in this.
12. Coding Of (Facial Animation
Parameters)FAP’s:
• Tools used for coding:
1) Arithmetic encoder(low delay)
2)DCT coding technique (high delay)
15. • The first set of FAP values , FAP(0) is coded without
prediction.(At time instant zero)
• The value of a FAP at time instant k i.e FAP(k) is
predicted using the previous encoded value FAP(k-1)
• e` is quantized using the step size QP multiplied by a
quantization parameter FAP_QUANT.
• 0< FAP_QUANT<31
• The quantized prediction error e` is arithmetically
encoded using a separate adaptive probability model
for each FAP.
• FAP_QUANT>15,is usually not used because the quality
of the animation gets reduced.
• At the decoder,the received data is arithmetically
decoded,dequantized and added to the previously
decoded value.
16. DCT:
• Applied to 16 consecutive FAP values.
• Hence,it introduces a significant delay in the coding and
decoding processes.
• After computing the DCT of 16 consecutive values of one
FAP,DC and AC coefficients are coded seperately.
• DC coefficients use the prediction method
• AC coefficients are directly coded.
• Both AC and DC coefficients are quantized seperately.
• The quantized coefficients are encoded with one VLC word
defining the number of zero co-efficients,prior to next
non-zero coefficients and another VLC for the amplitude of
this non zero coefficient.
19. Integration of Face Animation and Text to
Speech(TTS) synthesis
• Syncronization of a FAP stream with TTS
synthesizers using the TTSI(TTS interface) is
only possible if the encoder sends the timing
information.
• This is because,a conventional TTS is an
asynchronous source.
• Decoder:Decodes the text and passes it to the
proprietary speech synthesizer.
20. • SYNTHESIZER:Creates speech samples that are
handed to the compositor.
• COMPOSITOR:Provides audio or video output to
the user.
• The second output interface of the synthesizer
sends the phonemes of the synthesized speech as
well as the start time and duration information of
each phoneme to FAP converter.
• The converter translates the phonemes and
timing information into FAP’s so that the face
renderer can use in order to animate the face
model.
21. • Bookmarks in the text of TTS is used to
animate facial expressions.
• When the TTS finds the bookmarks in the
text,it sends it to FAP converter.
• FAP converter transforms the phonemes into
visemes and timing information into the FAP’s.
• When the TTS finds the bookmark in the
text,it sends this bookmark to the FAP
converter.
• The bookmark defines the start point and
duration of transition to FAP amplitude.
22.
23. Integration with MPEG-4 Systems:(BIFS)
• To use face animation in MPEG-4 systems,a BIFS scene
graph has to be transmitted.
• Minimum scene graph should contain a face node and FAP
node.
• The nodes of FAP’s may be the high level FAP’s such as
visemes and expressions.
• The scene graph would enable the encoder to animate the
proprietary face model of the decoder.
• In order to download a face model to the decoder,it
requires a FDP node.
• A FDP node is further divided into its children,viz Face
definition table(Fdef),Face Definition Mesh(FDM),Face
Definition Transform(FDT).
24. Nodes of the BIFS scene that are used to describe and animate a face