SlideShare une entreprise Scribd logo
1  sur  34
Télécharger pour lire hors ligne
Sound Texture: Wavelet Tree
Learning and Tiling and Stitching
            Antonio De Sena and Pietro Polotti
       desena@sci.univr.it, polotti@sci.univr.it


                      `
             Universita degli Studi di Verona




                                                            `
                                                   Universita degli Studi di Verona – p. 1/34
Goals
 Illustrating different definitions for sound textures
 proposed by different authors.
 Present the basic ideas of two different approaches
 for analyzing and synthesizing sound textures.
 Stimulating from the audience a proposal of
 definition/classification of audio/sound texture.




                                                       `
                                              Universita degli Studi di Verona – p. 2/34
Definitions       (1)


 Definition by Dubnov et al., Hebrew University,
 Jerusalem, Israel (2002) [1]:
    “We can describe sound textures as a set of
   repeating structural elements (sound grains) subject
   to some randomness in their time appearance and
   relative ordering but preserving certain essential
   temporal coherence and across-scale localization.”
   Ex: “. . . natural and artificial sounds such as rain, a
   waterfall, traffic noises, people babble, machine
   noises, and so on.”
   Fundamental assumption: “. . . the sound signals are
   approximately stationary at some scale.”
   Comment: this is according to a precise analytical
   tool.
                                                        `
                                               Universita degli Studi di Verona – p. 3/34
Definitions       (2)


 Definition by Dubnov and Tishby, Hebrew
 University, Jerusalem, Israel (1997) [2]:
    “Sound texture can be considered as stationary
   acoustical phenomena that obtain their acoustical
   effects from internal variations in the sound
   structure.”

   Variations like:
    “. . . micro-fluctuations in the harmonics of a pitched
   sound or statistical properties of random excitation
   source in an acoustic system.”




                                                         `
                                                Universita degli Studi di Verona – p. 4/34
Definitions       (3)


 Definition by Parker and Behm, University of
 Calgary, Canada (2004) [3]:
    “A sound texture can be described as having a
   somewhat random character (?), but a recognizable
   quality. Any small (?) sample of a sound texture
   should sound very much like, but not identical to, any
   other small sample.”
   Comment: This is very qualitative.
 Definition by Norris and Denham, University of
 Plymouth, (2003) [4]:
   “A sound texture may be loosely defined as a sound
   which may have some local structure, but has no
   perceptually obvious long-term structure.”
   Comment: This is rather vague.
                                                        `
                                               Universita degli Studi di Verona – p. 5/34
Definitions       (4)


 Definition by Athineos and Ellis, Columbia
 University, U.S.A. (2003) [5]:
    “. . . we look at a third class of sounds we call sound
   textures that are distinct from speech and music.”

    “. . . textures should have an undetermined extent
   (duration) with consistent properties (at some level),
   and be readily identifiable from a small (?) sample.”

   Comment: “. . . consistent properties”, is a bit vague.
   Comment: “. . . identifiable from a small sample”
   seems to be a perceptual criterion (?).
   They consider the existence of a global structure in
   time.
                                                          `
                                                 Universita degli Studi di Verona – p. 6/34
Two different approaches
  Creating Sound Textures by Example: Tiling and
  Stitching.
    Starting from image processing methods (tiling and
    stitching) the Parker and Behm [1] developed a new
    method for creating sound textures.
  Creating Sound Textures through Wavelet Tree
  Learning.
    Starting from an image processing method
    developed in [4], Dubnov et al. extend this method to
    the case of audio signal for the creation of sound
    textures.



                                                        `
                                               Universita degli Studi di Verona – p. 7/34
Creating Audio Texture by Example: Tiling and Stitching (1)

    Definition by Parker and Behm, University of
    Calgary, Canada (2004) [3]:
        “A sound texture can be described as having a
       somewhat random character, but a recognizable
       quality. Any small sample of a sound texture should
       sound very much like, but not identical to, any other
       small sample.”
       Comment: This is very qualitative.
       Examples: waterfall, rain, traffic noises . . .
       For every chunk, the frequency distribution should
       not change, nor should any rhythmical pattern or
       timbre characterization.



                                                             `
                                                    Universita degli Studi di Verona – p. 8/34
Creating Audio Texture by Example: Tiling and Stitching (2)

       Tiling and Stitching based methods.
           Image Quilting (image processing).
              Square sample blocks with fixed size.
              Overlap between adjacent blocks.
              Select blocks that have some significant measure of
              agreement between them a .
              Smoothing edges for reducing “mosaic” effect a .

  a
      No more information provided.




                                                                      `
                                                             Universita degli Studi di Verona – p. 9/34
Creating Audio Texture by Example: Tiling and Stitching (3)

    Tiling and Stitching based methods.
       Chaos Mosaic (image processing).
         Start with only one block.
         Image will be created copying the block (tiling) to fill the
         requested size.
         A chaos transformation need to be applied;
         es: Arnold’s Cat Map:

                          xl+l = (xl+l + y l+l ) mod m
                          y l+l = (xl+l + 2y l+l ) mod m

         This transformation maps the output image onto itself.
         Where image size is m × m, and the iteration number is l.
         Applied to blocks of pixels (and not on single pixel, to
         preserve local features).
         Smoothing edges (or fade) for reducing “mosaic” effect.

                                                                     `
                                                            Universita degli Studi di Verona – p. 10/34
Creating Audio Texture by Example: Tiling and Stitching (4)

    Stitching based methods: generation from a sound
    texture sample.
       The sound texture need to be separated in blocks of
       equal duration.
       Using this blocks a bigger sample can be created.
         A least square measure is used to find blocks whose head
         (first 15%) is similar to the tail (last 15%) of the previous one.
         Blocks is chosen using a LRU (Least Recently Used)
         algorithm (in combination with least square measure) to
         “forcing” the procedure to pick up all the chunks.
         Chunks are cross-faded (15%).




                                                                      `
                                                             Universita degli Studi di Verona – p. 11/34
Creating Audio Texture by Example: Tiling and Stitching (5)

       Chunk size can be determinate using amplitude
       peaks. The entire source sample is analyzed for
       RMS amplitude and peaks in amplitude with more
       than 1.5 standard deviation from the baseline are
       recorded. The mean and standard deviation of the
       observed distances between these peaks is used to
       generate the size of each chunk.
       Hopefully, then, each chunk will contain one “feature”
       that a listener can recognize.




                                                             `
                                                    Universita degli Studi di Verona – p. 12/34
Creating Audio Texture by Example: Tiling and Stitching (6)

    Tiling based methods: (chaos mosaic) generation
    from a sound texture sample.
       Make a matrix with row exactly large enough to hold
       one period at the “dominant” (?) frequency, or an
       integer number of periods.
       Fill the matrix with the sample (row by row).
       Partition it in rectangular regions. Width of these
       regions is computed using the dominant frequency
       (ex: width= n · Fd , with n 150).
       The corner of the regions are randomly moved using
       a normal function (with d = 15%) of the box size.




                                                             `
                                                    Universita degli Studi di Verona – p. 13/34
Creating Audio Texture by Example: Tiling and Stitching (7)

       Arnold’s Cat Map is applied with blocks one half
       smaller to create a background.
         The background is necessary because the next step can
         leaves “holes” in the wave.
       Arnold’s Cat Map is applied at normal size blocks
       (overlap without add at the background).




                                                               `
                                                      Universita degli Studi di Verona – p. 14/34
Creating Audio Texture by Example: Tiling and Stitching (8)

    Comments.
       Idea: handicraft work.
       Two ideas readapted from image processing.
       Textures: not bad, there are some problem like
       rhythmical patterns, time-envelope problems,
       repetitions.




                                                             `
                                                    Universita degli Studi di Verona – p. 15/34
Creating Audio Texture by Example: Tiling and Stitching (9)

    Appendix: Synthesis with Gaussian Pyramid.
       Again, an idea taken from image processing.
       A wavelet-like pyramid (MRA tree) done with a
       gaussian filter (lowpass) and the difference between
       original and filtered signal (details, bandpass).
       No full description available. Only a single page brief
       explanation available on
       http://pages.cpsc.ucalgary.ca/
       ~parker/gamesresearch/tsketch-texture.pdf .




                                                              `
                                                     Universita degli Studi di Verona – p. 16/34
Sounds examples and comments

   Creating Audio Texture by Example: Tiling and
   Stitching.
     Crowd (audience): macro-evident repetitiveness (examples
     sound as juxtaposed reiterated patterns).
     Time envelope problems: “volume discontinuity”.
     Fire: less macro-evident repetitiveness (sound examples of
     juxtaposed repeated patterns).
     No time envelope problems.
     Water: the “chaos” example is the best.
     Other examples: time-envelope problems feeling of “volume
     discontinuities”.
     In general it is the least repetition-like.
     Surf and gulls: Block copy, obtained with small windows, thus
     less repetition of temporal (almost rhythmical) patterns.



                                                               `
                                                      Universita degli Studi di Verona – p. 17/34
Synthesizing Sound Textures through Wavelet Tree Learning (1)


    Definition by Dubnov et al., Hebrew University,
    Jerusalem, Israel [1]:
        “We can describe sound textures as a set of
       repeating structural elements (sound grains) subject
       to some randomness in their time appearance and
       relative ordering but preserving certain essential
       temporal coherence and across-scale localization.”
       Ex: “. . . natural and artificial sounds such as rain, a
       waterfall, traffic noises, people babble, machine
       noises, and so on.”
       Fundamental assumption: “. . . the sound signals are
       approximately stationary at some scale” (?).
       Comment: this is according to a precise analytical
       tool.
                                                                `
                                                       Universita degli Studi di Verona – p. 18/34
Synthesizing Sound Textures through Wavelet Tree Learning (2)


    Gabor theory: sound is perceived as a series of
    short discrete burst of energy.
    A further assumption: in a sound texture, a
    statistical characterization of the joint
    time-frequency and/or time-scale relations is
    possible.




                                                                `
                                                       Universita degli Studi di Verona – p. 19/34
Synthesizing Sound Textures through Wavelet Tree Learning (3)


    Original idea developed for image (2D) and video
    (3D) textures [6].
       Examples on next slides extracted from:
       http://www.cs.huji.ac.il/labs/cglab/papers/texsyn/2dtexsyn/

    The audio (1D) is an adaptation of the original
    studies.
    More works to do in order to avoid silence gaps, too
    much similar portions, . . .




                                                                 `
                                                        Universita degli Studi di Verona – p. 20/34
Synthesizing Sound Textures through Wavelet Tree Learning (4)




Original texture, same size synthesized texture, 4 times larger synthesized texture.




                                                                                       `
                                                                              Universita degli Studi di Verona – p. 21/34
Synthesizing Sound Textures through Wavelet Tree Learning (5)




Original texture, same size synthesized texture, 4 times larger synthesized texture.




                                                                                       `
                                                                              Universita degli Studi di Verona – p. 22/34
Synthesizing Sound Textures through Wavelet Tree Learning (6)


    Statistical Learning.
       Estimating the stochastic source with a training
       example (a “sample” of the source).
       El-Yaniv algorithm: generate new random
       sequences that could have been generated from the
       source of the sample.
          The new sequences are generated by synthetic wavelet
          coefficients.
          The wavelet coefficients are obtained by following some
          statistically constrained paths in the analysis wavelet tree.




                                                                      `
                                                             Universita degli Studi di Verona – p. 23/34
Synthesizing Sound Textures through Wavelet Tree Learning (7)


    Wavelet MRA Tree.
       Using a Daubechies wavelet an analysis tree is built.
          The Daubechies has been chosen because
          “this wavelet has several superior properties compared to
          other orthonormal wavelets, especially with respect to
          translation and rotation invariance, aliasing, and robustness
          due to its nonorthogonality and redundancy” (?).
       Each MRA tree node stores the coefficients of the
       Daubechies Wavelet at a specific scale.




                                                                    `
                                                           Universita degli Studi di Verona – p. 24/34
Synthesizing Sound Textures through Wavelet Tree Learning (8)


    Learning.
       Each coefficient depends on its scale ancestor
       (upper level) and temporal predecessor (those to its
       left).
       Using an algorithm by El-Yaniv [1], the conditional
       probability along the tree path (scale) can be learnt.
       A second learn is done using the neighboring node
       (time) for preserving time structure.




                                                                `
                                                       Universita degli Studi di Verona – p. 25/34
Synthesizing Sound Textures through Wavelet Tree Learning (9)


    Synthesizing.
       Thus, the signal can be viewed as a collection of
       paths from the root of the tree toward the leaves.
       The goal is to generate new tree whose paths are
       typical sequences generated by the same source, by
       creating new (candidate) nodes (children) for a node
       vi .
          First the algorithm copy the root and the nodes of the level 1
          in the new tree.
          Now let’s assume that we have already generated the first i
          levels of the new tree. To generate the next level we must
          add two children nodes to each node v i in level i.
          The algorithm search among all nodes at i-th level of the
          tree for nodes wi with maximal-length ε-similar (El-Yaniv, ε is
          a user threshold) path suffixes w i−1 , wi−2 . . . wj .

                                                                     `
                                                            Universita degli Studi di Verona – p. 26/34
Synthesizing Sound Textures through Wavelet Tree Learning (10)


    Synthesizing.
          Among these candidate the algorithm look for those nodes
          whose kth (k is a user parameter) predecessor (the nodes
          on the left in the same level) resemble those of v i children.
          The algorithm then randomly chooses a candidate and
          copies the values to the node v i .




                                                                      `
                                                             Universita degli Studi di Verona – p. 27/34
Synthesizing Sound Textures through Wavelet Tree Learning (11)


    Comments.
       Theoretical approach: very interesting mathematical
       background.
       Experimental results: silence gap and pattern
       repetitions.
       Results with images are better, but probably because
       image perception is different from audio perception.




                                                                `
                                                       Universita degli Studi di Verona – p. 28/34
Sounds examples

   Synthesizing Sound Textures through Wavelet Tree
   Learning.
     Baby crying.
     Shores.
     Traffic jam.
     Their textures have a strong rhythmical or temporal articulation.
     All the examples show the same problems: The macro-tiles
     seems to be generated by the same set of “randomly” chosen
     coefficients, resulting in unnatural-sounding repetitions.




                                                                  `
                                                         Universita degli Studi di Verona – p. 29/34
Sound Texture Modelling with CFTLP (1)

   Definition by Athineos and Ellis, Columbia
   University, U.S.A. [5]:
       “. . . we look at a third class of sounds we call sound
      textures that are distinct from speech and music.”

       “. . . textures should have an undetermined extent
      (duration) with consistent properties (at some level),
      and be readily identifiable from a small sample.”




                                                            `
                                                   Universita degli Studi di Verona – p. 30/34
Sound Texture Modelling with CFTLP (2)

      Idea: to model texture as rapidly-modulated noise
      by using two linear predictors in cascadea .
          The first, operating in the time domain, is a normal
          LPC analysis and captures the spectral envelope.
          The second, in the frequency domain (operating on
          the residual of the previous LPC analysis), captures
          the time envelope, i.e. the time structure.
      Textures can be synthesized using a filtered
      Gaussian noise, which feed the cascade of filters
      whose coefficients where obtained by the analysis
      of the original texture sample.

 a
     A quite identical idea can be found on [7].


                                                              `
                                                     Universita degli Studi di Verona – p. 31/34
Sound Texture Modelling with CFTLP (3)




CTFLP analysis (up) and synthesis (down) block diagrams.




                                                                    `
                                                           Universita degli Studi di Verona – p. 32/34
References             (1)


 [1] Dubnov, S.; Bar-Joseph, Z.; El-Yaniv, R.; Lischinski, D.; Werman,
 M.;: Synthesizing sound textures through wavelet tree learning.,
 Computer Graphics and Applications, IEEE , Volume: 22 , Issue: 4,
 pp. 38-48, (July-Aug. 2002).
 [2] Dubnov, S.; Tishby, N.;: Analysis of sound textures in musical and
 machine sounds by means of higher order statistical features.,
 Acoustics, Speech, and Signal Processing, 1997. ICASSP-97., 1997
 IEEE International Conference on , Volume: 5, pp. 3845-3848,
 (21-24 April 1997).
 [3] Parker, J.R.; Behm, B.;: Creating audio textures by example: tiling
 and stitching., Acoustics, Speech, and Signal Processing, 2004.
 Proceedings. (ICASSP ’04). IEEE International Conference on ,
 pp:iv-317 - iv-320 vol.4, (17-21 May 2004).



                                                                    `
                                                           Universita degli Studi di Verona – p. 33/34
References             (2)


 [4] Michael Norris; Sue Denham;: Sound texture detection using Self
 Organizing Maps., Centre for Theoretical and Computation
 Neuroscience, University of Plymounth, UK, (Nov 2003).
 [5] Athineos, M.; Ellis, D.P.W.;: Sound texture modelling with linear
 prediction in both time and frequency domains., Acoustics, Speech,
 and Signal Processing, 2003. Proceedings. (ICASSP ’03). 2003
 IEEE International Conference on , Volume: 5, pp. 648-51, (6-10
 April 2003).
 [6] Z. Bar-Joseph et al.;: Texture Mixing and Texture Movie Synthesis
 Using Statistical Learning., IEEE Trans. Visualization and Computer
 Graphics, vol. 7, no. 2, pp. 120-135, (Apr.-Jun. 2001).
 [7] Zhu, X.L.; Wyse, L.;: Sound texture modeling and time-frequency
 LPC., Proceedings of the Conf. on Digital Audio Effects (DAFX-04),
 Napels, Italy, (5-8 October 2004).

                                                                    `
                                                           Universita degli Studi di Verona – p. 34/34

Contenu connexe

Similaire à St Slides

Doctoal Thesis Matthieu Hodgkinson
Doctoal Thesis Matthieu HodgkinsonDoctoal Thesis Matthieu Hodgkinson
Doctoal Thesis Matthieu Hodgkinson
Matthieu Hodgkinson
 
Wave nature of_light
Wave nature of_lightWave nature of_light
Wave nature of_light
brihan123
 
Why Do Instruments Have Its Own Unique Sound
Why Do Instruments Have Its Own Unique SoundWhy Do Instruments Have Its Own Unique Sound
Why Do Instruments Have Its Own Unique Sound
Nicole Jones
 

Similaire à St Slides (12)

Tervo: Sensory Dissonance Models
Tervo: Sensory Dissonance ModelsTervo: Sensory Dissonance Models
Tervo: Sensory Dissonance Models
 
Wave_nature_of_light.ppt
Wave_nature_of_light.pptWave_nature_of_light.ppt
Wave_nature_of_light.ppt
 
Doctoal Thesis Matthieu Hodgkinson
Doctoal Thesis Matthieu HodgkinsonDoctoal Thesis Matthieu Hodgkinson
Doctoal Thesis Matthieu Hodgkinson
 
Wave nature of_light
Wave nature of_lightWave nature of_light
Wave nature of_light
 
E media seminar 20_12_2017_artificial_reverberation
E media seminar 20_12_2017_artificial_reverberationE media seminar 20_12_2017_artificial_reverberation
E media seminar 20_12_2017_artificial_reverberation
 
Sound as Art
Sound as ArtSound as Art
Sound as Art
 
Wave nature of_light
Wave nature of_lightWave nature of_light
Wave nature of_light
 
Why Do Instruments Have Its Own Unique Sound
Why Do Instruments Have Its Own Unique SoundWhy Do Instruments Have Its Own Unique Sound
Why Do Instruments Have Its Own Unique Sound
 
Introduction to Audiography/Sound
Introduction to Audiography/SoundIntroduction to Audiography/Sound
Introduction to Audiography/Sound
 
Investigation of the Effect of Obstacle Placed Near the Human Glottis on the ...
Investigation of the Effect of Obstacle Placed Near the Human Glottis on the ...Investigation of the Effect of Obstacle Placed Near the Human Glottis on the ...
Investigation of the Effect of Obstacle Placed Near the Human Glottis on the ...
 
3 D Sound
3 D Sound3 D Sound
3 D Sound
 
A Fresh Look at Seismic Thin-bed Mapping
A Fresh Look at Seismic Thin-bed MappingA Fresh Look at Seismic Thin-bed Mapping
A Fresh Look at Seismic Thin-bed Mapping
 

St Slides

  • 1. Sound Texture: Wavelet Tree Learning and Tiling and Stitching Antonio De Sena and Pietro Polotti desena@sci.univr.it, polotti@sci.univr.it ` Universita degli Studi di Verona ` Universita degli Studi di Verona – p. 1/34
  • 2. Goals Illustrating different definitions for sound textures proposed by different authors. Present the basic ideas of two different approaches for analyzing and synthesizing sound textures. Stimulating from the audience a proposal of definition/classification of audio/sound texture. ` Universita degli Studi di Verona – p. 2/34
  • 3. Definitions (1) Definition by Dubnov et al., Hebrew University, Jerusalem, Israel (2002) [1]: “We can describe sound textures as a set of repeating structural elements (sound grains) subject to some randomness in their time appearance and relative ordering but preserving certain essential temporal coherence and across-scale localization.” Ex: “. . . natural and artificial sounds such as rain, a waterfall, traffic noises, people babble, machine noises, and so on.” Fundamental assumption: “. . . the sound signals are approximately stationary at some scale.” Comment: this is according to a precise analytical tool. ` Universita degli Studi di Verona – p. 3/34
  • 4. Definitions (2) Definition by Dubnov and Tishby, Hebrew University, Jerusalem, Israel (1997) [2]: “Sound texture can be considered as stationary acoustical phenomena that obtain their acoustical effects from internal variations in the sound structure.” Variations like: “. . . micro-fluctuations in the harmonics of a pitched sound or statistical properties of random excitation source in an acoustic system.” ` Universita degli Studi di Verona – p. 4/34
  • 5. Definitions (3) Definition by Parker and Behm, University of Calgary, Canada (2004) [3]: “A sound texture can be described as having a somewhat random character (?), but a recognizable quality. Any small (?) sample of a sound texture should sound very much like, but not identical to, any other small sample.” Comment: This is very qualitative. Definition by Norris and Denham, University of Plymouth, (2003) [4]: “A sound texture may be loosely defined as a sound which may have some local structure, but has no perceptually obvious long-term structure.” Comment: This is rather vague. ` Universita degli Studi di Verona – p. 5/34
  • 6. Definitions (4) Definition by Athineos and Ellis, Columbia University, U.S.A. (2003) [5]: “. . . we look at a third class of sounds we call sound textures that are distinct from speech and music.” “. . . textures should have an undetermined extent (duration) with consistent properties (at some level), and be readily identifiable from a small (?) sample.” Comment: “. . . consistent properties”, is a bit vague. Comment: “. . . identifiable from a small sample” seems to be a perceptual criterion (?). They consider the existence of a global structure in time. ` Universita degli Studi di Verona – p. 6/34
  • 7. Two different approaches Creating Sound Textures by Example: Tiling and Stitching. Starting from image processing methods (tiling and stitching) the Parker and Behm [1] developed a new method for creating sound textures. Creating Sound Textures through Wavelet Tree Learning. Starting from an image processing method developed in [4], Dubnov et al. extend this method to the case of audio signal for the creation of sound textures. ` Universita degli Studi di Verona – p. 7/34
  • 8. Creating Audio Texture by Example: Tiling and Stitching (1) Definition by Parker and Behm, University of Calgary, Canada (2004) [3]: “A sound texture can be described as having a somewhat random character, but a recognizable quality. Any small sample of a sound texture should sound very much like, but not identical to, any other small sample.” Comment: This is very qualitative. Examples: waterfall, rain, traffic noises . . . For every chunk, the frequency distribution should not change, nor should any rhythmical pattern or timbre characterization. ` Universita degli Studi di Verona – p. 8/34
  • 9. Creating Audio Texture by Example: Tiling and Stitching (2) Tiling and Stitching based methods. Image Quilting (image processing). Square sample blocks with fixed size. Overlap between adjacent blocks. Select blocks that have some significant measure of agreement between them a . Smoothing edges for reducing “mosaic” effect a . a No more information provided. ` Universita degli Studi di Verona – p. 9/34
  • 10. Creating Audio Texture by Example: Tiling and Stitching (3) Tiling and Stitching based methods. Chaos Mosaic (image processing). Start with only one block. Image will be created copying the block (tiling) to fill the requested size. A chaos transformation need to be applied; es: Arnold’s Cat Map: xl+l = (xl+l + y l+l ) mod m y l+l = (xl+l + 2y l+l ) mod m This transformation maps the output image onto itself. Where image size is m × m, and the iteration number is l. Applied to blocks of pixels (and not on single pixel, to preserve local features). Smoothing edges (or fade) for reducing “mosaic” effect. ` Universita degli Studi di Verona – p. 10/34
  • 11. Creating Audio Texture by Example: Tiling and Stitching (4) Stitching based methods: generation from a sound texture sample. The sound texture need to be separated in blocks of equal duration. Using this blocks a bigger sample can be created. A least square measure is used to find blocks whose head (first 15%) is similar to the tail (last 15%) of the previous one. Blocks is chosen using a LRU (Least Recently Used) algorithm (in combination with least square measure) to “forcing” the procedure to pick up all the chunks. Chunks are cross-faded (15%). ` Universita degli Studi di Verona – p. 11/34
  • 12. Creating Audio Texture by Example: Tiling and Stitching (5) Chunk size can be determinate using amplitude peaks. The entire source sample is analyzed for RMS amplitude and peaks in amplitude with more than 1.5 standard deviation from the baseline are recorded. The mean and standard deviation of the observed distances between these peaks is used to generate the size of each chunk. Hopefully, then, each chunk will contain one “feature” that a listener can recognize. ` Universita degli Studi di Verona – p. 12/34
  • 13. Creating Audio Texture by Example: Tiling and Stitching (6) Tiling based methods: (chaos mosaic) generation from a sound texture sample. Make a matrix with row exactly large enough to hold one period at the “dominant” (?) frequency, or an integer number of periods. Fill the matrix with the sample (row by row). Partition it in rectangular regions. Width of these regions is computed using the dominant frequency (ex: width= n · Fd , with n 150). The corner of the regions are randomly moved using a normal function (with d = 15%) of the box size. ` Universita degli Studi di Verona – p. 13/34
  • 14. Creating Audio Texture by Example: Tiling and Stitching (7) Arnold’s Cat Map is applied with blocks one half smaller to create a background. The background is necessary because the next step can leaves “holes” in the wave. Arnold’s Cat Map is applied at normal size blocks (overlap without add at the background). ` Universita degli Studi di Verona – p. 14/34
  • 15. Creating Audio Texture by Example: Tiling and Stitching (8) Comments. Idea: handicraft work. Two ideas readapted from image processing. Textures: not bad, there are some problem like rhythmical patterns, time-envelope problems, repetitions. ` Universita degli Studi di Verona – p. 15/34
  • 16. Creating Audio Texture by Example: Tiling and Stitching (9) Appendix: Synthesis with Gaussian Pyramid. Again, an idea taken from image processing. A wavelet-like pyramid (MRA tree) done with a gaussian filter (lowpass) and the difference between original and filtered signal (details, bandpass). No full description available. Only a single page brief explanation available on http://pages.cpsc.ucalgary.ca/ ~parker/gamesresearch/tsketch-texture.pdf . ` Universita degli Studi di Verona – p. 16/34
  • 17. Sounds examples and comments Creating Audio Texture by Example: Tiling and Stitching. Crowd (audience): macro-evident repetitiveness (examples sound as juxtaposed reiterated patterns). Time envelope problems: “volume discontinuity”. Fire: less macro-evident repetitiveness (sound examples of juxtaposed repeated patterns). No time envelope problems. Water: the “chaos” example is the best. Other examples: time-envelope problems feeling of “volume discontinuities”. In general it is the least repetition-like. Surf and gulls: Block copy, obtained with small windows, thus less repetition of temporal (almost rhythmical) patterns. ` Universita degli Studi di Verona – p. 17/34
  • 18. Synthesizing Sound Textures through Wavelet Tree Learning (1) Definition by Dubnov et al., Hebrew University, Jerusalem, Israel [1]: “We can describe sound textures as a set of repeating structural elements (sound grains) subject to some randomness in their time appearance and relative ordering but preserving certain essential temporal coherence and across-scale localization.” Ex: “. . . natural and artificial sounds such as rain, a waterfall, traffic noises, people babble, machine noises, and so on.” Fundamental assumption: “. . . the sound signals are approximately stationary at some scale” (?). Comment: this is according to a precise analytical tool. ` Universita degli Studi di Verona – p. 18/34
  • 19. Synthesizing Sound Textures through Wavelet Tree Learning (2) Gabor theory: sound is perceived as a series of short discrete burst of energy. A further assumption: in a sound texture, a statistical characterization of the joint time-frequency and/or time-scale relations is possible. ` Universita degli Studi di Verona – p. 19/34
  • 20. Synthesizing Sound Textures through Wavelet Tree Learning (3) Original idea developed for image (2D) and video (3D) textures [6]. Examples on next slides extracted from: http://www.cs.huji.ac.il/labs/cglab/papers/texsyn/2dtexsyn/ The audio (1D) is an adaptation of the original studies. More works to do in order to avoid silence gaps, too much similar portions, . . . ` Universita degli Studi di Verona – p. 20/34
  • 21. Synthesizing Sound Textures through Wavelet Tree Learning (4) Original texture, same size synthesized texture, 4 times larger synthesized texture. ` Universita degli Studi di Verona – p. 21/34
  • 22. Synthesizing Sound Textures through Wavelet Tree Learning (5) Original texture, same size synthesized texture, 4 times larger synthesized texture. ` Universita degli Studi di Verona – p. 22/34
  • 23. Synthesizing Sound Textures through Wavelet Tree Learning (6) Statistical Learning. Estimating the stochastic source with a training example (a “sample” of the source). El-Yaniv algorithm: generate new random sequences that could have been generated from the source of the sample. The new sequences are generated by synthetic wavelet coefficients. The wavelet coefficients are obtained by following some statistically constrained paths in the analysis wavelet tree. ` Universita degli Studi di Verona – p. 23/34
  • 24. Synthesizing Sound Textures through Wavelet Tree Learning (7) Wavelet MRA Tree. Using a Daubechies wavelet an analysis tree is built. The Daubechies has been chosen because “this wavelet has several superior properties compared to other orthonormal wavelets, especially with respect to translation and rotation invariance, aliasing, and robustness due to its nonorthogonality and redundancy” (?). Each MRA tree node stores the coefficients of the Daubechies Wavelet at a specific scale. ` Universita degli Studi di Verona – p. 24/34
  • 25. Synthesizing Sound Textures through Wavelet Tree Learning (8) Learning. Each coefficient depends on its scale ancestor (upper level) and temporal predecessor (those to its left). Using an algorithm by El-Yaniv [1], the conditional probability along the tree path (scale) can be learnt. A second learn is done using the neighboring node (time) for preserving time structure. ` Universita degli Studi di Verona – p. 25/34
  • 26. Synthesizing Sound Textures through Wavelet Tree Learning (9) Synthesizing. Thus, the signal can be viewed as a collection of paths from the root of the tree toward the leaves. The goal is to generate new tree whose paths are typical sequences generated by the same source, by creating new (candidate) nodes (children) for a node vi . First the algorithm copy the root and the nodes of the level 1 in the new tree. Now let’s assume that we have already generated the first i levels of the new tree. To generate the next level we must add two children nodes to each node v i in level i. The algorithm search among all nodes at i-th level of the tree for nodes wi with maximal-length ε-similar (El-Yaniv, ε is a user threshold) path suffixes w i−1 , wi−2 . . . wj . ` Universita degli Studi di Verona – p. 26/34
  • 27. Synthesizing Sound Textures through Wavelet Tree Learning (10) Synthesizing. Among these candidate the algorithm look for those nodes whose kth (k is a user parameter) predecessor (the nodes on the left in the same level) resemble those of v i children. The algorithm then randomly chooses a candidate and copies the values to the node v i . ` Universita degli Studi di Verona – p. 27/34
  • 28. Synthesizing Sound Textures through Wavelet Tree Learning (11) Comments. Theoretical approach: very interesting mathematical background. Experimental results: silence gap and pattern repetitions. Results with images are better, but probably because image perception is different from audio perception. ` Universita degli Studi di Verona – p. 28/34
  • 29. Sounds examples Synthesizing Sound Textures through Wavelet Tree Learning. Baby crying. Shores. Traffic jam. Their textures have a strong rhythmical or temporal articulation. All the examples show the same problems: The macro-tiles seems to be generated by the same set of “randomly” chosen coefficients, resulting in unnatural-sounding repetitions. ` Universita degli Studi di Verona – p. 29/34
  • 30. Sound Texture Modelling with CFTLP (1) Definition by Athineos and Ellis, Columbia University, U.S.A. [5]: “. . . we look at a third class of sounds we call sound textures that are distinct from speech and music.” “. . . textures should have an undetermined extent (duration) with consistent properties (at some level), and be readily identifiable from a small sample.” ` Universita degli Studi di Verona – p. 30/34
  • 31. Sound Texture Modelling with CFTLP (2) Idea: to model texture as rapidly-modulated noise by using two linear predictors in cascadea . The first, operating in the time domain, is a normal LPC analysis and captures the spectral envelope. The second, in the frequency domain (operating on the residual of the previous LPC analysis), captures the time envelope, i.e. the time structure. Textures can be synthesized using a filtered Gaussian noise, which feed the cascade of filters whose coefficients where obtained by the analysis of the original texture sample. a A quite identical idea can be found on [7]. ` Universita degli Studi di Verona – p. 31/34
  • 32. Sound Texture Modelling with CFTLP (3) CTFLP analysis (up) and synthesis (down) block diagrams. ` Universita degli Studi di Verona – p. 32/34
  • 33. References (1) [1] Dubnov, S.; Bar-Joseph, Z.; El-Yaniv, R.; Lischinski, D.; Werman, M.;: Synthesizing sound textures through wavelet tree learning., Computer Graphics and Applications, IEEE , Volume: 22 , Issue: 4, pp. 38-48, (July-Aug. 2002). [2] Dubnov, S.; Tishby, N.;: Analysis of sound textures in musical and machine sounds by means of higher order statistical features., Acoustics, Speech, and Signal Processing, 1997. ICASSP-97., 1997 IEEE International Conference on , Volume: 5, pp. 3845-3848, (21-24 April 1997). [3] Parker, J.R.; Behm, B.;: Creating audio textures by example: tiling and stitching., Acoustics, Speech, and Signal Processing, 2004. Proceedings. (ICASSP ’04). IEEE International Conference on , pp:iv-317 - iv-320 vol.4, (17-21 May 2004). ` Universita degli Studi di Verona – p. 33/34
  • 34. References (2) [4] Michael Norris; Sue Denham;: Sound texture detection using Self Organizing Maps., Centre for Theoretical and Computation Neuroscience, University of Plymounth, UK, (Nov 2003). [5] Athineos, M.; Ellis, D.P.W.;: Sound texture modelling with linear prediction in both time and frequency domains., Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP ’03). 2003 IEEE International Conference on , Volume: 5, pp. 648-51, (6-10 April 2003). [6] Z. Bar-Joseph et al.;: Texture Mixing and Texture Movie Synthesis Using Statistical Learning., IEEE Trans. Visualization and Computer Graphics, vol. 7, no. 2, pp. 120-135, (Apr.-Jun. 2001). [7] Zhu, X.L.; Wyse, L.;: Sound texture modeling and time-frequency LPC., Proceedings of the Conf. on Digital Audio Effects (DAFX-04), Napels, Italy, (5-8 October 2004). ` Universita degli Studi di Verona – p. 34/34