What Shazam doesn't want you to know

12 569 vues

Publié le

This is the presentation I gave at Devoxx 2011 called: "What Shazam doesn't want you to know"

1 commentaire
16 j’aime
Statistiques
Remarques
Aucun téléchargement
Vues
Nombre de vues
12 569
Sur SlideShare
0
Issues des intégrations
0
Intégrations
15
Actions
Partages
0
Téléchargements
0
Commentaires
1
J’aime
16
Intégrations 0
Aucune incorporation

Aucune remarque pour cette diapositive

What Shazam doesn't want you to know

  1. 1. What Shazam doesn't want you to know <ul>Roy van Rijn Software Craftsman </ul>
  2. 2. <ul>Imagine this... </ul>
  3. 3. <ul>Friday afternoon </ul><ul></ul><ul></ul>
  4. 4. <ul>Getting inspiration </ul><ul></ul><ul></ul>
  5. 5. <ul>Singing along... </ul><ul></ul><ul></ul>
  6. 6. <ul>Meet Stewie! </ul><ul></ul><ul></ul>
  7. 7. <ul>Music matching </ul>
  8. 8. <ul><li>Shazam is magic... alien technology! </li></ul><ul><ul><li>Start the application.
  9. 9. Let it listen for +/- 20 seconds.
  10. 10. It tells you: </li></ul></ul><ul>Music matching </ul><ul></ul><ul></ul>
  11. 11. <ul>Saturday morning </ul>
  12. 12. <ul><li>Specifying the audio format: </li></ul>private AudioFormat getFormat() { float sampleRate = 44100; int sampleSizeInBits = 8; int channels = 1; boolean signed = true ; boolean bigEndian = true ; return new AudioFormat( sampleRate, sampleSizeInBits, channels, signed, bigEndian); } <ul>The microphone </ul><ul></ul><ul></ul>
  13. 13. <ul><li>Accessing the microphone </li></ul>final AudioFormat format = getFormat(); DataLine.Info info = new DataLine.Info(TargetDataLine. class , format); final TargetDataLine line = (TargetDataLine) AudioSystem. getLine (info); line.open(format); line.start(); <ul>The microphone </ul><ul></ul><ul></ul>
  14. 14. <ul><li>Reading the sound: </li></ul>out = new ByteArrayOutputStream(); running = true ; try { while ( running ) { int count = line.read( buffer , 0, buffer . length ); if (count > 0) { out .write( buffer , 0, count); } } out .close(); } catch (IOException e) { throw new RuntimeException(e); } <ul>The microphone </ul><ul></ul><ul></ul>
  15. 15. <ul>0 0 0 1 2 5 0 -1 -3 -4 -5 -2 0 1 2 0 2 (etc) </ul><ul>Inside the byte array </ul><ul></ul><ul></ul>
  16. 16. <ul>Plotting a graph </ul><ul></ul><ul></ul>
  17. 17. <ul><li>Membrane, cochlear, brain </li></ul><ul>The human ear </ul><ul></ul><ul></ul>
  18. 18. <ul><li>Hertz
  19. 19. The amount of cycles per second
  20. 20. i.e. one sine wave. </li></ul><ul>Frequencies...? </ul><ul></ul><ul></ul>
  21. 21. <ul><li>To match music we need frequencies, not waves. </li></ul><ul>Time vs Frequency </ul><ul></ul><ul></ul>
  22. 22. <ul>Fourier Transformation </ul><ul></ul><ul></ul>
  23. 23. <ul>Fourier Transformation </ul><ul></ul><ul></ul>
  24. 24. <ul>Fourier Transformation </ul><ul></ul><ul></ul>
  25. 25. <ul>Fourier Transformation </ul><ul></ul><ul></ul>
  26. 26. <ul>Fourier Transformation </ul><ul></ul><ul></ul>
  27. 27. <ul><li>Excellent explanation by Stuart Riffle </li><ul><li>http://altdevblogaday.com </li></ul></ul><ul>Fourier Transformation </ul><ul></ul><ul></ul>
  28. 28. <ul><li>We've lost track of time! </li></ul><ul>Frequency domain </ul><ul></ul><ul></ul>
  29. 29. <ul><li>Solution: Apply transformation on pieces </li></ul>byte audio[] = out .toByteArray(); final int amountSlices = audio. length / SLICE_SIZE ; Complex[][] results = new Complex[amountChucks][]; for ( int slice = 0;slice < amountSlices; slice++) { Complex[] complex = new Complex[ SLICE_SIZE ]; for ( int i = 0;i< SLICE_SIZE ;i++) { complex[i] = new Complex(audio[(slice* SLICE_SIZE )+i], 0); } results[slice] = FFT. fft (complex); } <ul>Windowing </ul><ul></ul><ul></ul>
  30. 30. <ul><li>From wikipedia: </li></ul>Spectum Analyzer A spectrum analyzer or spectral analyzer is a device used to examine the spectral composition of some electrical, acoustic , or optical waveform. <ul>Spectrum Analyzer </ul><ul></ul><ul></ul>
  31. 31. Demo: Aphex Twin
  32. 32. <ul>Sunday </ul>
  33. 33. <ul><li>Determine the key points (in ranges):
  34. 34. 34 41 92 129 186
  35. 35. 39 41 117 130 218
  36. 36. 40 42 106 129 191
  37. 37. 40 47 117 121 217
  38. 38. 40 53 81 129 208
  39. 39. 40 48 109 132 260
  40. 40. 39 45 89 135 247
  41. 41. 40 42 84 125 251
  42. 42. 40 41 81 121 232
  43. 43. 38 42 113 131 245 </li></ul>(etc...) <ul>Matching the song </ul><ul></ul><ul></ul>
  44. 44. <ul><li>Playing/decoding MP3 files: </li><ul><li>JLayer (real time MP3 decoder) </li><ul><li>jl1.0.1.jar </li></ul><li>MP3SPI (Java plugin, based on JLayer) </li><ul><li>mp3spi1.9.4.jar </li></ul><li>Tritonus (implementation of Java Sound API) </li><ul><li>tritonus_share.jar </li></ul></ul></ul><ul>Something to match against </ul><ul></ul><ul></ul>
  45. 45. <ul><li>Harvesting my music collection: </li></ul>public void harvest(File rootDirectory) { String[] itemsInDirectory = rootDirectory.list(); for (String itemInDirectory:itemsInDirectory) { if (itemInDirectory.endsWith( &quot;.mp3&quot; )) { //Assume mp3 file File mp3File = new File(mp3Directory, itemInDirectory); captureAudio(mp3File); } else if ( new File(mp3Directory, itemInDirectory).isDirectory()) { //Directory? Recurse! harvest( new File(mp3Directory, itemInDirectory)); } } } <ul>Something to match against </ul><ul></ul><ul></ul>
  46. 46. <ul><li>We have: </li></ul><ul><ul><li>Set of +/- 3000 files of reference data (songs)
  47. 47. Way of capturing key moments with microphone </li></ul></ul><ul><li>Lets do some matching! </li></ul><ul>What we have now </ul><ul></ul><ul></ul>
  48. 48. <ul><li>Create a single hash per slice </li></ul>private static final in t FUZ_FACTOR = 2; private long hash(String line) { String[] p = line.split( &quot;t&quot; ); long p1 = Long. parseLong (p[0]); long p2 = Long. parseLong (p[1]); long p3 = Long. parseLong (p[2]); long p4 = Long. parseLong (p[3]); // long p5 = Long.parseLong(p[5]); // Not using the fifth point currently return (p4-(p4% FUZ_FACTOR )) * 100000000 + (p3-(p3% FUZ_FACTOR )) * 100000 + (p2-(p2% FUZ_FACTOR )) * 100 + (p1-(p1% FUZ_FACTOR )); } <ul>Hash function </ul><ul></ul><ul></ul>
  49. 49. <ul><ul><ul><ul><li>Load all the reference hashes
  50. 50. Listen to the microphone and generate hashes
  51. 51. Find all matching hashes
  52. 52. Return the reference-song with most hits </li></ul></ul></ul></ul><ul><li>This worked (a bit) but produced a lot of mis-hits
  53. 53. How can we improve this? </li></ul><ul>Matching algorithm #1 </ul><ul></ul><ul></ul>
  54. 54. <ul><li>Microphone, sample #1, matches with: </li></ul><ul><ul><ul><li>Song 4, sample #4
  55. 55. Song 6, sample #9 </li></ul></ul></ul><ul><li>Microphone, sample #2, matches with: </li></ul><ul><ul><ul><li>Song 4, sample #6
  56. 56. Song 6, sample #10
  57. 57. Song 8, sample #4 </li></ul></ul></ul><ul><li>Microphone, sample #3, matches with: </li></ul><ul><ul><ul><li>Song 4, sample #5
  58. 58. Song 5, sample #3 </li></ul></ul></ul><ul>Matching algorithm #2 </ul><ul></ul><ul></ul>
  59. 59. <ul><li>Microphone, sample #1, matches with: </li></ul><ul><ul><ul><li>Song 4, sample #4: 4 – 1 = 3
  60. 60. Song 6, sample #9 9 – 1 = 8 </li></ul></ul></ul><ul><li>Microphone, sample #2, matches with: </li></ul><ul><ul><ul><li>Song 4, sample #6 6 – 2 = 4
  61. 61. Song 6, sample #10 10 – 2 = 8
  62. 62. Song 8, sample #4 4 – 2 = 2 </li></ul></ul></ul><ul><li>Microphone, sample #3, matches with: </li></ul><ul><ul><ul><li>Song 4, sample #5 5 – 3 = 2
  63. 63. Song 5, sample #2 2 – 3 = -1 </li></ul></ul></ul>Matching algorithm #2 <ul></ul><ul></ul>
  64. 64. <ul><li>Now we group the results: </li></ul>2x: Song 6 with offset 8 1x: Song 4 with offset 2 1x: Song 4 with offset 3 1x: Song 4 with offset 4 1x: Song 8 with offset 2 1x: Song 5 with offset -1 Matching algorithm #2 <ul></ul><ul></ul>
  65. 65. <ul><li>Music matching (Shazam, SoundHound) isn't magic
  66. 66. Searching can be done very fast: </li></ul><ul><ul><li>Searching for hashes: O(1)
  67. 67. Align and match </li></ul></ul><ul><li>The Fourier Transformation is like a spirograph: </li></ul>Recap <ul></ul><ul></ul>
  68. 68. <ul>Demo </ul>
  69. 69. <ul><li>What are other uses for this algorithm: </li><ul><li>Speech recognition? </li><ul><li>Probably not.. </li></ul><li>Detecting duplicate songs in your music collection? </li><ul><li>Yes! Took 5 minutes for crude implementation </li></ul><li>Subtitle synchronisation in India ! </li></ul></ul>Other uses for this algorithm <ul></ul><ul></ul>
  70. 70. ... Landmark Digital Services owns the patents that cover the algorithm used as the basis for your recently posted “Creating Shazam In Java”. While it is not Landmark’s intention to alienate those in the Open Source and Music Information Retrieval community, Landmark must request that you do not ship, deploy or post the code presented in your post. Landmark also requests that in the future you do not ship, deploy or post any portions or versions of this code in its current state or in any modified state. We hope you understand our position and that we would be legally remiss not to make this request. We appreciate your immediate attention and response. ... Landmark Digital Services <ul></ul><ul></ul>
  71. 71. <ul><li>After this email I contacted: </li></ul><ul><ul><li>Arnoud Engelfriet (Dutch IT lawyer, patent attorney)
  72. 72. Free Software Foundation
  73. 73. And others. </li></ul></ul>Getting information <ul></ul><ul></ul>
  74. 74. <ul><li>From another email: </li></ul><ul><ul><li>As I'm sure you are aware, your blogpost may be viewed internationally. As a result, you may contribute to someone infringing our patents in any part of the world. While we trust your good intentions, yes, we would like you to refrain from releasing the code at all and to remove the blogpost explaining the algorithm . </li></ul></ul>Now the blogpost? <ul></ul><ul></ul>
  75. 75. <ul><li>My reply was short and concise:
  76. 76. I'm sorry, I can't comply. The blogpost will absolutely not be removed. Good luck. </li></ul>No way... <ul></ul><ul></ul>
  77. 77. <ul>Questions? </ul>
  78. 78. <ul><li>There are a couple I haven't tried: </li><ul><li>Non-linear time scales
  79. 79. Better 'important'-point selection (not static ranges)
  80. 80. HTML 5 recording, processing with Javascript </li></ul></ul>Any improvements? <ul></ul><ul></ul>

×