Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

@IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

Video available at: https://www.youtube.com/watch?v=z4JTjUp3NC0

To scale the building of decision trees on large amounts of Indeed job search data, we created a system called Imhotep. In addition to being a crucial tool for building these machine learning models, Imhotep has proven to be applicable to many different analytics problems. The core of Imhotep is a distributed system that manages the parallel execution of queries across a set of time-sharded inverted indices.

This talk covers Imhotep’s primitive operations that allow us to build decision trees, drill into data, build graphs, and even execute sql-like queries in IQL (Imhotep Query Language). We will also discuss what makes Imhotep fast, highly available, and fault tolerant.

  • Identifiez-vous pour voir les commentaires

@IndeedEng: Imhotep - Large Scale Analytics and Machine Learning at Indeed

  1. 1. go.indeed.com/IndeedEngTalks
  2. 2. Imhotep Large Scale Analytics and Machine Learning at Indeed
  3. 3. Jeff Plaisance Engineering Manager
  4. 4. I help people get jobs.
  5. 5. Indeed is a Search Engine for Jobs
  6. 6. Indeed is a data driven organization
  7. 7. Indeed is a data driven organization Data driven organizations need great tools
  8. 8. What does Imhotep allow you to do? ● Decision Tree Building ● Analytics
  9. 9. What does Imhotep allow you to do? ● Decision Tree Building ● Analytics
  10. 10. Indeed’s Analytics Philosophy Analytics systems should be: 1. Interactive 2. Not Sampled 3. Not Approximate
  11. 11. Imhotep answers questions What was the weekly average query time in the last quarter from people doing the query “software”?
  12. 12. Imhotep answers questions What percent of jobsearch results pages are for page 2 and beyond?
  13. 13. Imhotep answers questions What are the 5 most common queries in each country?
  14. 14. Total Job Searches From 2014-03-09 to 2014-03-23 ?
  15. 15. Query
  16. 16. Query Location
  17. 17. Query Location Impression
  18. 18. Document query: “indeed software engineer” location: “austin” impressions: 10 clicks: 2 time: 2014-03-17T12:00:00
  19. 19. Shard 0 21 3 4 5 76 8 9 10 1211 13 14
  20. 20. Shard 0 21 3 4 5 76 8 9 10 1211 13 14
  21. 21. Server 2014/03/02 2014/03/09 2014/03/11 2014/03/12 2014/03/22 2014/03/24 Documents Documents Documents Documents Documents Documents
  22. 22. Server 2014/03/02 2014/03/09 2014/03/11 2014/03/12 2014/03/22 2014/03/24 Documents Documents Documents Documents Documents Documents
  23. 23. Cluster 2014-03-02 Server A 2014-03-03 Server B 2014-03-04 Server C
  24. 24. Cluster 2014-03-02 2014-03-03 Server B 2014-03-04 Server CServer A
  25. 25. Cluster 2014-03-02 2014-03-03 Server B 2014-03-04 Server C Client Session Server A
  26. 26. Total Job Searches From 2014-03-09 to 2014-03-23 secret
  27. 27. Total Job Searches From 2014-03-09 to 2014-03-23 Per Day 2014-03-09 2014-03-16 2014-03-23
  28. 28. Metrics ● 64 bit integers ● Exactly one value per doc ● Random access by doc id
  29. 29. Metrics ● Time ● Clicks ● Impressions ● Revenue ● … or anything else that is a number
  30. 30. Groups ● Documents are placed into numbered groups ● Every document starts in group 1 ● Group 0 means “filtered out”
  31. 31. Groups ● Groups are stateful and scoped to a session ● Regroup operations update group for each doc in shard
  32. 32. width Metric Regroup ● Iterate over doc_id->metric lookup ● Set group to (value - start)/ bucket_width ● Useful for making graphs (buckets on x-axis) 1 2 3 4 5 start end
  33. 33. Get Group Stats ● For each group, sums a metric for all docs in that group
  34. 34. Bucket By Day 1. Regroup on time metric 2. Get Group Stats for count metric (always 1)
  35. 35. Total Job Searches From 2014-03-09 to 2014-03-23 Per Day 2014-03-09 2014-03-16 2014-03-23
  36. 36. Total and US Job Searches From 2014-03-09 to 2014-03-23 Per Day 2014-03-09 2014-03-16 2014-03-23
  37. 37. Inverted Indexes
  38. 38. Inverted Index ● Like index in the back of a book ● words = terms, page numbers = doc ids ● Term list is sorted ● Doc list for each term is sorted
  39. 39. doc id query country impressions clicks 0 software Canada 10 1 1 blank Canada 10 0 2 sales US 5 0 3 software US 8 1 4 blank US 10 1 Standard Index
  40. 40. Constructing an Inverted Index query country impression clicks doc id blank sales software Canada US 5 8 10 0 1 0 ✔ ✔ ✔ ✔ 1 ✔ ✔ ✔ ✔ 2 ✔ ✔ ✔ ✔ 3 ✔ ✔ ✔ ✔ 4 ✔ ✔ ✔ ✔
  41. 41. Constructing an Inverted Index field term 0 1 2 3 4 query blank ✔ ✔ sales ✔ software ✔ ✔ country Canada ✔ ✔ US ✔ ✔ ✔ impressions 5 ✔ 8 ✔ 10 ✔ ✔ ✔ clicks 0 ✔ ✔ 1 ✔ ✔ ✔
  42. 42. Inverted Index field term doc list query blank 1, 4 sales 2 software 0, 3 country Canada 0, 1 US 2, 3, 4 impressions 5 2 8 3 10 0, 1, 4 clicks 0 1, 2 1 0, 3, 4
  43. 43. Inverted Indexes Allow you to: ● Quickly find all documents containing a term ● Intersect several terms to perform boolean queries
  44. 44. Lucene ● Open source inverted index implementation ● Reasonably fast ● Widely used, well tested
  45. 45. Global and US Job Searches From 2014-03-09 to 2014-03-23 Per Day 2014-03-09 2014-03-16 2014-03-23
  46. 46. field term doc list query blank 1, 4 sales 2 software 0, 3 country Canada 0, 1 US 2, 3, 4 impressions 5 2 8 3 10 0, 1, 4 clicks 0 1, 2 1 0, 3, 4 Searches in the US only
  47. 47. field term doc list query blank 1, 4 sales 2 software 0, 3 country Canada 0, 1 US 2, 3, 4 impressions 5 2 8 3 10 0, 1, 4 clicks 0 1, 2 1 0, 3, 4 Searches in the US only
  48. 48. Searches in the US only field term doc list country Canada 0, 1 US 2, 3, 4
  49. 49. Searches in the US only Query Regroup ● Regroup all docs which do not match a boolean query to group zero field term doc list country Canada 0, 1 US 2, 3, 4
  50. 50. Term Regroup Splits docs in a group into one of two new groups based on presence/absence of a term country:US everything else 1 32
  51. 51. Multiterm Regroup Generalization of term regroup to N terms and N+1 new groups country:US everything elsecountry:CA country:FR 52 3 4 1
  52. 52. Total and US Job Searches From 2014-03-09 to 2014-03-23 Per Day 2014-03-09 2014-03-16 2014-03-23
  53. 53. Inverted Index Compression Size of Organic Dataset for last 5 months ● Original: 102 TB ● Inverted: 51 TB
  54. 54. Inverted Index Optimizations ● Compressed data structures ○ Better use of RAM and processor cache ○ Better use of memory bandwidth ○ Increased CPU usage and time ● Micro optimizations matter!
  55. 55. Delta / Varint Encoding ● Doc id lists are sorted ● Delta between a doc id and the previous doc id is sufficient ● Deltas are usually small integers ● Use less bits for small integers and more bits for large integers
  56. 56. Delta Encoding field term doc list query nursing 34, 86, 247, 301, 674, 714
  57. 57. Delta Encoding field term doc list query nursing 34, 86, 247, 301, 674, 714 34, 52, 161, 54, 373, 40
  58. 58. Small Integer Compression ● Golomb/Rice ● Varint ● Binary Packing ● PForDelta
  59. 59. Small Integer Compression ● Golomb/Rice ● Varint ● Bit Packing ● PForDelta
  60. 60. Varint Encoding 9838
  61. 61. Varint Encoding 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 0 0 1 1 0 1 1 1 0 9838
  62. 62. Varint Encoding 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 0 0 1 1 0 1 1 1 0 9838
  63. 63. Varint Encoding 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 0 0 1 1 0 1 1 1 0 9838
  64. 64. Varint Encoding 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 0 0 1 1 0 1 1 1 0 9838
  65. 65. Varint Encoding 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 0 0 1 1 0 1 1 1 0 9838 ? 1 1 0 1 1 1 0
  66. 66. Varint Encoding 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 9838 ? 1 1 0 1 1 1 0 0 0 1 0 0 1 1 0 0 1 1 0 1 1 1 0
  67. 67. Varint Encoding 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 0 0 1 1 0 1 1 1 0 9838 ? 1 1 0 1 1 1 0
  68. 68. Varint Encoding 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 0 0 1 1 0 1 1 1 0 9838 1 1 1 0 1 1 1 0
  69. 69. Varint Encoding 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 0 0 1 1 0 1 1 1 0 9838 1 1 1 0 1 1 1 0
  70. 70. Varint Encoding 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 0 0 1 1 0 1 1 1 0 9838 1 1 1 0 1 1 1 0 ? 1 0 0 1 1 0 0
  71. 71. Varint Encoding 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 0 0 1 1 0 1 1 1 0 9838 1 1 1 0 1 1 1 0 ? 1 0 0 1 1 0 0
  72. 72. Varint Encoding 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 0 0 1 1 0 1 1 1 0 9838 1 1 1 0 1 1 1 0 ? 1 0 0 1 1 0 0
  73. 73. Varint Encoding 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 0 0 1 1 0 1 1 1 0 9838 1 1 1 0 1 1 1 0 0 1 0 0 1 1 0 0
  74. 74. Varint Encoding 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 0 0 1 1 0 1 1 1 0 9838 1 1 1 0 1 1 1 0 0 1 0 0 1 1 0 0
  75. 75. Inverted Index Compression Size of Organic Dataset for last 5 months ● Original: 102 TB ● Inverted: 51 TB ● Delta / Varint: 17 TB
  76. 76. Flamdex ● Two files per field (terms/docs) ● Can add fields without rebuilding index ● Faster varint decoding ● No TF or positions (or wasted time decoding them)
  77. 77. Varints Pros: ● Compression ● Can fit more of index in RAM ● Higher information throughput per byte read from disk
  78. 78. Varints Cons: ● Decodes one byte at a time ● Lots of branch mispredictions ● Not fast to decode
  79. 79. Vectorized Varint Decoding 01001010 11001000 01110001 01001110 10011011 01101010 10110101 00010111 01110110 10001101 10110011 11000001
  80. 80. Vectorized Varint Decoding 01001010 11001000 01110001 01001110 10011011 01101010 10110101 00010111 01110110 10001101 10110011 11000001 pmovmskb: Extract top bit of each byte
  81. 81. Vectorized Varint Decoding 01001010 11001000 01110001 01001110 10011011 01101010 10110101 00010111 01110110 10001101 10110011 11000001 pmovmskb: Extract top bit of each byte 010010100111
  82. 82. Vectorized Varint Decoding 01001010 11001000 01110001 01001110 10011011 01101010 10110101 00010111 01110110 10001101 10110011 11000001 pmovmskb: Extract top bit of each byte 010010100111 Lookup in 4096 entry lookup table
  83. 83. 010010100111 Pattern of leading bits determines: ● how many varints to decode ● sizes and offsets of varints ● length of longest varint in bytes ● number of bytes to consume
  84. 84. 010010100111 Pattern of leading bits determines: ● how many varints to decode ● sizes and offsets of varints ● length of longest varint in bytes ● number of bytes to consume
  85. 85. 010010100111 Pattern of leading bits determines: ● how many varints to decode ● sizes and offsets of varints ● length of longest varint in bytes ● number of bytes to consume
  86. 86. 010010100111 Pattern of leading bits determines: ● how many varints to decode ● sizes and offsets of varints ● length of longest varint in bytes ● number of bytes to consume
  87. 87. 010010100111 Pattern of leading bits determines: ● how many varints to decode ● sizes and offsets of varints ● length of longest varint in bytes ● number of bytes to consume
  88. 88. 010010100111 Pattern of leading bits determines: ● how many varints to decode ● sizes and offsets of varints ● length of longest varint in bytes ● number of bytes to consume
  89. 89. 010010100111 Pattern of leading bits determines: ● how many varints to decode ● sizes and offsets of varints ● length of longest varint in bytes ● number of bytes to consume
  90. 90. 010010100111 Pattern of leading bits determines: ● how many varints to decode ● sizes and offsets of varints ● length of longest varint in bytes ● number of bytes to consume
  91. 91. 010010100111 Pattern of leading bits determines: ● how many varints to decode ● sizes and offsets of varints ● length of longest varint in bytes ● number of bytes to consume
  92. 92. 010010100111 Decoding options for: ● up to twelve 1 byte varints ● six 1-2 byte varints ● four 1-3 byte varints ● two 1-5 byte varints
  93. 93. Vectorized Varint Decoding ● Decode six 1-2 byte varints in parallel ● Need to pad out all 1 byte varints to 2 bytes pshufb: Intel SSSE3 instruction to shuffle bytes
  94. 94. Vectorized Varint Decoding 01001010 11001000 01110001 01001110 10011011 01101010 10110101 00010111 01110110 10001101 10110011 11000001 Decode 6 varints from 9 bytes
  95. 95. Vectorized Varint Decoding 01001010 11001000 01110001 01001110 10011011 01101010 10110101 00010111 01110110 10001101 10110011 11000001 Pad out 1 byte ints to 2 bytes
  96. 96. Vectorized Varint Decoding 01001010 00000000 11001000 01110001 01001110 00000000 10011011 01101010 10110101 00010111 01110110 00000000 Pad out 1 byte ints to 2 bytes
  97. 97. Vectorized Varint Decoding 01001010 00000000 11001000 01110001 01001110 00000000 10011011 01101010 10110101 00010111 01110110 00000000 Reverse bytes in 2 byte varints
  98. 98. Vectorized Varint Decoding 00000000 01001010 01110001 11001000 00000000 01001110 01101010 10011011 00010111 10110101 00000000 01110110 Reverse bytes in 2 byte varints
  99. 99. Vectorized Varint Decoding 00000000 01001010 01110001 11001000 00000000 01001110 01101010 10011011 00010111 10110101 00000000 01110110 Mask out leading purple 1’s
  100. 100. Vectorized Varint Decoding 00000000 01001010 01110001 01001000 00000000 01001110 01101010 00011011 00010111 00110101 00000000 01110110 Mask out leading purple 1’s
  101. 101. Vectorized Varint Decoding 00000000 01001010 01110001 01001000 00000000 01001110 01101010 00011011 00010111 00110101 00000000 01110110 Shift top bytes of each varint 1 bit right (mask/shift/or)
  102. 102. Vectorized Varint Decoding 00000000 01001010 00111000 11001000 00000000 01001110 00110101 00011011 00001011 10110101 00000000 01110110 Shift top bytes of each varint 1 bit right (mask/shift/or)
  103. 103. Vectorized Varint Decoding 00000000 01001010 00111000 11001000 00000000 01001110 00110101 00011011 00001011 10110101 00000000 01110110 ● ~10 instructions ● No branches ● Less than 2 instructions per varint
  104. 104. Vectorized Varint Decoding 00000000 01001010 00111000 11001000 00000000 01001110 00110101 00011011 00001011 10110101 00000000 01110110 ● Imhotep spends ~40% of its CPU time decoding varints
  105. 105. Vectorized Varint Decoding 00000000 01001010 00111000 11001000 00000000 01001110 00110101 00011011 00001011 10110101 00000000 01110110 ● Imhotep spends ~40% of its CPU time decoding varints ● Vectorized decoder ~3-5x faster ○ Decompresses at 1.5 GB per second ○ ~2x overall system performance
  106. 106. Top 5 Locations
  107. 107. Term Stats atlanta 49 austin 14 boston 25 chicago 28 dallas 13 houston 36 new york 68 san francisco 54
  108. 108. Term Stats Iterator ● For each term in a field, sum metrics across all docs containing that term
  109. 109. Term Stats Iterator ● For each term in a field, sum metrics across all docs containing that term ● How do we compute this across many machines?
  110. 110. dallas 5 boston 12 austin 3 atlanta 16 dallas 8 chicago 19 austin 4 atlanta 12 chicago 9 boston 13 austin 7 atlanta 21
  111. 111. dallas 5 boston 12 austin 3 atlanta 16 dallas 8 chicago 19 austin 4 atlanta 12 chicago 9 boston 13 austin 7 atlanta 21
  112. 112. dallas 5 boston 12 austin 3 atlanta 16 dallas 8 chicago 19 austin 4 atlanta 12 chicago 9 boston 13 austin 7 atlanta 21
  113. 113. dallas 5 boston 12 austin 3 atlanta 16 dallas 8 chicago 19 austin 4 atlanta 12 chicago 9 boston 13 austin 7 atlanta 21
  114. 114. dallas 5 boston 12 austin 3 atlanta 16 chicago 9 boston 13 austin 7 atlanta 21 atlanta 49 dallas 5 boston 12 austin 3 atlanta 16 dallas 8 chicago 19 austin 4 atlanta 12 chicago 9 boston 13 austin 7 atlanta 21
  115. 115. atlanta 49 dallas 5 boston 12 austin 3 atlanta 16 chicago 9 boston 13 austin 7 atlanta 21 dallas 5 boston 12 austin 3 atlanta 16 dallas 8 chicago 19 austin 4 atlanta 12 chicago 9 boston 13 austin 7 atlanta 21
  116. 116. dallas 5 boston 12 austin 3 dallas 8 chicago 19 austin 4 atlanta 12 chicago 9 boston 13 austin 7 atlanta 21 atlanta 49atlanta 49
  117. 117. dallas 5 boston 12 austin 3 dallas 8 chicago 19 austin 4 chicago 9 boston 13 austin 7 atlanta 21 atlanta 49atlanta 49
  118. 118. chicago 9 boston 13 austin 7 atlanta 49atlanta 49 dallas 5 boston 12 austin 3 dallas 8 chicago 19 austin 4
  119. 119. austin 14 atlanta 49 chicago 9 boston 13 austin 7 dallas 5 boston 12 austin 3 dallas 8 chicago 19 austin 4
  120. 120. austin 14 atlanta 49 chicago 9 boston 13 austin 7 dallas 5 boston 12 austin 3 dallas 8 chicago 19 austin 4
  121. 121. dallas 5 boston 12 austin 14 atlanta 49 chicago 9 boston 13 austin 7 dallas 8 chicago 19 austin 4
  122. 122. dallas 8 chicago 19 dallas 5 boston 12 austin 14 atlanta 49 chicago 9 boston 13 austin 7
  123. 123. chicago 9 boston 13 dallas 8 chicago 19 dallas 5 boston 12 austin 14 atlanta 49
  124. 124. chicago 9 boston 13 dallas 8 chicago 19 dallas 5 boston 12 boston 25 austin 14 atlanta 49
  125. 125. boston 25 austin 14 atlanta 49 chicago 9 boston 13 dallas 8 chicago 19 dallas 5 boston 12
  126. 126. dallas 5 boston 25 austin 14 atlanta 49 chicago 9 boston 13 dallas 8 chicago 19
  127. 127. chicago 9dallas 5 boston 25 austin 14 atlanta 49 dallas 8 chicago 19
  128. 128. chicago 9dallas 5 chicago 28 boston 25 austin 14 atlanta 49 dallas 8 chicago 19
  129. 129. chicago 28 boston 25 austin 14 atlanta 49 chicago 9dallas 5 dallas 8 chicago 19
  130. 130. dallas 8 chicago 28 boston 25 austin 14 atlanta 49 chicago 9dallas 5
  131. 131. dallas 8 chicago 28 boston 25 austin 14 atlanta 49 dallas 5
  132. 132. dallas 8 dallas 13 chicago 28 boston 25 austin 14 atlanta 49 dallas 5
  133. 133. dallas 5 dallas 8 dallas 13 chicago 28 boston 25 austin 14 atlanta 49
  134. 134. dallas 8 dallas 13 chicago 28 boston 25 austin 14 atlanta 49
  135. 135. dallas 13 chicago 28 boston 25 austin 14 atlanta 49
  136. 136. Term Stats 1-6 TS 1 TS 2 TS 3 TS 4 TS 5 TS 6
  137. 137. TS 1-6 TS 7-12 TS 13-18
  138. 138. TS 1-6 TS 7-12 TS 13-18 Term Stats 1- 18
  139. 139. Amdahl’s Law ● The speedup of a program using multiple processors is limited by the time needed for the sequential fraction of the program
  140. 140. Amdahl’s Law ● Sequential part of FTGS is last step in merge ● Can we distribute some part of the final merge?
  141. 141. Hash Partition + Interleave ● Send all stats for each unique term to the same thread based on a hash of the term ● Interleave merged terms
  142. 142. TS 1-6 TS 7-12 TS 13-18 Term Stats 1- 18
  143. 143. Shard Distribution
  144. 144. dallas 5 boston 12 austin 3 atlanta 16 dallas 8 chicago 19 austin 4 atlanta 12 chicago 9 boston 13 austin 7 atlanta 21
  145. 145. dallas 5 boston 12 austin 3 atlanta 16 dallas 8 chicago 19 austin 4 atlanta 12 chicago 9 boston 13 austin 7 atlanta 21
  146. 146. dallas 5 boston 12 austin 3 atlanta 16 dallas 8 chicago 19 austin 4 atlanta 12 chicago 9 boston 13 austin 7 atlanta 21
  147. 147. dallas 5 boston 12 austin 3 atlanta 16 dallas 8 chicago 19 austin 4 atlanta 12 chicago 9 boston 13 austin 7 atlanta 21
  148. 148. dallas 5 boston 12 austin 3 atlanta 16 dallas 8 chicago 19 austin 4 atlanta 12 chicago 9 boston 13 austin 7 atlanta 21
  149. 149. dallas 5 boston 12 austin 3 atlanta 16 dallas 8 chicago 19 austin 4 atlanta 12 chicago 9 boston 13 austin 7 atlanta 21
  150. 150. dallas 5 boston 12 atlanta 16 dallas 8 atlanta 12 boston 13 atlanta 21
  151. 151. dallas 5 boston 12 atlanta 16 dallas 8 atlanta 12 boston 13 atlanta 21
  152. 152. dallas 5 boston 12 atlanta 16 dallas 8 atlanta 12 boston 13 atlanta 21 atlanta 49
  153. 153. dallas 5 boston 12 dallas 8 boston 13 boston 25 atlanta 49
  154. 154. dallas 5 dallas 8 dallas 13 boston 25 atlanta 49
  155. 155. dallas 13 boston 25 atlanta 49
  156. 156. dallas 13 boston 25 atlanta 49 chicago 28 austin 14
  157. 157. dallas 13 boston 25 atlanta 49 chicago 28 austin 14
  158. 158. dallas 13 boston 25 atlanta 49 chicago 28 austin 14
  159. 159. atlanta 49 dallas 13 boston 25 atlanta 49 chicago 28 austin 14
  160. 160. atlanta 49 dallas 13 boston 25 atlanta 49 chicago 28 austin 14
  161. 161. dallas 13 boston 25 atlanta 49 chicago 28 austin 14
  162. 162. austin 14 atlanta 49 dallas 13 boston 25 chicago 28 austin 14
  163. 163. austin 14 atlanta 49 dallas 13 boston 25 chicago 28 austin 14
  164. 164. chicago 28 dallas 13 boston 25 austin 14 atlanta 49
  165. 165. boston 25 austin 14 atlanta 49 chicago 28 dallas 13 boston 25
  166. 166. boston 25 austin 14 atlanta 49 chicago 28 dallas 13 boston 25
  167. 167. dallas 13 boston 25 austin 14 atlanta 49 chicago 28
  168. 168. chicago 28 boston 25 austin 14 atlanta 49 dallas 13 chicago 28
  169. 169. chicago 28 boston 25 austin 14 atlanta 49 dallas 13 chicago 28
  170. 170. chicago 28 boston 25 austin 14 atlanta 49 dallas 13
  171. 171. dallas 13 dallas 13 chicago 28 boston 25 austin 14 atlanta 49
  172. 172. dallas 13 dallas 13 chicago 28 boston 25 austin 14 atlanta 49
  173. 173. dallas 13 chicago 28 boston 25 austin 14 atlanta 49
  174. 174. Shard Distribution ● Lots of datasets for different event types ● Each dataset is split into one shard per (hour/day) ● Each shard has 2 replicas for fault tolerance ● How do we assign shards to machines?
  175. 175. Shard Distribution Considerations ● Space ● Load ● Hot Spots ● Adding/Removing machines
  176. 176. Homogeneous vs. Heterogeneous Systems ● Must decide how or if you will handle heterogeneous hardware ● Cannot balance for both space and load on heterogeneous hardware
  177. 177. 1 TB 3 TB Homogeneous vs. Heterogeneous
  178. 178. Homogeneous vs. Heterogeneous 12 shards 50% capacity used 4 shards 50% capacity used
  179. 179. Homogeneous vs. Heterogeneous 12 shards 50% capacity used 4 shards 50% capacity used read hotspot
  180. 180. Homogeneous vs. Heterogeneous 8 shards 33% capacity used 8 shards 100% capacity used wasted space
  181. 181. Hot Spots When accessing any subset of a dataset, evenly spread the load across CPUs, drives, network cards
  182. 182. Hot Spots When accessing any subset of a dataset, evenly spread the load across CPUs, drives, network cards This is hard
  183. 183. Hot Spots Maybe random is good enough?
  184. 184. Hot Spots Maybe random is good enough? On average about 10% more data read from the most loaded machine than the least
  185. 185. Two Choice Randomized Load Balancing ● 2 replicas of each shard to choose from ● Greedily choose the machine that currently has the least load from this client
  186. 186. Two Choice Randomized Load Balancing ● 2 replicas of each shard to choose from ● Greedily choose the machine that currently has the least load from this client ● On average about 1% more data read from the most loaded machine than the least
  187. 187. Rendezvous Hashing ● Assignment of a shard to machines determined only by the machines that exist in the cluster ● Hash all pairs of shard ID and machine ID and pick the largest two
  188. 188. Rendezvous Hashing Shard ID: organic.2014-03-02T06:00:00 H(Shard ID + m1 ) = 0.592624 H(Shard ID + m2 ) = 0.294647 H(Shard ID + m3 ) = 0.736681 H(Shard ID + m4 ) = 0.647578 H(Shard ID + m5 ) = 0.835598
  189. 189. Rendezvous Hashing 0 1 m5 m3 m4 m1 m2
  190. 190. Rendezvous Hashing 0 1 m5 m3 m4 m1 m2
  191. 191. Rendezvous Hashing 0 1 m5 m3 m4 m1 m2
  192. 192. Rendezvous Hashing ● No coordination required - deterministic algorithm used to determine assignment ● No centralized storage for shard to machine assignment
  193. 193. Rendezvous Hashing
  194. 194. Rendezvous Hashing
  195. 195. Rendezvous Hashing
  196. 196. Rendezvous Hashing
  197. 197. Rendezvous Hashing
  198. 198. Rendezvous Hashing
  199. 199. Rendezvous Hashing
  200. 200. Rendezvous Hashing
  201. 201. Rendezvous Hashing
  202. 202. Rendezvous Hashing
  203. 203. Expected max hash for a shard is Rendezvous Hashing
  204. 204. Expected max hash for a shard is Probability that new machine will get shard Rendezvous Hashing
  205. 205. Imhotep answers questions What was the weekly average query time in the last quarter from people doing the query “software”?
  206. 206. 1. Query Regroup on query:software 2. Metric Regroup on time, width 7 days 3. Get Group Stats on query time and count, divide after summing
  207. 207. Ramses
  208. 208. Imhotep answers questions What percent of jobsearch results pages are for page 2 and beyond?
  209. 209. 1. Get Group Stats on count 2. Query Regroup on “-page:1” 3. Get Group Stats on count 4. Divide -page:1 count by total count
  210. 210. Ramses
  211. 211. Imhotep answers questions What are the 5 most common queries in each country?
  212. 212. 1. Multiterm Regroup on all values of country 2. Term Group Stats Iteration on query
  213. 213. IQL select count() from jobsearch ‘2014-01-01’ ‘2014-03-26’ group by country, query[5]
  214. 214. IQL select count() from jobsearch ‘2014-01-01’ ‘2014-03-26’ group by country, query[5] Metrics
  215. 215. select count() from jobsearch ‘2014-01-01’ ‘2014-03-26’ group by country, query[5] IQL Dataset
  216. 216. select count() from jobsearch ‘2014-01-01’ ‘2014-03-26’ group by country, query[5] IQL Regroup
  217. 217. select count() from jobsearch ‘2014-01-01’ ‘2014-03-26’ group by country, query[5] IQL Term Group Stats
  218. 218. Imhotep Large Scale Analytics and Machine Learning
  219. 219. Imhotep Large Scale Analytics and Machine Learning ● Varint Decoding: High Performance Vector Instructions ● Stream Merging: Hash Partition + Interleave ● Shard Distribution: Rendezvous Hashing
  220. 220. We’re Open Sourcing Imhotep
  221. 221. How You Can Use Imhotep Data Ingestion ● TSV Uploader ● Hadoop Data Access ● Imhotep Primitives ● IQL
  222. 222. Next @IndeedEng Talk Large Scale Interactive Analytics with Imhotep Tom Bergman, Product Manager Zak Cocos, Manager of Marketing Sciences April 30, 2014 http://engineering.indeed.com/talks
  223. 223. Q&A
  224. 224. More Questions? David James
  225. 225. Next @IndeedEng Talk Large Scale Interactive Analytics with Imhotep Tom Bergman, Product Manager Zak Cocos, Manager of Marketing Sciences April 30, 2014 http://engineering.indeed.com/talks

×