Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.
1 
Box + Solr = Content Search for 
Business 
June 2014
2 
Wei Zhao 
Box backend engineer 
wzhao@box.com
3 
Box mission 
to make organizations more productive, 
competitive and collaborative by connecting 
people and their most...
4 
25MM+ 
Users 
225K+ 
Businesses 
99% 
Fortune 500
5 
Box search mission is to make user content 
easy to discover.
6 
10Billion+ 
Documents 
10TB+ 
Index size 
100M+ 
Daily requests 
Box uses Solr for search
7 
Quick Search
8 
Quick Search
9 
Full Search
10 
Sharding – splitting the index 
Agenda 
Highly available search 
A few more things 
1 
2 
3 
4 
5 Q&A 
Currently worki...
11 
We shard things
12 
Shard ID = File ID % Total Shards
13 
Multi-tenant – One big logical index for all users 
Solr index 
Shard1 Shard2 Shard3 ShardN
14 
Search scope
15 
A typical Solr Document 
File ID: 12345 
OwnerID: user1 
Parent Folders IDs: folder1, folder2 
File Name: Solr.ppt 
Fi...
16 
File 1 File 2 
Owner: User1 
Parent: Folder1 
File 3 File 4 
Owner: User2 
Parent: Folder3 
Owner: User2 
Parent: Fold...
17 
User1 with no share folder 
File 1 File 2 
Owner: User1 
Parent: Folder1 
File 3 File 4 
Owner: User2 
Parent: Folder3...
18 
User2 shares Folder2 with User1 
File 1 File 2 
Owner: User1 
Parent: Folder1 
File 3 File 4 
Owner: User2 
Parent: Fo...
19 
User2 shares Folder2 with User1 
File 1 File 2 
Owner: User1 
Parent: Folder1 
File 3 File 4 
Owner: User2 
Parent: Fo...
20 
User2 shares Folder2 with User1 
File 1 File 2 
Owner: User1 
Parent: Folder1 
File 3 File 4 
Owner: User2 
Parent: Fo...
21 
User2 shares Folder2 with User1 
File 1 File 2 
Owner: User1 
Parent: Folder1 
File 3 File 4 
Owner: User2 
Parent: Fo...
22 
Highly Available Search
23 
• Index is highly available 
• Search functionality is highly available
24 
Index workflow
25 
Box 
Front 
End 
Upload 
Index Queue 
Queue 1 
Queue 2 
Queue 3 
Indexer 1 
Indexer 2 
Indexer 3 
MySQL 
Index1 
Index...
26 
Search workflow
27 
query HA 
Box 
Front 
End 
Proxy 
Head 
node 
1 2 3 N 
HA Proxy 
Box 
Front 
End 
query 
HA 
Proxy 
Data center bounda...
28 
A few more things
29 
File Content Search
30 
Box 
Front 
End 
Upload 
MySQL Box File 
Storage 
Indexer 
Solr 
Index 
Text Extraction 
Extracted 
Text
31 
Multi-language support
32 
Raw file 
content 
Language 
detector 
English tokenizer 
Spanish tokenizer 
Japanese tokenizer 
German tokenizer 
fil...
33 
To Dos 
• Scale language support 
• Support document with mixed languages
34 
Search Warm-up
35 
• Front end informs backend to warm up on keyboard focus 
• Backend prepares the search filter and caches it in a sear...
36 
What we are working on
37 
• Search suggestions 
• Search operators 
• Use machine learning to influence ranking 
• Logical sharding 
Things we a...
38 
Question?
39 
We are hiring! 
Contact: wzhao@box.com
Prochain SlideShare
Chargement dans…5
×

1

Partager

Télécharger pour lire hors ligne

Box + Solr = Content Search for Business

Télécharger pour lire hors ligne

SFBay Area Solr Meetup - June 18th. "Box + Solr = Content Search for Business" - Wei Zhao, Box.

Livres associés

Gratuit avec un essai de 30 jours de Scribd

Tout voir

Box + Solr = Content Search for Business

  1. 1. 1 Box + Solr = Content Search for Business June 2014
  2. 2. 2 Wei Zhao Box backend engineer wzhao@box.com
  3. 3. 3 Box mission to make organizations more productive, competitive and collaborative by connecting people and their most important information
  4. 4. 4 25MM+ Users 225K+ Businesses 99% Fortune 500
  5. 5. 5 Box search mission is to make user content easy to discover.
  6. 6. 6 10Billion+ Documents 10TB+ Index size 100M+ Daily requests Box uses Solr for search
  7. 7. 7 Quick Search
  8. 8. 8 Quick Search
  9. 9. 9 Full Search
  10. 10. 10 Sharding – splitting the index Agenda Highly available search A few more things 1 2 3 4 5 Q&A Currently working on
  11. 11. 11 We shard things
  12. 12. 12 Shard ID = File ID % Total Shards
  13. 13. 13 Multi-tenant – One big logical index for all users Solr index Shard1 Shard2 Shard3 ShardN
  14. 14. 14 Search scope
  15. 15. 15 A typical Solr Document File ID: 12345 OwnerID: user1 Parent Folders IDs: folder1, folder2 File Name: Solr.ppt File Content: blah ......
  16. 16. 16 File 1 File 2 Owner: User1 Parent: Folder1 File 3 File 4 Owner: User2 Parent: Folder3 Owner: User2 Parent: Folder2 Owner: User1 Parent: Folder1 Folder4
  17. 17. 17 User1 with no share folder File 1 File 2 Owner: User1 Parent: Folder1 File 3 File 4 Owner: User2 Parent: Folder3 Owner: User2 Parent: Folder2 Owner: User1 Parent: Folder1 Folder4
  18. 18. 18 User2 shares Folder2 with User1 File 1 File 2 Owner: User1 Parent: Folder1 File 3 File 4 Owner: User2 Parent: Folder3 Owner: User2 Parent: Folder2 Owner: User1 Parent: Folder1 Folder4
  19. 19. 19 User2 shares Folder2 with User1 File 1 File 2 Owner: User1 Parent: Folder1 File 3 File 4 Owner: User2 Parent: Folder3 Owner: User2 Parent: Folder2 Owner: User1 Parent: Folder1 Folder4
  20. 20. 20 User2 shares Folder2 with User1 File 1 File 2 Owner: User1 Parent: Folder1 File 3 File 4 Owner: User2 Parent: Folder3 Owner: User2 Parent: Folder5 Owner: User1 Parent: Folder1 Folder4 Removed out of Folder2
  21. 21. 21 User2 shares Folder2 with User1 File 1 File 2 Owner: User1 Parent: Folder1 File 3 File 4 Owner: User2 Parent: Folder3 Owner: User2 Parent: Folder5 Owner: User1 Parent: Folder1 Folder4 Removed out of Folder2
  22. 22. 22 Highly Available Search
  23. 23. 23 • Index is highly available • Search functionality is highly available
  24. 24. 24 Index workflow
  25. 25. 25 Box Front End Upload Index Queue Queue 1 Queue 2 Queue 3 Indexer 1 Indexer 2 Indexer 3 MySQL Index1 Index2 Index2
  26. 26. 26 Search workflow
  27. 27. 27 query HA Box Front End Proxy Head node 1 2 3 N HA Proxy Box Front End query HA Proxy Data center boundary Head node HA Proxy 1 2 3 N
  28. 28. 28 A few more things
  29. 29. 29 File Content Search
  30. 30. 30 Box Front End Upload MySQL Box File Storage Indexer Solr Index Text Extraction Extracted Text
  31. 31. 31 Multi-language support
  32. 32. 32 Raw file content Language detector English tokenizer Spanish tokenizer Japanese tokenizer German tokenizer file_content_en File_content_es {hola} file_content_ja . . . . File_content_de
  33. 33. 33 To Dos • Scale language support • Support document with mixed languages
  34. 34. 34 Search Warm-up
  35. 35. 35 • Front end informs backend to warm up on keyboard focus • Backend prepares the search filter and caches it in a search session • Backend sends a warm-up query to Solr
  36. 36. 36 What we are working on
  37. 37. 37 • Search suggestions • Search operators • Use machine learning to influence ranking • Logical sharding Things we are working on
  38. 38. 38 Question?
  39. 39. 39 We are hiring! Contact: wzhao@box.com
  • markpeng

    Dec. 22, 2014

SFBay Area Solr Meetup - June 18th. "Box + Solr = Content Search for Business" - Wei Zhao, Box.

Vues

Nombre de vues

1 680

Sur Slideshare

0

À partir des intégrations

0

Nombre d'intégrations

175

Actions

Téléchargements

23

Partages

0

Commentaires

0

Mentions J'aime

1

×