Contenu connexe Similaire à Box + Solr = Content Search for Business (20) Box + Solr = Content Search for Business1. 1
Box + Solr = Content Search for
Business
June 2014
3. 3
Box mission
to make organizations more productive,
competitive and collaborative by connecting
people and their most important information
5. 5
Box search mission is to make user content
easy to discover.
10. 10
Sharding – splitting the index
Agenda
Highly available search
A few more things
1
2
3
4
5 Q&A
Currently working on
13. 13
Multi-tenant – One big logical index for all users
Solr index
Shard1 Shard2 Shard3 ShardN
15. 15
A typical Solr Document
File ID: 12345
OwnerID: user1
Parent Folders IDs: folder1, folder2
File Name: Solr.ppt
File Content: blah
......
16. 16
File 1 File 2
Owner: User1
Parent: Folder1
File 3 File 4
Owner: User2
Parent: Folder3
Owner: User2
Parent: Folder2
Owner: User1
Parent:
Folder1
Folder4
17. 17
User1 with no share folder
File 1 File 2
Owner: User1
Parent: Folder1
File 3 File 4
Owner: User2
Parent: Folder3
Owner: User2
Parent: Folder2
Owner: User1
Parent:
Folder1
Folder4
18. 18
User2 shares Folder2 with User1
File 1 File 2
Owner: User1
Parent: Folder1
File 3 File 4
Owner: User2
Parent: Folder3
Owner: User2
Parent: Folder2
Owner: User1
Parent:
Folder1
Folder4
19. 19
User2 shares Folder2 with User1
File 1 File 2
Owner: User1
Parent: Folder1
File 3 File 4
Owner: User2
Parent: Folder3
Owner: User2
Parent: Folder2
Owner: User1
Parent:
Folder1
Folder4
20. 20
User2 shares Folder2 with User1
File 1 File 2
Owner: User1
Parent: Folder1
File 3 File 4
Owner: User2
Parent: Folder3
Owner: User2
Parent: Folder5
Owner: User1
Parent:
Folder1
Folder4
Removed
out of
Folder2
21. 21
User2 shares Folder2 with User1
File 1 File 2
Owner: User1
Parent: Folder1
File 3 File 4
Owner: User2
Parent: Folder3
Owner: User2
Parent: Folder5
Owner: User1
Parent:
Folder1
Folder4
Removed
out of
Folder2
23. 23
• Index is highly available
• Search functionality is highly available
25. 25
Box
Front
End
Upload
Index Queue
Queue 1
Queue 2
Queue 3
Indexer 1
Indexer 2
Indexer 3
MySQL
Index1
Index2
Index2
27. 27
query HA
Box
Front
End
Proxy
Head
node
1 2 3 N
HA Proxy
Box
Front
End
query
HA
Proxy
Data center boundary
Head
node
HA Proxy
1 2 3 N
30. 30
Box
Front
End
Upload
MySQL Box File
Storage
Indexer
Solr
Index
Text Extraction
Extracted
Text
32. 32
Raw file
content
Language
detector
English tokenizer
Spanish tokenizer
Japanese tokenizer
German tokenizer
file_content_en
File_content_es
{hola}
file_content_ja
.
.
.
.
File_content_de
33. 33
To Dos
• Scale language support
• Support document with mixed languages
35. 35
• Front end informs backend to warm up on keyboard focus
• Backend prepares the search filter and caches it in a search session
• Backend sends a warm-up query to Solr
37. 37
• Search suggestions
• Search operators
• Use machine learning to influence ranking
• Logical sharding
Things we are working on
Notes de l'éditeur This has changed many of the access patterns to data in the enterprise.
This has changed many of the access patterns to data in the enterprise.
This has changed many of the access patterns to data in the enterprise.