SlideShare une entreprise Scribd logo
1  sur  39
1 
Box + Solr = Content Search for 
Business 
June 2014
2 
Wei Zhao 
Box backend engineer 
wzhao@box.com
3 
Box mission 
to make organizations more productive, 
competitive and collaborative by connecting 
people and their most important information
4 
25MM+ 
Users 
225K+ 
Businesses 
99% 
Fortune 500
5 
Box search mission is to make user content 
easy to discover.
6 
10Billion+ 
Documents 
10TB+ 
Index size 
100M+ 
Daily requests 
Box uses Solr for search
7 
Quick Search
8 
Quick Search
9 
Full Search
10 
Sharding – splitting the index 
Agenda 
Highly available search 
A few more things 
1 
2 
3 
4 
5 Q&A 
Currently working on
11 
We shard things
12 
Shard ID = File ID % Total Shards
13 
Multi-tenant – One big logical index for all users 
Solr index 
Shard1 Shard2 Shard3 ShardN
14 
Search scope
15 
A typical Solr Document 
File ID: 12345 
OwnerID: user1 
Parent Folders IDs: folder1, folder2 
File Name: Solr.ppt 
File Content: blah 
......
16 
File 1 File 2 
Owner: User1 
Parent: Folder1 
File 3 File 4 
Owner: User2 
Parent: Folder3 
Owner: User2 
Parent: Folder2 
Owner: User1 
Parent: 
Folder1 
Folder4
17 
User1 with no share folder 
File 1 File 2 
Owner: User1 
Parent: Folder1 
File 3 File 4 
Owner: User2 
Parent: Folder3 
Owner: User2 
Parent: Folder2 
Owner: User1 
Parent: 
Folder1 
Folder4
18 
User2 shares Folder2 with User1 
File 1 File 2 
Owner: User1 
Parent: Folder1 
File 3 File 4 
Owner: User2 
Parent: Folder3 
Owner: User2 
Parent: Folder2 
Owner: User1 
Parent: 
Folder1 
Folder4
19 
User2 shares Folder2 with User1 
File 1 File 2 
Owner: User1 
Parent: Folder1 
File 3 File 4 
Owner: User2 
Parent: Folder3 
Owner: User2 
Parent: Folder2 
Owner: User1 
Parent: 
Folder1 
Folder4
20 
User2 shares Folder2 with User1 
File 1 File 2 
Owner: User1 
Parent: Folder1 
File 3 File 4 
Owner: User2 
Parent: Folder3 
Owner: User2 
Parent: Folder5 
Owner: User1 
Parent: 
Folder1 
Folder4 
Removed 
out of 
Folder2
21 
User2 shares Folder2 with User1 
File 1 File 2 
Owner: User1 
Parent: Folder1 
File 3 File 4 
Owner: User2 
Parent: Folder3 
Owner: User2 
Parent: Folder5 
Owner: User1 
Parent: 
Folder1 
Folder4 
Removed 
out of 
Folder2
22 
Highly Available Search
23 
• Index is highly available 
• Search functionality is highly available
24 
Index workflow
25 
Box 
Front 
End 
Upload 
Index Queue 
Queue 1 
Queue 2 
Queue 3 
Indexer 1 
Indexer 2 
Indexer 3 
MySQL 
Index1 
Index2 
Index2
26 
Search workflow
27 
query HA 
Box 
Front 
End 
Proxy 
Head 
node 
1 2 3 N 
HA Proxy 
Box 
Front 
End 
query 
HA 
Proxy 
Data center boundary 
Head 
node 
HA Proxy 
1 2 3 N
28 
A few more things
29 
File Content Search
30 
Box 
Front 
End 
Upload 
MySQL Box File 
Storage 
Indexer 
Solr 
Index 
Text Extraction 
Extracted 
Text
31 
Multi-language support
32 
Raw file 
content 
Language 
detector 
English tokenizer 
Spanish tokenizer 
Japanese tokenizer 
German tokenizer 
file_content_en 
File_content_es 
{hola} 
file_content_ja 
. 
. 
. 
. 
File_content_de
33 
To Dos 
• Scale language support 
• Support document with mixed languages
34 
Search Warm-up
35 
• Front end informs backend to warm up on keyboard focus 
• Backend prepares the search filter and caches it in a search session 
• Backend sends a warm-up query to Solr
36 
What we are working on
37 
• Search suggestions 
• Search operators 
• Use machine learning to influence ranking 
• Logical sharding 
Things we are working on
38 
Question?
39 
We are hiring! 
Contact: wzhao@box.com

Contenu connexe

Similaire à Box + Solr = Content Search for Business

Powerpoint versiebeheer there is no such thing as a final version 1
Powerpoint versiebeheer there is no such thing as a final version 1Powerpoint versiebeheer there is no such thing as a final version 1
Powerpoint versiebeheer there is no such thing as a final version 1
Hugo Besemer
 
Realtimestream and realtime fastcatsearch
Realtimestream and realtime fastcatsearchRealtimestream and realtime fastcatsearch
Realtimestream and realtime fastcatsearch
상욱 송
 

Similaire à Box + Solr = Content Search for Business (20)

Extracts from AS/400 Concepts & Tools workshop
Extracts from AS/400 Concepts & Tools workshopExtracts from AS/400 Concepts & Tools workshop
Extracts from AS/400 Concepts & Tools workshop
 
Continuous Applications at Scale of 100 Teams with Databricks Delta and Struc...
Continuous Applications at Scale of 100 Teams with Databricks Delta and Struc...Continuous Applications at Scale of 100 Teams with Databricks Delta and Struc...
Continuous Applications at Scale of 100 Teams with Databricks Delta and Struc...
 
[IC Manage] Workspace Acceleration & Network Storage Reduction
[IC Manage] Workspace Acceleration & Network Storage Reduction[IC Manage] Workspace Acceleration & Network Storage Reduction
[IC Manage] Workspace Acceleration & Network Storage Reduction
 
How to write memory efficient code?
How to write memory efficient code?How to write memory efficient code?
How to write memory efficient code?
 
A Practical Introduction to Apache Solr
A Practical Introduction to Apache SolrA Practical Introduction to Apache Solr
A Practical Introduction to Apache Solr
 
INT 1010 04-5.pdf
INT 1010 04-5.pdfINT 1010 04-5.pdf
INT 1010 04-5.pdf
 
File handling.pptx
File handling.pptxFile handling.pptx
File handling.pptx
 
Dancing faster in the datasphere
Dancing faster in the datasphereDancing faster in the datasphere
Dancing faster in the datasphere
 
Report blocking ,management of files in secondry memory , static vs dynamic a...
Report blocking ,management of files in secondry memory , static vs dynamic a...Report blocking ,management of files in secondry memory , static vs dynamic a...
Report blocking ,management of files in secondry memory , static vs dynamic a...
 
Powerpoint versiebeheer there is no such thing as a final version 1
Powerpoint versiebeheer there is no such thing as a final version 1Powerpoint versiebeheer there is no such thing as a final version 1
Powerpoint versiebeheer there is no such thing as a final version 1
 
Feeding the Elephant: Approaching 1PB/Day
Feeding the Elephant: Approaching 1PB/DayFeeding the Elephant: Approaching 1PB/Day
Feeding the Elephant: Approaching 1PB/Day
 
Introduction to Plone (November 2003)
Introduction to Plone (November 2003)Introduction to Plone (November 2003)
Introduction to Plone (November 2003)
 
What the git? - SAP Inside Track Munich 2016
What the git?  - SAP Inside Track Munich 2016What the git?  - SAP Inside Track Munich 2016
What the git? - SAP Inside Track Munich 2016
 
Building a digital repository for the NAi
Building a digital repository for the NAiBuilding a digital repository for the NAi
Building a digital repository for the NAi
 
[India Merge World Tour] IC Manage
[India Merge World Tour] IC Manage[India Merge World Tour] IC Manage
[India Merge World Tour] IC Manage
 
Big Search 4 Big Data War Stories
Big Search 4 Big Data War StoriesBig Search 4 Big Data War Stories
Big Search 4 Big Data War Stories
 
LOD2 Webinar Series: Zemanta / Open refine
LOD2 Webinar Series: Zemanta / Open refine LOD2 Webinar Series: Zemanta / Open refine
LOD2 Webinar Series: Zemanta / Open refine
 
Introduction to HDF5 Data Model, Programming Model and Library APIs
Introduction to HDF5 Data Model, Programming Model and Library APIsIntroduction to HDF5 Data Model, Programming Model and Library APIs
Introduction to HDF5 Data Model, Programming Model and Library APIs
 
Realtimestream and realtime fastcatsearch
Realtimestream and realtime fastcatsearchRealtimestream and realtime fastcatsearch
Realtimestream and realtime fastcatsearch
 
Do Something Now: Why Perfect is the Enemy of Good (Enough) in Digital Preser...
Do Something Now: Why Perfect is the Enemy of Good (Enough) in Digital Preser...Do Something Now: Why Perfect is the Enemy of Good (Enough) in Digital Preser...
Do Something Now: Why Perfect is the Enemy of Good (Enough) in Digital Preser...
 

Plus de Lucidworks

Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Lucidworks
 

Plus de Lucidworks (20)

Search is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce StrategySearch is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce Strategy
 
Drive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in SalesforceDrive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in Salesforce
 
How Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant ProductsHow Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant Products
 
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product DiscoveryLucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
 
Connected Experiences Are Personalized Experiences
Connected Experiences Are Personalized ExperiencesConnected Experiences Are Personalized Experiences
Connected Experiences Are Personalized Experiences
 
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
 
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
 
Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020
 
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
 
AI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and RosetteAI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and Rosette
 
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual MomentThe Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
 
Webinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - EuropeWebinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - Europe
 
Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19
 
Applying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 ResearchApplying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 Research
 
Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1
 
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce StrategyWebinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
 
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
 
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision IntelligenceApply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
 
Webinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise SearchWebinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise Search
 
Why Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and BeyondWhy Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and Beyond
 

Dernier

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 

Dernier (20)

FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 

Box + Solr = Content Search for Business

  • 1. 1 Box + Solr = Content Search for Business June 2014
  • 2. 2 Wei Zhao Box backend engineer wzhao@box.com
  • 3. 3 Box mission to make organizations more productive, competitive and collaborative by connecting people and their most important information
  • 4. 4 25MM+ Users 225K+ Businesses 99% Fortune 500
  • 5. 5 Box search mission is to make user content easy to discover.
  • 6. 6 10Billion+ Documents 10TB+ Index size 100M+ Daily requests Box uses Solr for search
  • 10. 10 Sharding – splitting the index Agenda Highly available search A few more things 1 2 3 4 5 Q&A Currently working on
  • 11. 11 We shard things
  • 12. 12 Shard ID = File ID % Total Shards
  • 13. 13 Multi-tenant – One big logical index for all users Solr index Shard1 Shard2 Shard3 ShardN
  • 15. 15 A typical Solr Document File ID: 12345 OwnerID: user1 Parent Folders IDs: folder1, folder2 File Name: Solr.ppt File Content: blah ......
  • 16. 16 File 1 File 2 Owner: User1 Parent: Folder1 File 3 File 4 Owner: User2 Parent: Folder3 Owner: User2 Parent: Folder2 Owner: User1 Parent: Folder1 Folder4
  • 17. 17 User1 with no share folder File 1 File 2 Owner: User1 Parent: Folder1 File 3 File 4 Owner: User2 Parent: Folder3 Owner: User2 Parent: Folder2 Owner: User1 Parent: Folder1 Folder4
  • 18. 18 User2 shares Folder2 with User1 File 1 File 2 Owner: User1 Parent: Folder1 File 3 File 4 Owner: User2 Parent: Folder3 Owner: User2 Parent: Folder2 Owner: User1 Parent: Folder1 Folder4
  • 19. 19 User2 shares Folder2 with User1 File 1 File 2 Owner: User1 Parent: Folder1 File 3 File 4 Owner: User2 Parent: Folder3 Owner: User2 Parent: Folder2 Owner: User1 Parent: Folder1 Folder4
  • 20. 20 User2 shares Folder2 with User1 File 1 File 2 Owner: User1 Parent: Folder1 File 3 File 4 Owner: User2 Parent: Folder3 Owner: User2 Parent: Folder5 Owner: User1 Parent: Folder1 Folder4 Removed out of Folder2
  • 21. 21 User2 shares Folder2 with User1 File 1 File 2 Owner: User1 Parent: Folder1 File 3 File 4 Owner: User2 Parent: Folder3 Owner: User2 Parent: Folder5 Owner: User1 Parent: Folder1 Folder4 Removed out of Folder2
  • 23. 23 • Index is highly available • Search functionality is highly available
  • 25. 25 Box Front End Upload Index Queue Queue 1 Queue 2 Queue 3 Indexer 1 Indexer 2 Indexer 3 MySQL Index1 Index2 Index2
  • 27. 27 query HA Box Front End Proxy Head node 1 2 3 N HA Proxy Box Front End query HA Proxy Data center boundary Head node HA Proxy 1 2 3 N
  • 28. 28 A few more things
  • 29. 29 File Content Search
  • 30. 30 Box Front End Upload MySQL Box File Storage Indexer Solr Index Text Extraction Extracted Text
  • 32. 32 Raw file content Language detector English tokenizer Spanish tokenizer Japanese tokenizer German tokenizer file_content_en File_content_es {hola} file_content_ja . . . . File_content_de
  • 33. 33 To Dos • Scale language support • Support document with mixed languages
  • 35. 35 • Front end informs backend to warm up on keyboard focus • Backend prepares the search filter and caches it in a search session • Backend sends a warm-up query to Solr
  • 36. 36 What we are working on
  • 37. 37 • Search suggestions • Search operators • Use machine learning to influence ranking • Logical sharding Things we are working on
  • 39. 39 We are hiring! Contact: wzhao@box.com

Notes de l'éditeur

  1. This has changed many of the access patterns to data in the enterprise.
  2. This has changed many of the access patterns to data in the enterprise.
  3. This has changed many of the access patterns to data in the enterprise.