SlideShare une entreprise Scribd logo
1  sur  10
Indexing with solr search server and
hadoop framework.
indexing
• indexing collects, parses, and stores data to facilitate fast and
accurate information retrieval.
• The purpose of storing an index is to optimize speed and performance in
finding documents.
• Without an index, the search engine would scan every document.
• The additional computer storage required to store the index, as well as the
considerable increase in the time required for an update to take place, are
traded off for the time saved during information retrieval.
Why hadoop + solr ?
• Data set outgrows the storage capacity of a single physical machine.
• Distributed filesystems more complex than regular disk filesystems.
• Biggest challenges is making the filesystem tolerate node failure without
suffering data loss.
• Hadoop comes with a distributed filesystem called HDFS.
• HDFS is built around the idea that the most efficient data processing
pattern is a write-once, read-many-times pattern.
• Hadoop doesn’t require expensive, highly reliable hardware to run on.
Continue…
• A program written in other frameworks may require large amounts of
refactoring when scaling from ten to one hundred or one thousand
machines.
• This may involve having the program be rewritten several times
• Hadoop is specifically designed to have a very flat scalability curve.
• In Hadoop very little--if any--work is required for that same program to
run on a much larger amount of hardware.
• Hadoop platform will manage the data and hardware resources and
provide dependable performance growth proportionate to the number of
machines available.
Continue…
• Highly fault-tolerant
• Suitable for applications with large data sets
• A HTTP browser can be used to browse the files of a HDFS instance.
• Detection of faults and quick, automatic recovery from them is a core
architectural goal of HDFS.
Solr
• Advanced Full-Text Search Capabilities
• Optimized for High Volume Web Traffic
• Standards Based Open Interfaces - XML, JSON and HTTP
• Comprehensive HTML Administration Interfaces
• Linearly scalable, auto index replication, auto failover and recovery
• Near Real-time indexing
• Flexible and Adaptable with XML configuration
• Extensible Plugin Architecture
Solr cloud
• New in Solr 4.0
• Easier scaling
• Centralized config
• Fault tolerant indexing and querying
• Using Apache ZooKeeper as registry
slave
slave
slave
Solr server
Solr server
Solr server
master ZooKee
per
Solr cloud
Technology and Platform
Technology: Hadoop, Solr
Front End: Solr
Back End: Hadoop Framework, solr search
server
Thank you

Contenu connexe

Tendances

Hadoop: The elephant in the room
Hadoop: The elephant in the roomHadoop: The elephant in the room
Hadoop: The elephant in the roomcacois
 
MongoDB Capacity Planning
MongoDB Capacity PlanningMongoDB Capacity Planning
MongoDB Capacity PlanningNorberto Leite
 
Messaging architecture @FB (Fifth Elephant Conference)
Messaging architecture @FB (Fifth Elephant Conference)Messaging architecture @FB (Fifth Elephant Conference)
Messaging architecture @FB (Fifth Elephant Conference)Joydeep Sen Sarma
 
Interactive ad-hoc analysis at petabyte scale with HDInsight Interactive Query
Interactive ad-hoc analysis at petabyte scale with HDInsight Interactive QueryInteractive ad-hoc analysis at petabyte scale with HDInsight Interactive Query
Interactive ad-hoc analysis at petabyte scale with HDInsight Interactive QueryAshish Thapliyal
 
Building Big data solutions in Azure
Building Big data solutions in AzureBuilding Big data solutions in Azure
Building Big data solutions in AzureMostafa
 
The Meta of Hadoop - COMAD 2012
The Meta of Hadoop - COMAD 2012The Meta of Hadoop - COMAD 2012
The Meta of Hadoop - COMAD 2012Joydeep Sen Sarma
 
Scalable and High available Distributed File System Metadata Service Using gR...
Scalable and High available Distributed File System Metadata Service Using gR...Scalable and High available Distributed File System Metadata Service Using gR...
Scalable and High available Distributed File System Metadata Service Using gR...Alluxio, Inc.
 
Big data solutions in azure
Big data solutions in azureBig data solutions in azure
Big data solutions in azureMostafa
 
Basic Hadoop Architecture V1 vs V2
Basic  Hadoop Architecture  V1 vs V2Basic  Hadoop Architecture  V1 vs V2
Basic Hadoop Architecture V1 vs V2VIVEKVANAVAN
 
ImpalaToGo design explained
ImpalaToGo design explainedImpalaToGo design explained
ImpalaToGo design explainedDavid Groozman
 
Introduction to apache hadoop copy
Introduction to apache hadoop   copyIntroduction to apache hadoop   copy
Introduction to apache hadoop copyMohammad_Tariq
 
SFBay Area Solr Meetup - July 15th: Integrating Hadoop and Solr
 SFBay Area Solr Meetup - July 15th: Integrating Hadoop and Solr SFBay Area Solr Meetup - July 15th: Integrating Hadoop and Solr
SFBay Area Solr Meetup - July 15th: Integrating Hadoop and SolrLucidworks (Archived)
 
Optimizing Latency-Sensitive Queries for Presto at Facebook: A Collaboration ...
Optimizing Latency-Sensitive Queries for Presto at Facebook: A Collaboration ...Optimizing Latency-Sensitive Queries for Presto at Facebook: A Collaboration ...
Optimizing Latency-Sensitive Queries for Presto at Facebook: A Collaboration ...Alluxio, Inc.
 

Tendances (20)

Hadoop: The elephant in the room
Hadoop: The elephant in the roomHadoop: The elephant in the room
Hadoop: The elephant in the room
 
Hadoop
HadoopHadoop
Hadoop
 
MongoDB Capacity Planning
MongoDB Capacity PlanningMongoDB Capacity Planning
MongoDB Capacity Planning
 
What database
What databaseWhat database
What database
 
Cloud Optimized Big Data
Cloud Optimized Big DataCloud Optimized Big Data
Cloud Optimized Big Data
 
Cosmos db
Cosmos dbCosmos db
Cosmos db
 
Messaging architecture @FB (Fifth Elephant Conference)
Messaging architecture @FB (Fifth Elephant Conference)Messaging architecture @FB (Fifth Elephant Conference)
Messaging architecture @FB (Fifth Elephant Conference)
 
Interactive ad-hoc analysis at petabyte scale with HDInsight Interactive Query
Interactive ad-hoc analysis at petabyte scale with HDInsight Interactive QueryInteractive ad-hoc analysis at petabyte scale with HDInsight Interactive Query
Interactive ad-hoc analysis at petabyte scale with HDInsight Interactive Query
 
Building Big data solutions in Azure
Building Big data solutions in AzureBuilding Big data solutions in Azure
Building Big data solutions in Azure
 
Concepts on Hadoop
Concepts on HadoopConcepts on Hadoop
Concepts on Hadoop
 
The Meta of Hadoop - COMAD 2012
The Meta of Hadoop - COMAD 2012The Meta of Hadoop - COMAD 2012
The Meta of Hadoop - COMAD 2012
 
Scalable and High available Distributed File System Metadata Service Using gR...
Scalable and High available Distributed File System Metadata Service Using gR...Scalable and High available Distributed File System Metadata Service Using gR...
Scalable and High available Distributed File System Metadata Service Using gR...
 
Integrating Hadoop & Solr
Integrating Hadoop & SolrIntegrating Hadoop & Solr
Integrating Hadoop & Solr
 
Big data solutions in azure
Big data solutions in azureBig data solutions in azure
Big data solutions in azure
 
Basic Hadoop Architecture V1 vs V2
Basic  Hadoop Architecture  V1 vs V2Basic  Hadoop Architecture  V1 vs V2
Basic Hadoop Architecture V1 vs V2
 
ImpalaToGo design explained
ImpalaToGo design explainedImpalaToGo design explained
ImpalaToGo design explained
 
Introduction to apache hadoop copy
Introduction to apache hadoop   copyIntroduction to apache hadoop   copy
Introduction to apache hadoop copy
 
SFBay Area Solr Meetup - July 15th: Integrating Hadoop and Solr
 SFBay Area Solr Meetup - July 15th: Integrating Hadoop and Solr SFBay Area Solr Meetup - July 15th: Integrating Hadoop and Solr
SFBay Area Solr Meetup - July 15th: Integrating Hadoop and Solr
 
Intro to Apache Spark
Intro to Apache SparkIntro to Apache Spark
Intro to Apache Spark
 
Optimizing Latency-Sensitive Queries for Presto at Facebook: A Collaboration ...
Optimizing Latency-Sensitive Queries for Presto at Facebook: A Collaboration ...Optimizing Latency-Sensitive Queries for Presto at Facebook: A Collaboration ...
Optimizing Latency-Sensitive Queries for Presto at Facebook: A Collaboration ...
 

Similaire à Indexing with solr search server and hadoop framework

Integrating Hadoop & Solr
Integrating Hadoop & SolrIntegrating Hadoop & Solr
Integrating Hadoop & SolrLucidworks
 
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsBig Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsEsther Kundin
 
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsBig Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsEsther Kundin
 
Introduction To Hadoop Ecosystem
Introduction To Hadoop EcosystemIntroduction To Hadoop Ecosystem
Introduction To Hadoop EcosystemInSemble
 
Hadoop storage
Hadoop storageHadoop storage
Hadoop storageSanSan149
 
Technologies for Data Analytics Platform
Technologies for Data Analytics PlatformTechnologies for Data Analytics Platform
Technologies for Data Analytics PlatformN Masahiro
 
Foxvalley bigdata
Foxvalley bigdataFoxvalley bigdata
Foxvalley bigdataTom Rogers
 
Introduction to HDFS and MapReduce
Introduction to HDFS and MapReduceIntroduction to HDFS and MapReduce
Introduction to HDFS and MapReduceDerek Chen
 
Big Data Architecture Workshop - Vahid Amiri
Big Data Architecture Workshop -  Vahid AmiriBig Data Architecture Workshop -  Vahid Amiri
Big Data Architecture Workshop - Vahid Amiridatastack
 
Hadoop - HDFS
Hadoop - HDFSHadoop - HDFS
Hadoop - HDFSKavyaGo
 
Introduction to Kudu: Hadoop Storage for Fast Analytics on Fast Data - Rüdige...
Introduction to Kudu: Hadoop Storage for Fast Analytics on Fast Data - Rüdige...Introduction to Kudu: Hadoop Storage for Fast Analytics on Fast Data - Rüdige...
Introduction to Kudu: Hadoop Storage for Fast Analytics on Fast Data - Rüdige...Dataconomy Media
 
Big data - Online Training
Big data - Online TrainingBig data - Online Training
Big data - Online TrainingLearntek1
 
Topic 9a-Hadoop Storage- HDFS.pptx
Topic 9a-Hadoop Storage- HDFS.pptxTopic 9a-Hadoop Storage- HDFS.pptx
Topic 9a-Hadoop Storage- HDFS.pptxDanishMahmood23
 
Apache hadoop basics
Apache hadoop basicsApache hadoop basics
Apache hadoop basicssaili mane
 

Similaire à Indexing with solr search server and hadoop framework (20)

Hadoop jon
Hadoop jonHadoop jon
Hadoop jon
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Integrating Hadoop & Solr
Integrating Hadoop & SolrIntegrating Hadoop & Solr
Integrating Hadoop & Solr
 
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsBig Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
 
Hadoop ppt1
Hadoop ppt1Hadoop ppt1
Hadoop ppt1
 
Big data Hadoop
Big data  Hadoop   Big data  Hadoop
Big data Hadoop
 
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsBig Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
 
Introduction To Hadoop Ecosystem
Introduction To Hadoop EcosystemIntroduction To Hadoop Ecosystem
Introduction To Hadoop Ecosystem
 
Cloudera Hadoop Distribution
Cloudera Hadoop DistributionCloudera Hadoop Distribution
Cloudera Hadoop Distribution
 
Hadoop storage
Hadoop storageHadoop storage
Hadoop storage
 
Technologies for Data Analytics Platform
Technologies for Data Analytics PlatformTechnologies for Data Analytics Platform
Technologies for Data Analytics Platform
 
Search On Hadoop
Search On HadoopSearch On Hadoop
Search On Hadoop
 
Foxvalley bigdata
Foxvalley bigdataFoxvalley bigdata
Foxvalley bigdata
 
Introduction to HDFS and MapReduce
Introduction to HDFS and MapReduceIntroduction to HDFS and MapReduce
Introduction to HDFS and MapReduce
 
Big Data Architecture Workshop - Vahid Amiri
Big Data Architecture Workshop -  Vahid AmiriBig Data Architecture Workshop -  Vahid Amiri
Big Data Architecture Workshop - Vahid Amiri
 
Hadoop - HDFS
Hadoop - HDFSHadoop - HDFS
Hadoop - HDFS
 
Introduction to Kudu: Hadoop Storage for Fast Analytics on Fast Data - Rüdige...
Introduction to Kudu: Hadoop Storage for Fast Analytics on Fast Data - Rüdige...Introduction to Kudu: Hadoop Storage for Fast Analytics on Fast Data - Rüdige...
Introduction to Kudu: Hadoop Storage for Fast Analytics on Fast Data - Rüdige...
 
Big data - Online Training
Big data - Online TrainingBig data - Online Training
Big data - Online Training
 
Topic 9a-Hadoop Storage- HDFS.pptx
Topic 9a-Hadoop Storage- HDFS.pptxTopic 9a-Hadoop Storage- HDFS.pptx
Topic 9a-Hadoop Storage- HDFS.pptx
 
Apache hadoop basics
Apache hadoop basicsApache hadoop basics
Apache hadoop basics
 

Dernier

Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 

Dernier (20)

Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 

Indexing with solr search server and hadoop framework

  • 1. Indexing with solr search server and hadoop framework.
  • 2. indexing • indexing collects, parses, and stores data to facilitate fast and accurate information retrieval. • The purpose of storing an index is to optimize speed and performance in finding documents. • Without an index, the search engine would scan every document. • The additional computer storage required to store the index, as well as the considerable increase in the time required for an update to take place, are traded off for the time saved during information retrieval.
  • 3. Why hadoop + solr ? • Data set outgrows the storage capacity of a single physical machine. • Distributed filesystems more complex than regular disk filesystems. • Biggest challenges is making the filesystem tolerate node failure without suffering data loss. • Hadoop comes with a distributed filesystem called HDFS. • HDFS is built around the idea that the most efficient data processing pattern is a write-once, read-many-times pattern. • Hadoop doesn’t require expensive, highly reliable hardware to run on.
  • 4. Continue… • A program written in other frameworks may require large amounts of refactoring when scaling from ten to one hundred or one thousand machines. • This may involve having the program be rewritten several times • Hadoop is specifically designed to have a very flat scalability curve. • In Hadoop very little--if any--work is required for that same program to run on a much larger amount of hardware. • Hadoop platform will manage the data and hardware resources and provide dependable performance growth proportionate to the number of machines available.
  • 5. Continue… • Highly fault-tolerant • Suitable for applications with large data sets • A HTTP browser can be used to browse the files of a HDFS instance. • Detection of faults and quick, automatic recovery from them is a core architectural goal of HDFS.
  • 6. Solr • Advanced Full-Text Search Capabilities • Optimized for High Volume Web Traffic • Standards Based Open Interfaces - XML, JSON and HTTP • Comprehensive HTML Administration Interfaces • Linearly scalable, auto index replication, auto failover and recovery • Near Real-time indexing • Flexible and Adaptable with XML configuration • Extensible Plugin Architecture
  • 7. Solr cloud • New in Solr 4.0 • Easier scaling • Centralized config • Fault tolerant indexing and querying • Using Apache ZooKeeper as registry
  • 8. slave slave slave Solr server Solr server Solr server master ZooKee per Solr cloud
  • 9. Technology and Platform Technology: Hadoop, Solr Front End: Solr Back End: Hadoop Framework, solr search server