Indexing with solr search server and hadoop framework

•Télécharger en tant que PPTX, PDF•

1 j'aime•821 vues

keval dalasaniya

Why to combine Hadoop and solr,two cutting edge open source technologies.

Technologie

Indexing with solr search server and
hadoop framework.

indexing
• indexing collects, parses, and stores data to facilitate fast and
accurate information retrieval.
• The purpose of storing an index is to optimize speed and performance in
finding documents.
• Without an index, the search engine would scan every document.
• The additional computer storage required to store the index, as well as the
considerable increase in the time required for an update to take place, are
traded off for the time saved during information retrieval.

Why hadoop + solr ?
• Data set outgrows the storage capacity of a single physical machine.
• Distributed filesystems more complex than regular disk filesystems.
• Biggest challenges is making the filesystem tolerate node failure without
suffering data loss.
• Hadoop comes with a distributed filesystem called HDFS.
• HDFS is built around the idea that the most efficient data processing
pattern is a write-once, read-many-times pattern.
• Hadoop doesn’t require expensive, highly reliable hardware to run on.

Continue…
• A program written in other frameworks may require large amounts of
refactoring when scaling from ten to one hundred or one thousand
machines.
• This may involve having the program be rewritten several times
• Hadoop is specifically designed to have a very flat scalability curve.
• In Hadoop very little--if any--work is required for that same program to
run on a much larger amount of hardware.
• Hadoop platform will manage the data and hardware resources and
provide dependable performance growth proportionate to the number of
machines available.

Continue…
• Highly fault-tolerant
• Suitable for applications with large data sets
• A HTTP browser can be used to browse the files of a HDFS instance.
• Detection of faults and quick, automatic recovery from them is a core
architectural goal of HDFS.

Solr
• Advanced Full-Text Search Capabilities
• Optimized for High Volume Web Traffic
• Standards Based Open Interfaces - XML, JSON and HTTP
• Comprehensive HTML Administration Interfaces
• Linearly scalable, auto index replication, auto failover and recovery
• Near Real-time indexing
• Flexible and Adaptable with XML configuration
• Extensible Plugin Architecture

Solr cloud
• New in Solr 4.0
• Easier scaling
• Centralized config
• Fault tolerant indexing and querying
• Using Apache ZooKeeper as registry

slave
slave
slave
Solr server
Solr server
Solr server
master ZooKee
per
Solr cloud

Technology and Platform
Technology: Hadoop, Solr
Front End: Solr
Back End: Hadoop Framework, solr search
server

Contenu connexe

Tendances

Hadoop: The elephant in the roomcacois

HadoopKasam Sharif

MongoDB Capacity PlanningNorberto Leite

What databaseRegunath B

Cloud Optimized Big DataJoydeep Sen Sarma

Cosmos dbAkshat Thakar

Messaging architecture @FB (Fifth Elephant Conference)Joydeep Sen Sarma

Interactive ad-hoc analysis at petabyte scale with HDInsight Interactive QueryAshish Thapliyal

Building Big data solutions in AzureMostafa

Concepts on HadoopChristopher Sharkey

The Meta of Hadoop - COMAD 2012Joydeep Sen Sarma

Scalable and High available Distributed File System Metadata Service Using gR...Alluxio, Inc.

Integrating Hadoop & SolrLucidworks (Archived)

Big data solutions in azureMostafa

Basic Hadoop Architecture V1 vs V2VIVEKVANAVAN

ImpalaToGo design explainedDavid Groozman

Introduction to apache hadoop copyMohammad_Tariq

SFBay Area Solr Meetup - July 15th: Integrating Hadoop and SolrLucidworks (Archived)

Intro to Apache SparkMarius Soutier

Optimizing Latency-Sensitive Queries for Presto at Facebook: A Collaboration ...Alluxio, Inc.

Tendances (20)

Hadoop: The elephant in the room

Hadoop

MongoDB Capacity Planning

What database

Cloud Optimized Big Data

Cosmos db

Messaging architecture @FB (Fifth Elephant Conference)

Interactive ad-hoc analysis at petabyte scale with HDInsight Interactive Query

Building Big data solutions in Azure

Concepts on Hadoop

The Meta of Hadoop - COMAD 2012

Scalable and High available Distributed File System Metadata Service Using gR...

Integrating Hadoop & Solr

Big data solutions in azure

Basic Hadoop Architecture V1 vs V2

ImpalaToGo design explained

Introduction to apache hadoop copy

SFBay Area Solr Meetup - July 15th: Integrating Hadoop and Solr

Intro to Apache Spark

Optimizing Latency-Sensitive Queries for Presto at Facebook: A Collaboration ...

Similaire à Indexing with solr search server and hadoop framework

Hadoop jonHumoyun Ahmedov

Introduction to HadoopDr. C.V. Suresh Babu

Integrating Hadoop & SolrLucidworks

Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsEsther Kundin

Hadoop ppt1chariorienit

Big data Hadoop Ayyappan Paramesh

Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsEsther Kundin

Introduction To Hadoop EcosystemInSemble

Cloudera Hadoop DistributionThisara Pramuditha

Hadoop storageSanSan149

Technologies for Data Analytics PlatformN Masahiro

Search On Hadoopbigdatagurus_meetup

Foxvalley bigdataTom Rogers

Introduction to HDFS and MapReduceDerek Chen

Big Data Architecture Workshop - Vahid Amiridatastack

Hadoop - HDFSKavyaGo

Introduction to Kudu: Hadoop Storage for Fast Analytics on Fast Data - Rüdige...Dataconomy Media

Big data - Online TrainingLearntek1

Topic 9a-Hadoop Storage- HDFS.pptxDanishMahmood23

Apache hadoop basicssaili mane

Similaire à Indexing with solr search server and hadoop framework (20)

Hadoop jon

Introduction to Hadoop

Integrating Hadoop & Solr

Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends

Hadoop ppt1

Big data Hadoop

Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends

Introduction To Hadoop Ecosystem

Cloudera Hadoop Distribution

Hadoop storage

Technologies for Data Analytics Platform

Search On Hadoop

Foxvalley bigdata

Introduction to HDFS and MapReduce

Big Data Architecture Workshop - Vahid Amiri

Hadoop - HDFS

Introduction to Kudu: Hadoop Storage for Fast Analytics on Fast Data - Rüdige...

Big data - Online Training

Topic 9a-Hadoop Storage- HDFS.pptx

Apache hadoop basics

Dernier

Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10

From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software

The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los

TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc

[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745

Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal

2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong

Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge

Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2

HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics

Developing An App To Navigate The Roads of BrazilV3cube

Scaling API-first – The story of a global engineering organizationRadu Cotescu

🐬 The future of MySQL is Postgres 🐘RTylerCroy

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j

AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo

04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG

Apidays New York 2024 - The value of a flexible API Management solution for O...apidays

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung

GenCyber Cyber Security Day PresentationMichael W. Hawkins

Dernier (20)

Connector Corner: Accelerate revenue generation using UiPath API-centric busi...

From Event to Action: Accelerate Your Decision Making with Real-Time Automation

The 7 Things I Know About Cyber Security After 25 Years | April 2024

TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery

[2024]Digital Global Overview Report 2024 Meltwater.pdf

Understanding Discord NSFW Servers A Guide for Responsible Users.pdf

2024: Domino Containers - The Next Step. News from the Domino Container commu...

Driving Behavioral Change for Information Management through Data-Driven Gree...

Exploring the Future Potential of AI-Enabled Smartphone Processors

HTML Injection Attacks: Impact and Mitigation Strategies

Developing An App To Navigate The Roads of Brazil

Scaling API-first – The story of a global engineering organization

🐬 The future of MySQL is Postgres 🐘

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...

AWS Community Day CPH - Three problems of Terraform

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx

Apidays New York 2024 - The value of a flexible API Management solution for O...

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...

GenCyber Cyber Security Day Presentation

Indexing with solr search server and hadoop framework

1. Indexing with solr search server and hadoop framework.

2. indexing • indexing collects, parses, and stores data to facilitate fast and accurate information retrieval. • The purpose of storing an index is to optimize speed and performance in finding documents. • Without an index, the search engine would scan every document. • The additional computer storage required to store the index, as well as the considerable increase in the time required for an update to take place, are traded off for the time saved during information retrieval.

3. Why hadoop + solr ? • Data set outgrows the storage capacity of a single physical machine. • Distributed filesystems more complex than regular disk filesystems. • Biggest challenges is making the filesystem tolerate node failure without suffering data loss. • Hadoop comes with a distributed filesystem called HDFS. • HDFS is built around the idea that the most efficient data processing pattern is a write-once, read-many-times pattern. • Hadoop doesn’t require expensive, highly reliable hardware to run on.

4. Continue… • A program written in other frameworks may require large amounts of refactoring when scaling from ten to one hundred or one thousand machines. • This may involve having the program be rewritten several times • Hadoop is specifically designed to have a very flat scalability curve. • In Hadoop very little--if any--work is required for that same program to run on a much larger amount of hardware. • Hadoop platform will manage the data and hardware resources and provide dependable performance growth proportionate to the number of machines available.

5. Continue… • Highly fault-tolerant • Suitable for applications with large data sets • A HTTP browser can be used to browse the files of a HDFS instance. • Detection of faults and quick, automatic recovery from them is a core architectural goal of HDFS.

6. Solr • Advanced Full-Text Search Capabilities • Optimized for High Volume Web Traffic • Standards Based Open Interfaces - XML, JSON and HTTP • Comprehensive HTML Administration Interfaces • Linearly scalable, auto index replication, auto failover and recovery • Near Real-time indexing • Flexible and Adaptable with XML configuration • Extensible Plugin Architecture

7. Solr cloud • New in Solr 4.0 • Easier scaling • Centralized config • Fault tolerant indexing and querying • Using Apache ZooKeeper as registry

8. slave slave slave Solr server Solr server Solr server master ZooKee per Solr cloud

9. Technology and Platform Technology: Hadoop, Solr Front End: Solr Back End: Hadoop Framework, solr search server

10. Thank you

Indexing with solr search server and hadoop framework

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à Indexing with solr search server and hadoop framework

Similaire à Indexing with solr search server and hadoop framework (20)

Dernier

Dernier (20)

Indexing with solr search server and hadoop framework