SlideShare une entreprise Scribd logo
1  sur  9
Télécharger pour lire hors ligne
The Power of Elasticsearch
                                    What is Search?
                                    Search is a feature that can make or break an application or
                                    service. The ability to do full-text search across a body of text,
A horizontally-                     to narrow search queries by values or ranges of specific fields,
                                    to use advanced features like faceted search, geo queries, or
scalable, distributed               similarity searching can add a wealth of functionality and act as a
database built on                   differentiator for your product. When search fails, either because
                                    of inflexible query parameters, irrelevant results, or an inability to
Apache’s Lucene                     scale to meet demands of volume or usage, users notice immedi-
                                    ately -- and they will be upset.
that delivers a full-
featured search                     What is Elasticsearch?
experience across
                                    Elasticsearch is a new database built to handle huge amounts
terabytes of data                   of data volume with very high availability and to distribute itself
                                    across many machines to be fault-tolerant and scalable, all the
with a simple yet                   while maintaining a simple but powerful API that allows applica-
powerful API.                       tions from any language or framework access to the database.




© 2012 Infochimps, Inc. All rights reserved.                                                          1
Built on Lucene
Apache’s Lucene is an open-source Java library for text search. The Lucene project has been
growing for more than a decade and has now become the standard reference for how to build a
powerful yet easy to integrate, open-source search library. Its feature set includes but is not limited
to:


  Performance/Scalability         Search Features                       Accessibility

  Can index over 95 GB/hr/       Ranked/relevance searching with       Completely open-source
  node 		                        fine-grained control

  Low RAM overhead (1MB          Allows querying on phrases,           Implemented in Java so
  heap)                          wildcards, geographical               inherently cross-platform
                                 proximity, variable ranges, &c.

Wrapping Lucene for Big Data


Lucene, as a search library, must be wrapped with an interface to allow its features to be used by an
application. Many such interfaces have been built for different platforms and use cases. One of the
most popular is Apache’s own SOLR project, which creates an interface around Lucene tailored for
something like a traditional web application.


An interface like SOLR, however, is designed for a world in which a single server can handle the full
workload of indexing and querying the data. When the data volume begins to increase past this limit,
SOLR (and similar interfaces to Lucene) become unwieldy to use: the same problems of sharding,
replication, and query dispatching that occur in RDBMS systems begin occur again in this context.
And just as various methods exist for dealing with these difficulties in the RDBMS world, various tools
exist for shard creation and distribution around SOLR.


But just as the right solution to big data databases means moving away from RDBMS into NoSQL
technologies, the right solution to scaling Lucene is to move away from tools like SOLR and use a tool
built from the ground-up to work with terabytes of data in a horizontally scalable, distributed, and fault-
tolerant way: Elasticsearch!




© 2012 Infochimps, Inc. All rights reserved.                                                          2
Elastic Search Features
Elasticsearch is best thought of as an interface to Lucene designed for big data from the ground up.
The complex feature set that Lucene provides for searching data is directly available through Elas-
ticsearch, as Lucene is ultimately the library that’s used for indexing and querying data. This also
means that plugins that work with Lucene will work with Elasticsearch out of the box.


The features that Elasticsearch itself provides around Lucene are designed to make it the perfect tool
for full-text search on big data:


  Performance/Scalability           Robustness                               East of Use

 An 8-node cluster can provide      No single point of failure               Simple, JSON-based
 sub-200ms response latency                                                  REST API means any
 when performing complex                                                     language can index
 searches on 10B+ records!                                                   or query records in an
                                                                             Elasticsearch cluster.

 Add or subtract nodes on the       Automatically backup all data in the     Java and Thrift APIs
 fly to dynamically scale the       cluster to local disk or permanent,      exist for finer-grained
 cluster to the current load        remote storage (like AWS’ S3             or more performant
                                    service).                                access.

 Ability to independently scale     Tune the replication factor of data on   Flexible schemas allow
 the indexing and querying          a per-index level                        for complex treatments
 performance of the cluster to                                               of types like dates
 deal with different sorts of use                                            without forcing all
 cases                                                                       documents in a table to
                                                                             be identical.

                                    Data will automatically be migrated      Multiple indices enable
                                    through the cluster if a node fails to   multi-tenancy out of the
                                    maintain performance and replication     box.
                                    factor.




© 2012 Infochimps, Inc. All rights reserved.                                                           3
Example Use Cases
There’s a lot you can do with Elasticsearch besides just searching for phrases. The following exam-
ples act as a quick guide to just a few of the features Elasticsearch provides.


Powerful Query Syntax
The simplest way to interface with Elasticsearch is also one of the most powerful: the query string.
Elasticsearch exposes the full Lucene query syntax through query strings that can be passed from a
user in an application directly to the database to be evaluated.


  Feature                        Query String                       Notes

 Boolean logic                  (coke OR pepsi) AND health

 Wildcards                      apple AND ip*d                      Wildcards can be applied for
                                                                    a single character (?) or for
                                                                    groups of characters (*).

 Specific search fields         coffee AND author:Smith             Can search on deeply nested
                                                                    fields like “author.lastName”
                                                                    as well.

 Search within a range          apple AND date:[20100101 TO
                                20100201]

 Boost results in relevance     taxicab AND (“New York”^2 or        Boosting can also be
                                “San Francisco”)                    configured at index time.




© 2012 Infochimps, Inc. All rights reserved.                                                        4
Records can be Complex Documents
A record in Elasticsearch doesn’t have to be flat like a record in a traditional RDBMS. Elasticsearch
allows documents to be hierarchical, and for sub-fields within a document to themselves have hierar-
chical structure. This makes data modeling very flexible. An example of how one might store a blog
post:


          {
                “id”: 1001,
                “author”: {
                  “name”: “Alexander Hamilton”,
                  “id”:     3874
                },
                “date”: “1787-10-07 12:31:00 -0600 CST”,
                “title”: “The Federalist Papers”,
                “subtitle”: “Paper #1”
                “text”: “AFTER an unequivocal experience of the inefficiency...”
                “similar_posts”: [ 1002, 1003, 1005]
                “comments”: [
                   { “author”: “John Adams”, “text”: “I must beg to differ...” },
                   …
                ]
            }

We could query these records using “author.name” or even “comments.text”, giving us a great deal of
flexibilty in how we choose to denormalize and access the data in the database.




© 2012 Infochimps, Inc. All rights reserved.                                                      5
Geo Queries
Elasticsearch understands geography. Geolocations can be stored within records as (latitude, longi-
tude) pairs or as geohashes. In either case, Elasticsearch provides the ability to query using a variety
of geo-methods:




                               Geo queries defined with a bounding box




                      Geo queries defined by distance range from a given point




© 2012 Infochimps, Inc. All rights reserved.                                                        6
Time Series
Things change. It’s important to see how. Elasticsearch understands dates and times and can return
time series data which represent an aggregation of the search results binned by time interval.




     Raw tweets stored in Elasticsearch can be binned into a time series on the fly at query time.




© 2012 Infochimps, Inc. All rights reserved.                                                         7
Application Support
Elasticsearch isn’t just a search engine; it’s a full-fledged database, and you can build an entire fron-
tend application on top of it.


Elasticsearch supports multiple indices (databases) and multiple mappings (tables) per index. This
feature, combined with the complex document structure Elasticsearch allows, lets you build the com-
plex data models that support applications.


And, in addition to being able to execute rich search queries across the data, Elasticsearch allows
the more “traditional” operations that define an application database: listing records, creating records,
updating records, and deleting records. These features give you what you need to build a traditional
database-driven, read/write application on top of the same database that lets you do full-text search
and complex queries, all with horizontal scalability built-in from the ground up.


Administration & Monitoring
Elasticsearch also exposes a complete administrative and monitoring interface over the same API
that powers the indexing, retrieval, and search of data.


Creating indices, updating their indexing or storage properties, defining rules for dealing with specific
fields in specific mappings, &c. can all be accomplished via this same API.


Getting detailed information about the cluster’s availability state, health, individual nodes’ memory
footprint, &c. is also available through this API, making monitoring of Elasticsearch easy.




© 2012 Infochimps, Inc. All rights reserved.                                                            8
About Infochimps
                                    Our mission is to make the world’s data more accessible.
                                    Infochimps helps companies understand their data. We provide
                                    tools and services that connect their internal data, leverage the
                                    power of cloud computing and new technologies such as Hadoop,
                                    and provide a wealth of external datasets, which organizations
                                    can connect to their own data.


                                    Contact Us
                                    Infochimps, Inc.
                                    1214 W 6th St. Suite 202
                                    Austin, TX 78703


                                    1-855-DATA-FUN (1-855-328-2386)


                                    www.infochimps.com
                                    info@infochimps.com


                                    Twitter: @infochimps




                      Get a free Big Data consultation
                          Let’s talk Big Data in the enterprise!

     Get a free conference with the leading big data experts regarding your enterprise big data
     project. Meet with leading data scientists Flip Kromer and/or Dhruv Bansal to talk shop
     about your project objectives, design, infrastructure, tools, etc. Find out how other compa-
     nies are solving similar problems. Learn best practices and get recommendations — free.




© 2012 Infochimps, Inc. All rights reserved.                                                        8

Contenu connexe

Tendances

Introduction to elasticsearch
Introduction to elasticsearchIntroduction to elasticsearch
Introduction to elasticsearchpmanvi
 
Elastic search overview
Elastic search overviewElastic search overview
Elastic search overviewABC Talks
 
One Large Data Lake, Hold the Hype
One Large Data Lake, Hold the HypeOne Large Data Lake, Hold the Hype
One Large Data Lake, Hold the HypeJared Winick
 
Hadoop Big Data A big picture
Hadoop Big Data A big pictureHadoop Big Data A big picture
Hadoop Big Data A big pictureJ S Jodha
 
Modeling data and best practices for the Azure Cosmos DB.
Modeling data and best practices for the Azure Cosmos DB.Modeling data and best practices for the Azure Cosmos DB.
Modeling data and best practices for the Azure Cosmos DB.Mohammad Asif
 
Analysing big data with cluster service and R
Analysing big data with cluster service and RAnalysing big data with cluster service and R
Analysing big data with cluster service and RLushi Chen
 
Dr.Hadoop- an infinite scalable metadata management for Hadoop-How the baby e...
Dr.Hadoop- an infinite scalable metadata management for Hadoop-How the baby e...Dr.Hadoop- an infinite scalable metadata management for Hadoop-How the baby e...
Dr.Hadoop- an infinite scalable metadata management for Hadoop-How the baby e...Dipayan Dev
 
Hadoop Technologies
Hadoop TechnologiesHadoop Technologies
Hadoop Technologieszahid-mian
 
ElasticSearch in Production: lessons learned
ElasticSearch in Production: lessons learnedElasticSearch in Production: lessons learned
ElasticSearch in Production: lessons learnedBeyondTrees
 
Introduction to elasticsearch
Introduction to elasticsearchIntroduction to elasticsearch
Introduction to elasticsearchhypto
 
Comparison - RDBMS vs Hadoop vs Apache
Comparison - RDBMS vs Hadoop vs ApacheComparison - RDBMS vs Hadoop vs Apache
Comparison - RDBMS vs Hadoop vs ApacheSandeepTaksande
 
[SSA] 04.sql on hadoop(2014.02.05)
[SSA] 04.sql on hadoop(2014.02.05)[SSA] 04.sql on hadoop(2014.02.05)
[SSA] 04.sql on hadoop(2014.02.05)Steve Min
 
TCO - MongoDB vs. Oracle
TCO - MongoDB vs. OracleTCO - MongoDB vs. Oracle
TCO - MongoDB vs. OracleJeremy Taylor
 

Tendances (20)

Introduction to elasticsearch
Introduction to elasticsearchIntroduction to elasticsearch
Introduction to elasticsearch
 
Elastic search overview
Elastic search overviewElastic search overview
Elastic search overview
 
One Large Data Lake, Hold the Hype
One Large Data Lake, Hold the HypeOne Large Data Lake, Hold the Hype
One Large Data Lake, Hold the Hype
 
Hadoop Big Data A big picture
Hadoop Big Data A big pictureHadoop Big Data A big picture
Hadoop Big Data A big picture
 
Ess1000 glossary
Ess1000 glossaryEss1000 glossary
Ess1000 glossary
 
39 43
39 4339 43
39 43
 
Modeling data and best practices for the Azure Cosmos DB.
Modeling data and best practices for the Azure Cosmos DB.Modeling data and best practices for the Azure Cosmos DB.
Modeling data and best practices for the Azure Cosmos DB.
 
Intro to Hadoop
Intro to HadoopIntro to Hadoop
Intro to Hadoop
 
Analysing big data with cluster service and R
Analysing big data with cluster service and RAnalysing big data with cluster service and R
Analysing big data with cluster service and R
 
SparkPaper
SparkPaperSparkPaper
SparkPaper
 
NoSQL
NoSQLNoSQL
NoSQL
 
Dr.Hadoop- an infinite scalable metadata management for Hadoop-How the baby e...
Dr.Hadoop- an infinite scalable metadata management for Hadoop-How the baby e...Dr.Hadoop- an infinite scalable metadata management for Hadoop-How the baby e...
Dr.Hadoop- an infinite scalable metadata management for Hadoop-How the baby e...
 
Hadoop Technologies
Hadoop TechnologiesHadoop Technologies
Hadoop Technologies
 
ElasticSearch in Production: lessons learned
ElasticSearch in Production: lessons learnedElasticSearch in Production: lessons learned
ElasticSearch in Production: lessons learned
 
Elastic search
Elastic searchElastic search
Elastic search
 
Introduction to elasticsearch
Introduction to elasticsearchIntroduction to elasticsearch
Introduction to elasticsearch
 
Comparison - RDBMS vs Hadoop vs Apache
Comparison - RDBMS vs Hadoop vs ApacheComparison - RDBMS vs Hadoop vs Apache
Comparison - RDBMS vs Hadoop vs Apache
 
[SSA] 04.sql on hadoop(2014.02.05)
[SSA] 04.sql on hadoop(2014.02.05)[SSA] 04.sql on hadoop(2014.02.05)
[SSA] 04.sql on hadoop(2014.02.05)
 
Bigdata
BigdataBigdata
Bigdata
 
TCO - MongoDB vs. Oracle
TCO - MongoDB vs. OracleTCO - MongoDB vs. Oracle
TCO - MongoDB vs. Oracle
 

Similaire à The Power of Elasticsearch

Elasticsearch for beginners
Elasticsearch for beginnersElasticsearch for beginners
Elasticsearch for beginnersNeil Baker
 
Deep dive to ElasticSearch - معرفی ابزار جستجوی الاستیکی
Deep dive to ElasticSearch - معرفی ابزار جستجوی الاستیکیDeep dive to ElasticSearch - معرفی ابزار جستجوی الاستیکی
Deep dive to ElasticSearch - معرفی ابزار جستجوی الاستیکیEhsan Asgarian
 
Elasticsearch and Spark
Elasticsearch and SparkElasticsearch and Spark
Elasticsearch and SparkAudible, Inc.
 
Wanna search? Piece of cake!
Wanna search? Piece of cake!Wanna search? Piece of cake!
Wanna search? Piece of cake!Alex Kursov
 
ElasticSearch: Distributed Multitenant NoSQL Datastore and Search Engine
ElasticSearch: Distributed Multitenant NoSQL Datastore and Search EngineElasticSearch: Distributed Multitenant NoSQL Datastore and Search Engine
ElasticSearch: Distributed Multitenant NoSQL Datastore and Search EngineDaniel N
 
Comparative study of no sql document, column store databases and evaluation o...
Comparative study of no sql document, column store databases and evaluation o...Comparative study of no sql document, column store databases and evaluation o...
Comparative study of no sql document, column store databases and evaluation o...ijdms
 
SQL and NoSQL in SQL Server
SQL and NoSQL in SQL ServerSQL and NoSQL in SQL Server
SQL and NoSQL in SQL ServerMichael Rys
 
ElasticSearch Getting Started
ElasticSearch Getting StartedElasticSearch Getting Started
ElasticSearch Getting StartedOnuralp Taner
 
Elastic Search Capability Presentation.pptx
Elastic Search Capability Presentation.pptxElastic Search Capability Presentation.pptx
Elastic Search Capability Presentation.pptxKnoldus Inc.
 
Dynamic and repeatable transformation of existing Thesauri and Authority list...
Dynamic and repeatable transformation of existing Thesauri and Authority list...Dynamic and repeatable transformation of existing Thesauri and Authority list...
Dynamic and repeatable transformation of existing Thesauri and Authority list...DESTIN-Informatique.com
 
Perl and Elasticsearch
Perl and ElasticsearchPerl and Elasticsearch
Perl and ElasticsearchDean Hamstead
 
Intro to Elasticsearch
Intro to ElasticsearchIntro to Elasticsearch
Intro to ElasticsearchClifford James
 
Using Data Lakes: Data Analytics Week SF
Using Data Lakes: Data Analytics Week SFUsing Data Lakes: Data Analytics Week SF
Using Data Lakes: Data Analytics Week SFAmazon Web Services
 

Similaire à The Power of Elasticsearch (20)

Elastic search
Elastic searchElastic search
Elastic search
 
Elasticsearch for beginners
Elasticsearch for beginnersElasticsearch for beginners
Elasticsearch for beginners
 
Elasticsearch
ElasticsearchElasticsearch
Elasticsearch
 
Elasticsearch
ElasticsearchElasticsearch
Elasticsearch
 
Deep dive to ElasticSearch - معرفی ابزار جستجوی الاستیکی
Deep dive to ElasticSearch - معرفی ابزار جستجوی الاستیکیDeep dive to ElasticSearch - معرفی ابزار جستجوی الاستیکی
Deep dive to ElasticSearch - معرفی ابزار جستجوی الاستیکی
 
Elasticsearch and Spark
Elasticsearch and SparkElasticsearch and Spark
Elasticsearch and Spark
 
No sql databases
No sql databasesNo sql databases
No sql databases
 
Wanna search? Piece of cake!
Wanna search? Piece of cake!Wanna search? Piece of cake!
Wanna search? Piece of cake!
 
ElasticSearch: Distributed Multitenant NoSQL Datastore and Search Engine
ElasticSearch: Distributed Multitenant NoSQL Datastore and Search EngineElasticSearch: Distributed Multitenant NoSQL Datastore and Search Engine
ElasticSearch: Distributed Multitenant NoSQL Datastore and Search Engine
 
Comparative study of no sql document, column store databases and evaluation o...
Comparative study of no sql document, column store databases and evaluation o...Comparative study of no sql document, column store databases and evaluation o...
Comparative study of no sql document, column store databases and evaluation o...
 
Datastores
DatastoresDatastores
Datastores
 
SQL and NoSQL in SQL Server
SQL and NoSQL in SQL ServerSQL and NoSQL in SQL Server
SQL and NoSQL in SQL Server
 
ElasticSearch Getting Started
ElasticSearch Getting StartedElasticSearch Getting Started
ElasticSearch Getting Started
 
Elastic Search Capability Presentation.pptx
Elastic Search Capability Presentation.pptxElastic Search Capability Presentation.pptx
Elastic Search Capability Presentation.pptx
 
Dynamic and repeatable transformation of existing Thesauri and Authority list...
Dynamic and repeatable transformation of existing Thesauri and Authority list...Dynamic and repeatable transformation of existing Thesauri and Authority list...
Dynamic and repeatable transformation of existing Thesauri and Authority list...
 
Perl and Elasticsearch
Perl and ElasticsearchPerl and Elasticsearch
Perl and Elasticsearch
 
Intro to Elasticsearch
Intro to ElasticsearchIntro to Elasticsearch
Intro to Elasticsearch
 
Elasticsearch
ElasticsearchElasticsearch
Elasticsearch
 
Using Data Lakes
Using Data Lakes Using Data Lakes
Using Data Lakes
 
Using Data Lakes: Data Analytics Week SF
Using Data Lakes: Data Analytics Week SFUsing Data Lakes: Data Analytics Week SF
Using Data Lakes: Data Analytics Week SF
 

Plus de Infochimps, a CSC Big Data Business

[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
[Webinar] Getting to Insights Faster: A Framework for Agile Big DataInfochimps, a CSC Big Data Business
 
[Webinar] Measure Twice, Build Once: Real-Time Predictive Analytics
[Webinar] Measure Twice, Build Once: Real-Time Predictive Analytics[Webinar] Measure Twice, Build Once: Real-Time Predictive Analytics
[Webinar] Measure Twice, Build Once: Real-Time Predictive AnalyticsInfochimps, a CSC Big Data Business
 
Case Study: Digital Agency Turbocharges Social Listening and Insights with t...
Case Study: Digital  Agency Turbocharges Social Listening and Insights with t...Case Study: Digital  Agency Turbocharges Social Listening and Insights with t...
Case Study: Digital Agency Turbocharges Social Listening and Insights with t...Infochimps, a CSC Big Data Business
 

Plus de Infochimps, a CSC Big Data Business (17)

Vayacondios: Divine into Complex Systems
Vayacondios: Divine into Complex SystemsVayacondios: Divine into Complex Systems
Vayacondios: Divine into Complex Systems
 
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
 
[Webinar] Measure Twice, Build Once: Real-Time Predictive Analytics
[Webinar] Measure Twice, Build Once: Real-Time Predictive Analytics[Webinar] Measure Twice, Build Once: Real-Time Predictive Analytics
[Webinar] Measure Twice, Build Once: Real-Time Predictive Analytics
 
AHUG Presentation: Fun with Hadoop File Systems
AHUG Presentation: Fun with Hadoop File SystemsAHUG Presentation: Fun with Hadoop File Systems
AHUG Presentation: Fun with Hadoop File Systems
 
Report: CIOs & Big Data
Report: CIOs & Big DataReport: CIOs & Big Data
Report: CIOs & Big Data
 
Infographic: CIOs & Big Data
Infographic: CIOs & Big DataInfographic: CIOs & Big Data
Infographic: CIOs & Big Data
 
5 Big Data Use Cases for 2013
5 Big Data Use Cases for 20135 Big Data Use Cases for 2013
5 Big Data Use Cases for 2013
 
451 Research Impact Report
451 Research Impact Report451 Research Impact Report
451 Research Impact Report
 
[Webinar] Top Strategies for Successful Big Data Projects
[Webinar] Top Strategies for Successful Big Data Projects[Webinar] Top Strategies for Successful Big Data Projects
[Webinar] Top Strategies for Successful Big Data Projects
 
[Webinar] High Speed Retail Analytics
[Webinar] High Speed Retail Analytics[Webinar] High Speed Retail Analytics
[Webinar] High Speed Retail Analytics
 
Infochimps + CloudCon: Infinite Monkey Theorem
Infochimps + CloudCon: Infinite Monkey TheoremInfochimps + CloudCon: Infinite Monkey Theorem
Infochimps + CloudCon: Infinite Monkey Theorem
 
Taming the Big Data Tsunami using Intel Architecture
Taming the Big Data Tsunami using Intel ArchitectureTaming the Big Data Tsunami using Intel Architecture
Taming the Big Data Tsunami using Intel Architecture
 
The Other Way of Doing Big Data
The Other Way of Doing Big DataThe Other Way of Doing Big Data
The Other Way of Doing Big Data
 
Real-Time Analytics: The Future of Big Data in the Agency
Real-Time Analytics: The Future of Big Data in the AgencyReal-Time Analytics: The Future of Big Data in the Agency
Real-Time Analytics: The Future of Big Data in the Agency
 
Ironfan: Your Foundation for Flexible Big Data Infrastructure
Ironfan: Your Foundation for Flexible Big Data InfrastructureIronfan: Your Foundation for Flexible Big Data Infrastructure
Ironfan: Your Foundation for Flexible Big Data Infrastructure
 
Case Study: Digital Agency Turbocharges Social Listening and Insights with t...
Case Study: Digital  Agency Turbocharges Social Listening and Insights with t...Case Study: Digital  Agency Turbocharges Social Listening and Insights with t...
Case Study: Digital Agency Turbocharges Social Listening and Insights with t...
 
Meet the Infochimps Platform
Meet the Infochimps PlatformMeet the Infochimps Platform
Meet the Infochimps Platform
 

Dernier

Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Will Schroeder
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemAsko Soukka
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsSeth Reyes
 
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsIgniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsSafe Software
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintMahmoud Rabie
 
20230202 - Introduction to tis-py
20230202 - Introduction to tis-py20230202 - Introduction to tis-py
20230202 - Introduction to tis-pyJamie (Taka) Wang
 
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopBachir Benyammi
 
VoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXVoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXTarek Kalaji
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024D Cloud Solutions
 
Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Brian Pichman
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdfPedro Manuel
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostMatt Ray
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Commit University
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Adtran
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6DianaGray10
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioChristian Posta
 
Linked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesLinked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesDavid Newbury
 
How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?IES VE
 

Dernier (20)

Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystem
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and Hazards
 
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsIgniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership Blueprint
 
20230202 - Introduction to tis-py
20230202 - Introduction to tis-py20230202 - Introduction to tis-py
20230202 - Introduction to tis-py
 
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 Workshop
 
VoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXVoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBX
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024
 
Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )
 
201610817 - edge part1
201610817 - edge part1201610817 - edge part1
201610817 - edge part1
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdf
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and Istio
 
Linked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesLinked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond Ontologies
 
How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?
 
20230104 - machine vision
20230104 - machine vision20230104 - machine vision
20230104 - machine vision
 

The Power of Elasticsearch

  • 1. The Power of Elasticsearch What is Search? Search is a feature that can make or break an application or service. The ability to do full-text search across a body of text, A horizontally- to narrow search queries by values or ranges of specific fields, to use advanced features like faceted search, geo queries, or scalable, distributed similarity searching can add a wealth of functionality and act as a database built on differentiator for your product. When search fails, either because of inflexible query parameters, irrelevant results, or an inability to Apache’s Lucene scale to meet demands of volume or usage, users notice immedi- ately -- and they will be upset. that delivers a full- featured search What is Elasticsearch? experience across Elasticsearch is a new database built to handle huge amounts terabytes of data of data volume with very high availability and to distribute itself across many machines to be fault-tolerant and scalable, all the with a simple yet while maintaining a simple but powerful API that allows applica- powerful API. tions from any language or framework access to the database. © 2012 Infochimps, Inc. All rights reserved. 1
  • 2. Built on Lucene Apache’s Lucene is an open-source Java library for text search. The Lucene project has been growing for more than a decade and has now become the standard reference for how to build a powerful yet easy to integrate, open-source search library. Its feature set includes but is not limited to: Performance/Scalability Search Features Accessibility Can index over 95 GB/hr/ Ranked/relevance searching with Completely open-source node fine-grained control Low RAM overhead (1MB Allows querying on phrases, Implemented in Java so heap) wildcards, geographical inherently cross-platform proximity, variable ranges, &c. Wrapping Lucene for Big Data Lucene, as a search library, must be wrapped with an interface to allow its features to be used by an application. Many such interfaces have been built for different platforms and use cases. One of the most popular is Apache’s own SOLR project, which creates an interface around Lucene tailored for something like a traditional web application. An interface like SOLR, however, is designed for a world in which a single server can handle the full workload of indexing and querying the data. When the data volume begins to increase past this limit, SOLR (and similar interfaces to Lucene) become unwieldy to use: the same problems of sharding, replication, and query dispatching that occur in RDBMS systems begin occur again in this context. And just as various methods exist for dealing with these difficulties in the RDBMS world, various tools exist for shard creation and distribution around SOLR. But just as the right solution to big data databases means moving away from RDBMS into NoSQL technologies, the right solution to scaling Lucene is to move away from tools like SOLR and use a tool built from the ground-up to work with terabytes of data in a horizontally scalable, distributed, and fault- tolerant way: Elasticsearch! © 2012 Infochimps, Inc. All rights reserved. 2
  • 3. Elastic Search Features Elasticsearch is best thought of as an interface to Lucene designed for big data from the ground up. The complex feature set that Lucene provides for searching data is directly available through Elas- ticsearch, as Lucene is ultimately the library that’s used for indexing and querying data. This also means that plugins that work with Lucene will work with Elasticsearch out of the box. The features that Elasticsearch itself provides around Lucene are designed to make it the perfect tool for full-text search on big data: Performance/Scalability Robustness East of Use An 8-node cluster can provide No single point of failure Simple, JSON-based sub-200ms response latency REST API means any when performing complex language can index searches on 10B+ records! or query records in an Elasticsearch cluster. Add or subtract nodes on the Automatically backup all data in the Java and Thrift APIs fly to dynamically scale the cluster to local disk or permanent, exist for finer-grained cluster to the current load remote storage (like AWS’ S3 or more performant service). access. Ability to independently scale Tune the replication factor of data on Flexible schemas allow the indexing and querying a per-index level for complex treatments performance of the cluster to of types like dates deal with different sorts of use without forcing all cases documents in a table to be identical. Data will automatically be migrated Multiple indices enable through the cluster if a node fails to multi-tenancy out of the maintain performance and replication box. factor. © 2012 Infochimps, Inc. All rights reserved. 3
  • 4. Example Use Cases There’s a lot you can do with Elasticsearch besides just searching for phrases. The following exam- ples act as a quick guide to just a few of the features Elasticsearch provides. Powerful Query Syntax The simplest way to interface with Elasticsearch is also one of the most powerful: the query string. Elasticsearch exposes the full Lucene query syntax through query strings that can be passed from a user in an application directly to the database to be evaluated. Feature Query String Notes Boolean logic (coke OR pepsi) AND health Wildcards apple AND ip*d Wildcards can be applied for a single character (?) or for groups of characters (*). Specific search fields coffee AND author:Smith Can search on deeply nested fields like “author.lastName” as well. Search within a range apple AND date:[20100101 TO 20100201] Boost results in relevance taxicab AND (“New York”^2 or Boosting can also be “San Francisco”) configured at index time. © 2012 Infochimps, Inc. All rights reserved. 4
  • 5. Records can be Complex Documents A record in Elasticsearch doesn’t have to be flat like a record in a traditional RDBMS. Elasticsearch allows documents to be hierarchical, and for sub-fields within a document to themselves have hierar- chical structure. This makes data modeling very flexible. An example of how one might store a blog post: { “id”: 1001, “author”: { “name”: “Alexander Hamilton”, “id”: 3874 }, “date”: “1787-10-07 12:31:00 -0600 CST”, “title”: “The Federalist Papers”, “subtitle”: “Paper #1” “text”: “AFTER an unequivocal experience of the inefficiency...” “similar_posts”: [ 1002, 1003, 1005] “comments”: [ { “author”: “John Adams”, “text”: “I must beg to differ...” }, … ] } We could query these records using “author.name” or even “comments.text”, giving us a great deal of flexibilty in how we choose to denormalize and access the data in the database. © 2012 Infochimps, Inc. All rights reserved. 5
  • 6. Geo Queries Elasticsearch understands geography. Geolocations can be stored within records as (latitude, longi- tude) pairs or as geohashes. In either case, Elasticsearch provides the ability to query using a variety of geo-methods: Geo queries defined with a bounding box Geo queries defined by distance range from a given point © 2012 Infochimps, Inc. All rights reserved. 6
  • 7. Time Series Things change. It’s important to see how. Elasticsearch understands dates and times and can return time series data which represent an aggregation of the search results binned by time interval. Raw tweets stored in Elasticsearch can be binned into a time series on the fly at query time. © 2012 Infochimps, Inc. All rights reserved. 7
  • 8. Application Support Elasticsearch isn’t just a search engine; it’s a full-fledged database, and you can build an entire fron- tend application on top of it. Elasticsearch supports multiple indices (databases) and multiple mappings (tables) per index. This feature, combined with the complex document structure Elasticsearch allows, lets you build the com- plex data models that support applications. And, in addition to being able to execute rich search queries across the data, Elasticsearch allows the more “traditional” operations that define an application database: listing records, creating records, updating records, and deleting records. These features give you what you need to build a traditional database-driven, read/write application on top of the same database that lets you do full-text search and complex queries, all with horizontal scalability built-in from the ground up. Administration & Monitoring Elasticsearch also exposes a complete administrative and monitoring interface over the same API that powers the indexing, retrieval, and search of data. Creating indices, updating their indexing or storage properties, defining rules for dealing with specific fields in specific mappings, &c. can all be accomplished via this same API. Getting detailed information about the cluster’s availability state, health, individual nodes’ memory footprint, &c. is also available through this API, making monitoring of Elasticsearch easy. © 2012 Infochimps, Inc. All rights reserved. 8
  • 9. About Infochimps Our mission is to make the world’s data more accessible. Infochimps helps companies understand their data. We provide tools and services that connect their internal data, leverage the power of cloud computing and new technologies such as Hadoop, and provide a wealth of external datasets, which organizations can connect to their own data. Contact Us Infochimps, Inc. 1214 W 6th St. Suite 202 Austin, TX 78703 1-855-DATA-FUN (1-855-328-2386) www.infochimps.com info@infochimps.com Twitter: @infochimps Get a free Big Data consultation Let’s talk Big Data in the enterprise! Get a free conference with the leading big data experts regarding your enterprise big data project. Meet with leading data scientists Flip Kromer and/or Dhruv Bansal to talk shop about your project objectives, design, infrastructure, tools, etc. Find out how other compa- nies are solving similar problems. Learn best practices and get recommendations — free. © 2012 Infochimps, Inc. All rights reserved. 8