Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.
정호욱책임/ BigDataPlatform Team 
그루터 
ElasticSearch의이해와 
성능최적화
저는요… 
•정호욱 
•BigdataPlatform, GruterCorp 
•hwjeong@gruter.com 
•http://jjeong.tistory.com 
•E-book: 실무예제로배우는Elasticsearch검...
1.ElasticSearch이해 
2.ElasticSearch 성능최적화이해 
3.ElasticSearch 빅데이터활용 
CONTENTS
1.ElasticSearch 
이해 
1.1.ElasticSearch와동작방식 
1.2.설치및실행하기 
1.3.Modeling 하기
ElasticSearch란? 
Lucene기반의오픈소스검색엔진 
1.1.ElasticSearch와동작방식 
ElasticSearch특징 
Easy 
Real time search & analytics 
Distribut...
ElasticSearch구성 
Physical구성 
Logical구성 
1.1.ElasticSearch와동작방식 
Cluster 
Index 
Node 
Node 
Node 
Indice 
Indice 
Indice 
...
ElasticSearchNodes 
Master node 
Data node 
Search load balancer node 
Client node 
1.1.ElasticSearch와동작방식 
Master 
node.m...
ElasticSearchNodes 구성예 
1.1.ElasticSearch와동작방식 
Case 1) 
All round player 
node.master: true 
node.data: true 
node.master...
ElasticSearchvs RDBMS 
1.1.ElasticSearch와동작방식 
Relational Database 
ElasticSearch 
Database 
Index 
Table 
Type 
Row 
Docu...
ElasticSearchshard replication 
1.1.ElasticSearch와동작방식 
POST /my_index/_settings{ "number_of_replicas":1} 
POST /my_index/...
Creating, indexing and deleting a document 
1.1.ElasticSearch와동작방식 
http://www.elasticsearch.org/guide/en/elasticsearch/gu...
Retrieve, query and fetch a document 
1.1.ElasticSearch와동작방식 
http://www.elasticsearch.org/guide/en/elasticsearch/guide/cu...
설치하기 
다운로드 
압축해제 
1.2.설치및실행하기 
실행하기 
실행 
테스트 
Create index 
Add document 
Get document 
Search document
Indice/type design 
Time-based/User-based data 
Relational data 
1TB 
1.3.Modeling 하기 
Field design 
검색대상필드 
분석대상필드 
정렬대상필...
Modeling 구성예 
1.3.Modeling 하기 
Indice1 
Indice2 
Indice3 
IndiceA 
IndiceB 
IndiceC 
Type 
Parent 
Type 
Child 
Type 
Pare...
Shard design 
number_of_shards>= number_of_data_nodes 
number_of_replica<= number_of_data_nodes-1 
1.3.Modeling 하기 
Shard ...
Hash partition test 
1.3.Modeling 하기 
public class EsHashPartitionTest{ 
@Test 
public void testHashPartiion() { 
……중략…… 
...
2.ElasticSearch 
성능최적화 
이해 
2.1.성능에영향을미치는요소들 
2.2.설정최적화 
2.3.색인최적화 
2.4.질의최적화
장비관점 
Network bandwidth? 
Disk I/O? 
RAM? 
CPU cores? 
2.1.성능에영향을미치는요소들 
문서관점 
Document size? 
Total index data size? 
Dat...
In ElasticSearchsite: 
If 1 shard is too few and 1,000 shards are too many, how do I know how many shards I need? 
This is...
In ElasticSearchsite: 
Fortunately, it is an easy question to answer in the specific case: yours. 
1.Create a cluster cons...
운영체제관점 
Increase File descriptor 
Avoid swap 
2.2.설정최적화 
검색엔진관점 
Avoid swap 
Thread pool 
Segment merge 
Index buffer size...
Cluster restart관점 
Optimize (max segments: 5) 
Close index 
Restart after set “disable_allocation: true” 
Increase recover...
Modeling 
Disable “_all”fields 
Disable “_source” fields, so far as possible 
Set right value to “_id” fields 
Set false t...
Sizing 
Indice는데이터의크기를관리할수있는용도로사용한다. 
Indice당primary shard 수는data node 수보다크거나같아야한다. (number_of_shards>= number_of_data_nod...
Client 
Bulk API를사용한다. 
Hardware 성능을점검한다. 
Exception을확인한다. 
Thread pools을점검한다. 
1110(Node,Indice,Shard,Replica)으로점검한다. 
Op...
Bulk indexing 
Request 당크기는5 ~ 15MB 
Request 당문서크기는1,000 ~ 5,000개 
Server bulk thread pool 크기는core size ×5 보다작거나같게설정 
Clie...
Bulk indexing 
Disable refresh_interval 
Disable replica 
Use flush & refresh (instead of optimize) 
2.3.색인최적화 
Bulk index...
Shards 
Data 분산을위해shard 수를늘린다. 
Replica shard 수를늘린다. 
2.4.질의최적화 
Data distribution 
Use routing 
Check _id 
ShardId= hash(...
Query 
항상같은node 로query hitting이되지않도록한다. 
Zero hit query를줄여야한다. 
Query 결과를cache 한다. 
Avoid deep pagination. 
Sorting : numb...
Queries vs. Filters 
Query 대신filtered query와filter를사용한다. 
And/or/not filter 대신boolfilter를사용한다. 
2.4.질의최적화 
Queries 
Filter...
3.ElasticSearch 
빅데이터 
활용 
3.1.Hadoop 통합 
3.2.SQL on ElasticSearch
ElasticSearchHadoop 활용 
Big data 분석을위한도구 
Snapshot & Restore 저장소 
ElasticSearchHadoop plugin 도구제공 
3.1.Hadoop 통합
Indexing 
3.1.Hadoop 통합 
ElasticSearch 
Hadoop plugin 
Read raw data 
Integrate natively 
Bulk indexing 
Java client 
appl...
Indexing 
ElasticSearch 
Hadoop 
Plugin 
MapReduce 
3.1.Hadoop 통합 
Configuration conf= new Configuration(); 
…중략… 
conf.se...
Indexing 
Java 
Client 
Application 
MapReduce 
3.1.Hadoop 통합 
public static void main(String[] args) throws Exception { 
...
Indexing 
Java 
Client 
Application 
MapReduce 
3.1.Hadoop 통합 
public void map(Object key, Object value, Context context) ...
Searching 
3.1.Hadoop 통합 
ElasticSearchHadoop plugin 
Integrate natively 
Query request 
Java client application 
Query re...
Searching 
ElasticSearch 
Hadoop 
Plugin 
MapReduce 
3.1.Hadoop 통합 
public static class SearchMapperextends Mapper { 
@Ove...
Searching 
Java 
Client 
Application 
3.1.Hadoop 통합 
SearchResponsesearchResponse; 
MatchAllQueryBuilder 
matchAllQueryBui...
ElasticSearchSQL 이란? 
쉬운접근성과데이터분석도구를제공한다. 
표준SQL 문법을Query DSL로변환한다. 
표준SQL 문법을사용하여검색엔진으로CRUD 연산을수행할수있다. 
JDBC drive와CLI 기능...
ElasticSearchJDBC driver 
3.2.SQL on ElasticSearch 
Client 
Application 
JDBC 
Driver 
Elastic 
Search 
SQL 
Analyzer 
Alg...
ElasticSearchSQL Syntax 
Create database/table 
Drop database/table 
Select/Insert/Upsert/Delete 
Use database 
Show datab...
ElasticSearchAnalytics(Aggregations) SQL 
Min/max/sum/avg/stats/extended_stats 
Value_count/percentiles/cardinality 
Globa...
ElasticSearchSQL vs. Query DSL 
3.2.SQL on ElasticSearch 
SQL 
Query DSL 
SELECT * 
FROM type_name 
LIMIT 0/10 
"match_all...
ElasticSearchSQL vs. Query DSL 
3.2.SQL on ElasticSearch 
SQL 
Query DSL 
SELECT * 
FROM type_name 
WHERE search_ field > ...
SQL on ElasticSearch 
Demo
ElasticSearch이해 
Lucene기반의분산검색엔진 
ElasticSearch성능최적화이해 
정답은없지만… 
항상좋은장비에최신버전을사용한다. 
확장가능한modeling과sizing을구성한다. 
병목구간을항상모니터...
Q&A 
E-mail : sophistlv@gmail.com
THANK YOU
[2D1]Elasticsearch 성능 최적화
Prochain SlideShare
Chargement dans…5
×

[2D1]Elasticsearch 성능 최적화

26 967 vues

Publié le

DEVIEW 2014 [2D1]Elasticsearch 성능 최적화

Publié dans : Technologie
  • Identifiez-vous pour voir les commentaires

[2D1]Elasticsearch 성능 최적화

  1. 1. 정호욱책임/ BigDataPlatform Team 그루터 ElasticSearch의이해와 성능최적화
  2. 2. 저는요… •정호욱 •BigdataPlatform, GruterCorp •hwjeong@gruter.com •http://jjeong.tistory.com •E-book: 실무예제로배우는Elasticsearch검색엔진-입문편
  3. 3. 1.ElasticSearch이해 2.ElasticSearch 성능최적화이해 3.ElasticSearch 빅데이터활용 CONTENTS
  4. 4. 1.ElasticSearch 이해 1.1.ElasticSearch와동작방식 1.2.설치및실행하기 1.3.Modeling 하기
  5. 5. ElasticSearch란? Lucene기반의오픈소스검색엔진 1.1.ElasticSearch와동작방식 ElasticSearch특징 Easy Real time search & analytics Distributed & highly available search engine
  6. 6. ElasticSearch구성 Physical구성 Logical구성 1.1.ElasticSearch와동작방식 Cluster Index Node Node Node Indice Indice Indice Shard Shard Shard Shard Shard Shard Shard Shard Shard Type Type Type Document Document Document field:value field:value field:value field:value field:value field:value field:value field:value field:value [Physical 구성] [Logical 구성]
  7. 7. ElasticSearchNodes Master node Data node Search load balancer node Client node 1.1.ElasticSearch와동작방식 Master node.master: true Data node.data: true Search LB node.master: false node.data: false Client node.client: true
  8. 8. ElasticSearchNodes 구성예 1.1.ElasticSearch와동작방식 Case 1) All round player node.master: true node.data: true node.master: true node.data: true node.master: true node.data: true Case 2) Master Data node.master: true node.data: false node.master: true node.data: false node.master: false node.data: true node.master: false node.data: true Case 3) Master Data Search LB node.master: true node.data: false node.master: true node.data: false node.master: false node.data: true node.master: false node.data: true node.master: false node.data: false node.master: false node.data: false
  9. 9. ElasticSearchvs RDBMS 1.1.ElasticSearch와동작방식 Relational Database ElasticSearch Database Index Table Type Row Document Column Field Index Analyze Primary key _id Schema Mapping Physical partition Shard Logical partition Route Relational Parent/Child, Nested SQL Query DSL
  10. 10. ElasticSearchshard replication 1.1.ElasticSearch와동작방식 POST /my_index/_settings{ "number_of_replicas":1} POST /my_index/_settings{ "number_of_replicas":2} http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/replica-shards
  11. 11. Creating, indexing and deleting a document 1.1.ElasticSearch와동작방식 http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/distrib-write.html
  12. 12. Retrieve, query and fetch a document 1.1.ElasticSearch와동작방식 http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/distrib-read.html http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/_query_phase.html http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/_fetch_phase.html
  13. 13. 설치하기 다운로드 압축해제 1.2.설치및실행하기 실행하기 실행 테스트 Create index Add document Get document Search document
  14. 14. Indice/type design Time-based/User-based data Relational data 1TB 1.3.Modeling 하기 Field design 검색대상필드 분석대상필드 정렬대상필드 저장대상필드 Primary key 필드
  15. 15. Modeling 구성예 1.3.Modeling 하기 Indice1 Indice2 Indice3 IndiceA IndiceB IndiceC Type Parent Type Child Type Parent Type Child Type Child Type 1 : N 1 : N 1 : N
  16. 16. Shard design number_of_shards>= number_of_data_nodes number_of_replica<= number_of_data_nodes-1 1.3.Modeling 하기 Shard sizing Index 당최대shard 수: 200 개이하 Shard 하나당최대크기: 20 ~ 50GB Shard 하나당최소크기: ~ 3GB
  17. 17. Hash partition test 1.3.Modeling 하기 public class EsHashPartitionTest{ @Test public void testHashPartiion() { ……중략…… for ( inti=0; i<1000000; i++ ) { intshardId= MathUtils.mod(hash(String.valueOf(i)), shardSize); shards.add(shardId, (long) ++partSize[shardId]); } ……중략…… } public inthash(String routing) { return hashFunction.hash(routing); } }
  18. 18. 2.ElasticSearch 성능최적화 이해 2.1.성능에영향을미치는요소들 2.2.설정최적화 2.3.색인최적화 2.4.질의최적화
  19. 19. 장비관점 Network bandwidth? Disk I/O? RAM? CPU cores? 2.1.성능에영향을미치는요소들 문서관점 Document size? Total index data size? Data size increase? Store period? 서비스관점 Analyzer? Analyze fields? Indexed field size? Boosting? Realtimeor batch? Queries?
  20. 20. In ElasticSearchsite: If 1 shard is too few and 1,000 shards are too many, how do I know how many shards I need? This is a question that is impossible to answer in the general case. There are just too many variables: the hardware that you use, the size and complexity of your documents, how you index and analyze those documents, the types of queries that you run, the aggregations that you perform, how you model your data, etc., etc. 2.1.성능에영향을미치는요소들 http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/capacity-planning.html
  21. 21. In ElasticSearchsite: Fortunately, it is an easy question to answer in the specific case: yours. 1.Create a cluster consisting of a single server, with the hardware that you are considering using in production. 2.Create an index with the same settings and analyzers that you plan to use in production, but with only on primary shard and no replicas. 3.Fill it with real documents (or as close to real as you can get). 4.Run real queries and aggregations (or as close to real as you can get). 2.1.성능에영향을미치는요소들 http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/capacity-planning.html
  22. 22. 운영체제관점 Increase File descriptor Avoid swap 2.2.설정최적화 검색엔진관점 Avoid swap Thread pool Segment merge Index buffer size Storage device Use recent version
  23. 23. Cluster restart관점 Optimize (max segments: 5) Close index Restart after set “disable_allocation: true” Increase recovery limits 2.2.설정최적화
  24. 24. Modeling Disable “_all”fields Disable “_source” fields, so far as possible Set right value to “_id” fields Set false to “store” fields, so far as possible 2.3.색인최적화
  25. 25. Sizing Indice는데이터의크기를관리할수있는용도로사용한다. Indice당primary shard 수는data node 수보다크거나같아야한다. (number_of_shards>= number_of_data_nodes) Indice당shard 수는200개미만으로구성한다. Shard 하나의크기는50GB 미만으로구성한다. 2.3.색인최적화
  26. 26. Client Bulk API를사용한다. Hardware 성능을점검한다. Exception을확인한다. Thread pools을점검한다. 1110(Node,Indice,Shard,Replica)으로점검한다. Optimize 대신Flush와Refresh를활용한다. 2.3.색인최적화
  27. 27. Bulk indexing Request 당크기는5 ~ 15MB Request 당문서크기는1,000 ~ 5,000개 Server bulk thread pool 크기는core size ×5 보다작거나같게설정 Client bulk connection pool 크기는3 ~ 10개×number_of_data_nodes Client ping timeout은30 ~ 90초로설정 Client node sampler interval은30 ~ 90초로설정 Client transport sniff를true로설정 Client network TCP blocking을false로설정 2.3.색인최적화
  28. 28. Bulk indexing Disable refresh_interval Disable replica Use flush & refresh (instead of optimize) 2.3.색인최적화 Bulk indexing flow Update Settings Bulk Request Flush & Refresh Update Settings
  29. 29. Shards Data 분산을위해shard 수를늘린다. Replica shard 수를늘린다. 2.4.질의최적화 Data distribution Use routing Check _id ShardId= hash(_id) % number_of_primary_shards
  30. 30. Query 항상같은node 로query hitting이되지않도록한다. Zero hit query를줄여야한다. Query 결과를cache 한다. Avoid deep pagination. Sorting : number_of_shard×(from +size) Script 사용시_source, _field 대신doc[‘field’]를사용한다. 2.4.질의최적화 Search type Query and fetch Query then fetch Count Scan
  31. 31. Queries vs. Filters Query 대신filtered query와filter를사용한다. And/or/not filter 대신boolfilter를사용한다. 2.4.질의최적화 Queries Filters Relevance Binary yes/no Full text Exactvalues Not cached Cached Slower Faster “query” : { “match_all” : { } } “query” : { “filtered” : { “query” : { “match_all” : {} } } }
  32. 32. 3.ElasticSearch 빅데이터 활용 3.1.Hadoop 통합 3.2.SQL on ElasticSearch
  33. 33. ElasticSearchHadoop 활용 Big data 분석을위한도구 Snapshot & Restore 저장소 ElasticSearchHadoop plugin 도구제공 3.1.Hadoop 통합
  34. 34. Indexing 3.1.Hadoop 통합 ElasticSearch Hadoop plugin Read raw data Integrate natively Bulk indexing Java client application BulkRequestBuilder REST API Control concurrency request
  35. 35. Indexing ElasticSearch Hadoop Plugin MapReduce 3.1.Hadoop 통합 Configuration conf= new Configuration(); …중략… conf.set(Configuration.ES_NODES, “localhost:9200”); conf.set(Configuration.ES_RESOURCE, “blog/post”); …중략… Job job= new Job(conf); job.setInputFormatClass(TextInputFormat.class); job.setOutputFormatClass(EsOutputFormat.class); job.setMapOutputValueClass(LinkedMapWritable.class); job.setMapperClass(TabMapper.class); job.setNumReduceTasks(0); File fl= new File(“blog/post.txt”); long splitSize= fl.length() / 3; TextInputFormat.setMaxInputSplitSize(job, splitSize); TextInputFormat.setMinInputSplitSize(job, 50); booleanresult = job.waitForCompletion(true);
  36. 36. Indexing Java Client Application MapReduce 3.1.Hadoop 통합 public static void main(String[] args) throws Exception { ...중략... settings= Connector.buildSettings(esCluster); client= Connector.buildClient(settings, esNodes.split(",")); runBeforeConfig(esIndice); Job job= new Job(conf); ...중략... for ( String distJar: esDistributedCacheJars) { DistributedCache.addFileToClassPath( new Path(esDistributedCachePath+"/"+distJar), job.getConfiguration()); } ...중략... if ( "true".equalsIgnoreCase(esOptimize) ) { runOptimize(esIndice); } else { runRefreshAndFlush(esIndice); } runAfterConfig(esIndice, replica); }
  37. 37. Indexing Java Client Application MapReduce 3.1.Hadoop 통합 public void map(Object key, Object value, Context context) throws Exception { ...중략... IndexRequestindexRequest= new IndexRequest(); indexRequest= indexRequest.index(esIndice) .type(esType) .source(doc); ...중략... bulkRequest.add(indexRequest); ...중략... bulkResponse= bulkRequest.setConsistencyLevel(QUORUM) .setReplicationType(ASYNC) .setRefresh(false) .execute() .actionGet(); ...중략... }
  38. 38. Searching 3.1.Hadoop 통합 ElasticSearchHadoop plugin Integrate natively Query request Java client application Query request
  39. 39. Searching ElasticSearch Hadoop Plugin MapReduce 3.1.Hadoop 통합 public static class SearchMapperextends Mapper { @Override public void map(Object key, Object value, Context context) throws IOException, InterruptedException{ Text docId= (Text) key; LinkedMapWritabledoc = (LinkedMapWritable) value; System.out.println(docId); } } public static void main(String[] args) throws Exception { Configuration conf= new Configuration(); ...중략... Job job= new Job(conf); ...중략... conf.set(ConfigurationOptions.ES_QUERY, "{ "query" : { "match_all" : {} } }"); job.setNumReduceTasks(0); booleanresult = job.waitForCompletion(true); }
  40. 40. Searching Java Client Application 3.1.Hadoop 통합 SearchResponsesearchResponse; MatchAllQueryBuilder matchAllQueryBuilder= new MatchAllQueryBuilder(); searchResponse= client.prepareSearch(esIndice) .setQuery(matchAllQueryBuilder) .execute() .actionGet(); System.out.println(searchResponse.toString());
  41. 41. ElasticSearchSQL 이란? 쉬운접근성과데이터분석도구를제공한다. 표준SQL 문법을Query DSL로변환한다. 표준SQL 문법을사용하여검색엔진으로CRUD 연산을수행할수있다. JDBC drive와CLI 기능을제공하고있다. Apache Tajo용SQL analyzer를사용하고있다. 3.2.SQL on ElasticSearch
  42. 42. ElasticSearchJDBC driver 3.2.SQL on ElasticSearch Client Application JDBC Driver Elastic Search SQL Analyzer Algebra Expression Query DSL Planner Query Execution SQL DSL
  43. 43. ElasticSearchSQL Syntax Create database/table Drop database/table Select/Insert/Upsert/Delete Use database Show databases/tables Desctable 3.2.SQL on ElasticSearch
  44. 44. ElasticSearchAnalytics(Aggregations) SQL Min/max/sum/avg/stats/extended_stats Value_count/percentiles/cardinality Global_* Terms/range/date_range 3.2.SQL on ElasticSearch
  45. 45. ElasticSearchSQL vs. Query DSL 3.2.SQL on ElasticSearch SQL Query DSL SELECT * FROM type_name LIMIT 0/10 "match_all": {} … “from” : 0, “size” : 10 SELECT field1, field2 FROM type_name WHERE search_field= ‘elasticsearch’ "term": { "search_field": { "value": "elasticsearch" } } … "fields": [ "field1","field2" ]
  46. 46. ElasticSearchSQL vs. Query DSL 3.2.SQL on ElasticSearch SQL Query DSL SELECT * FROM type_name WHERE search_ field > ‘20140624235959’ ORDER BY search_fieldDESC "range": { "search_field": { "gt": "20140624235959" } } … "sort": [ { "search_field": { "order": "desc" } } ]
  47. 47. SQL on ElasticSearch Demo
  48. 48. ElasticSearch이해 Lucene기반의분산검색엔진 ElasticSearch성능최적화이해 정답은없지만… 항상좋은장비에최신버전을사용한다. 확장가능한modeling과sizing을구성한다. 병목구간을항상모니터링한다. Query와filter를목적에맞게사용한다. Bulk API를사용한다. ElasticSearch빅데이터활용 Hadoop과SQL로쉽게분석도구로활용한다. 마무리하며…
  49. 49. Q&A E-mail : sophistlv@gmail.com
  50. 50. THANK YOU

×