SlideShare une entreprise Scribd logo
1  sur  47
Solr Cluster installation tool "Anuenue"
                   and
     "Did You Mean?" for Japanese



              Takahiko Ito
               mixi, Inc.

                                           1
mixi?
£ One of the largest social
   networking service in
   Japan.
£ Many services to promote
   communication among
   users.
    ¢ Blog, news, game
       platform etc
    ¢ Most of the services
       come with search
£ 15M monthly active users



                               2
Our current (urgent) project …
Replace in-house search engines into a up-to-date search
platform!
    We have
     ¢  selected Apache Solr as the search platform!
     ¢  created a simple OSS package (Anuenue) which
         wraps Solr

Project URL: http://code.google.com/p/anuenue-wrapper/




                                                           3
Reason why we make Anuenue
Deployment / daily operations of Solr search cluster is a bit
difficult for ordinary engineers.	
     ¢ We need to edit the configuration files for all the Solr
        instances respectively
     ¢ Commands for whole clusters are not provided
          •  We need to write client commands by ourselves
          •  Hadoop provides utility commands for clusters
             E.g., start-all.sh (start processes), fsck (check all
             discs), balancer (rebalance the data blocks)
What does Anuenue provide?
£ Handy configuration of search clusters
£ Commands for clusters
    ¢ Simple commands (post, delete, update, commit etc)
    ¢ Start and stop commands for processes in cluster.
£ Japanese support
    ¢ Implementation of Japanese Did-You-Mean facilities
    ¢ Japanese tokenizer (Sen and Kuromoji)




                                                            5
Today’s Topics
£ Anuenue
    ¢ Handy configuration of search clusters
    ¢ Commands for search clusters

£ Did-You-Mean facilities for Japanese queries
    ¢ Common problem in Did-You-Mean implementation
    ¢ Mining a Japanese Did-You-Mean dictionary from
       query log data




                                                        6
Cluster configuration with Anuenue
£  Cluster setup is done with a special configuration file

£  Anuenue assigns more than one roles to instances.
     ¢  Roles are the functions in a cluster
     ¢  Anuenue supports three roles (Master, Slave,
         Merger)




                                                              7
Role: master
£ Index input data.

NOTE: Anuenue provides a command to distribute the input
data into master instances (build Solr shard indexes) .




             Master-1   Master-2     Master-3


                                       Build shard indexes


                        Input Data

                                                             8
Role: slave
Has three functions
                                         Merger-1
  ¢ Copy (replicate) index
     from master                                         Submit queries
  ¢ Accept queries from
     mergers and then         Slave-1               Slave-2
     search it own index
                                                              Replicate index
  ¢ Return the results to
     merger instance          Master-1              Master-2


                                                         Index input data


                                         Input Data

                                                                        9
Role: merger
£  Forwards queries from
    clients to slaves.            Client-1            Client-2
     ¢  Note: clients need not
         to know the slave                                Submit queries
         instances (merger
         adds ‘shard’                        Merger
         parameter with slave
                                                         Forwards queries
         instances)
£  Merge the results from all
                                  Slave-1             Slave-2
    the slave instances and
    returned the merged
    results.


                                                                   10
Example: Anuenue cluster
The cluster consists of five   Client-1                Client-2
machines
   ¢ Each has one                          aa
      Anuenue instance
                                                                  Forward queries


Instances                        cc                      dd

    ¢ Merger: aa                                                 Replicate index
    ¢ Master: bb, cc
                                 bb                     ee
    ¢ Slave: dd, ee
                                                                  Index input data


                                          Input Data
                                                                            11
How to assign roles to instance?

Edit cluster configuration file, anuenue-nodes.xml.
    •  Add three elements (mergers, slaves and masters) 	
    •  In each element, add more than one instance
       information (machine name and port number).




                                                             12
Configuration example
Case: there is one merger instance in machine, aa (port
7000)

<mergers>
  <merger>
    <host>aa</host>
    <port>7000</port>
  </merger>
</mergers>




                                                          13
Specify the index to replicate
<masters>
  <master iname=“master1”>
     <host>aaaa</host>
     <port>8983</port>
  </master>                          Add name of master instance
</masters>                           by iname attribute
<slaves>
   <slave >
      <host>bbbb</host>
      <port>8983</port>
      <replicate>master1</replicate>
   </slave>
                                     Specify the master instance
</slaves>
                                 to copy the index adding
                                 replicate element
                                                             14
Example: simple cluster settings
     <mergers>                             Client-1            Client-2
       <merger>
          <host>aa</host>
          <port>8983</port>
       </merger>                                      aa
     </mergers>
     <masters>                                                  Forward queries
        <master iname=“master1”>
          <host>bb</host>                             cc
          <port>8983</port>
        </master>                                               Replicate index
      </masters>
      <slaves>                                        bb
        <slave>
          <host>cc</host>                                       Index input data
          <port>8983</port>
          <replicate>master1</replicate>
        </slave>                                  Input Data
      </slaves>
	
                                                                           15
Cluster setup with Anuenue
£ Flexible and support various types of search cluster.

£ For example…




                                                           16
Assign multiple roles


          Client1                Client2


                                   Submit queries


                     instance

                                   Index input data


                    Input Data




                                                      17
Large clusters to handle huge data with
high QPS
      Client1            Client2          Client3   …     ClientN



                     Merger1        Merger2     Merger3



    Slave1      Slave2       Slave3       Slave4    Slave5      Slave6



    Master1     Master2     Master3      Master4    Master5    Master6




                                   Input Data
                                                                         18
After setting up cluster	
We can make use of commands for clusters.
 Anuenue provides
  ¢  start / stop commands
  ¢  commands to manipulate the index
Start and stop clusters
Users can start / stop clusters by a command
(anuenue-distdaemon.sh).

Usage:
  $sh bin/anuenue-distdaemon.sh [start|stop]
Simple commands for clusters	
Anuenue also provides basic commands ( post’, ‘delete’,
‘commit’, ‘optimize’ and ‘update’) for search cluster 	
   ¢ The commands are implemented in multi-thread

E.g.,
   $sh bin/anuenue-distcommands.sh post -arg inputDir
Today’s Topics
£ Anuenue
    ¢ Handy cluster configuration of search clusters
    ¢ Commands for search clusters

£ Did-You-Mean facilities for Japanese queries
    ¢ Common problem in Did-You-Mean implementation
    ¢ Mining a Japanese Did-You-Mean dictionary from
       query log data




                                                        22
What is Did-You-Mean service?
£ Suggest correct spelling when users submit queries with
   mistakes
£ Increase the usability of search service




                                                             23
Example: Did-You-Mean service



                 (English: Ugly Betty)




                                         24
Common implementation
Many search engines (including Solr) apply distance
measures such as Edit Distance [Levenshtein, 1965]

Edit Distance: measure of distance between two sequences.
Simply speaking, when two sequences have more common
characters, the distance is smaller.
	
E.g.,
   like 1 likes (small distance)
   like 1 foobar (large distance)



                                                       25
Common procedure: Did-You-Mean
When a user submits a query,
1.  Did-You-Mean service computes edit distance between
    input query and words in index.
2.  If there is a word whose distance is small,
         è  Did-You-Mean handler suggests

E.g., when a user submit a query, “pthon”, Did-You-Mean
service suggests a word in the index with small distance
“python”.




                                                           26
Problem: Japanese queries

Simple application of edit distance does not work for
Japanese
è Misspelled queries are sometimes totally different from
   the correct one (large distance).
    E.g.,
    ¢                 (correct:          )
    ¢                 (correct:              )

è These cases are derived from Japanese input method.




                                                             27
Typing in Japanese query
We input Japanese (query) words with two steps.
    1.  Type the reading of the Japanese word in Latin
        alphabet.
    2.  Select a desired word from the list of candidates



  This step cause a spelling mistake, too large
  distance to correct spelling




                                                            28
Example: Typing in Japanese queries
Assume a user wants to submit a query:
         (Obama)

1.  Type in the reading in Latin alphabet.
    reading: obama

2.  Select correct spelling.
    Possible candidates:         (correct),   ,   etc.




                                                         29
Japanese Did-You-Mean dictionary
£  Because of the large distance problem, simple distance
    measures (edit distance) do not work.

£  To handle this problem, Anuenue supports a special
    dictionary for Japanese Did-You-Mean service.




                                                             30
Dictionary for Japanese Did-You-Mean
 service

Dictionary has two columns   Query with           Correct Query
   1. Query with mistakes    mistakes
   2. Correct queries                        	
             	


                                   	
                            	

                                        	
                            	




                                                                      31
Implementing Did-You-Mean service with
 the dictionary
When users submit the         Query with           Correct Query
query with mistakes in        mistakes
dictionary,
                                              	
             	
è  Did-You-Mean service
    suggests the correct
    query                           	
                            	


NOTE: Anuenue provides                   	
                            	

handlers for the dictionary
format.



                                                                        32
Problem…
How we can create the dictionary?
è We can make use of a query log mining tool Oluolu.




                                                        33
Oluolu
£ Creates a spelling correction dictionary from query log
£ Extracts pairs of queries (query with spelling mistakes,
   query with correct spelling)
    ¢ Support the Japanese spelling mistakes (from version
       0.2)
£ runs on the Hadoop framework

Project URL: http://code.google.com/p/oluolu/




                                                              34
Input to Oluolu: query log
Three columns           User Id Query     Time
    1.  User Id
    2.  Query string    438904   Pthon    2009-11-21
    3.  Time of query                     11:16:12
        submission
                        34443    Java     2009-11-21
                                          12:16:13

                        438904   Python   2009-11-21
                                          12:16:20

                        8975     Java   2009-11-21
                                 Tomcat 12:16:25


                                                       35
Procedure: creating Japanese Did-You-
 Mean dictionary with Oluolu
Oluolu extracts the elements of Japanese Did-You-Mean
dictionary with 2 steps.
     1.  Extract all the query pairs in the same session
     2.  Validate the query pairs




                                                           36
Step1: extract query pairs
£ Oluolu extracts pairs of     User ID    Query    Time
   queries in the same session.
   E.g., Oluolu extracts pair   438904     Pthon    2009-11-21
                                                    12:16:12
   (Pthon and Python).
                                  34443    Java     2009-11-21
                                                    12:16:13
£ Queries in the same session:
   a set of queries submit by the 438904   Python   2009-11-21
                                                    12:16:20
   same user within small time
   range.                         8975     Tomcat 2009-11-21
                                                    12:16:25

£ Extracted pairs can be
   misspelled query and correct
   query.
	
                                                               37
Step 2: validate candidate pairs
£ Oluolu validates all the query pairs extracted step 1.
£ In validation phase (step 2), Oluolu makes use of query
   readings.




                                                             38
Reading of Japanese words
£ Japanese words can be convert into the readings in Latin
   Alphabets.
    ¢           (reading: konnichiha)
    ¢    (reading: itou)

FACT: even when Japanese query with spelling mistakes
can be totally different from correct query,
   è  the readings are the same or the distance is small!

   	



                                                             39
Validate candidate pair with reading
Given a query pairs, Oluolu validates the queries with 2
steps
   1. Convert the queries into readings with Latin Alphabets
   2. Compute edit distance with the two readings
       è  When the distance is small, the two queries are
        extracted as a element of Did-You-Mean dictionary.	




                                                          40
Example: step 2
Given a pair of queries: (             ,           )

1.  Convert them into readings
     è  readings are the same, “sumitomofudousan”.

3.  Compute the distance with the readings
     è  Distance is zero
     è  Extracted as a element of Did-You-Mean dictionary




                                                             41
Creating Japanese Did-You-Mean
 dictionary with Oluolu
£ Installation requirements
    ¢ Java 1.6.0 or greater
    ¢ Hadoop 0.20.0 or greater
    ¢ Oluolu 0.2.0 or greater
£ Copy the input query log into HDFS
£ Run spellcheck task of oluolu
   $ bin/oluolu spellcheck
                 -input testInput.txt
                 -output output
                 -inputLanguage ja



                                        42
Preliminary experiments	
£ Experimental settings
    ¢ Input data: log file from a mixi service (community
       search).
         •  5 GB data

£ Extracted dictionary
    ¢  number of elements is over 100.000
    ¢  succeeded to extract the query pairs with large edit
      distance.
         •  ( Ν,        )
         •  (      ,          )
Current status
£ Finished functional tests and stress tests.
£ Now replacing an in-house search engine in a small
   search service with Anuenue.
£ In next phase, we will apply Anuenue to the search
   service with large data and high QPS.




                                                        44
Future work
£ Integrate SolrCloud and Zookeeper
    ¢ Support failover, and rebalance the index

£ Kuromoji, a new OSS Japanese tokenizer	




                                                   45
Summary
£ Introduction of Anuenue
£ Described a Did-You-Mean facility for Japanese query




                                                          46
Thank you for your attention!




                                47

Contenu connexe

Tendances

Hector v2: The Second Version of the Popular High-Level Java Client for Apach...
Hector v2: The Second Version of the Popular High-Level Java Client for Apach...Hector v2: The Second Version of the Popular High-Level Java Client for Apach...
Hector v2: The Second Version of the Popular High-Level Java Client for Apach...zznate
 
JavaOne 2012 - JVM JIT for Dummies
JavaOne 2012 - JVM JIT for DummiesJavaOne 2012 - JVM JIT for Dummies
JavaOne 2012 - JVM JIT for DummiesCharles Nutter
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud Clusterslucenerevolution
 
Cassandra summit 2013 - DataStax Java Driver Unleashed!
Cassandra summit 2013 - DataStax Java Driver Unleashed!Cassandra summit 2013 - DataStax Java Driver Unleashed!
Cassandra summit 2013 - DataStax Java Driver Unleashed!Michaël Figuière
 
Java Heap Dump Analysis Primer
Java Heap Dump Analysis PrimerJava Heap Dump Analysis Primer
Java Heap Dump Analysis PrimerKyle Hodgson
 
PostgreSQL Troubleshoot On-line, (RITfest 2015 meetup at Moscow, Russia).
PostgreSQL Troubleshoot On-line, (RITfest 2015 meetup at Moscow, Russia).PostgreSQL Troubleshoot On-line, (RITfest 2015 meetup at Moscow, Russia).
PostgreSQL Troubleshoot On-line, (RITfest 2015 meetup at Moscow, Russia).Alexey Lesovsky
 
Cassandra Java APIs Old and New – A Comparison
Cassandra Java APIs Old and New – A ComparisonCassandra Java APIs Old and New – A Comparison
Cassandra Java APIs Old and New – A Comparisonshsedghi
 
Peeking into the Black Hole Called PL/PGSQL - the New PL Profiler / Jan Wieck...
Peeking into the Black Hole Called PL/PGSQL - the New PL Profiler / Jan Wieck...Peeking into the Black Hole Called PL/PGSQL - the New PL Profiler / Jan Wieck...
Peeking into the Black Hole Called PL/PGSQL - the New PL Profiler / Jan Wieck...Ontico
 
The Art of JVM Profiling
The Art of JVM ProfilingThe Art of JVM Profiling
The Art of JVM ProfilingAndrei Pangin
 
Apache Cassandra, part 3 – machinery, work with Cassandra
Apache Cassandra, part 3 – machinery, work with CassandraApache Cassandra, part 3 – machinery, work with Cassandra
Apache Cassandra, part 3 – machinery, work with CassandraAndrey Lomakin
 
Advanced Apache Cassandra Operations with JMX
Advanced Apache Cassandra Operations with JMXAdvanced Apache Cassandra Operations with JMX
Advanced Apache Cassandra Operations with JMXzznate
 
Flame Graphs for MySQL DBAs - FOSDEM 2022 MySQL Devroom
Flame Graphs for MySQL DBAs - FOSDEM 2022 MySQL DevroomFlame Graphs for MySQL DBAs - FOSDEM 2022 MySQL Devroom
Flame Graphs for MySQL DBAs - FOSDEM 2022 MySQL DevroomValeriy Kravchuk
 
How Prometheus Store the Data
How Prometheus Store the DataHow Prometheus Store the Data
How Prometheus Store the DataHao Chen
 
Distributed systems at ok.ru #rigadevday
Distributed systems at ok.ru #rigadevdayDistributed systems at ok.ru #rigadevday
Distributed systems at ok.ru #rigadevdayodnoklassniki.ru
 
Tarantool как платформа для микросервисов / Антон Резников, Владимир Перепели...
Tarantool как платформа для микросервисов / Антон Резников, Владимир Перепели...Tarantool как платформа для микросервисов / Антон Резников, Владимир Перепели...
Tarantool как платформа для микросервисов / Антон Резников, Владимир Перепели...Ontico
 
Debugging & Tuning in Spark
Debugging & Tuning in SparkDebugging & Tuning in Spark
Debugging & Tuning in SparkShiao-An Yuan
 
More on bpftrace for MariaDB DBAs and Developers - FOSDEM 2022 MariaDB Devroom
More on bpftrace for MariaDB DBAs and Developers - FOSDEM 2022 MariaDB DevroomMore on bpftrace for MariaDB DBAs and Developers - FOSDEM 2022 MariaDB Devroom
More on bpftrace for MariaDB DBAs and Developers - FOSDEM 2022 MariaDB DevroomValeriy Kravchuk
 
Mastering PostgreSQL Administration
Mastering PostgreSQL AdministrationMastering PostgreSQL Administration
Mastering PostgreSQL AdministrationEDB
 
Cassandra by example - the path of read and write requests
Cassandra by example - the path of read and write requestsCassandra by example - the path of read and write requests
Cassandra by example - the path of read and write requestsgrro
 

Tendances (20)

Hector v2: The Second Version of the Popular High-Level Java Client for Apach...
Hector v2: The Second Version of the Popular High-Level Java Client for Apach...Hector v2: The Second Version of the Popular High-Level Java Client for Apach...
Hector v2: The Second Version of the Popular High-Level Java Client for Apach...
 
JavaOne 2012 - JVM JIT for Dummies
JavaOne 2012 - JVM JIT for DummiesJavaOne 2012 - JVM JIT for Dummies
JavaOne 2012 - JVM JIT for Dummies
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud Clusters
 
Cassandra summit 2013 - DataStax Java Driver Unleashed!
Cassandra summit 2013 - DataStax Java Driver Unleashed!Cassandra summit 2013 - DataStax Java Driver Unleashed!
Cassandra summit 2013 - DataStax Java Driver Unleashed!
 
Java Heap Dump Analysis Primer
Java Heap Dump Analysis PrimerJava Heap Dump Analysis Primer
Java Heap Dump Analysis Primer
 
PostgreSQL Troubleshoot On-line, (RITfest 2015 meetup at Moscow, Russia).
PostgreSQL Troubleshoot On-line, (RITfest 2015 meetup at Moscow, Russia).PostgreSQL Troubleshoot On-line, (RITfest 2015 meetup at Moscow, Russia).
PostgreSQL Troubleshoot On-line, (RITfest 2015 meetup at Moscow, Russia).
 
Cassandra Java APIs Old and New – A Comparison
Cassandra Java APIs Old and New – A ComparisonCassandra Java APIs Old and New – A Comparison
Cassandra Java APIs Old and New – A Comparison
 
Peeking into the Black Hole Called PL/PGSQL - the New PL Profiler / Jan Wieck...
Peeking into the Black Hole Called PL/PGSQL - the New PL Profiler / Jan Wieck...Peeking into the Black Hole Called PL/PGSQL - the New PL Profiler / Jan Wieck...
Peeking into the Black Hole Called PL/PGSQL - the New PL Profiler / Jan Wieck...
 
The Art of JVM Profiling
The Art of JVM ProfilingThe Art of JVM Profiling
The Art of JVM Profiling
 
Apache Cassandra, part 3 – machinery, work with Cassandra
Apache Cassandra, part 3 – machinery, work with CassandraApache Cassandra, part 3 – machinery, work with Cassandra
Apache Cassandra, part 3 – machinery, work with Cassandra
 
Advanced Apache Cassandra Operations with JMX
Advanced Apache Cassandra Operations with JMXAdvanced Apache Cassandra Operations with JMX
Advanced Apache Cassandra Operations with JMX
 
Apache Zookeeper
Apache ZookeeperApache Zookeeper
Apache Zookeeper
 
Flame Graphs for MySQL DBAs - FOSDEM 2022 MySQL Devroom
Flame Graphs for MySQL DBAs - FOSDEM 2022 MySQL DevroomFlame Graphs for MySQL DBAs - FOSDEM 2022 MySQL Devroom
Flame Graphs for MySQL DBAs - FOSDEM 2022 MySQL Devroom
 
How Prometheus Store the Data
How Prometheus Store the DataHow Prometheus Store the Data
How Prometheus Store the Data
 
Distributed systems at ok.ru #rigadevday
Distributed systems at ok.ru #rigadevdayDistributed systems at ok.ru #rigadevday
Distributed systems at ok.ru #rigadevday
 
Tarantool как платформа для микросервисов / Антон Резников, Владимир Перепели...
Tarantool как платформа для микросервисов / Антон Резников, Владимир Перепели...Tarantool как платформа для микросервисов / Антон Резников, Владимир Перепели...
Tarantool как платформа для микросервисов / Антон Резников, Владимир Перепели...
 
Debugging & Tuning in Spark
Debugging & Tuning in SparkDebugging & Tuning in Spark
Debugging & Tuning in Spark
 
More on bpftrace for MariaDB DBAs and Developers - FOSDEM 2022 MariaDB Devroom
More on bpftrace for MariaDB DBAs and Developers - FOSDEM 2022 MariaDB DevroomMore on bpftrace for MariaDB DBAs and Developers - FOSDEM 2022 MariaDB Devroom
More on bpftrace for MariaDB DBAs and Developers - FOSDEM 2022 MariaDB Devroom
 
Mastering PostgreSQL Administration
Mastering PostgreSQL AdministrationMastering PostgreSQL Administration
Mastering PostgreSQL Administration
 
Cassandra by example - the path of read and write requests
Cassandra by example - the path of read and write requestsCassandra by example - the path of read and write requests
Cassandra by example - the path of read and write requests
 

En vedette

Hadoop conference Japan 2011
Hadoop conference Japan 2011Hadoop conference Japan 2011
Hadoop conference Japan 2011Takahiko Ito
 
Achievements_MakotoHigashino
Achievements_MakotoHigashinoAchievements_MakotoHigashino
Achievements_MakotoHigashinoMakoto Higashino
 
Solr meeting in Japan 2011
Solr meeting in Japan 2011Solr meeting in Japan 2011
Solr meeting in Japan 2011Takahiko Ito
 
スライド作成入門
スライド作成入門スライド作成入門
スライド作成入門Takahiko Ito
 
GitHubで雑誌・書籍を作る
GitHubで雑誌・書籍を作るGitHubで雑誌・書籍を作る
GitHubで雑誌・書籍を作るNaonori Inao
 

En vedette (8)

Hadoop conference Japan 2011
Hadoop conference Japan 2011Hadoop conference Japan 2011
Hadoop conference Japan 2011
 
Achievements_MakotoHigashino
Achievements_MakotoHigashinoAchievements_MakotoHigashino
Achievements_MakotoHigashino
 
KDD 2005
KDD 2005KDD 2005
KDD 2005
 
Solr meeting in Japan 2011
Solr meeting in Japan 2011Solr meeting in Japan 2011
Solr meeting in Japan 2011
 
作文入門
作文入門作文入門
作文入門
 
スライド作成入門
スライド作成入門スライド作成入門
スライド作成入門
 
DocumentValidator
DocumentValidatorDocumentValidator
DocumentValidator
 
GitHubで雑誌・書籍を作る
GitHubで雑誌・書籍を作るGitHubで雑誌・書籍を作る
GitHubで雑誌・書籍を作る
 

Similaire à Lucene revolution 2011

Kubernetes Summit 2018 - Kubernetes: Stateless -> Stateful
Kubernetes Summit 2018 - Kubernetes: Stateless -> StatefulKubernetes Summit 2018 - Kubernetes: Stateless -> Stateful
Kubernetes Summit 2018 - Kubernetes: Stateless -> Statefulsmalltown
 
Distributed monitoring at Hyves- Puppet
Distributed monitoring at Hyves- PuppetDistributed monitoring at Hyves- Puppet
Distributed monitoring at Hyves- PuppetPuppet
 
Ansible & Salt - Vincent Boon
Ansible & Salt - Vincent BoonAnsible & Salt - Vincent Boon
Ansible & Salt - Vincent BoonMyNOG
 
Stack kicker devopsdays-london-2013
Stack kicker devopsdays-london-2013Stack kicker devopsdays-london-2013
Stack kicker devopsdays-london-2013Simon McCartney
 
Deep Dive on Amazon EC2 Instances (March 2017)
Deep Dive on Amazon EC2 Instances (March 2017)Deep Dive on Amazon EC2 Instances (March 2017)
Deep Dive on Amazon EC2 Instances (March 2017)Julien SIMON
 
Above the clouds: introducing Akka
Above the clouds: introducing AkkaAbove the clouds: introducing Akka
Above the clouds: introducing Akkanartamonov
 
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...Amazon Web Services
 
My sql failover test using orchestrator
My sql failover test  using orchestratorMy sql failover test  using orchestrator
My sql failover test using orchestratorYoungHeon (Roy) Kim
 
Orchestration tool roundup kubernetes vs. docker vs. heat vs. terra form vs...
Orchestration tool roundup   kubernetes vs. docker vs. heat vs. terra form vs...Orchestration tool roundup   kubernetes vs. docker vs. heat vs. terra form vs...
Orchestration tool roundup kubernetes vs. docker vs. heat vs. terra form vs...Nati Shalom
 
Uri Cohen & Dan Kilman, GigaSpaces - Orchestration Tool Roundup - OpenStack l...
Uri Cohen & Dan Kilman, GigaSpaces - Orchestration Tool Roundup - OpenStack l...Uri Cohen & Dan Kilman, GigaSpaces - Orchestration Tool Roundup - OpenStack l...
Uri Cohen & Dan Kilman, GigaSpaces - Orchestration Tool Roundup - OpenStack l...Cloud Native Day Tel Aviv
 
AWS re:Invent 2016: Deep Dive on Amazon EC2 Instances, Featuring Performance ...
AWS re:Invent 2016: Deep Dive on Amazon EC2 Instances, Featuring Performance ...AWS re:Invent 2016: Deep Dive on Amazon EC2 Instances, Featuring Performance ...
AWS re:Invent 2016: Deep Dive on Amazon EC2 Instances, Featuring Performance ...Amazon Web Services
 
AWS re:Invent 2016: [JK REPEAT] Deep Dive on Amazon EC2 Instances, Featuring ...
AWS re:Invent 2016: [JK REPEAT] Deep Dive on Amazon EC2 Instances, Featuring ...AWS re:Invent 2016: [JK REPEAT] Deep Dive on Amazon EC2 Instances, Featuring ...
AWS re:Invent 2016: [JK REPEAT] Deep Dive on Amazon EC2 Instances, Featuring ...Amazon Web Services
 
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...Amazon Web Services
 
Galera Multi Master Synchronous My S Q L Replication Clusters
Galera  Multi Master  Synchronous  My S Q L  Replication  ClustersGalera  Multi Master  Synchronous  My S Q L  Replication  Clusters
Galera Multi Master Synchronous My S Q L Replication ClustersPerconaPerformance
 
Intel's Out of the Box Network Developers Ireland Meetup on March 29 2017 - ...
Intel's Out of the Box Network Developers Ireland Meetup on March 29 2017  - ...Intel's Out of the Box Network Developers Ireland Meetup on March 29 2017  - ...
Intel's Out of the Box Network Developers Ireland Meetup on March 29 2017 - ...Haidee McMahon
 
Orchestration tool roundup - OpenStack Israel summit - kubernetes vs. docker...
Orchestration tool roundup  - OpenStack Israel summit - kubernetes vs. docker...Orchestration tool roundup  - OpenStack Israel summit - kubernetes vs. docker...
Orchestration tool roundup - OpenStack Israel summit - kubernetes vs. docker...Uri Cohen
 
Dude, This Isn't Where I Parked My Instance?
Dude, This Isn't Where I Parked My Instance?Dude, This Isn't Where I Parked My Instance?
Dude, This Isn't Where I Parked My Instance?Stephen Gordon
 

Similaire à Lucene revolution 2011 (20)

Kubernetes Summit 2018 - Kubernetes: Stateless -> Stateful
Kubernetes Summit 2018 - Kubernetes: Stateless -> StatefulKubernetes Summit 2018 - Kubernetes: Stateless -> Stateful
Kubernetes Summit 2018 - Kubernetes: Stateless -> Stateful
 
Distributed monitoring at Hyves- Puppet
Distributed monitoring at Hyves- PuppetDistributed monitoring at Hyves- Puppet
Distributed monitoring at Hyves- Puppet
 
Ansible & Salt - Vincent Boon
Ansible & Salt - Vincent BoonAnsible & Salt - Vincent Boon
Ansible & Salt - Vincent Boon
 
Stack kicker devopsdays-london-2013
Stack kicker devopsdays-london-2013Stack kicker devopsdays-london-2013
Stack kicker devopsdays-london-2013
 
Deep Dive on Amazon EC2 Instances (March 2017)
Deep Dive on Amazon EC2 Instances (March 2017)Deep Dive on Amazon EC2 Instances (March 2017)
Deep Dive on Amazon EC2 Instances (March 2017)
 
Above the clouds: introducing Akka
Above the clouds: introducing AkkaAbove the clouds: introducing Akka
Above the clouds: introducing Akka
 
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
 
Deep Dive on Amazon EC2
Deep Dive on Amazon EC2Deep Dive on Amazon EC2
Deep Dive on Amazon EC2
 
My sql failover test using orchestrator
My sql failover test  using orchestratorMy sql failover test  using orchestrator
My sql failover test using orchestrator
 
Orchestration tool roundup kubernetes vs. docker vs. heat vs. terra form vs...
Orchestration tool roundup   kubernetes vs. docker vs. heat vs. terra form vs...Orchestration tool roundup   kubernetes vs. docker vs. heat vs. terra form vs...
Orchestration tool roundup kubernetes vs. docker vs. heat vs. terra form vs...
 
Uri Cohen & Dan Kilman, GigaSpaces - Orchestration Tool Roundup - OpenStack l...
Uri Cohen & Dan Kilman, GigaSpaces - Orchestration Tool Roundup - OpenStack l...Uri Cohen & Dan Kilman, GigaSpaces - Orchestration Tool Roundup - OpenStack l...
Uri Cohen & Dan Kilman, GigaSpaces - Orchestration Tool Roundup - OpenStack l...
 
Exch2007 sp1 win2008
Exch2007 sp1 win2008Exch2007 sp1 win2008
Exch2007 sp1 win2008
 
MySQL Replication
MySQL ReplicationMySQL Replication
MySQL Replication
 
AWS re:Invent 2016: Deep Dive on Amazon EC2 Instances, Featuring Performance ...
AWS re:Invent 2016: Deep Dive on Amazon EC2 Instances, Featuring Performance ...AWS re:Invent 2016: Deep Dive on Amazon EC2 Instances, Featuring Performance ...
AWS re:Invent 2016: Deep Dive on Amazon EC2 Instances, Featuring Performance ...
 
AWS re:Invent 2016: [JK REPEAT] Deep Dive on Amazon EC2 Instances, Featuring ...
AWS re:Invent 2016: [JK REPEAT] Deep Dive on Amazon EC2 Instances, Featuring ...AWS re:Invent 2016: [JK REPEAT] Deep Dive on Amazon EC2 Instances, Featuring ...
AWS re:Invent 2016: [JK REPEAT] Deep Dive on Amazon EC2 Instances, Featuring ...
 
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
 
Galera Multi Master Synchronous My S Q L Replication Clusters
Galera  Multi Master  Synchronous  My S Q L  Replication  ClustersGalera  Multi Master  Synchronous  My S Q L  Replication  Clusters
Galera Multi Master Synchronous My S Q L Replication Clusters
 
Intel's Out of the Box Network Developers Ireland Meetup on March 29 2017 - ...
Intel's Out of the Box Network Developers Ireland Meetup on March 29 2017  - ...Intel's Out of the Box Network Developers Ireland Meetup on March 29 2017  - ...
Intel's Out of the Box Network Developers Ireland Meetup on March 29 2017 - ...
 
Orchestration tool roundup - OpenStack Israel summit - kubernetes vs. docker...
Orchestration tool roundup  - OpenStack Israel summit - kubernetes vs. docker...Orchestration tool roundup  - OpenStack Israel summit - kubernetes vs. docker...
Orchestration tool roundup - OpenStack Israel summit - kubernetes vs. docker...
 
Dude, This Isn't Where I Parked My Instance?
Dude, This Isn't Where I Parked My Instance?Dude, This Isn't Where I Parked My Instance?
Dude, This Isn't Where I Parked My Instance?
 

Dernier

Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 

Dernier (20)

Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 

Lucene revolution 2011

  • 1. Solr Cluster installation tool "Anuenue" and "Did You Mean?" for Japanese Takahiko Ito mixi, Inc. 1
  • 2. mixi? £ One of the largest social networking service in Japan. £ Many services to promote communication among users. ¢ Blog, news, game platform etc ¢ Most of the services come with search £ 15M monthly active users 2
  • 3. Our current (urgent) project … Replace in-house search engines into a up-to-date search platform! We have ¢  selected Apache Solr as the search platform! ¢  created a simple OSS package (Anuenue) which wraps Solr Project URL: http://code.google.com/p/anuenue-wrapper/ 3
  • 4. Reason why we make Anuenue Deployment / daily operations of Solr search cluster is a bit difficult for ordinary engineers. ¢ We need to edit the configuration files for all the Solr instances respectively ¢ Commands for whole clusters are not provided •  We need to write client commands by ourselves •  Hadoop provides utility commands for clusters E.g., start-all.sh (start processes), fsck (check all discs), balancer (rebalance the data blocks)
  • 5. What does Anuenue provide? £ Handy configuration of search clusters £ Commands for clusters ¢ Simple commands (post, delete, update, commit etc) ¢ Start and stop commands for processes in cluster. £ Japanese support ¢ Implementation of Japanese Did-You-Mean facilities ¢ Japanese tokenizer (Sen and Kuromoji) 5
  • 6. Today’s Topics £ Anuenue ¢ Handy configuration of search clusters ¢ Commands for search clusters £ Did-You-Mean facilities for Japanese queries ¢ Common problem in Did-You-Mean implementation ¢ Mining a Japanese Did-You-Mean dictionary from query log data 6
  • 7. Cluster configuration with Anuenue £  Cluster setup is done with a special configuration file £  Anuenue assigns more than one roles to instances. ¢  Roles are the functions in a cluster ¢  Anuenue supports three roles (Master, Slave, Merger) 7
  • 8. Role: master £ Index input data. NOTE: Anuenue provides a command to distribute the input data into master instances (build Solr shard indexes) . Master-1 Master-2 Master-3 Build shard indexes Input Data 8
  • 9. Role: slave Has three functions Merger-1 ¢ Copy (replicate) index from master Submit queries ¢ Accept queries from mergers and then Slave-1 Slave-2 search it own index Replicate index ¢ Return the results to merger instance Master-1 Master-2 Index input data Input Data 9
  • 10. Role: merger £  Forwards queries from clients to slaves. Client-1 Client-2 ¢  Note: clients need not to know the slave Submit queries instances (merger adds ‘shard’ Merger parameter with slave Forwards queries instances) £  Merge the results from all Slave-1 Slave-2 the slave instances and returned the merged results. 10
  • 11. Example: Anuenue cluster The cluster consists of five Client-1 Client-2 machines ¢ Each has one aa Anuenue instance Forward queries Instances cc dd ¢ Merger: aa Replicate index ¢ Master: bb, cc bb ee ¢ Slave: dd, ee Index input data Input Data 11
  • 12. How to assign roles to instance? Edit cluster configuration file, anuenue-nodes.xml. •  Add three elements (mergers, slaves and masters) •  In each element, add more than one instance information (machine name and port number). 12
  • 13. Configuration example Case: there is one merger instance in machine, aa (port 7000) <mergers> <merger> <host>aa</host> <port>7000</port> </merger> </mergers> 13
  • 14. Specify the index to replicate <masters> <master iname=“master1”> <host>aaaa</host> <port>8983</port> </master> Add name of master instance </masters> by iname attribute <slaves> <slave > <host>bbbb</host> <port>8983</port> <replicate>master1</replicate> </slave> Specify the master instance </slaves> to copy the index adding replicate element 14
  • 15. Example: simple cluster settings <mergers> Client-1 Client-2 <merger> <host>aa</host> <port>8983</port> </merger> aa </mergers> <masters> Forward queries <master iname=“master1”> <host>bb</host> cc <port>8983</port> </master> Replicate index </masters> <slaves> bb <slave> <host>cc</host> Index input data <port>8983</port> <replicate>master1</replicate> </slave> Input Data </slaves> 15
  • 16. Cluster setup with Anuenue £ Flexible and support various types of search cluster. £ For example… 16
  • 17. Assign multiple roles Client1 Client2 Submit queries instance Index input data Input Data 17
  • 18. Large clusters to handle huge data with high QPS Client1 Client2 Client3 … ClientN Merger1 Merger2 Merger3 Slave1 Slave2 Slave3 Slave4 Slave5 Slave6 Master1 Master2 Master3 Master4 Master5 Master6 Input Data 18
  • 19. After setting up cluster We can make use of commands for clusters. Anuenue provides ¢  start / stop commands ¢  commands to manipulate the index
  • 20. Start and stop clusters Users can start / stop clusters by a command (anuenue-distdaemon.sh). Usage: $sh bin/anuenue-distdaemon.sh [start|stop]
  • 21. Simple commands for clusters Anuenue also provides basic commands ( post’, ‘delete’, ‘commit’, ‘optimize’ and ‘update’) for search cluster ¢ The commands are implemented in multi-thread E.g., $sh bin/anuenue-distcommands.sh post -arg inputDir
  • 22. Today’s Topics £ Anuenue ¢ Handy cluster configuration of search clusters ¢ Commands for search clusters £ Did-You-Mean facilities for Japanese queries ¢ Common problem in Did-You-Mean implementation ¢ Mining a Japanese Did-You-Mean dictionary from query log data 22
  • 23. What is Did-You-Mean service? £ Suggest correct spelling when users submit queries with mistakes £ Increase the usability of search service 23
  • 24. Example: Did-You-Mean service (English: Ugly Betty) 24
  • 25. Common implementation Many search engines (including Solr) apply distance measures such as Edit Distance [Levenshtein, 1965] Edit Distance: measure of distance between two sequences. Simply speaking, when two sequences have more common characters, the distance is smaller. E.g., like 1 likes (small distance) like 1 foobar (large distance) 25
  • 26. Common procedure: Did-You-Mean When a user submits a query, 1.  Did-You-Mean service computes edit distance between input query and words in index. 2.  If there is a word whose distance is small, è  Did-You-Mean handler suggests E.g., when a user submit a query, “pthon”, Did-You-Mean service suggests a word in the index with small distance “python”. 26
  • 27. Problem: Japanese queries Simple application of edit distance does not work for Japanese è Misspelled queries are sometimes totally different from the correct one (large distance). E.g., ¢  (correct: ) ¢  (correct: ) è These cases are derived from Japanese input method. 27
  • 28. Typing in Japanese query We input Japanese (query) words with two steps. 1.  Type the reading of the Japanese word in Latin alphabet. 2.  Select a desired word from the list of candidates This step cause a spelling mistake, too large distance to correct spelling 28
  • 29. Example: Typing in Japanese queries Assume a user wants to submit a query: (Obama) 1.  Type in the reading in Latin alphabet. reading: obama 2.  Select correct spelling. Possible candidates: (correct), , etc. 29
  • 30. Japanese Did-You-Mean dictionary £  Because of the large distance problem, simple distance measures (edit distance) do not work. £  To handle this problem, Anuenue supports a special dictionary for Japanese Did-You-Mean service. 30
  • 31. Dictionary for Japanese Did-You-Mean service Dictionary has two columns Query with Correct Query 1. Query with mistakes mistakes 2. Correct queries 31
  • 32. Implementing Did-You-Mean service with the dictionary When users submit the Query with Correct Query query with mistakes in mistakes dictionary, è  Did-You-Mean service suggests the correct query NOTE: Anuenue provides handlers for the dictionary format. 32
  • 33. Problem… How we can create the dictionary? è We can make use of a query log mining tool Oluolu. 33
  • 34. Oluolu £ Creates a spelling correction dictionary from query log £ Extracts pairs of queries (query with spelling mistakes, query with correct spelling) ¢ Support the Japanese spelling mistakes (from version 0.2) £ runs on the Hadoop framework Project URL: http://code.google.com/p/oluolu/ 34
  • 35. Input to Oluolu: query log Three columns User Id Query Time 1.  User Id 2.  Query string 438904 Pthon 2009-11-21 3.  Time of query 11:16:12 submission 34443 Java 2009-11-21 12:16:13 438904 Python 2009-11-21 12:16:20 8975 Java 2009-11-21 Tomcat 12:16:25 35
  • 36. Procedure: creating Japanese Did-You- Mean dictionary with Oluolu Oluolu extracts the elements of Japanese Did-You-Mean dictionary with 2 steps. 1.  Extract all the query pairs in the same session 2.  Validate the query pairs 36
  • 37. Step1: extract query pairs £ Oluolu extracts pairs of User ID Query Time queries in the same session. E.g., Oluolu extracts pair 438904 Pthon 2009-11-21 12:16:12 (Pthon and Python). 34443 Java 2009-11-21 12:16:13 £ Queries in the same session: a set of queries submit by the 438904 Python 2009-11-21 12:16:20 same user within small time range. 8975 Tomcat 2009-11-21 12:16:25 £ Extracted pairs can be misspelled query and correct query. 37
  • 38. Step 2: validate candidate pairs £ Oluolu validates all the query pairs extracted step 1. £ In validation phase (step 2), Oluolu makes use of query readings. 38
  • 39. Reading of Japanese words £ Japanese words can be convert into the readings in Latin Alphabets. ¢  (reading: konnichiha) ¢  (reading: itou) FACT: even when Japanese query with spelling mistakes can be totally different from correct query, è  the readings are the same or the distance is small! 39
  • 40. Validate candidate pair with reading Given a query pairs, Oluolu validates the queries with 2 steps 1. Convert the queries into readings with Latin Alphabets 2. Compute edit distance with the two readings è  When the distance is small, the two queries are extracted as a element of Did-You-Mean dictionary. 40
  • 41. Example: step 2 Given a pair of queries: ( , ) 1.  Convert them into readings è  readings are the same, “sumitomofudousan”. 3.  Compute the distance with the readings è  Distance is zero è  Extracted as a element of Did-You-Mean dictionary 41
  • 42. Creating Japanese Did-You-Mean dictionary with Oluolu £ Installation requirements ¢ Java 1.6.0 or greater ¢ Hadoop 0.20.0 or greater ¢ Oluolu 0.2.0 or greater £ Copy the input query log into HDFS £ Run spellcheck task of oluolu $ bin/oluolu spellcheck -input testInput.txt -output output -inputLanguage ja 42
  • 43. Preliminary experiments £ Experimental settings ¢ Input data: log file from a mixi service (community search). •  5 GB data £ Extracted dictionary ¢  number of elements is over 100.000 ¢  succeeded to extract the query pairs with large edit distance. •  ( Ν, ) •  ( , )
  • 44. Current status £ Finished functional tests and stress tests. £ Now replacing an in-house search engine in a small search service with Anuenue. £ In next phase, we will apply Anuenue to the search service with large data and high QPS. 44
  • 45. Future work £ Integrate SolrCloud and Zookeeper ¢ Support failover, and rebalance the index £ Kuromoji, a new OSS Japanese tokenizer 45
  • 46. Summary £ Introduction of Anuenue £ Described a Did-You-Mean facility for Japanese query 46
  • 47. Thank you for your attention! 47