SlideShare une entreprise Scribd logo
1  sur  77
Télécharger pour lire hors ligne
Using Lucene/Solr
to build Advertising Systems
Hide (Hatayama Hideharu)
Big Data Department, Targeting Section, Advertising Group
Rakuten, Inc. May 2nd 2013
2
Intro
Agenda | www.lucenerevolution.org
http://www.lucenerevolution.org/2013/agenda
3
Intro
Agenda | www.lucenerevolution.org
http://www.lucenerevolution.org/2013/agenda
35 min...orz
 my talk is NOT about... m(_ _)m
 NRT
 SolrCloud
 complicated queries
 or other Solr hot topics
 my talk is just about
 Overview of Solr, most common features
 Our empirical knowledge about Solr
4
Agenda
1 Introduction of Me & Rakuten
2 Solr centered Advertising Systems
4 Solr plug-in
3 Solr performance
5 (Solr with Japanese language)
5
Agenda
1 Introduction of Me & Rakuten
2 Solr centered Advertising Systems
4 Solr plug-in
3 Solr performance
5 (Solr with Japanese language)
6
Agenda
1 Introduction of Me & Rakuten
2 Solr centered Advertising Systems
4 Solr plug-in
3 Solr performance
5 (Solr with Japanese language)
7
Agenda
1 Introduction of Me & Rakuten
2 Solr centered Advertising Systems
4 Solr plug-in
3 Solr performance
5 (Solr with Japanese language)
8
Agenda
1 Introduction of Me & Rakuten
2 Solr centered Advertising Systems
4 Solr plug-in
3 Solr performance
5 (Solr with Japanese language)
9
Agenda
1 Introduction of Me & Rakuten
2 Solr centered Advertising Systems
4 Solr plug-in
3 Solr performance
5 (Solr with Japanese language)
10
Who am I?
 Hatayama Hideharu (call me Hide)
 M.Eng, Tokyo Institute of Technology, Japan
 Worked with advertising system in Rakuten for 3 years
 ad management system development
 ad distribution system development
 system architecture design
 increase the performance of systems
 increase profitability of ad services
 User of Solr, not implementer http://6109.hidepiy.com/
11
Who are we?
 Rakuten, Inc.
 Internet services company
 Founded : Feb. 7th 1997, Tokyo, Japan
 The first service: Rakuten Ichiba (shopping mall)
12
Who are we?
13
Rakuten in Japan
14
Rakuten Ichiba
 Ichiba: The largest online shopping mall in Japan
user info
campaign
other services
item search
category navigation
personalized item
item history
sale event
shop history
bookmarked item
service tab
:
15
Rakuten’s Global Expansion
★
●●●
●●
●
●
●
●●
●
●
●
●
●
●
● ●
●●
● ●
●
●
●
●
●
●
● ●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
● E-Commerce
eBook
Travel
Other services & businesses
Development center●
16
Agenda
1 Introduction of Me & Rakuten
2 Solr centered Advertising Systems
4 Solr plug-in
3 Solr performance
5 (Solr with Japanese language)
17
Types of advertisements on Rakuten Ichiba [1/3]
 Listing Ad (search word related ad)
item search
searched ads
searched items
18
Types of advertisements on Rakuten Ichiba [2/3]
 Display Ad (placement related ad)
where, when …
 Targeting Ad (user related ad)
sex, age, browsing history …
19
 ... Ad ?
120 ads on 1 page ...orz
Types of advertisements on Rakuten Ichiba [3/3]
20
ad system function landscape
ad system
Rakuten
Owned
Media
(Web/Email)
Owned Ad
Network
Rakuten staff
Merchants
Tool User Media
External
ADNW,
AdEx
Other staff
Tenancy Ad (Fixed placement/fee/term)
P4P Ad (CPM/CPC/CPA etc.)
Ad placement def.
Sales mgmt.
Creative mgmt.
Campaign mgmt.
Budget mgmt.
Bidding
Additional Function
Big Data Analysis Advanced
targeting
Creative
optimization
Connect to
affiliate network
Programmatic
media buying
- Attribution
- Behavior
- Optimization
Delivery mgmt.
Reporting
Merchant Tool
Targeting/media
Reporting
Merchant Tool
Ad server.
ad management ad distribution
Log processing
Targeting
(Placement, keyword,
behavioral, demographic, etc.)
Beacon server.
Redirect server.
Device
x
PC Mobile
Smart
phone
Tablet
21
ad distribution system [1/2]
JSON
HTML
JavaScript
 ad searching
 ad filtering
 ad sorting
 logging
 ...
???
parameter
placement
keyword
ad type
...
cookie
22
ad distribution system [2/2]
 need high performance, high availability
e.g., more than 7,000 req / sec for 1 server with 100.00% avail.
 collect & analyze log, then improve profitability
 basic architecture is the same for our variety of ad
 using...
Kyoto Tycoon
23
system design: few years ago [1/5]
master
...
: 1 physical server
... : SLB
: 1 server cluster
x4 x4 x4 x4 x4
x4 x4
x2
slave
web svr
app svr
master
24
master
system design: few years ago [2/5]
master
...
: 1 physical server
... : SLB
: 1 server cluster
x4 x4
x2
slave
web svr
app svrx4 x4 x4 x4 x4
cluster
web server x 4
app server x 5
25
master
system design: few years ago [3/5]
master
...
: 1 physical server
: SLB
: 1 server cluster
x2
slave
web svr
app svr
...
x4 x4 x4 x4 x4
x4 x4
SLB connect
app <-> Solr
26
system design: few years ago [4/5]
master
...
: 1 physical server
... : SLB
: 1 server cluster
x4 x4 x4 x4 x4
x4 x4
x2
slave
web svr
app svr
 High availability, robust
 simplified task for each servers
Web server only do Apaching
Solr server searching
...
 make full use of resources, on demand provisioning
e.g., add 1 front cluster
e.g., swap broken apache server
e.g., tune up performance, decrease app server 5 -> 3
27
system design: few years ago [5/5]
master
...
: 1 physical server
... : SLB
: 1 server cluster
x4 x4 x4 x4 x4
x4 x4
x2
slave
web svr
app svr
 so many servers, so many configurations
we didn’t have automatic deploy or operation tools
 so many external networking
Apache <-> Tomcat
app <-> Solr
...
 Apache, Tomcat, Solr, and Redis had never died,
but the performance was our biggest issue.
28
system design: little bit changed [1/4]
master
...
: 1 physical server
... : SLB
: 1 server cluster
x4 x4
x4 x4
x2
slave
master
29
system design: little bit changed[2/4]
master
: 1 physical server
... : SLB
: 1 server cluster
x4 x4
x2
slave
master
...
x4 x4
merged web & app server
1 physical server both contains
Apache & Tomcat
30
system design: little bit changed[3/4]
master
...
: 1 physical server
... : SLB
: 1 server cluster
x4 x4
x4 x4
x2
slave
master
 easy to understand whole system network
 easy to operate
 easy to deploy or change configurations
31
system design: little bit changed [4/4]
master
...
: 1 physical server
... : SLB
: 1 server cluster
x4 x4
x4 x4
x2
slave
master
 Solr is still far from apps
32
system design: current[1/4]
...
: 1 physical server
: SLB
x2
app
x2 x2
app
x2 x2
x2
master
33
system design: current [2/4]
: 1 physical server
: SLB
x2x2
master
...app
x2 x2
app
x2 x2
Solr slave is included
in app server
34
system design: current [3/4]
: 1 physical server
: SLB
master
...app
x2 x2
app
x2 x2
x2x2
SLB connect
master <-> slave
35
system design: current [4/4]
...
: 1 physical server
: SLB
x2
app
x2 x2
app
x2 x2
x2
master
 no SPOF (Solr master is redundant)
 easy to understand whole system process
 easy to operate
 easy to deploy or change configurations
 easy to scale out
 good performance (7000 req / sec by 1 server)
 but we can’t make full use of server resources
e.g., we want 0.7 Solr instance for 1 app instance...
36
system design: in the near future
 server instance
 physical on-premise, private cloud, public cloud
 PaaS
 Apache or Nginx?
 shared cache
 master <-> slave or SolrCloud?
 Solr or Elasticsearch?
 abolish servlet & tomcat style?
 collaborate more with Hadoop family members
37
system design: in the near future
 server instance
 physical on-premise, private cloud, public cloud
 PaaS
 Apache or Nginx?
 shared cache
 Solr or Elasticsearch?
 abolish servlet & tomcat style
 collaborate more with Hadoop family members
m(_ _)m
UNDER
CONSTRUCTION
38
operation e.g. Solr schema update [1/8]
: 1 physical server
: SLB
x2
app
x2 x2
app
x2 x2
x2
master
app
x2 x2
39
operation e.g. Solr schema update [2/8]
: 1 physical server
: SLB
x2
app
x2 x2
app
x2 x2
x2
master
app
x2 x2
Stop replication of
Solr & Redis
40
operation e.g. Solr schema update [3/8]
: 1 physical server
: SLB
x2
app
x2 x2
app
x2 x2
x2
master
app
x2 x2
Separated from the net
Service IN Service IN Service OUT
41
operation e.g. Solr schema update [4/8]
: 1 physical server
: SLB
x2
app
x2 x2
app
x2 x2
x2
master
app
x2 x2
update schema & app
Service IN Service IN Service OUT
42
operation e.g. Solr schema update [5/8]
: 1 physical server
: SLB
x2
app
x2 x2
app
x2 x2
x2
master
app
x2 x2
update schema
Service IN Service IN Service OUT
43
operation e.g. Solr shcema update [6/8]
: 1 physical server
: SLB
x2
app
x2 x2
app
x2 x2
x2
master
app
x2 x2
restart replication
Service IN Service IN Service OUT
44
operation e.g. Solr shema update [7/8]
: 1 physical server
: SLB
x2
app
x2 x2
app
x2 x2
x2
master
app
x2 x2
test app functions
with reverse proxy
Service IN Service IN Service OUT
45
operation e.g. Solr shcema update [8/8]
: 1 physical server
: SLB
x2
app
x2 x2
app
x2 x2
x2
master
app
x2 x2
Service IN Service IN Service IN
connected to the net
46
Agenda
1 Introduction of Me & Rakuten
2 Solr centered Advertising Systems
4 Solr plug-in
3 Solr performance
5 (Solr with Japanese language)
47
Solr cache
 about various kind of Lucene/Solr cache
 fieldCache (Lucene level)
 fieldValueCache
 documentCache
 filterCache
 queryResultCache
 HTTP chache
 and user defined cache
48
filter cache
 we’re using it for caching the results of filter queries
<!-- default in solrconfig.xml -->
<filterCache class="solr.FastLRUCache"
size="512"
initialSize="512"
autowarmCount="0"/>
49
query result cache
 we used to activate it for avoiding useless searching
<!-- default in solrconfig.xml -->
<queryResultCache class="solr.LRUCache"
size="512"
initialSize="512"
autowarmCount="0"/>
50
application cache
 about cache in app side
processing time without Searching is 0 – 1 msec
-> convert from doc to DTO is relatively wasteful
-> SolrJ with javabin works well, but...
51
sizing & memory usage
 monitoring -> tuning configuration, memory allocation
 server: traffic, load, cpu, memory, page, swap
 Apache: busy, rps, bps, cpu, state, processing time
 Tomcat: thread, rps, bps, eps, memory, jmx
 Solr: index size, doc num, memory, cache hit ratio
admin page, admin/Luke, replication?command=details...
server mon GrowthForecast Solr admin, command, Luke
52
avoid Full GC
 Full GC
if we allocate 2GB for a tomcat heap
-> “Stop the World” would be more than 1 sec
 Concurrent GC (we’re still struggling in tuning)
e.g.,)
HEAP_OPTS="-Xmx2g -Xms2g -Xss512k"
GC_LOG_OPTS="-verbose:gc -XX:+PrintGCTimeStamps -XX:+PrintGCDetails"
FULL_GC_OPTS="-XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:+UseParNewGC -
XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=32 -XX:TargetSurvivorRatio=90"
JMX_OPTS="-Dcom.sun.management.config.file=${CATALINA_HOME}/conf/management.properties"
CATALINA_OPTS="-server ${HEAP_OPTS} ${GC_LOG_OPTS} ${FULL_GC_OPTS} ${JMX_OPTS}"
53
Agenda
1 Introduction of Me & Rakuten
2 Solr centered Advertising Systems
4 Solr plug-in
3 Solr performance
5 (Solr with Japanese language)
54
Solr plugin
 RequestHandler, SearchHandler
 SearchComponent, QueryComponent
 QParserPlugin, PostFilter
 QueryResponseWriter
-> implemented these classes for our own use
55
RequestHandler & SearchHandler
 for logging
 for health check
like /admin/ calls AdminHandlers
public class OurRequestHandler extends RequestHandlerBase {
/** Logger */
private static Logger log = LoggerFactory.getLogger(OurRequestHandler.class);
@Override public void init(NamedList args) { super.init(args); }
@Override public void handleRequestBody(SolrQueryRequest req, SolrQueryResponse rsp)
throws Exception {
log.info(req.toString());
rsp.setHttpCaching(false);
...
}
}
56
Solr index situation [1/2]
 Solr’s indexing need huge costs, we thought (just thought...)
-> then separated into these two
 basic stable data
 additional unstable data
or
57
Solr index situation [2/2]
 Solr index: for searching
 keyword, placement data (Japan, Ichiba, footer...)
 a few GB
 Redis data (previously MySQL): for filtering or sorting
 ad status (active or not)
 ad price, ad rank (based on CTR, CVR...)
 and ad contents data (image path, link, text...)
 100MB – 10GB (depends on advertisement types)
58
searching: handle ads in app [1/2]
handle req
search
filter
sort
...
59
searching: handle ads in Solr [2/2]
handle req
search
...
60
Solr with Redis data handling [1/2]
 ResponseWriter
-> unsuitable for searching or filtering
 SearchComponent
-> easy to implement, configure
-> basic process is already handled in QueryComponent
61
Solr with Redis data handling [2/2]
 modify QueryComponent
-> good position in terms of functionality
-> base for default searching
-> relatively big component
 ConstantScoreQuery with our own Filter?
62
QueryParserPlugin & PostFilter [1/2]
e.g.)
<!–- solrconfig.xml -->
<!-- put jar file here -->
<lib dir=“.../orochi_search” />
<!-- define implemented class -->
<queryParser name=“redis” class=“...orochi.search.ExtendedQParserPlugin” />
public class ExtendedQParserPlugin extends QParserPlugin {
public void init(NamedList args) { /* NOOP */ }
@Override public QParser createParser
(String qstr, SolrParams localParams, SolrParams params, SolrQueryRequest req) {
return new QParser(qstr, localParams, params, req) {
...
@Override public Query parse() throws ParseException {
return new RedisPostFilter(rows, preview, currentTimeMillis);
}
};
}
}
63
QueryParserPlugin & PostFilter [2/2]
public class RedisPostFilter extends ExtendedQueryBase implements PostFilter {
public RedisPostFilter(int rows, long preview, long currentTimeMillis) {
setCache(false);
...
}
public boolean isValid(int docId, IndexSearcher indexSearcher) {
// return the document is valid or not.
document = indexSearcher.doc(docId, fieldSelector);
...
}
public DelegatingCollector getFilterCollector(final IndexSearcher indexSearcher) {
return new DelegatingCollector() {
@Override public void collect(int docId) throws IOException {
if (isValid(docId, indexSearcher)) {
super.collect(docId);
...
}
}
};
}
@Override public int getCost() { return Math.max(super.getCost(), 100); }
...
}
64
Merge Solr & Redis
handle req
search
...
65
Agenda
1 Introduction of Me & Rakuten
2 Solr centered Advertising Systems
4 Solr plug-in
3 Solr performance
5 (Solr with Japanese language)
66
Japanese linguistics
すもももももももも
(pronunciation) sumomomomomomomomo
すもも も もも も もも
(words) sumomo mo momo mo momo
李も桃も桃
(meaning) Plums and peaches are both part of peaches
67
Japanese linguistics
最中を食べている最中ですm(_ _)m
(pronunciation) monakawotabeteirusaichudesu
(meaning) I’m eating monaka. (excuse me)
how to separate this sentence into tokens for indexing?
68
Tokenize approach: N-gram
最中を食べている最中ですm(_ _)m
 unigram
最 中 を 食 べ て い る 最 中 で す m ( _ _ ) m
 bigram
最中 中を を食 食べ べて てい いる る最 最中 中で です すm m( (_ _ _ _) )m
 trigram
最中を 中を食 を食べ 食べて べてい ている いる最 る最中 最中で 中です ですm す
m( m(_ (_ _ _ _) _)m
69
Tokenize approach: Morphological Analysis [1/2]
最中を食べている最中ですm(_ _)m
 using dictionary
最中 を 食べ て いる 最中 です m(_ _)m
最中 を 食べ て いる 最中 です m(_ _)m
text 最中 を 食べ て いる 最中 です m(_ _)m
partO
fSpee
ch
noun-
common
particle-
case-
misc
verb-
main
particle-
conjuncti
ve
verb-
auxiliary
noun-
adverbial
auxiliary-
verb
-
pronu
nciati
on
monaka o tabe te iru saichu desu -
70
Tokenize approach: Morphological Analysis [2/2]
最中を食べている最中ですm(_ _)m
71
Tokenize approach: compare 2 ways
N-gram Morphological Analysis
index size big small
preparation not needed make & maintain word
dictionary
implementation very easy hard
NLP, ML, statistic
new word no problem update dictionary, re-index
search relevancy without omission
contains trivial
with omission
human like
processing time ... ...
72
Solr with Morphological Analysis
 ver. -3.5 : setup component & dictionary manually
Sen
Lucene gosen
...
 ver. 3.6- : field type text_ja woks well
“kuromoji” is inside
73
issues of kuromoji
 some adjustments are needed for migration
supported dictionaries would be different between
previous engine & kuromoji
 half width & full width characters
Windows8 <-> Windows8
AKB48 <-> AKB48
74
Japanese Analyzer
 JapaneseTokenizer
 JapaneseBaseFormFilter
 JapanesePartOfSpeechStopFilter
 CJKWidthFilter
 StopFilter
 JapaneseKatakanaStemFilter
 LowerCaseFilter
75
Agenda
1 Introduction of Me & Rakuten
2 Solr centered Advertising Systems
4 Solr plug-in
3 Solr performance
5 (Solr with Japanese language)
76
Thank you, San Diego
any question?
any comment?
any advice?
If you have some, let’s talk later (not now...?)
Hide (Hatayama Hideharu)
Big Data Department, Targeting Section, Advertising Group
Rakuten Inc.
blog: http://6109.hidepiy.com
facebook: http://www.facebook.com/hatayama.hideharu
twitter: ... I don’t remember

Contenu connexe

Similaire à Using lucene solr to build advertising systems

Oracle no sql release 3 4 overview
Oracle no sql release 3 4 overviewOracle no sql release 3 4 overview
Oracle no sql release 3 4 overviewAnand Chandak
 
[Rakuten TechConf2014] [C-5] Ichiba Architecture on ExaLogic
[Rakuten TechConf2014] [C-5] Ichiba Architecture on ExaLogic[Rakuten TechConf2014] [C-5] Ichiba Architecture on ExaLogic
[Rakuten TechConf2014] [C-5] Ichiba Architecture on ExaLogicRakuten Group, Inc.
 
Open-Falcon: A Distributed and High-Performance Monitoring System
Open-Falcon: A Distributed and High-Performance Monitoring SystemOpen-Falcon: A Distributed and High-Performance Monitoring System
Open-Falcon: A Distributed and High-Performance Monitoring SystemYao-Wei Ou
 
Auto scaling and dynamic routing for was liberty collectives
Auto scaling and dynamic routing for was liberty collectivesAuto scaling and dynamic routing for was liberty collectives
Auto scaling and dynamic routing for was liberty collectivessflynn073
 
Orchestrating microservices like a ninja
Orchestrating microservices like a ninjaOrchestrating microservices like a ninja
Orchestrating microservices like a ninjaApigee | Google Cloud
 
Mule soft meetup_chandigarh_#7_25_sept_2021
Mule soft meetup_chandigarh_#7_25_sept_2021Mule soft meetup_chandigarh_#7_25_sept_2021
Mule soft meetup_chandigarh_#7_25_sept_2021Lalit Panwar
 
sap basis transaction codes
sap basis transaction codessap basis transaction codes
sap basis transaction codesEOH SAP Services
 
Evolving ALLSTOCKER: Agile increments with Pharo Smalltalk
Evolving ALLSTOCKER: Agile increments with Pharo SmalltalkEvolving ALLSTOCKER: Agile increments with Pharo Smalltalk
Evolving ALLSTOCKER: Agile increments with Pharo SmalltalkESUG
 
Lagom : Reactive microservice framework
Lagom : Reactive microservice frameworkLagom : Reactive microservice framework
Lagom : Reactive microservice frameworkFabrice Sznajderman
 
MySQL-Performance Schema- What's new in MySQL-5.7 DMRs
MySQL-Performance Schema- What's new in MySQL-5.7 DMRsMySQL-Performance Schema- What's new in MySQL-5.7 DMRs
MySQL-Performance Schema- What's new in MySQL-5.7 DMRsMayank Prasad
 
Functional training day1
Functional training day1Functional training day1
Functional training day1Satyamitra maan
 
Splunk in Rakuten: Splunk as a Service for all
Splunk in Rakuten: Splunk as a Service for allSplunk in Rakuten: Splunk as a Service for all
Splunk in Rakuten: Splunk as a Service for allTimur Bagirov
 
2018 - CertiFUNcation - Olivier Dobberka: Apache Solr for Newbies
2018 - CertiFUNcation - Olivier Dobberka: Apache Solr for Newbies2018 - CertiFUNcation - Olivier Dobberka: Apache Solr for Newbies
2018 - CertiFUNcation - Olivier Dobberka: Apache Solr for NewbiesTYPO3 CertiFUNcation
 
Eating our Own Dogfood - How Automic Automates
Eating our Own Dogfood - How Automic AutomatesEating our Own Dogfood - How Automic Automates
Eating our Own Dogfood - How Automic AutomatesCA | Automic Software
 
Preparing for Neo - Singapore OutSystems User Group October 2022 Meetup
Preparing for Neo - Singapore OutSystems User Group October 2022 MeetupPreparing for Neo - Singapore OutSystems User Group October 2022 Meetup
Preparing for Neo - Singapore OutSystems User Group October 2022 MeetupYashrajNayak4
 

Similaire à Using lucene solr to build advertising systems (20)

Oracle no sql release 3 4 overview
Oracle no sql release 3 4 overviewOracle no sql release 3 4 overview
Oracle no sql release 3 4 overview
 
[Rakuten TechConf2014] [C-5] Ichiba Architecture on ExaLogic
[Rakuten TechConf2014] [C-5] Ichiba Architecture on ExaLogic[Rakuten TechConf2014] [C-5] Ichiba Architecture on ExaLogic
[Rakuten TechConf2014] [C-5] Ichiba Architecture on ExaLogic
 
Solr @ eBay Kleinanzeigen
Solr @ eBay KleinanzeigenSolr @ eBay Kleinanzeigen
Solr @ eBay Kleinanzeigen
 
Solarwinds overview 2013 original
Solarwinds overview 2013 originalSolarwinds overview 2013 original
Solarwinds overview 2013 original
 
Open-Falcon: A Distributed and High-Performance Monitoring System
Open-Falcon: A Distributed and High-Performance Monitoring SystemOpen-Falcon: A Distributed and High-Performance Monitoring System
Open-Falcon: A Distributed and High-Performance Monitoring System
 
mtl_rubykaigi
mtl_rubykaigimtl_rubykaigi
mtl_rubykaigi
 
Auto scaling and dynamic routing for was liberty collectives
Auto scaling and dynamic routing for was liberty collectivesAuto scaling and dynamic routing for was liberty collectives
Auto scaling and dynamic routing for was liberty collectives
 
Orchestrating microservices like a ninja
Orchestrating microservices like a ninjaOrchestrating microservices like a ninja
Orchestrating microservices like a ninja
 
Mule soft meetup_chandigarh_#7_25_sept_2021
Mule soft meetup_chandigarh_#7_25_sept_2021Mule soft meetup_chandigarh_#7_25_sept_2021
Mule soft meetup_chandigarh_#7_25_sept_2021
 
Introduction to DevOps
Introduction to DevOpsIntroduction to DevOps
Introduction to DevOps
 
Liberty management
Liberty managementLiberty management
Liberty management
 
sap basis transaction codes
sap basis transaction codessap basis transaction codes
sap basis transaction codes
 
Evolving ALLSTOCKER: Agile increments with Pharo Smalltalk
Evolving ALLSTOCKER: Agile increments with Pharo SmalltalkEvolving ALLSTOCKER: Agile increments with Pharo Smalltalk
Evolving ALLSTOCKER: Agile increments with Pharo Smalltalk
 
Lagom : Reactive microservice framework
Lagom : Reactive microservice frameworkLagom : Reactive microservice framework
Lagom : Reactive microservice framework
 
MySQL-Performance Schema- What's new in MySQL-5.7 DMRs
MySQL-Performance Schema- What's new in MySQL-5.7 DMRsMySQL-Performance Schema- What's new in MySQL-5.7 DMRs
MySQL-Performance Schema- What's new in MySQL-5.7 DMRs
 
Functional training day1
Functional training day1Functional training day1
Functional training day1
 
Splunk in Rakuten: Splunk as a Service for all
Splunk in Rakuten: Splunk as a Service for allSplunk in Rakuten: Splunk as a Service for all
Splunk in Rakuten: Splunk as a Service for all
 
2018 - CertiFUNcation - Olivier Dobberka: Apache Solr for Newbies
2018 - CertiFUNcation - Olivier Dobberka: Apache Solr for Newbies2018 - CertiFUNcation - Olivier Dobberka: Apache Solr for Newbies
2018 - CertiFUNcation - Olivier Dobberka: Apache Solr for Newbies
 
Eating our Own Dogfood - How Automic Automates
Eating our Own Dogfood - How Automic AutomatesEating our Own Dogfood - How Automic Automates
Eating our Own Dogfood - How Automic Automates
 
Preparing for Neo - Singapore OutSystems User Group October 2022 Meetup
Preparing for Neo - Singapore OutSystems User Group October 2022 MeetupPreparing for Neo - Singapore OutSystems User Group October 2022 Meetup
Preparing for Neo - Singapore OutSystems User Group October 2022 Meetup
 

Plus de lucenerevolution

Text Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and LuceneText Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and Lucenelucenerevolution
 
State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! lucenerevolution
 
Building Client-side Search Applications with Solr
Building Client-side Search Applications with SolrBuilding Client-side Search Applications with Solr
Building Client-side Search Applications with Solrlucenerevolution
 
Integrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationsIntegrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationslucenerevolution
 
Scaling Solr with SolrCloud
Scaling Solr with SolrCloudScaling Solr with SolrCloud
Scaling Solr with SolrCloudlucenerevolution
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud Clusterslucenerevolution
 
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and ParboiledImplementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiledlucenerevolution
 
Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs lucenerevolution
 
Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchlucenerevolution
 
Real-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and StormReal-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and Stormlucenerevolution
 
Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?lucenerevolution
 
Schemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APISchemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APIlucenerevolution
 
High Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with LuceneHigh Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with Lucenelucenerevolution
 
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMText Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMlucenerevolution
 
Faceted Search with Lucene
Faceted Search with LuceneFaceted Search with Lucene
Faceted Search with Lucenelucenerevolution
 
Recent Additions to Lucene Arsenal
Recent Additions to Lucene ArsenalRecent Additions to Lucene Arsenal
Recent Additions to Lucene Arsenallucenerevolution
 
Turning search upside down
Turning search upside downTurning search upside down
Turning search upside downlucenerevolution
 
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...lucenerevolution
 
Shrinking the haystack wes caldwell - final
Shrinking the haystack   wes caldwell - finalShrinking the haystack   wes caldwell - final
Shrinking the haystack wes caldwell - finallucenerevolution
 

Plus de lucenerevolution (20)

Text Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and LuceneText Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and Lucene
 
State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here!
 
Search at Twitter
Search at TwitterSearch at Twitter
Search at Twitter
 
Building Client-side Search Applications with Solr
Building Client-side Search Applications with SolrBuilding Client-side Search Applications with Solr
Building Client-side Search Applications with Solr
 
Integrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationsIntegrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applications
 
Scaling Solr with SolrCloud
Scaling Solr with SolrCloudScaling Solr with SolrCloud
Scaling Solr with SolrCloud
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud Clusters
 
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and ParboiledImplementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
 
Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs
 
Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic search
 
Real-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and StormReal-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and Storm
 
Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?
 
Schemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APISchemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST API
 
High Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with LuceneHigh Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with Lucene
 
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMText Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
 
Faceted Search with Lucene
Faceted Search with LuceneFaceted Search with Lucene
Faceted Search with Lucene
 
Recent Additions to Lucene Arsenal
Recent Additions to Lucene ArsenalRecent Additions to Lucene Arsenal
Recent Additions to Lucene Arsenal
 
Turning search upside down
Turning search upside downTurning search upside down
Turning search upside down
 
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
 
Shrinking the haystack wes caldwell - final
Shrinking the haystack   wes caldwell - finalShrinking the haystack   wes caldwell - final
Shrinking the haystack wes caldwell - final
 

Dernier

What is the Future of QuickBooks DeskTop?
What is the Future of QuickBooks DeskTop?What is the Future of QuickBooks DeskTop?
What is the Future of QuickBooks DeskTop?TechSoup
 
Patterns of Written Texts Across Disciplines.pptx
Patterns of Written Texts Across Disciplines.pptxPatterns of Written Texts Across Disciplines.pptx
Patterns of Written Texts Across Disciplines.pptxMYDA ANGELICA SUAN
 
How to Solve Singleton Error in the Odoo 17
How to Solve Singleton Error in the  Odoo 17How to Solve Singleton Error in the  Odoo 17
How to Solve Singleton Error in the Odoo 17Celine George
 
The basics of sentences session 10pptx.pptx
The basics of sentences session 10pptx.pptxThe basics of sentences session 10pptx.pptx
The basics of sentences session 10pptx.pptxheathfieldcps1
 
Philosophy of Education and Educational Philosophy
Philosophy of Education  and Educational PhilosophyPhilosophy of Education  and Educational Philosophy
Philosophy of Education and Educational PhilosophyShuvankar Madhu
 
Ultra structure and life cycle of Plasmodium.pptx
Ultra structure and life cycle of Plasmodium.pptxUltra structure and life cycle of Plasmodium.pptx
Ultra structure and life cycle of Plasmodium.pptxDr. Asif Anas
 
How to Use api.constrains ( ) in Odoo 17
How to Use api.constrains ( ) in Odoo 17How to Use api.constrains ( ) in Odoo 17
How to Use api.constrains ( ) in Odoo 17Celine George
 
In - Vivo and In - Vitro Correlation.pptx
In - Vivo and In - Vitro Correlation.pptxIn - Vivo and In - Vitro Correlation.pptx
In - Vivo and In - Vitro Correlation.pptxAditiChauhan701637
 
3.21.24 The Origins of Black Power.pptx
3.21.24  The Origins of Black Power.pptx3.21.24  The Origins of Black Power.pptx
3.21.24 The Origins of Black Power.pptxmary850239
 
Drug Information Services- DIC and Sources.
Drug Information Services- DIC and Sources.Drug Information Services- DIC and Sources.
Drug Information Services- DIC and Sources.raviapr7
 
Practical Research 1: Lesson 8 Writing the Thesis Statement.pptx
Practical Research 1: Lesson 8 Writing the Thesis Statement.pptxPractical Research 1: Lesson 8 Writing the Thesis Statement.pptx
Practical Research 1: Lesson 8 Writing the Thesis Statement.pptxKatherine Villaluna
 
How to Show Error_Warning Messages in Odoo 17
How to Show Error_Warning Messages in Odoo 17How to Show Error_Warning Messages in Odoo 17
How to Show Error_Warning Messages in Odoo 17Celine George
 
Maximizing Impact_ Nonprofit Website Planning, Budgeting, and Design.pdf
Maximizing Impact_ Nonprofit Website Planning, Budgeting, and Design.pdfMaximizing Impact_ Nonprofit Website Planning, Budgeting, and Design.pdf
Maximizing Impact_ Nonprofit Website Planning, Budgeting, and Design.pdfTechSoup
 
The Stolen Bacillus by Herbert George Wells
The Stolen Bacillus by Herbert George WellsThe Stolen Bacillus by Herbert George Wells
The Stolen Bacillus by Herbert George WellsEugene Lysak
 
CHUYÊN ĐỀ DẠY THÊM TIẾNG ANH LỚP 11 - GLOBAL SUCCESS - NĂM HỌC 2023-2024 - HK...
CHUYÊN ĐỀ DẠY THÊM TIẾNG ANH LỚP 11 - GLOBAL SUCCESS - NĂM HỌC 2023-2024 - HK...CHUYÊN ĐỀ DẠY THÊM TIẾNG ANH LỚP 11 - GLOBAL SUCCESS - NĂM HỌC 2023-2024 - HK...
CHUYÊN ĐỀ DẠY THÊM TIẾNG ANH LỚP 11 - GLOBAL SUCCESS - NĂM HỌC 2023-2024 - HK...Nguyen Thanh Tu Collection
 
Benefits & Challenges of Inclusive Education
Benefits & Challenges of Inclusive EducationBenefits & Challenges of Inclusive Education
Benefits & Challenges of Inclusive EducationMJDuyan
 
2024.03.23 What do successful readers do - Sandy Millin for PARK.pptx
2024.03.23 What do successful readers do - Sandy Millin for PARK.pptx2024.03.23 What do successful readers do - Sandy Millin for PARK.pptx
2024.03.23 What do successful readers do - Sandy Millin for PARK.pptxSandy Millin
 
PISA-VET launch_El Iza Mohamedou_19 March 2024.pptx
PISA-VET launch_El Iza Mohamedou_19 March 2024.pptxPISA-VET launch_El Iza Mohamedou_19 March 2024.pptx
PISA-VET launch_El Iza Mohamedou_19 March 2024.pptxEduSkills OECD
 

Dernier (20)

What is the Future of QuickBooks DeskTop?
What is the Future of QuickBooks DeskTop?What is the Future of QuickBooks DeskTop?
What is the Future of QuickBooks DeskTop?
 
Patterns of Written Texts Across Disciplines.pptx
Patterns of Written Texts Across Disciplines.pptxPatterns of Written Texts Across Disciplines.pptx
Patterns of Written Texts Across Disciplines.pptx
 
Personal Resilience in Project Management 2 - TV Edit 1a.pdf
Personal Resilience in Project Management 2 - TV Edit 1a.pdfPersonal Resilience in Project Management 2 - TV Edit 1a.pdf
Personal Resilience in Project Management 2 - TV Edit 1a.pdf
 
How to Solve Singleton Error in the Odoo 17
How to Solve Singleton Error in the  Odoo 17How to Solve Singleton Error in the  Odoo 17
How to Solve Singleton Error in the Odoo 17
 
The basics of sentences session 10pptx.pptx
The basics of sentences session 10pptx.pptxThe basics of sentences session 10pptx.pptx
The basics of sentences session 10pptx.pptx
 
Philosophy of Education and Educational Philosophy
Philosophy of Education  and Educational PhilosophyPhilosophy of Education  and Educational Philosophy
Philosophy of Education and Educational Philosophy
 
Ultra structure and life cycle of Plasmodium.pptx
Ultra structure and life cycle of Plasmodium.pptxUltra structure and life cycle of Plasmodium.pptx
Ultra structure and life cycle of Plasmodium.pptx
 
How to Use api.constrains ( ) in Odoo 17
How to Use api.constrains ( ) in Odoo 17How to Use api.constrains ( ) in Odoo 17
How to Use api.constrains ( ) in Odoo 17
 
In - Vivo and In - Vitro Correlation.pptx
In - Vivo and In - Vitro Correlation.pptxIn - Vivo and In - Vitro Correlation.pptx
In - Vivo and In - Vitro Correlation.pptx
 
3.21.24 The Origins of Black Power.pptx
3.21.24  The Origins of Black Power.pptx3.21.24  The Origins of Black Power.pptx
3.21.24 The Origins of Black Power.pptx
 
Drug Information Services- DIC and Sources.
Drug Information Services- DIC and Sources.Drug Information Services- DIC and Sources.
Drug Information Services- DIC and Sources.
 
Practical Research 1: Lesson 8 Writing the Thesis Statement.pptx
Practical Research 1: Lesson 8 Writing the Thesis Statement.pptxPractical Research 1: Lesson 8 Writing the Thesis Statement.pptx
Practical Research 1: Lesson 8 Writing the Thesis Statement.pptx
 
How to Show Error_Warning Messages in Odoo 17
How to Show Error_Warning Messages in Odoo 17How to Show Error_Warning Messages in Odoo 17
How to Show Error_Warning Messages in Odoo 17
 
Maximizing Impact_ Nonprofit Website Planning, Budgeting, and Design.pdf
Maximizing Impact_ Nonprofit Website Planning, Budgeting, and Design.pdfMaximizing Impact_ Nonprofit Website Planning, Budgeting, and Design.pdf
Maximizing Impact_ Nonprofit Website Planning, Budgeting, and Design.pdf
 
Prelims of Kant get Marx 2.0: a general politics quiz
Prelims of Kant get Marx 2.0: a general politics quizPrelims of Kant get Marx 2.0: a general politics quiz
Prelims of Kant get Marx 2.0: a general politics quiz
 
The Stolen Bacillus by Herbert George Wells
The Stolen Bacillus by Herbert George WellsThe Stolen Bacillus by Herbert George Wells
The Stolen Bacillus by Herbert George Wells
 
CHUYÊN ĐỀ DẠY THÊM TIẾNG ANH LỚP 11 - GLOBAL SUCCESS - NĂM HỌC 2023-2024 - HK...
CHUYÊN ĐỀ DẠY THÊM TIẾNG ANH LỚP 11 - GLOBAL SUCCESS - NĂM HỌC 2023-2024 - HK...CHUYÊN ĐỀ DẠY THÊM TIẾNG ANH LỚP 11 - GLOBAL SUCCESS - NĂM HỌC 2023-2024 - HK...
CHUYÊN ĐỀ DẠY THÊM TIẾNG ANH LỚP 11 - GLOBAL SUCCESS - NĂM HỌC 2023-2024 - HK...
 
Benefits & Challenges of Inclusive Education
Benefits & Challenges of Inclusive EducationBenefits & Challenges of Inclusive Education
Benefits & Challenges of Inclusive Education
 
2024.03.23 What do successful readers do - Sandy Millin for PARK.pptx
2024.03.23 What do successful readers do - Sandy Millin for PARK.pptx2024.03.23 What do successful readers do - Sandy Millin for PARK.pptx
2024.03.23 What do successful readers do - Sandy Millin for PARK.pptx
 
PISA-VET launch_El Iza Mohamedou_19 March 2024.pptx
PISA-VET launch_El Iza Mohamedou_19 March 2024.pptxPISA-VET launch_El Iza Mohamedou_19 March 2024.pptx
PISA-VET launch_El Iza Mohamedou_19 March 2024.pptx
 

Using lucene solr to build advertising systems

  • 1. Using Lucene/Solr to build Advertising Systems Hide (Hatayama Hideharu) Big Data Department, Targeting Section, Advertising Group Rakuten, Inc. May 2nd 2013
  • 3. 3 Intro Agenda | www.lucenerevolution.org http://www.lucenerevolution.org/2013/agenda 35 min...orz  my talk is NOT about... m(_ _)m  NRT  SolrCloud  complicated queries  or other Solr hot topics  my talk is just about  Overview of Solr, most common features  Our empirical knowledge about Solr
  • 4. 4 Agenda 1 Introduction of Me & Rakuten 2 Solr centered Advertising Systems 4 Solr plug-in 3 Solr performance 5 (Solr with Japanese language)
  • 5. 5 Agenda 1 Introduction of Me & Rakuten 2 Solr centered Advertising Systems 4 Solr plug-in 3 Solr performance 5 (Solr with Japanese language)
  • 6. 6 Agenda 1 Introduction of Me & Rakuten 2 Solr centered Advertising Systems 4 Solr plug-in 3 Solr performance 5 (Solr with Japanese language)
  • 7. 7 Agenda 1 Introduction of Me & Rakuten 2 Solr centered Advertising Systems 4 Solr plug-in 3 Solr performance 5 (Solr with Japanese language)
  • 8. 8 Agenda 1 Introduction of Me & Rakuten 2 Solr centered Advertising Systems 4 Solr plug-in 3 Solr performance 5 (Solr with Japanese language)
  • 9. 9 Agenda 1 Introduction of Me & Rakuten 2 Solr centered Advertising Systems 4 Solr plug-in 3 Solr performance 5 (Solr with Japanese language)
  • 10. 10 Who am I?  Hatayama Hideharu (call me Hide)  M.Eng, Tokyo Institute of Technology, Japan  Worked with advertising system in Rakuten for 3 years  ad management system development  ad distribution system development  system architecture design  increase the performance of systems  increase profitability of ad services  User of Solr, not implementer http://6109.hidepiy.com/
  • 11. 11 Who are we?  Rakuten, Inc.  Internet services company  Founded : Feb. 7th 1997, Tokyo, Japan  The first service: Rakuten Ichiba (shopping mall)
  • 14. 14 Rakuten Ichiba  Ichiba: The largest online shopping mall in Japan user info campaign other services item search category navigation personalized item item history sale event shop history bookmarked item service tab :
  • 15. 15 Rakuten’s Global Expansion ★ ●●● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● E-Commerce eBook Travel Other services & businesses Development center●
  • 16. 16 Agenda 1 Introduction of Me & Rakuten 2 Solr centered Advertising Systems 4 Solr plug-in 3 Solr performance 5 (Solr with Japanese language)
  • 17. 17 Types of advertisements on Rakuten Ichiba [1/3]  Listing Ad (search word related ad) item search searched ads searched items
  • 18. 18 Types of advertisements on Rakuten Ichiba [2/3]  Display Ad (placement related ad) where, when …  Targeting Ad (user related ad) sex, age, browsing history …
  • 19. 19  ... Ad ? 120 ads on 1 page ...orz Types of advertisements on Rakuten Ichiba [3/3]
  • 20. 20 ad system function landscape ad system Rakuten Owned Media (Web/Email) Owned Ad Network Rakuten staff Merchants Tool User Media External ADNW, AdEx Other staff Tenancy Ad (Fixed placement/fee/term) P4P Ad (CPM/CPC/CPA etc.) Ad placement def. Sales mgmt. Creative mgmt. Campaign mgmt. Budget mgmt. Bidding Additional Function Big Data Analysis Advanced targeting Creative optimization Connect to affiliate network Programmatic media buying - Attribution - Behavior - Optimization Delivery mgmt. Reporting Merchant Tool Targeting/media Reporting Merchant Tool Ad server. ad management ad distribution Log processing Targeting (Placement, keyword, behavioral, demographic, etc.) Beacon server. Redirect server. Device x PC Mobile Smart phone Tablet
  • 21. 21 ad distribution system [1/2] JSON HTML JavaScript  ad searching  ad filtering  ad sorting  logging  ... ??? parameter placement keyword ad type ... cookie
  • 22. 22 ad distribution system [2/2]  need high performance, high availability e.g., more than 7,000 req / sec for 1 server with 100.00% avail.  collect & analyze log, then improve profitability  basic architecture is the same for our variety of ad  using... Kyoto Tycoon
  • 23. 23 system design: few years ago [1/5] master ... : 1 physical server ... : SLB : 1 server cluster x4 x4 x4 x4 x4 x4 x4 x2 slave web svr app svr master
  • 24. 24 master system design: few years ago [2/5] master ... : 1 physical server ... : SLB : 1 server cluster x4 x4 x2 slave web svr app svrx4 x4 x4 x4 x4 cluster web server x 4 app server x 5
  • 25. 25 master system design: few years ago [3/5] master ... : 1 physical server : SLB : 1 server cluster x2 slave web svr app svr ... x4 x4 x4 x4 x4 x4 x4 SLB connect app <-> Solr
  • 26. 26 system design: few years ago [4/5] master ... : 1 physical server ... : SLB : 1 server cluster x4 x4 x4 x4 x4 x4 x4 x2 slave web svr app svr  High availability, robust  simplified task for each servers Web server only do Apaching Solr server searching ...  make full use of resources, on demand provisioning e.g., add 1 front cluster e.g., swap broken apache server e.g., tune up performance, decrease app server 5 -> 3
  • 27. 27 system design: few years ago [5/5] master ... : 1 physical server ... : SLB : 1 server cluster x4 x4 x4 x4 x4 x4 x4 x2 slave web svr app svr  so many servers, so many configurations we didn’t have automatic deploy or operation tools  so many external networking Apache <-> Tomcat app <-> Solr ...  Apache, Tomcat, Solr, and Redis had never died, but the performance was our biggest issue.
  • 28. 28 system design: little bit changed [1/4] master ... : 1 physical server ... : SLB : 1 server cluster x4 x4 x4 x4 x2 slave master
  • 29. 29 system design: little bit changed[2/4] master : 1 physical server ... : SLB : 1 server cluster x4 x4 x2 slave master ... x4 x4 merged web & app server 1 physical server both contains Apache & Tomcat
  • 30. 30 system design: little bit changed[3/4] master ... : 1 physical server ... : SLB : 1 server cluster x4 x4 x4 x4 x2 slave master  easy to understand whole system network  easy to operate  easy to deploy or change configurations
  • 31. 31 system design: little bit changed [4/4] master ... : 1 physical server ... : SLB : 1 server cluster x4 x4 x4 x4 x2 slave master  Solr is still far from apps
  • 32. 32 system design: current[1/4] ... : 1 physical server : SLB x2 app x2 x2 app x2 x2 x2 master
  • 33. 33 system design: current [2/4] : 1 physical server : SLB x2x2 master ...app x2 x2 app x2 x2 Solr slave is included in app server
  • 34. 34 system design: current [3/4] : 1 physical server : SLB master ...app x2 x2 app x2 x2 x2x2 SLB connect master <-> slave
  • 35. 35 system design: current [4/4] ... : 1 physical server : SLB x2 app x2 x2 app x2 x2 x2 master  no SPOF (Solr master is redundant)  easy to understand whole system process  easy to operate  easy to deploy or change configurations  easy to scale out  good performance (7000 req / sec by 1 server)  but we can’t make full use of server resources e.g., we want 0.7 Solr instance for 1 app instance...
  • 36. 36 system design: in the near future  server instance  physical on-premise, private cloud, public cloud  PaaS  Apache or Nginx?  shared cache  master <-> slave or SolrCloud?  Solr or Elasticsearch?  abolish servlet & tomcat style?  collaborate more with Hadoop family members
  • 37. 37 system design: in the near future  server instance  physical on-premise, private cloud, public cloud  PaaS  Apache or Nginx?  shared cache  Solr or Elasticsearch?  abolish servlet & tomcat style  collaborate more with Hadoop family members m(_ _)m UNDER CONSTRUCTION
  • 38. 38 operation e.g. Solr schema update [1/8] : 1 physical server : SLB x2 app x2 x2 app x2 x2 x2 master app x2 x2
  • 39. 39 operation e.g. Solr schema update [2/8] : 1 physical server : SLB x2 app x2 x2 app x2 x2 x2 master app x2 x2 Stop replication of Solr & Redis
  • 40. 40 operation e.g. Solr schema update [3/8] : 1 physical server : SLB x2 app x2 x2 app x2 x2 x2 master app x2 x2 Separated from the net Service IN Service IN Service OUT
  • 41. 41 operation e.g. Solr schema update [4/8] : 1 physical server : SLB x2 app x2 x2 app x2 x2 x2 master app x2 x2 update schema & app Service IN Service IN Service OUT
  • 42. 42 operation e.g. Solr schema update [5/8] : 1 physical server : SLB x2 app x2 x2 app x2 x2 x2 master app x2 x2 update schema Service IN Service IN Service OUT
  • 43. 43 operation e.g. Solr shcema update [6/8] : 1 physical server : SLB x2 app x2 x2 app x2 x2 x2 master app x2 x2 restart replication Service IN Service IN Service OUT
  • 44. 44 operation e.g. Solr shema update [7/8] : 1 physical server : SLB x2 app x2 x2 app x2 x2 x2 master app x2 x2 test app functions with reverse proxy Service IN Service IN Service OUT
  • 45. 45 operation e.g. Solr shcema update [8/8] : 1 physical server : SLB x2 app x2 x2 app x2 x2 x2 master app x2 x2 Service IN Service IN Service IN connected to the net
  • 46. 46 Agenda 1 Introduction of Me & Rakuten 2 Solr centered Advertising Systems 4 Solr plug-in 3 Solr performance 5 (Solr with Japanese language)
  • 47. 47 Solr cache  about various kind of Lucene/Solr cache  fieldCache (Lucene level)  fieldValueCache  documentCache  filterCache  queryResultCache  HTTP chache  and user defined cache
  • 48. 48 filter cache  we’re using it for caching the results of filter queries <!-- default in solrconfig.xml --> <filterCache class="solr.FastLRUCache" size="512" initialSize="512" autowarmCount="0"/>
  • 49. 49 query result cache  we used to activate it for avoiding useless searching <!-- default in solrconfig.xml --> <queryResultCache class="solr.LRUCache" size="512" initialSize="512" autowarmCount="0"/>
  • 50. 50 application cache  about cache in app side processing time without Searching is 0 – 1 msec -> convert from doc to DTO is relatively wasteful -> SolrJ with javabin works well, but...
  • 51. 51 sizing & memory usage  monitoring -> tuning configuration, memory allocation  server: traffic, load, cpu, memory, page, swap  Apache: busy, rps, bps, cpu, state, processing time  Tomcat: thread, rps, bps, eps, memory, jmx  Solr: index size, doc num, memory, cache hit ratio admin page, admin/Luke, replication?command=details... server mon GrowthForecast Solr admin, command, Luke
  • 52. 52 avoid Full GC  Full GC if we allocate 2GB for a tomcat heap -> “Stop the World” would be more than 1 sec  Concurrent GC (we’re still struggling in tuning) e.g.,) HEAP_OPTS="-Xmx2g -Xms2g -Xss512k" GC_LOG_OPTS="-verbose:gc -XX:+PrintGCTimeStamps -XX:+PrintGCDetails" FULL_GC_OPTS="-XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:+UseParNewGC - XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=32 -XX:TargetSurvivorRatio=90" JMX_OPTS="-Dcom.sun.management.config.file=${CATALINA_HOME}/conf/management.properties" CATALINA_OPTS="-server ${HEAP_OPTS} ${GC_LOG_OPTS} ${FULL_GC_OPTS} ${JMX_OPTS}"
  • 53. 53 Agenda 1 Introduction of Me & Rakuten 2 Solr centered Advertising Systems 4 Solr plug-in 3 Solr performance 5 (Solr with Japanese language)
  • 54. 54 Solr plugin  RequestHandler, SearchHandler  SearchComponent, QueryComponent  QParserPlugin, PostFilter  QueryResponseWriter -> implemented these classes for our own use
  • 55. 55 RequestHandler & SearchHandler  for logging  for health check like /admin/ calls AdminHandlers public class OurRequestHandler extends RequestHandlerBase { /** Logger */ private static Logger log = LoggerFactory.getLogger(OurRequestHandler.class); @Override public void init(NamedList args) { super.init(args); } @Override public void handleRequestBody(SolrQueryRequest req, SolrQueryResponse rsp) throws Exception { log.info(req.toString()); rsp.setHttpCaching(false); ... } }
  • 56. 56 Solr index situation [1/2]  Solr’s indexing need huge costs, we thought (just thought...) -> then separated into these two  basic stable data  additional unstable data or
  • 57. 57 Solr index situation [2/2]  Solr index: for searching  keyword, placement data (Japan, Ichiba, footer...)  a few GB  Redis data (previously MySQL): for filtering or sorting  ad status (active or not)  ad price, ad rank (based on CTR, CVR...)  and ad contents data (image path, link, text...)  100MB – 10GB (depends on advertisement types)
  • 58. 58 searching: handle ads in app [1/2] handle req search filter sort ...
  • 59. 59 searching: handle ads in Solr [2/2] handle req search ...
  • 60. 60 Solr with Redis data handling [1/2]  ResponseWriter -> unsuitable for searching or filtering  SearchComponent -> easy to implement, configure -> basic process is already handled in QueryComponent
  • 61. 61 Solr with Redis data handling [2/2]  modify QueryComponent -> good position in terms of functionality -> base for default searching -> relatively big component  ConstantScoreQuery with our own Filter?
  • 62. 62 QueryParserPlugin & PostFilter [1/2] e.g.) <!–- solrconfig.xml --> <!-- put jar file here --> <lib dir=“.../orochi_search” /> <!-- define implemented class --> <queryParser name=“redis” class=“...orochi.search.ExtendedQParserPlugin” /> public class ExtendedQParserPlugin extends QParserPlugin { public void init(NamedList args) { /* NOOP */ } @Override public QParser createParser (String qstr, SolrParams localParams, SolrParams params, SolrQueryRequest req) { return new QParser(qstr, localParams, params, req) { ... @Override public Query parse() throws ParseException { return new RedisPostFilter(rows, preview, currentTimeMillis); } }; } }
  • 63. 63 QueryParserPlugin & PostFilter [2/2] public class RedisPostFilter extends ExtendedQueryBase implements PostFilter { public RedisPostFilter(int rows, long preview, long currentTimeMillis) { setCache(false); ... } public boolean isValid(int docId, IndexSearcher indexSearcher) { // return the document is valid or not. document = indexSearcher.doc(docId, fieldSelector); ... } public DelegatingCollector getFilterCollector(final IndexSearcher indexSearcher) { return new DelegatingCollector() { @Override public void collect(int docId) throws IOException { if (isValid(docId, indexSearcher)) { super.collect(docId); ... } } }; } @Override public int getCost() { return Math.max(super.getCost(), 100); } ... }
  • 64. 64 Merge Solr & Redis handle req search ...
  • 65. 65 Agenda 1 Introduction of Me & Rakuten 2 Solr centered Advertising Systems 4 Solr plug-in 3 Solr performance 5 (Solr with Japanese language)
  • 66. 66 Japanese linguistics すもももももももも (pronunciation) sumomomomomomomomo すもも も もも も もも (words) sumomo mo momo mo momo 李も桃も桃 (meaning) Plums and peaches are both part of peaches
  • 67. 67 Japanese linguistics 最中を食べている最中ですm(_ _)m (pronunciation) monakawotabeteirusaichudesu (meaning) I’m eating monaka. (excuse me) how to separate this sentence into tokens for indexing?
  • 68. 68 Tokenize approach: N-gram 最中を食べている最中ですm(_ _)m  unigram 最 中 を 食 べ て い る 最 中 で す m ( _ _ ) m  bigram 最中 中を を食 食べ べて てい いる る最 最中 中で です すm m( (_ _ _ _) )m  trigram 最中を 中を食 を食べ 食べて べてい ている いる最 る最中 最中で 中です ですm す m( m(_ (_ _ _ _) _)m
  • 69. 69 Tokenize approach: Morphological Analysis [1/2] 最中を食べている最中ですm(_ _)m  using dictionary 最中 を 食べ て いる 最中 です m(_ _)m 最中 を 食べ て いる 最中 です m(_ _)m text 最中 を 食べ て いる 最中 です m(_ _)m partO fSpee ch noun- common particle- case- misc verb- main particle- conjuncti ve verb- auxiliary noun- adverbial auxiliary- verb - pronu nciati on monaka o tabe te iru saichu desu -
  • 70. 70 Tokenize approach: Morphological Analysis [2/2] 最中を食べている最中ですm(_ _)m
  • 71. 71 Tokenize approach: compare 2 ways N-gram Morphological Analysis index size big small preparation not needed make & maintain word dictionary implementation very easy hard NLP, ML, statistic new word no problem update dictionary, re-index search relevancy without omission contains trivial with omission human like processing time ... ...
  • 72. 72 Solr with Morphological Analysis  ver. -3.5 : setup component & dictionary manually Sen Lucene gosen ...  ver. 3.6- : field type text_ja woks well “kuromoji” is inside
  • 73. 73 issues of kuromoji  some adjustments are needed for migration supported dictionaries would be different between previous engine & kuromoji  half width & full width characters Windows8 <-> Windows8 AKB48 <-> AKB48
  • 74. 74 Japanese Analyzer  JapaneseTokenizer  JapaneseBaseFormFilter  JapanesePartOfSpeechStopFilter  CJKWidthFilter  StopFilter  JapaneseKatakanaStemFilter  LowerCaseFilter
  • 75. 75 Agenda 1 Introduction of Me & Rakuten 2 Solr centered Advertising Systems 4 Solr plug-in 3 Solr performance 5 (Solr with Japanese language)
  • 76. 76 Thank you, San Diego any question? any comment? any advice? If you have some, let’s talk later (not now...?)
  • 77. Hide (Hatayama Hideharu) Big Data Department, Targeting Section, Advertising Group Rakuten Inc. blog: http://6109.hidepiy.com facebook: http://www.facebook.com/hatayama.hideharu twitter: ... I don’t remember