Thoth is a real-time Solr monitoring system developed at Trulia to understand search infrastructure without accessing logs. It collects Solr request data, indexes it in another Solr core for search and analysis, and provides a dashboard and APIs for monitoring metrics. It also uses machine learning to predict query times and identify query patterns through topic modeling. The system was designed to be modular and its components like data collection, indexing, dashboard and monitoring are open-sourced.
3. Overview
- What is Thoth ?
- Data Collection and Thoth Core Indexing
- Thoth API & Thoth Dashboard
- Thoth Monitor
- Thoth ML : Prediction and Topic Modeling
- Special Thanks & Q/A
Demo
4. What is Thoth?
- Innovation project at Trulia
- Understand our search infrastructure without touching logs
- Troubleshoot search performance issues
- Designed as a modular system
- Set of tools that can help gather info, monitor, understand a search infrastructure
- Open source project :
thoth
thoth-ml
thoth-api
thoth-dashboard
thoth-monitor
thoth-demo
5. Problem: Know Your Search Infrastructure
- Solr logs are a good source. Sometimes partial information
- Decentralized data (at least 1 log per search server)
- Log rotation
- Not searchable
If we could index all the information .. Let’s use Solr !
- We can search on it
- We have some handy features for free: facets, stats etc
- It’s scalable
6. Thoth Document
1 Solr Request = 1 Thoth (Solr) Document
Server Info
hostname, port number, core name, pool name
Query Info
timestamp, actual query, qtime, hits, exception?
7. Data Collection (1/2)
- Should be smooth. No traffic slowing down.
- We care about near real-time data
- We care about historical data
- Dataset is growing fast
- Interceptor on each search server
- We use a SolrComponent attached to a Request Handler
- Queue System (E.g: ActiveMQ) to facilitate and temporary store messages
- Each search server has a manifest in the solrconfig.xml
9. Sizing of Data
- Need for granular information for near real-time data
- Less granularity for historical data
Too much data = slow search, space problem
- Shrinking feature:
-‐ Create
Shrank
Document
-‐ Real-‐3me
Core
cleanup
- Shrinking time is configurable
10. Thoth Index
- Solr 4.7
- Soft commit for near real-time search
- Soft commit maxTime set to 1s
- Auto commit set to 15s
- Update chain set to enforce UUID as PkID
- Use of Solrj to index data and query
11. Thoth API
- Abstraction for Thoth index and Thoth data
- Read only REST-like API
- JSON response
- Written in Node.js to accommodate socket.io
Example:
thoth:3001/api/server/foo/core/bar/port/portbar/start/NOW-‐1DAY/end/NOW/count/nqueries
{"numFound":95,"values":[{"timestamp":"2014-09-16T18:00:02Z","value":45337},
{"timestamp":"2014-09-16T18:15:02Z","value":77325},
{"timestamp":"2014-09-16T18:30:02Z","value":109523},
{"timestamp":"2014-09-16T18:45:02Z","value":112279},
{"timestamp":"2014-09-16T19:00:02Z","value":115334}
12. Thoth Dashboard (1/5)
- Visual insight on Thoth data
- Useful graphs divided by server or pool
- Handy list of slow queries and exceptions
- Real-time view for server
- Selecting data based on time
- Sharable URLs (to OPS team, QA team, Release Eng. )
17. Thoth Monitor
- Continuously monitoring for metrics
- Stateless
- Alerting through email or Nagios
- Examples: QTime, Number of Zero hits,
Predictor Model Health
- Possibility to implement custom monitors
- Reuse StatsComponent
[http://wiki.apache.org/solr/StatsComponent]
if possible
18. Thoth ML
What can we do with all this data?
• Rich source of information
• Can we turn it into knowledge?
• How about machine learning?
1.
Query
3me
predic3on
2.
Query
paJern
recogni3on
3.
Server
sizing
and
resource
alloca3on
19. 1. Query Time Prediction (1/4)
• Goal : appropriately route queries to slow/ fast pool
• Look at query attributes
• Query
text
• Start
parameter
• Facets,
range
queries,
geo
spa3al
searches
etc
• Train a supervised learning model
• Use learned model to predict if a query will be slow v/s fast
• H2O Machine Learning Library
20. 1. Query Time Prediction (2/4)
Challenges
• Imbalanced dataset
• Frequency of model training
• Type of model
• Minimal delay requirement
21. 1. Query Time Prediction (3/4)
Challenges Addressed
• Imbalanced dataset
• Stra3fied
sampling
• Frequency of model training
• Auto
iden3fy
relearning
frequency
• Type of model
• Boolean,
categorical
features
-‐>
Tree
based
• High
accuracy
• Gradient
Boosted
Machine
• Minimal delay requirement
• User
pool
queries:
45-‐50
ms
• Predic3on:
1-‐3
ms
22. 1. Query Time Prediction (4/4)
• 1000 Gradient Boosted Trees
• Slow queries = (>100ms. Configurable)
• Experimental Results
• Training
on
~3.1
million
• Test
on
~1.4
million
• AUC:
0.94542
• Accuracy:
0.9202223
25. 2. Query Pattern Recognition
• Exceptions, zero hit queries
• Analyze and find out why
• Probabilistic Topic Modeling
• Using MALLET open source toolkit
28. Future Direction
- Thoth ML improvements:
• Predic3ng
query
3me
buckets
• Regression
v/s
classifica3on
• Excep3ons
and
zero
hit
query
analysis
• Sizing
and
resource
alloca3on
- Solr Cloud integration
- Dashboard integration with Solr cloud
- More standard metrics on Thoth Monitor
- More data collection (load, GC)
29. Contributors and Special Thanks
Damiano : dbraga@trulia.com
Praneet: pmhatre@trulia.com
Fork us on Github!
github.com/trulia/thoth
JD Cantrell ( API, Dashboard)
Giulio Grillanda (API, Dashboard)
Rajendra Shioramwar (Core)
Ying Wang (Design)
Girish Gudla (Monitor)
Alexander Kanarsky
Alex Burmester