1. Data Science at Pebble
Analyzing Data to Make Smarter Watches
June 2, 2015
2. Today’s speakers
Scott Ward
Solutions Architect
Amazon Web Services
Kiyoto Tamura
Head of Marketing
Treasure Data
Susan Holcomb
Head of Analytics
Pebble
4. What is Pebble?
• Customizable smart
watch with crowd-
pleasing history
• $10.3MM on Kickstarter
with first product
• In March, $20MM on
Kickstarter with new
product
5. Pebble Data Team: Then vs. Now
One year
ago…
No data
team
No analytics
infrastructure
Barely any
data
Barely any
insights
Today… 5-person
team (&
growing!)
Scalable analytics
infrastructure via
Treasure Data
~60MM
records per
day
New product
influenced by
data insights
8. Engagement Definition
• How can we tell someone likes the watch?
– Button presses?
– Apps downloaded / launched?
– Minimized SW bugs?
– A crazy formula combining these?
• Simplest: They are wearing the watch
– Use accelerometer
9. Accessing Data
60 MM records
per day Scheduled jobs
in TD to post-
process &
aggregate data
Ad hoc queries in
TD to explore data
(Presto, Hive)
Dashboards
Standardized
output
Process: ~30
queries to get
one result
10. Accelerometer noise threshold
• Accelerometer picks up gestures, net motion (so we
can enable cool features)
• Sensitive enough to pick up vibrations of passing train
• Goal: Determine threshold for noise so we can assess
when watch is really in use
13. Raising the threshold
peaks shift left spike remains
backlight data matches original threshold!!
Further validated by survey of users
14. Why this worked
• Rapid, repeated ad hoc querying lets you get an intuitive picture of the data
– What is the range?
– Where are the errors?
– Where are the inflection points?
• Few analytics infrastructure tools optimize for this
– Too focused on standardized reporting
– Want to sell you black box that spits out “insights”
15. Problems 2-n
• Building scalable reporting system
• Delivering insights that shaped interface for new product
• Discovering signals on user attrition
• Designing models to segment use cases
• Analyzing dozens of product elements to improve
product experience
21. We provide a simple solution
Ingest Analyze Distribute
and more…
22. • Streaming or Batch
ingestion (or both) with
Treasure Agent and Embulk
• Don’t worry about changing
the way you send data,
Treasure Data handles it all
• 99.99% uptime, our team
takes care of running the
show so you don’t have to
• Query all your data using
SQL, no schema required
• Control Treasure Data
through our Console, our
Command Line Interface or
Luigi-TD for complex
automated data pipelines
• Choose Hive or Presto
• Run machine learning at
scale with Hivemall
• Expansive collection of
export plugins: send data to
Google Docs, Tableau,
Excel, PostgreSQL…
• Connect your favorite BI
tool
• Fine grained user access
control to your data
Why is Treasure Data better?
Ingest Analyze Distribute
24. • API Servers
(c3.2xlarge)
• Hadoop workers
(c3.8xlarge)
• Generic workers
(c3.4xlarge)
• Powers our schema-
free, columnar store
• 50 billion events/day
• No capacity planning
needed!
• Both MySQL &
PostgreSQL
• Reduced ops cost
• No dedicated devops
for 2.5 years
Treasure Data on AWS
EC2 S3 RDS
25.
26.
27. Amazon Relational Database Service (RDS)
Amazon RDS is a fully managed relational DB service that is:
– Simple to deploy
– Easy to scale
– Reliable
– Cost-effective
Ease of deployment and patching
Push-button scalability
Choice of DB Engines
Automated backups
User snapshots and cloning
Monitoring and auto. host replacement
POSTGRE
Amazon RDS for Aurora (Preview)
28. Amazon RDS - Multi-Availability Zone Configuration
• Configure your RDS environment for high availability and DR
• Primary database running in one Availability Zone with Standby in
another
• DNS Name changes due to unhealthy RDS instance or Availability Zone
KEY MESSAGES
Looker is one of the fastest growing data and analytics companies in history—both in terms of customer growth and revenue growth
Organizations that use Looker see incredible levels of engagement by both data analysts and business users
KEY MESSAGES
Looker is one of the fastest growing data and analytics companies in history—both in terms of customer growth and revenue growth
Organizations that use Looker see incredible levels of engagement by both data analysts and business users
KEY MESSAGES
Looker is one of the fastest growing data and analytics companies in history—both in terms of customer growth and revenue growth
Organizations that use Looker see incredible levels of engagement by both data analysts and business users
KEY MESSAGES
Looker is one of the fastest growing data and analytics companies in history—both in terms of customer growth and revenue growth
Organizations that use Looker see incredible levels of engagement by both data analysts and business users
KEY MESSAGES
This is a completely new architecture that fundamentally changes the way your connect, describe, and explore your data
KEY MESSAGE
We’re seeing growth across industries—each of which has their own unique use cases for the tool
KEY MESSAGES
This is a completely new architecture that fundamentally changes the way your connect, describe, and explore your data
Start out
To summarise what Amazon RDS offers, across three ‘flavours’ of the service, you can think about the feature set in three main areas:
Deployment
A choice of database engines and overall application compatibility
Ease of deployment with pre-configured parameters and settings
Management
Automated backups and disaster recovery
User snapshots and cloning, plus software patching and upgrades
Scaling
Push button scaling through the AWS management console
Here we are focusing on a multi Availability Zone configuration as it relates to RDS.
What a multi AZ configuration allows you to do is have a master database running in one AZ and a copy of the data kept in synch for another instance in another AZ of the region you are operating in. Once there is a problem detected with the RDS instance or the production AZ the DNS records are switched to use the Standby database and your applications are now working against the standby and when the production one comes up that is now the standby.
This is functionality that exists for MySQL, PostGres, and Aurora
With some databases there is a need to support lots of read only operations against the database. Running all these reads against the same production database where you are doing all your writes can negatively impact your database and slow down all operations. This is where it may be appropriate to run a read replica version of your database in order to take the load of reads off the production database.