Partner webinar presentation aws pebble_treasure_data

Data Science at Pebble
Analyzing Data to Make Smarter Watches
June 2, 2015

Today’s speakers
Scott Ward
Solutions Architect
Amazon Web Services
Kiyoto Tamura
Head of Marketing
Treasure Data
Susan Holcomb
Head of Analytics
Pebble

What is Pebble?
• Customizable smart
watch with crowd-
pleasing history
• $10.3MM on Kickstarter
with first product
• In March, $20MM on
Kickstarter with new
product

Pebble Data Team: Then vs. Now
One year
ago…
No data
team
No analytics
infrastructure
Barely any
data
Barely any
insights
Today… 5-person
team (&
growing!)
Scalable analytics
infrastructure via
Treasure Data
~60MM
records per
day
New product
influenced by
data insights

Data Science Workflow
Define the
problem
Acquire the
data
Fit the
model
the work the hype

Pebble’s First Problem
How should we measure
product success?

Engagement Definition
• How can we tell someone likes the watch?
– Button presses?
– Apps downloaded / launched?
– Minimized SW bugs?
– A crazy formula combining these?
• Simplest: They are wearing the watch
– Use accelerometer

Accessing Data
60 MM records
per day Scheduled jobs
in TD to post-
process &
aggregate data
Ad hoc queries in
TD to explore data
(Presto, Hive)
Dashboards
Standardized
output
Process: ~30
queries to get
one result

Accelerometer noise threshold
• Accelerometer picks up gestures, net motion (so we
can enable cool features)
• Sensitive enough to pick up vibrations of passing train
• Goal: Determine threshold for noise so we can assess
when watch is really in use

Raising the threshold
peaks shift left spike remains
backlight data matches original threshold!!
Further validated by survey of users

Why this worked
• Rapid, repeated ad hoc querying lets you get an intuitive picture of the data
– What is the range?
– Where are the errors?
– Where are the inflection points?
• Few analytics infrastructure tools optimize for this
– Too focused on standardized reporting
– Want to sell you black box that spits out “insights”

Problems 2-n
• Building scalable reporting system
• Delivering insights that shaped interface for new product
• Discovering signals on user attrition
• Designing models to segment use cases
• Analyzing dozens of product elements to improve
product experience

Product Overview
Kiyoto Tamura
Director of Developer Relations

Event Data is Everywhere…
Smartphones Websites Home
Automation
Wearable
Devices
Connected
Vehicles

Event Data is Everywhere…
Smartphones Websites Home
Automation
Wearable
Devices
Connected
Vehicles
{
“timestamp”: “2015-05-22T13:50:00-0600”,
“event”: “tap”,
“object”: “button_32”,
“user”: {
“name”: “Luca”,
“email”: “luca@treasuredata.com”,
“twitter”: “luckymethod”
}
}

Connecting the (big) data dots is hard
credit: Matt Turck @ FirstMark Capital

We provide a simple solution
Ingest Analyze Distribute
and more…

• Streaming or Batch
ingestion (or both) with
Treasure Agent and Embulk
• Don’t worry about changing
the way you send data,
Treasure Data handles it all
• 99.99% uptime, our team
takes care of running the
show so you don’t have to
• Query all your data using
SQL, no schema required
• Control Treasure Data
through our Console, our
Command Line Interface or
Luigi-TD for complex
automated data pipelines
• Choose Hive or Presto
• Run machine learning at
scale with Hivemall
• Expansive collection of
export plugins: send data to
Google Docs, Tableau,
Excel, PostgreSQL…
• Connect your favorite BI
tool
• Fine grained user access
control to your data
Why is Treasure Data better?
Ingest Analyze Distribute

CommerceTechnologyGaming Media & Ad Tech
Our growing customer base
Energy
Company
IoT

• API Servers
(c3.2xlarge)
• Hadoop workers
(c3.8xlarge)
• Generic workers
(c3.4xlarge)
• Powers our schema-
free, columnar store
• 50 billion events/day
• No capacity planning
needed!
• Both MySQL &
PostgreSQL
• Reduced ops cost
• No dedicated devops
for 2.5 years
Treasure Data on AWS
EC2 S3 RDS

Amazon Relational Database Service (RDS)
Amazon RDS is a fully managed relational DB service that is:
– Simple to deploy
– Easy to scale
– Reliable
– Cost-effective
Ease of deployment and patching
Push-button scalability
Choice of DB Engines
Automated backups
User snapshots and cloning
Monitoring and auto. host replacement
POSTGRE
Amazon RDS for Aurora (Preview)

Amazon RDS - Multi-Availability Zone Configuration
• Configure your RDS environment for high availability and DR
• Primary database running in one Availability Zone with Standby in
another
• DNS Name changes due to unhealthy RDS instance or Availability Zone

Availability Zone #1
Web
Tier
RDPGW
App
Tier
Web
Tier
App
Tier
Auto Scaling group
Auto Scaling group
Availability Zone #2
Web
Tier
App
Tier
Web
Tier
App
Tier
Auto Scaling group
Auto Scaling group
RDS Multi-Availability Zone Architecture

Amazon RDS - Read Replicas
Region #1 Region #2

Questions?
Treasure Data
Kiyoto Tamura
@kiyototamura
treasuredata.com
Pebble
Susan Holcomb
getpebble.com
AWS
Scott Ward
aws.amazon.com
Contact us to learn more

Partner webinar presentation aws pebble_treasure_data

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (20)

Similaire à Partner webinar presentation aws pebble_treasure_data

Similaire à Partner webinar presentation aws pebble_treasure_data (20)

Plus de Treasure Data, Inc.

Plus de Treasure Data, Inc. (20)

Partner webinar presentation aws pebble_treasure_data

Notes de l'éditeur