AWS Summit 2013 | India - Petabyte Scale Data Warehousing at Low Cost, Abhishek Sinha and Co-Presented with Jaspersoft

Abhishek Sinha
Business Development Manager
sinhaar@amazon.com
@abysinha
Petabyte Scale Data Warehousing on the Cloud

Data warehousing done the AWS way
• No upfront costs, pay as you go
• Really fast performance at a really low price
• Open and flexible with support for popular tools
• Easy to provision and scale up massively

We set out to build…
A fast and powerful, petabyte-scale data warehouse that is:
Delivered as a managed service
A Lot Faster
A Lot Cheaper
A Lot SimplerAmazon Redshift

Amazon Redshift dramatically reduces I/O
ID Age State
123 20 CA
345 25 WA
678 40 FL
Row storage Column storage
Scan
Direction

• Data compression
• Zone maps
• Direct-attached storage
• Large data block sizes
ID Age State Amou
nt
123 20 CA 500
345 25 WA 250
678 40 FL 125
957 37 WA 375
• With row storage you do
unnecessary I/O
• To get total amount, you have to
read everything

• Zone maps
ID Age State Amou
nt
123 20 CA 500
345 25 WA 250
678 40 FL 125
957 37 WA 375
• With column storage, you only
read the data you need

• Column storage
• Track of the minimum
and maximum value for
each block
• Skip over blocks that
don’t contain the data
needed for a given query
• Minimize unnecessary
I/O

• Column storage
• Zone maps
• Use direct-attached storage
to maximize throughput
• Hardware optimized for high
performance data
processing
• Large block sizes to make
the most of each read
• Amazon Redshift manages
durability for you

Amazon Redshift architecture
• Leader Node
– SQL endpoint
– Stores metadata
– Coordinates query execution
• Compute Nodes
– Local, columnar storage
– Execute queries in parallel
– Load, backup, restore via Amazon
S3
– Parallel load from Amazon
DynamoDB
• Single node version available
10 GigE
(HPC)
Ingestion
Backup
Restore
JDBC/ODBC

Amazon Redshift runs on optimized hardware
HS1.8XL: 128 GB RAM, 16 Cores, 24 Spindles, 16 TB compressed user storage, 2 GB/sec scan rate
HS1.XL: 16 GB RAM, 2 Cores, 3 Spindles, 2 TB compressed customer storage
• Optimized for I/O intensive workloads
• High disk density
• Runs in HPC - fast network
• HS1.8XL available on Amazon EC2

Amazon Redshift parallelizes and distributes everything
• Query
• Load
• Backup
• Restore
• Resize
10 GigE
(HPC)
Ingestion
Backup
Restore
JDBC/ODBC

• Query
• Load
• Backup/Restore
• Resize

• Load in parallel from Amazon
S3 or Amazon DynamoDB
• Data automatically distributed
and sorted according to DDL
• Scales linearly with number of
nodes
• Query
• Load
• Backup/Restore
• Resize

• Backups to Amazon S3 are
automatic, continuous and
incremental
• Configurable system snapshot
retention period
• Take user snapshots on-
demand
• Streaming restores enable you
to resume querying faster
• Query
• Load
• Backup/Restore
• Resize

• Resize while remaining online
• Provision a new cluster in the
background
• Copy data in parallel from node to
node
• Only charged for source cluster
• Query
• Load
• Backup/Restore
• Resize

• Query
• Load
• Backup/Restore
• Resize
• Automatic SQL endpoint switchover
via DNS
• Decommission the source cluster
• Simple operation via AWS Console or
API

Amazon Redshift lets you start small and grow big
Extra Large Node (HS1.XL)
3 spindles, 2 TB, 16 GB RAM, 2 cores
Single Node (2 TB)
Cluster 2-32 Nodes (4 TB – 64 TB)
Eight Extra Large Node (HS1.8XL)
24 spindles, 16 TB, 128 GB RAM, 16 cores, 10 GigE
Cluster 2-100 Nodes (32 TB – 1.6 PB)
Note: Nodes not to scale

Amazon Redshift is priced to let you analyze all your data
Price Per Hour for
HS1.XL Single Node
Effective Hourly
Price
Per TB
Effective Annual
Price per TB
On-Demand $ 0.850 $ 0.425 $ 3,723
1 Year
Reservation
$ 0.500 $ 0.250 $ 2,190
3 Year
Reservation
$ 0.228 $ 0.114 $ 999
Simple Pricing
Number of Nodes x Cost per Hour
No charge for Leader Node
No upfront costs
Pay as you go

Amazon Redshift is easy to use
• Provision in minutes
• Monitor query performance
• Point and click resize
• Built in security
• Automatic backups

Provision a data warehouse in minutes

Amazon Redshift integrates with multiple data sources
Amazon
DynamoDB
Amazon Elastic
MapReduce
Amazon Simple
Storage Service (S3)
Amazon Elastic
Compute Cloud (EC2)
AWS Storage
Gateway Service
Corporate
Data Center
Amazon Relational
Database Service
(RDS)
Amazon
Redshift
More coming soon…

Amazon Redshift provides multiple data loading options
• Upload to Amazon S3
• AWS Import/Export
• AWS Direct Connect
• Work with a partner
Data Integration Systems Integrators
More coming soon…

Amazon Redshift works with your existing analysis tools
JDBC/ODBC
Amazon Redshift
More coming soon…

Competing on Time and Information
“The New Factors of Production: Time and Information”
Brian Gentile, Jaspersoft
But business users
don’t have access to
timely, actionable data
Why?
Most don’t spend their
day inside a BI tool
…nor do they want to!

We Need “Intelligence Inside”
We want information to FIND US, not the other way round
 Pipeline dashboard inside SaaS CRM app
 Performance report inside partner portal
 Salary data visualizations inside HR intranet
 Portfolio analytics inside client website
 Tickets crosstab inside custom helpdesk app
 Interactive charts inside native mobile app
“To make analytics more actionable and
pervasively deployed, BI professionals
must make analytics more invisible to
their users […] through embedded
analytic applications at the point of
decision or action.”

Jaspersoft: The Intelligence Inside
Self-Service BI + Embeddable + Affordable
“We empower millions of people every day to make better
decisions faster by delivering timely, actionable data to them inside
their apps and business process through an embeddable, cost-
effective reporting and analytics platform.”

Intelligence
Inside
Example Customers
Commercial
Apps
Customer
Portals
Cloud Apps
Internal Apps
Big Data
Analytics
The Intelligence Inside Business

Strong Partnerships, Broad Recognition
High Growth Subscription
Revenue Company
©2013 Jaspersoft Corporation. Proprietary and Confidential
World’s Most Widely Deployed BI
• Commercial Open Source BI Suite
• Nearly 200 people worldwide
• 16,000,000 downloads
• 325,000 community members
• 130,000 embedded applications
• 1,800 subscription customers
Jaspersoft: High Growth and Momentum
2010 2011 2012 2013
Magic Quadrants
36

Winner, Technology of the Year 2013
 Jaspersoft wins alongside iPad Mini, Hadoop, HTML5
 Only business intelligence or analytics vendor to win
“Jaspersoft's powerhouse
reporting and analytics platform
[….] remains a flexible fit for a
broad range of use cases.
Whether you're looking to scrub
petabytes of data with threat
analytics, or just knock out some
slick dashboards that drill into
customer traffic patterns,
Jaspersoft has the right stuff.”
InfoWorld

Design Any Report . . .

POJO files
… using Any Data Type
Relational FilesRelational Big Data Files

… bringing Intelligence to Any App

Jaspersoft for AWS Overview
 Jaspersoft is the first BI service that you can buy per hour
 No user limitations, no monthly fee,
 Starting at $0.40 an hour
 First BI service to automatically connect to your AWS data
 10 minutes from purchase to analyzing your data in RDS or Redshift
 AWS Security Integration

Jaspersoft for AWS In Action
46
“We've taken the
desktop power of data
visualization tools,
built it scale on the
HTML5 web, and
made it embeddable
within any app, device
or portal”
©2013 Jaspersoft Corporation. Proprietary and Confidential

Jaspersoft on Amazon AWS
Fast Customer Growth
Some Early Stats
- Added 250 paying customers in 3 months
- Currently ~ 30% staying active
- Revenue grew 10X over last month
- Last month usage ~ 70% US, 25% EU, 5% ROW
“This is truly a disruptive product
offering. The pricing is extremely
cost effective and I had it setup
with dashboards in an hour.” Sage
Human Capital
“Jaspersoft has developed a
truly innovative offering with
its utility-based pricing
model.” Click Travel
“I’ve been looking at your product offering for an
internal project and the experience has been very
positive. I think you guys have the right product,
right place, right time.” Leading Cloud Provider

Some Early Customers

NEW! Jaspersoft for AWS Promo
 What?
 Free Jaspersoft for AWS on XL instance
 $175 of AWS credits for AWS services
 When?
 From June 15, 2013 – July 14, 2013
 How?
 Go to www.jaspersoft.com/cloud and sign up
 Details: https://aws.amazon.com/marketplace/help/201193990

The Intelligence Inside
Thank You
www.jaspersoft.com/amazon
aws-marketplace@jaspersoft.com

AWS Summit 2013 | India - Petabyte Scale Data Warehousing at Low Cost, Abhishek Sinha and Co-Presented with Jaspersoft

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to AWS Summit 2013 | India - Petabyte Scale Data Warehousing at Low Cost, Abhishek Sinha and Co-Presented with Jaspersoft

Similar to AWS Summit 2013 | India - Petabyte Scale Data Warehousing at Low Cost, Abhishek Sinha and Co-Presented with Jaspersoft (20)

More from Amazon Web Services

More from Amazon Web Services (20)

Recently uploaded

Recently uploaded (20)

AWS Summit 2013 | India - Petabyte Scale Data Warehousing at Low Cost, Abhishek Sinha and Co-Presented with Jaspersoft