Organizations storing large volumes of data in Amazon Redshift rely on faster cycle analysis to quickly uncover actionable insights. Their challenge when data volumes grow in Redshift is finding an analysis solution that removes the headaches of tedious ETL, data wrangling and allows scalable, visual data analysis. These slides shared during the webinar demonstrates ClearStory Data’s solution for scalable, fast-cycle, visual data analysis, that is used by CPG, Retail, Consumer Internet companies on Redshift.
To watch the on-demand webinar, visit:
2. Today’s Speakers
2
Tina Adams
Senior Product Manager
Amazon Web Services
Andrew Yeung
Director, Product Marketing
ClearStory Data
Scott Anderson
Senior Sales Engineer
ClearStory Data
3. Agenda
• Overview of Amazon Redshift
• Fast Cycle Data Analysis with ClearStory Data on
Amazon Redshift
• Demo
• Q&A
3
5. Amazon Redshift Architecture
• Leader Node
– SQL endpoint
– Stores metadata
– Coordinates query execution
• Compute Nodes
– Local, columnar storage
– Execute queries in parallel
– Load, backup, restore via
Amazon S3; load from
Amazon DynamoDB or SSH
• Two hardware platforms
– Optimized for data processing
– DW1: HDD; scale from 2TB to 1.6PB
– DW2: SSD; scale from 160GB to 256TB
10 GigE
(HPC)
Ingestion
Backup
Restore
SQL Clients/BI Tools
128GB RAM
16TB disk
16 cores
Amazon S3 / DynamoDB / SSH
JDBC/ODBC
128GB RAM
16TB disk
16 cores
Compute
Node
128GB RAM
16TB disk
16 cores
Compute
Node
128GB RAM
16TB disk
16 cores
Compute
Node
Leader
Node
6. Amazon Redshift is priced to let you analyze all your data
• Number
of
nodes
x
cost
per
hour
• No
charge
for
leader
node
• No
upfront
costs
• Pay
as
you
go
DW1 (HDD)
Price Per Hour for
DW1.XL Single
Node
Effective Annual
Price per TB
On-Demand $ 0.850 $ 3,723
1 Year
Reservation
$ 0.500 $ 2,190
3 Year
Reservation
$ 0.228 $ 999
DW2 (SSD)
Price Per Hour for
DW2.L Single Node
Effective Annual
Price per TB
On-Demand $ 0.250 $ 13,688
1 Year
Reservation
$ 0.161 $ 8,794
3 Year
Reservation
$ 0.100 $ 5,498
7. Common Customer Use Cases
• Reduce costs by
extending DW rather than
adding HW
• Migrate completely from
existing DW systems
• Respond faster to
business
• Improve performance by
an order of magnitude
• Make more data
available for analysis
• Access business data via
standard reporting tools
• Add analytic functionality
to applications
• Scale DW capacity as
demand grows
• Reduce HW & SW costs
by an order of magnitude
Traditional Enterprise DW Companies with Big Data SaaS Companies
11. Consider the Following Question…
CPG/Retail
“Is daily product sales being impacted by
restocking rate, product freshness, store
merchandising, competitor pricing or
demographic buying patterns?”
Or…
12. Consider the Following Question…
Consumer Internet
“Who are my users, how long are they on the
system, what features are they accessing, how
do they decide what purchases to make?”
How would you find an answer, or uncover
new insight, on fast cycle?
13. Hurdles to Fast-Cycle Data Analysis
Proliferation of inconsistent, siloed views
Resulting Line-of-Business Pains
Lengthy round trip to
ask new questions
Resort to point solutions,
spreadsheets or desktop
visualization tools
Increased blind spots & slow decisions
No traceability to validate insights
Data Refresh
Velocity
Restrictions
Limited Data
Scale &
Data Formats
Slow Decision
Times
Skills Gap
Rigid Dashboards
Sampling of data
Limitations of Traditional Solutions
14. Date & Time
Location
Text
Currency
Categories
Numbers
ClearStory Data Solution Overview
More LOB Users
• Interactive StoryBoards
for fast answers for LOB
More Speed
• Reduce data
manipulation
• Automates data
blending
• Fast exploration
More Sources
• More internal sources/
formats
• Direct access to external
data
User&DataGovernance
Data Access Analysis/Exploration StoryBoards
Application
Data Steward Story Authors Business Users
Collaboration
Harmonization
Data Inference & Metadata
Platform
Date & Time
Location
Text
Currency
Categories
Numbers
Product Name
Product SKU
Product Cat
Product Brand
Zip Code
County
State
Internal Data External Data
Semi-
Structured
Structured Files API / Web Premium Public
Amazon
Redshift
15. Why ClearStory for Amazon Redshift?
Scale out as
data
volume
grows – no
constraints
Scalability
Less pre-
processing
and data
aggregation
Aggregation
Data
governance,
user
governance,
lineage and
traceability
Governance
Speed of
analysis –
enabled by
ClearStory’s
underlying
Spark-
based in-
memory
data
processing
Speed
Ease-of-use
on front-end
for any user.
Less
reliance on
users with
specialized
skillsets
Simplicity
16. Consumer Internet, Online Gaming
Need: Intra-Day Analysis on Large Volume Data Sets
16
Data
Captured
Gaming Platform
Amazon Redshift
Centralized
Data Store
Intra-Day,
Multi-
Terabyte
Analysis
with
ClearStory
Data
Understand user behavior based on usage patterns on online game.
Analyze drivers of in-app purchase revenue by partner source and user profile.
Partner NetworkBusiness Analyst
Executives
Collaboration
Event-based
Game Data
User Profile
Awards &
Promotions
In-App
Purchases
17. Leader in Dairy Products
How Are We Performing Daily by Grocery Store and Why?
17
Data
Sources
Internal Supply Chain Retailer’s Systems
Daily,
Fast-Cycle
Analysis
10+ Data Sources Blended Daily
Retailers / GrocersBusiness Analyst
Executives
Collaboration
Inventory Demand
Planning
Logistics VMI
Point-of-
Sales
Warehouse
Store
Shelves
Fill Rate
Syndicated Retail Sales Data
• Holistic customer
analysis
• Impacts of promos,
placement, price,
packaging
• Collaborative
insight for key
stakeholders and
grocers
Converge
Disparate Data
Data Platform
• Converge data silos
across the entire
supply chain
• Spot sales
opportunities and
competitive threats
• Speed of execution
driven by business
need
19. Summary
1. More Data
- More Internal/External sources and diverse data formats
- Plus direct access to Amazon Redshift
2. More Speed
- Eliminate data manipulation
- And automates data blending for fast answers
3. More Business Consumption of Data
- New simple user model for any skillset
- Interactive StoryBoards for fast answers for line-of-business