The document discusses big data analytics and the DataCanvas platform. It notes that the big data market was worth $16.9 billion in 2015 and areas like government, communication, banking, and manufacturing are seeing heavy investment in big data. It then describes the DataCanvas platform as providing an intuitive way to build workflows and pipelines to enable flexible data analysis, product analysis, and predictive services through drag and drop functionality and module sharing. The platform aims to address issues with unmanageable configurations and lack of reuse in existing solutions by providing collaboration, reusable modules, and solution templates to help users build and operate advanced analytics.
5. Make data live
Data sitting in storage generates no value
Revenue and profit from data
Application and solution to get insights from data
Link insights with business
Don’t stop at visualization or report
Advanced analytics is the engine of business solution
Fraud detection
Customer retention
6. Data analysis
Example: Estimate customer’s life cycle value
User: data scientist
Demanding: flexibility to explore and faster iteration
Product analysis
Example: How many female customers visit website home page and
leave within less than 5 clicks?
User: product manager, data analyst, marketing team
Demanding: No complex coding, SQL query at most
Predictive service
Example: Is this transaction a fraud?
User: developer and data scientist
Demanding: pipeline processing
7. Powering all these scenarios
Data Analysis: Flexible
Product Analysis: Intuitive
Prediction service: Complex processing
Enable application, solution and business process
DataCanvas
8. Hadoop(HIVE/Pig) RDBMS NOSQL SPARK
Recommendation Anomaly Detection Operation Analytics
Application
Platform to enable application and connect infrastructure
Service
Pipeline
Infrastructure
9. • Big data challenges are across services,
environments and even locations
Storage
Processing
Reporting
Data Generation
• An orchestration platform is required to
manage and connect steps in the pipeline
• Bring Pipeline to the game
10. No more central data store, bring
computation to data, not vice versa!
• Unify resource
• Optimize workload
• Automation
11. Unmanageable
Redundancy
Hard to fast iterate
Gap between documentation and
actual workflow
Pain points
monster configuration
spaghetti script no reuse No idea what’s actually running
12. • Drag & drop to run data flow
• Public or private cloud
• Intuitive job management
• Module repository
• Built-in library
• Make your own recipe
• Powering advanced analytics
• Business solution template
• Address common applications
• Fully customizable
• Team collaboration
• Flow sharing
• Module sharing
• This is the BEST documentation
14. • Seamlessly connect to any existing/
upcoming computation infrastructure
• Enabler for module management and
sharing
• Support Lambda: Processing + Serving
+ Visualization
Lambda Architecture
15. AWS DP Oozie AzureML MortarData Azkaban DataCanvas
Workflow + Scheduling
Module management
Solution template
Multiple Env support
Collaboration + Sharing
Cloud service
DataCanvas = ((Workflow + Scheduler) * Drag & drop * Module composition ) ^ Solution @ Cloud
Good
Bad or not support
Not that great
16. Subscription
Charge services on tiers, Startup, Premium, Enterprise
Free
• 1 user
• Unlimited
projects
• Limited
workload, good
for evaluation
• Forum support
Startup
• Unlimited users
• Unlimited
projects
• Decent workload,
3-5 jobs in
parallel
• Email support
Premium
• Unlimited users
• Unlimited
projects
• Significant
workload, >20
jobs in parallel
• Email support
Enterprise
• Unlimited users
• Unlimited
projects
• Workload on
scale
• Full support
Annual Support Package
For Premier and Enterprise customers
Forum support, Email support with SLA, Telephone support
17. Data scientist
Assembly line to facilitate exploration
Team collaboration
Analyst
Drag and drop to find insights, need any more reason?
Manager
Faster iteration
Shorter time to deliver project
Easier to maintain