Tuomas Autio's and Mikko Mattila's presentation from Hadoop & Azure Marketplace - digitalisaation tekijät Breakfast seminar on the 26th April. Find our blogs about Hadoop: http://www.bilot.fi/en/explore/?cat=blog&tag=hadoop
2. Hosts
03/05/2016 www.bilot.fi 2
Tuomas Autio
Bilot
Head of Big Data &
Business Lead (BI)
tuomas.autio@bilot.fi
@BigDataTuomas
Mikko Mattila
Bilot
Solution Lead,
Analytics
mikko.mattila@bilot.fi
@MattilaJMikko
Antti Alila
Microsoft
Product Manager,
Azure
antti.alila@microsoft.c
om
Mats Johansson
Hortonworks
Solution Architect
mjohansson@
hortonworks.com
Pasi Vuorela
Hortonworks
Sales Manager Nordics
pvuorela@
hortonworks.com
4. Agenda
• Introductions
• Microsoft and Azure Marketplace
• Hadoop and modern data architecture + demo
• Hortonworks, HDP and HDF
• Case study by Hortonworks
• Wrap-up & next steps
03/05/2016 www.bilot.fi 4
5. Key take-aways from today
What to expect
• What Hadoop is
• How does Hadoop fit into
enterprise architecture
• What does Hadoop mean
to my organizational
structure
• Big data is relevant to
every industry
• Real world use cases
03/05/2016 www.bilot.fi 5
”Hadoop plays significant role filling that gap in the market. Open standard
approach is needed to keep up with the pace. Old technologies are not capable
for billions of things to be connected.” GE’s CIO Vince Campisi
”Spark [on top of Hadoop] has been ‘instrumental in where we’ve gotten to’”
Vinoth Chandar, Uber
”100 % of large (over $1 bil) enterprises adapts Hadoop by 2020” Forrester’s
Principal Analyst Mike Gualteri
“Hadoop is the most important technological part of the digitalization” SAP’s
CTO Quentin Clark
“Who cares about Hadoop on Linux? Microsoft (yes, really) … We want Azure
to be a place where all operating systems can run” T. K. "Ranga" Rengarajan,
Microsoft's corporate VP, Data Platform
6. Bilot stands for BI
Bilot’s offering for Analytics & Big data, Tuomas Autio Bilot
03/05/2016 www.bilot.fi 6
12. 03/05/2016 www.bilot.fi 12
DATA SYSTEMS
REPORTING & APPLICATIONS
Analytics
Custom
applications
Packaged
applications
EDWRDBMS MPP
New Data Sources
Social media
Click-stream
Marketing
data
Server logs /
RFID
(TRADITIONAL) DATA SOURCES
POS
ERP CRM
…
1
Sensor / Machine
data
Geo locations
Unsctructured
documents
2
(Old) Architectures under pressure
13. Quick Intro to Hadoop
03/05/2016 www.bilot.fi 13
• Hadoop is an open source framework for distributed file
storage
• Managed by Apache Foundation
• De facto standard for big data
• Enterprise Hadoop distributions
• Hortonworks HDP (”Red Hat” of Hadoop), HDP for Windows,
IBM, Microsoft Azure HDInsight (HDP), Cloudera, MapR, AWS
(EMR), Rackspace
• >50% of US Fortune 100 companies use Hadoop, ~60% CAGR
(2020 $50bn)
• ~25 Finnish instances, ~10 known production instances in
Finland (strongly behind US and central European markets)
Hadoop 2.x
Framework
14. Key Features
• Cluster of commodity servers, scales out ”infinitely” affordably
• Linear growth of performance
• Distributed processing
• Schemaless
• Hadoop stores files in a distributed file system
• Fast (for big data), maps data wherever it is located in cluster
• Resilient to failure
• Flexible
• Cost effective
03/05/2016 www.bilot.fi 14
15. 03/05/2016 www.bilot.fi 15
USE CASE
BUT”Haters to the left! Kill the fear! Just get it started and go!”,
Symantec’s Cloud Platform Engineering Leader David Lin
Value compounds with use, as more use cases,
sources, time periods join in a data lake
16. ”Hadoop – It’s damn hard to use”, anonymous CXO
03/05/2016 www.bilot.fi 17
Mitigation: Right Team and skills!
IT and the Business MUST Work Together to Create Maximum Value
Typical (new) roles needed in the
organization:
• The Data Architect
• The Data Scientist
• The Business Analyst
• The Developer
• The Administrators
18. Why Hadoop will success
IKEA’s Business Idea
“to offer a wide range of home furnishings with good design and
function at prices so low that as many people as possible will be
able to afford them”
03/05/2016 www.bilot.fi 19
19. Why Hadoop will success
“HADOOP IS A SOFTWARE PACKAGE AT SUCH A LOW PRICE
THAT ALMOST EVERY COMPANY IS ABLE TO AFFORD IT
ALREADY”
“HADOOP AND OTHER OPEN SOURCE BIG DATA PROJECTS
PROVIDE A HUGE RANGE OF IT SOFTWARE FOR AREAS OF
DATA MANAGEMENT AND SYSTEM INTEGRATION”
“HADOOP TOOLS ARE DESIGNED TO SOLVE ISSUES
IMPOSSIBLE FOR TRADITIONAL COMMERCIAL TOOLS”
03/05/2016 www.bilot.fi 20
23. Traditional Enterprise software and filesOnline systems (log or Streams)
5/3/2016 www.bilot.fi
RDBMS
ERP
Hadoop ecosystem: All you need for modern analytics
architecture as open source
Webshops, Mobile Applications, Contact Centers, ERP, CRM systems etc
(Un)Structured &
documents
Clickstream
Server logs /
RFID
Sentiment,
Some Sensor
ETL +
DW
Digital organization Traditional organization
24. Traditional Enterprise software and filesOnline systems (log or Streams)
5/3/2016 www.bilot.fi
RDBMS
ERP
Hadoop ecosystem: All you need for modern analytics
architecture as open source
Real-time stream, log data and rdbms change capturing
(Flume or Hortonworks data flow)
Webshops, Mobile Applications, Contact Centers, ERP, CRM systems etc
(Un)Structured &
documents
Clickstream
Server logs /
RFID
Sentiment,
Some Sensor
Message Queue and history
(Kafka)
Complex event processing
(Storm, SparkStreaming, KafkaStreams, Flink)
Real time machine interface for applications
ETL +
DW
Digital organization Traditional organization
25. Traditional Enteprice software and files
Interactive
processing & queries
(Spark & Hive)
Online systems (log or Streams)
FileSystem (HDFS) +
Core Services
5/3/2016 www.bilot.fi
RDBMS
ERP
Hadoop ecosystem: All you need for modern analytics
architecture as open source
Real-time stream, log data and rdbms change capturing
(Flume or Hortonworks data flow)
Webshops, Mobile Applications, Contact Centers, ERP, CRM systems etc
(Un)Structured &
documents
Clickstream
Server logs /
RFID
Sentiment,
Some Sensor
Message Queue and history
(Kafka)
Complex event processing
(Storm, SparkStreaming, KafkaStreams, Flink)
Real time machine interface for applications
ETL +
DW
BI User
Digital organization Traditional organization
Batch Processing
26. Traditional Enterprise software and files
Interactive
processing & queries
(Spark & Hive)
Online systems (log or Streams)
FileSystem (HDFS) +
Core Services
5/3/2016 www.bilot.fi
RDBMS
ERP
Batch processing
(MapReduce & Pig
Latin)
Hadoop ecosystem: All you need for modern analytics
architecture as open source
Real-time stream, log data and rdbms change capturing
(Flume or Hortonworks data flow)
Webshops, Mobile Applications, Contact Centers, ERP, CRM systems etc
(Un)Structured &
documents
Clickstream
Server logs /
RFID
Sentiment,
Some Sensor
Message Queue and history
(Kafka)
Complex event processing
(Storm, SparkStreaming, KafkaStreams, Flink)
Real time machine interface for applications
ETL +
DW
RDBMS ->
HDFS
batch load
(Sqoop)
Statistical
Analysis
(Spark)
BI User Data Scientist
Digital organization Traditional organization
Batch Processing
27. Traditional Enterprise software and files
Interactive
processing & queries
(Spark & Hive)
Online systems (log or Streams)
FileSystem (HDFS) +
Core Services
5/3/2016 www.bilot.fi
RDBMS
ERP
Batch processing
(MapReduce & Pig
Latin)
Hadoop ecosystem: All you need for modern analytics
architecture as open source
Real-time stream, log data and rdbms change capturing
(Flume or Hortonworks data flow)
Webshops, Mobile Applications, Contact Centers, ERP, CRM systems etc
(Un)Structured &
documents
Clickstream
Server logs /
RFID
Sentiment,
Some Sensor
Message Queue and history
(Kafka)
Complex event processing
(Storm, SparkStreaming, KafkaStreams, Flink)
Real time machine interface for applications
ETL +
DW
RDBMS ->
HDFS
batch load
(Sqoop)
Statistical
Analysis
(Spark)
BI User Data Scientist
Digital organization Traditional organization
Batch Processing
28. Traditional Enterprise software and files
Interactive
processing & queries
(Spark & Hive)
Online systems (log or Streams)
FileSystem (HDFS) +
Core Services
5/3/2016 www.bilot.fi
RDBMS
ERP
Batch processing
(MapReduce & Pig
Latin)
Hadoop ecosystem: All you need for modern analytics
architecture as open source
Real-time stream, log data and rdbms change capturing
(Flume or Hortonworks data flow)
Webshops, Mobile Applications, Contact Centers, ERP, CRM systems etc
(Un)Structured &
documents
Clickstream
Server logs /
RFID
Sentiment,
Some Sensor
Message Queue and history
(Kafka)
Complex event processing
(Storm, SparkStreaming, KafkaStreams, Flink)
Real time machine interface for applications
ETL +
DW
RDBMS ->
HDFS
batch load
(Sqoop)
Statistical
Analysis
(Spark)
NoSQL
database for
interactive
use (hbase)
BI User Data Scientist
Batch Processing
Digital organization Traditional organization
29. Traditional Enterprise software and files
Interactive
processing & queries
(Spark & Hive)
Online systems (log or Streams)
FileSystem (HDFS) +
Core Services
5/3/2016 www.bilot.fi
RDBMS
ERP
Batch processing
(MapReduce & Pig
Latin)
Hadoop ecosystem: All you need for modern analytics
architecture as open source
Real-time stream, log data and rdbms change capturing
(Flume or Hortonworks data flow)
Webshops, Mobile Applications, Contact Centers, ERP, CRM systems etc
(Un)Structured &
documents
Clickstream
Server logs /
RFID
Sentiment,
Some Sensor
Message Queue and history
(Kafka)
Complex event processing
(Storm, SparkStreaming, KafkaStreams, Flink)
Real time machine interface for applications
ETL +
DW
RDBMS ->
HDFS
batch load
(Sqoop)
Statistical
Analysis
(Spark)
NoSQL
database for
interactive
use (hbase) Data Virtualization
Virtual Datamodels / security
O/JDBC, MDX, REST outbound interfaces
BI User Data Scientist
Batch Processing
O/JDBC, MDX, REST inbound interfaces
Logical Data Warehouse
Traditional BI Tools
Digital organization Traditional organization
30. Traditional Enterprise software and files
Interactive
processing & queries
(Spark & Hive)
Online systems (log or Streams)
FileSystem (HDFS) +
Core Services
5/3/2016 www.bilot.fi
RDBMS
ERP
Batch processing
(MapReduce & Pig
Latin)
Hadoop ecosystem: All you need for modern analytics
architecture as open source
Real-time stream, log data and rdbms change capturing
(Flume or Hortonworks data flow)
Webshops, Mobile Applications, Contact Centers, ERP, CRM systems etc
(Un)Structured &
documents
Clickstream
Server logs /
RFID
Sentiment,
Some Sensor
Message Queue and history
(Kafka)
Complex event processing
(Storm, SparkStreaming, KafkaStreams, Flink)
Real time machine interface for applications
ETL +
DW
RDBMS ->
HDFS
batch load
(Sqoop)
Statistical
Analysis
(Spark)
NoSQL
database for
interactive
use (hbase) Data Virtualization
Virtual Datamodels / security
O/JDBC, MDX, REST outbound interfaces
BI User Data Scientist
Batch Processing
O/JDBC, MDX, REST inbound interfaces
Logical Data Warehouse
Traditional BI Tools
Digital organization Traditional organization
31. Example use case: Dynamic Pricing
Dynamic pricing will be more and more common in the future
Usage of dynamic pricing should be business decision – not
restricted by your technical capabilities
5/3/2016 www.bilot.fi 32
Dynamic Pricing
Same price for every one
in every store
More you visit on
booking pages the
higher price
32. Dynamic OmniChannel Pricing
5/3/2016 www.bilot.fi 33
Store
Consumer
buying
On-line Channel
Consumption
(IoT)
Price Cache
(SmartPricing
Accelerator SPA)
Pricing
rules
Price List
Customer
Product
Basket Size
History
Warehouse levels
Delivery time / type
WebSite Activity
IOT consumption
MQ
Analytics and Pricing Simulations
(SmartPricing)
Supply Chain
Management
(+other sources)
Batch
Processing
& History
Second & Minute Level Price optimizationMonthly level Price optimization
Orders /
ClickStream
Sensor Data
POS data
CEP
33. Demo Scope
5/3/2016 www.bilot.fi 34
Consumer
buying
Pricing
updates
MQ
Analytics
Data
Warehouse CEP
WebShopClickStream
ClickStream
Orders, Product categories, Suppliers
MS SQL Server
HTML5 + Tomcat server
Kafka
HDFS +
Hive
MS PowerBI
Log file sniffing to stream
Flume-ng
Every visit to ”product
page” increases
price with 5%
Indentifies ”product page”
and viewed product +
sends request to increase
price
38. Bilot’s Hadoop Accelerator Program
03/05/2016 www.bilot.fi 39
1. Business
Strategy
2. Hadoop
bootcamp
3. Proof of
Concept
4. Proof of
Solution
5. Build &
Implement
6. Run
0,5 day 1 day
• Intro to Hadoop
• Vision
• Use cases
• Prioritization
• 1 use case
• Deep dive with
business, IT,
and operations
• Business case
• Platform
deployed on
Azure
• Integrations +
use case
• Look & feel
• Test drive
• Scalability
• Security
• Tools and
methods
• Cloud/on-prem
• Licences/ support
descriptions
• Implementation
• Agile dev
• Roll-out and
roadmap
• Change mgmt.
begins
• Hadoop as a
Service
• AMS
• Data driven
enterprise/
organization
dev
2 - 8 weeks 2-3 months 3-6 months
• Insight for
Hadoop-
enabled
business
• List of
prioritized
Hadoop use
cases
DELIVERABLES
• Business case
for PoC use
case
• “How to get
there?”
• Technical: Up
and running
system and
technical
evaluation
• Confirmed
business case
• Plans for
scalable and
secure Hadoop
solution ready
for
implementation
• Hadoop
implemented
• Roadmap for
further use
cases
• Fully functional
Hadoop
environment
• Continuous
support model
• Organizational
adaptation
PoC / Pilot Production implementation
Contact Bilot to hear more
39. Interested? Contact us for a tailored demo
and workshop!
Bilot is Hortonworks’ first systems integrator partner in Finland and
Microsoft’s Gold Partner
03/05/2016 www.bilot.fi 40
Real customer usecases and industry examples
available for demo. Contact us for your own
tailored session!
In pre-PoC phase for sandboxing and light
demo purposes we can utilize Azure or Bilot’s 5-
node on-premises HDP cluster
Mikko Mattila
Solution Lead,
Analytics
Mikko.mattila@bilot.fi
@MattilaJMikko
Tuomas Autio
Head of Big Data & BI
Business Lead
tuomas.autio@bilot.fi
@BigDataTuomas