1. Big Data Solutions on Cloud – the way
forward
By: K. A. Kiththi Perera
Chief Enterprise and Wholesale Officer
Sri Lanka Telecom
ITU-TRCSL Symposium on Cloud Computing 2015
Colombo
Session 04: Big Data Strategy in the Cloud and Applications
2. Big Data Analytics and
Cloud Computing
• Two ICT initiatives are currently top of mind for organizations;
– Big Data Analytics and
– Cloud Computing
• Big Data Analytics offer;
– Valuable insights to create competitive advantage
– Spark new innovations and
– Drive Revenue
• Cloud Computing offer;
– Enhance Business Agility and Productivity
– Enable greater efficiencies and
– Reduce Costs
Both Technologies continue to evolve
6. What’s driving Big Data
- Ad-hoc querying and reporting
- Data mining techniques
- Structured data, typical sources
- Small to mid-size datasets
- Optimizations and predictive analytics
- Complex statistical analysis
- All types of data, and many sources
- Very large datasets
- More of a real-time
7. Value of Big Data Analytics
• Big Data is more real-
time in nature than
traditional DW
applications
• Traditional DW
Architectures (e.g.
Exadata, Teradata) are
not well-suited for big
data apps
• Shared, massively
parallel processing, scale
out architectures are
well-suited for big data
apps
8. “Without big data, you are blind
and deaf in the middle of a
freeway”
Geoffrey Moore, management consultant and theorist
Need to have a high-performance and easy-to-use data
transformation and analytic solution for Big Data
10. Hadoop Functional Blocks
Hive - A high-level language built on top of MapReduce for analyzing large data sets .
Pig - Enables the analysis of large data sets using Pig Latin.
Sqoop - ("SQL to Hadoop") is a Java-based application designed for transferring bulk data between
Apache Hadoop and non-Hadoop data stores
11. Hadoop Core Components
• HDFS – Hadoop Distributed File System (Distributed Storage);
– Distributed across multiple “nodes”
– Natively redundant
– “NameNode” tracks locations
• Map Reduce (Distributed Processing);
– Split a task across processors
– Self-Healing, High Bandwidth
– Clustered Storage
– JobTracker manages TaskTrackers
14. Alternatives to Hadoop
• Many believe that Big Data and Hadoop is the only option
• Hadoop's historic focus on Batch Processing of data was well
supported by ‘MapReduce’
• But there is a need for more flexible developer tool to support;
– The larger market of 'mid-size data sets’ and
– Use cases that call for ‘real-time processing’
• Apache Spark: Preparing for the Next Wave of Reactive Big Data
18. Economics of Cloud Users
Unused resources
• Pay by use instead of provisioning for peak
Static data center Data center in the cloud
Demand
Capacity
Time
Resources
Demand
Capacity
TimeResources
19. Cloud Computing Modalities
• Hosted Applications and services
• Pay-as-you-go model
• Scalability, fault-tolerance,
elasticity, and self-manageability
• Very large data repositories
• Complex analysis
• Distributed and parallel data
processing
“Can we outsource our IT software and
hardware infrastructure?”
“We have terabytes of click-stream data –
what can we do with it?”
EDBT 2011 Tutorial
20. Big Data - Cloud Option
and Challenges
• Key to big data success;
– Elastic Infrastructure and
– Data gravity
• Cloud is emerging as increasingly popular option for new
analytics applications and processing big data
• Challenge - movement of hundreds of terabytes or petabytes
of data across the network
– Traditional data is largely located in Enterprise Data Warehouse
– Limited speed in the WAN
• New data sets – weather data, census data, machine and
sensor data originate from outside the enterprise
– Cloud becomes the ideal place to capture and data processing
Cloud Service Providers to offer “Hadoop/Spark as a service”
bundled with “High Speed Connectivity”
21. SLT “akaza” cloud services
IAAS
Infrastructure
as a Service
SAAS
Software as
a Service
DAAS
Desktop as a
Service
CAAS
Communicati
on as a
Service
PAAS
Platform as a
Service
22. Big Data Use Cases
Optimize Funnel Conversion01
Behavioral Analytics02
Customer Segmentation03
Predictive Support04
Market Analysis and pricing optimization05
Predict Security Threats06
23. Big data analytics allows companies to track
leads through the entire sales conversion
process, from a click on an adword ad to the
final transaction, in order to uncover insights
on how the conversion process can be
improved.
Optimize Funnel Conversion
25. With access to data on consumer behavior,
companies can learn what prompts a customer
to stick around longer as well as learn more
about their customer’s characteristics and
purchasing habits in order to improve
marketing efforts and boost profits.
Behavioral Analytics
26. PURPOSE:
McDonalds tracks vast amounts of data in order to improve operations and
boost the customer experience. The company looks at factors such as the
design of the drive-thru, information provided on the menu, wait times,
size of orders and ordering patterns in order to optimize each restaurant
to its particular market.
Company
McDonald’s
Industry
Food and Beverage
Employees
750,000
Type
Behavioral Analytics
Behavioral Analytics
27. By accessing data about the consumer from
multiple sources, such as social media data
and transaction history, companies can better
segment and target their customers and start
to make personalized offers to those
customers.
Customer Segmentation
29. Through sensors and other machine-generated
data, companies can identify when a
malfunction is likely to occur. The company can
then proactively order parts and make repairs
in order to avoid downtime and lost profits.
Predictive Support