SlideShare une entreprise Scribd logo
1  sur  26
© 2013 IBM Corporation
Making Hadoop Ready for the Enterprise
Hadoop Summit, June 27, 2013
Anjul Bhambhri
Vice-President, IBM Big Data Development
Safe area – no graphics here
Safe area – no graphics hereSafearea–nographicshere
Safearea–nographicshere
Big Data is the next Natural Resource
Harvesting any resource requires Mining, Refining and Delivering
Big Dataisthenext
Natural Resource
“We have for the first time an economy
based on a key resource (Information)
that is not only renewable, but self-generating.
Running out of it is not a problem, but
drowning in it is.”
— John Naisbitt
Cost efficiently
processing the
growing Volume
300x
20202005 Source: IDC
Responding to the
increasing Velocity
19 Billion
RFID
sensors and
counting
Source: RFID Forecasts
Responding to the
increasing Velocity
19 Billion
RFID
sensors and
counting
Source: RFID Forecasts
Collectively
analyzing the
broadening Variety
Source: IBM Market Information
80% of the
world’s data
is unstructured
Collectively
analyzing the
broadening Variety
Source: IBM Market Information
80% of the
world’s data
is unstructured
Establishing the
Veracity of big
data sources
1 in 3 business leaders don’t trust
the information they use to make
decisions
Source: IBM. BAO for the Intelligent Enterprise
Establishing the
Veracity of big
data sources
1 in 3 business leaders don’t trust
the information they use to make
decisions
Source: IBM. BAO for the Intelligent Enterprise
40 ZB
24 hour
earlier detection of infections
You could detect a neonatal
infections sooner?
What if…
Big Data enabled doctors from University of Ontario to apply neonatal infant
monitoring to predict infection in ICU 24 hours in advance
120 children monitored :120K message
per sec, billion messages per day
Solution
© 2013 IBM Corporation4
Constant Contact Transforming
Marketing Campaign
Effectiveness with IBM Big Data
• Analyze 35 billion annual emails to
guide customers on best dates &
times to send emails for maximum
response
Benefits
• 40 times improvement in analysis
performance
• 15-25% performance increase in
customer email campaigns
• Analysis time reduced from hours to
seconds
© 2013 IBM Corporation5
Automobile and
Manufacturing Quality
Control and Customer
Satisfaction
• In-flexibility and scalability limitations
of existing IT solutions has been a
inhibitor to competitive advantage.
A new solution is needed to
improve quality and operational
efficiency
• Inventory control of parts
• Manufacturing equipment and
assembly line data
• Warranty and services data from
dealers
• Telemetry data from vehicles
• Next generation of Enterprise Data
Warehouse:
© 2013 IBM Corporation6
Big Data and Technology PlatformBig Data and Technology Platform
Transactional &
Application Data Machine Data Enterprise ContentSocial Data
New Opportunities with Big Data & Analytics
© 2013 IBM Corporation7
Big Data and Technology PlatformBig Data and Technology Platform
Roles and AnalyticsRoles and Analytics
Data Scientist Business Analyst User
New Opportunities with Big Data & Analytics
© 2013 IBM Corporation8
Big Data and Technology PlatformBig Data and Technology Platform
Roles and AnalyticsRoles and Analytics
New OutcomesNew Outcomes
Enrich
info base
Improve customer
interaction
Reduce
risk
Gain efficiency
and scale
Optimize
and monetize
New Opportunities with Big Data & Analytics
Emerging Pattern of Big Data Implementation
Ingest
Landing and Analytics Sandbox Zone
Indexes,
facets
Hive/HBase
Col Stores
Documents
In Variety
of Formats
Analytics
MapReduce
Repository, Workbench
Ingestion and Real-time Analytic Zone
Data
Sinks
Filter, Transform
Ingest
Correlate, Classify
Extract, Annotate
Warehousing Zone
Enterprise
Warehouse
Data Marts
Query
Engines
Cubes
Descriptive,
Predictive
Models
Models
Widgets
Discovery,
Visualizer
Search
Analytics and
Reporting Zone
Metadata and Governance Zone
9
Connectors
Big Data Exploration
Find, visualize, understand
all big data to improve
decision making
Enhanced 360o
View
of the Customer
Extend existing customer
views (MDM, CRM, etc) by
incorporating additional
internal and external
information sources
Operations Analysis
Analyze a variety of machine
data for improved business results
Data Warehouse Augmentation
Integrate big data and data warehouse
capabilities to increase operational efficiency
Security/Intelligence
Extension
Lower risk, detect fraud
and monitor cyber security
in real-time
The 5 Key Use Cases
Cloud | Mobile | Security
Big Data Platform and Application Framework
Gather, extract and
explore data using
best of breed
visualization
Speed time to value
with analytic and
application
acceleratorsBI /
Reporting
BI /
Reporting
Exploration /
Visualization
Functional
App
Industry
App
Predictive
Analytics
Content
Analytics
Analytic Applications
IBM Big Data Platform
Systems
Management
Applications &
Development
Visualization
& Discovery
Analyze streaming
data and large data
bursts for real-time
insights
Govern data quality
and manage
information lifecycle
Cost-effectively
analyze
petabytes of
structured and
unstructured
information
Deliver deep insight
with advanced
in-database analytics
and operational
analytics
Accelerators
Information Integration & Governance
Hadoop
System
Stream
Computing
Data
Warehouse
Contextual
Discovery
Index and federated
discovery for
contextual
collaborative insights
© 2013 IBM Corporation12
Enterprise Capabilities on Hadoop
Enterprise Capabilities
Administration & Security
Workload Optimization
Connectors
Open source
components
Advanced Engines
Visualization & Exploration
Development Tools
Key Platform Requirements
– Built-in analytics
– Enterprise-grade capabilities
– Integrated with enterprise software
– Ease of installation and management
– Reference hardware configurations
– World-class support
– Full open source compatibility
Business benefits
– Quicker time-to-value
– Reduced operational risk
– Enhanced business knowledge with flexible
analytical platform
– Leverages and complements existing
software investments
IBM-certified
Apache Hadoop
13 © 2013 IBM Corporation
Application
Big SQL Engine
Hadoop
HiveTables HBase tables CSV Files
Data Sources
SQL Language
JDBC / ODBC Driver
JDBC / ODBC Server
Big Data needs SQL
• Most existing applications in
the enterprise use SQL
• SQL bridges the chasm
between existing apps and Big
Data
• SQL access to all data stored in
Hadoop
• Via JDBC/ODBC
• Using rich standard SQL
• Intelligently leverage
Map/Reduce parallelism
OR direct access for
achieving low-latency
© 2013 IBM Corporation14
Text Analytics: Getting measurable insights
• Most of the world’s data is in unstructured or semi-structured text.
• Social media is rife with discussions about products and services
• Company Internal Information is locked in blobs, description fields, and sometimes even
discarded
• How do you get a metrics based understanding of facts from unstructured text?
Healthcare Analytics: E-Medical records, hospital
reports
Public Sectors Case files, police records, emergency calls…
Automotive Quality Insight: Tech notes, call logs,
online media
Insurance Fraud: Insurance claims
Social Media for Marketing: twitter, facebook, blogs,
forums
Over 80% of stored information is unstructured*
Structural analysis
Mining and visualization
© 2013 IBM Corporation15
Football World Cup 2010, one team distinguished
themselves well, losing to the eventual champions 1-
0 in the Final. Early in the second half,
Netherlands’ striker, Arjen Robben, had a breakaway,
but the keeper for Spain, Iker Casilas made the save.
Winger Andres Iniesta scored for Spain for the win.
NetherlandsStrikerArjen Robben
Keeper SpainIker Casilas
WingerAndres Iniesta Spain
World Cup 2010 Highlights
How Text Analytics Works
© 2013 IBM Corporation16
Text Analytics Language and Runtime
 Declarative SQL-like
language
 Discovery tools for AQL
development
Text Analytics
Runtime
Text Analytics
Runtime
Input
Documents
Offline Runtime
Development Environment
AQL ExtractorAQL Extractor
 High-throughput
 Small memory footprint
create view Employment as
select R.jobType as jobType,
C.name as companyName
from Company C, Role R
where
Follows(R.jobType, C.name, 0, 20)
and ContainsDict('EmpAssociation.dict',
RightContext(R.jobType,10));
Extracted
Objects
Cost-based
optimization
Dominant Cost is CPU
Role
Select
Join
Company
Dict
Role
Dict
Select
Company
Join
…
Role
Join
Select
Company
Dict
General-Purpose Linguistic
Parsers
DictionariesDictionaries
© 2013 IBM Corporation17
Enterprise Data
Tools
Business User
Data Scientist
Business Analyst
Developer
Administrator
© 2013 IBM Corporation18
Security and compliance in Big Data environments
Structure
d
Unstructured
Streaming
Big Data Platform
Hadoop Cluster
Clients
• Who is running specific big data
requests?
• What map-reduce jobs are they
running?
• Are these jobs part of an authorized
program list accessing the data?
• Is there an exceptional number of file
permission exceptions?
• Taps for Hadoop
• Collects and streams audit data to Collector
• Provides visibility for HDFS, MapReduce,
RPC, Oozie, HBase, etc.
• Securely stores audit data collected by TAPs
• Provides analytics, reporting & compliance
workflow automation
© 2013 IBM Corporation19
Data Archiving and Masking on Hadoop
Data Archiving
Database
Hadoop
Data Masking
JASON MICHAELS ROBERT SMITH
Mask
Before Masking After Masking
Mask in-databaseMask in-database
ExtractExtract MaskMask
Mask in HadoopMask in Hadoop
Archive & PurgeArchive & Purge
LoadLoad
Query-able
Auditable
Restorable Data
Query-able
Auditable
Restorable DataComplete Business Objects
Data Integrity
Schema, Metadata
Retention Policies
Archive
files
CompressCompress
• Mask confidential data to avoid data
breach & meet privacy compliance
• Protect confidential data while preserving analytics
• Support compliance with privacy regulations
• Cost-effective query-able archiving
• Manage, apply retention policies for
compliance
• Enable business users to query on Hot, Warm
and Cold data
20 © 2013 IBM Corporation
Simplified Experience
• Designed for easy and quick deployment
• Built-in tools designed for users to derive value quickly
• Easy connectivity to common data warehouse systems
Built-in Expertise
• Enables ‘what-if analysis’ and advanced analytics
• Supports structured, semi-structured, and unstructured data
• Built-in text processing engine and library of annotators
to analyze large volumes of text-based information
• Data can be used in its native format
eliminating need to pre-define and map structures
Integration by Design
• InfoSphere BigInsights software, cluster management, and
IBM System x® servers
• Automatic parallelization and resource optimization to scale
economically
• Enterprise-class security and platform management
Introducing pureData for Hadoop
– BigInsights Appliance
21 © 2013 IBM Corporation
© 2013 IBM Corporation
From Getting Starting to Enterprise Deployment:
InfoSphere BigInsights Brings Hadoop to the Enterprise
Enterprise Edition
Breadth of capabilities
Enterpriseclass
Sold by # of terabytes managed
PureData for Hadoop
Appliance simplicity for the enterprise
Quick Start Edition
* Pre-announced
Web-based
mgmt console
Jaql
Integrated installApache
Hadoop
Basic Edition
Free download
Quick Start features
PLUS:
Accelerators
Enterprise Integration
Production support
Production-ready features
Big Sheets
Text Analytics
Big SQL
Workload
optimization/
Query support
Dev tools
Connectors
Mgmt tools
IBM Hadoop
Core
Free download, non-production
22 © 2013 IBM Corporation
Streams - Real Time Analytics
22
23 © 2013 IBM Corporation
InfoSphere Data Explorer – delivering insights at the point of
impact
Create unified view
of ALL information
for real-time
monitoring
Identify areas of information
risk & ensure data
compliance
Analyze customer data to
unlock true customer
value
Increase productivity &
leverage past work
increasing speed to market
Improve customer
service & reduce
call times
InfoSphere
Data Explorer
Data access & integration
•Index structured & unstructured
data—in place
•Support existing security
•Federate to external sources
•Leverage MDM, governance,
and taxonomies
Discovery & navigation
•Clustering & categorization
•Contextual intelligence
•Easy-to-deploy applications
•All at the scale required for
today’s big data challenges
Providing unified, real-time
access and fusion of big
data unlocks greater
insight and ROI
24 © 2013 IBM Corporation
Organizations are Building Big Data Applications on Data Explorer
DataExplorerAppBuilder
Warehouse
Structured Enterprise
Data
BigInsights
Data at rest
Data Explorer
Semi- & unstructured
enterprise data
Streams
Data in motion
25 © 2013 IBM Corporation
Get Started on Your Big Data Journey Today
Get Educated
• IBM Big Data: ibm.com/bigdata
• IBMBigDataHub.com
• BigDataUniversity.com
Get Your Hands on Big Data
• Download Quick Start
ibm.coQuickStart
THINK
26

Contenu connexe

Tendances

Meg Mude, Intel - Data Engineering Lifecycle Optimized on Intel - H2O World S...
Meg Mude, Intel - Data Engineering Lifecycle Optimized on Intel - H2O World S...Meg Mude, Intel - Data Engineering Lifecycle Optimized on Intel - H2O World S...
Meg Mude, Intel - Data Engineering Lifecycle Optimized on Intel - H2O World S...Sri Ambati
 
2014 10 09 Top reasons to use IBM BigInsights as your Big Data Hadoop system
2014 10 09 Top reasons to use IBM BigInsights as your Big Data Hadoop system2014 10 09 Top reasons to use IBM BigInsights as your Big Data Hadoop system
2014 10 09 Top reasons to use IBM BigInsights as your Big Data Hadoop systemToby Woolfe
 
Big data ibm keynote d advani presentation
Big data ibm keynote d advani presentationBig data ibm keynote d advani presentation
Big data ibm keynote d advani presentationMassTLC
 
Big Data Platform Landscape by 2017
Big Data Platform Landscape by 2017Big Data Platform Landscape by 2017
Big Data Platform Landscape by 2017Donghui Zhang
 
Deutsche Telekom on Big Data
Deutsche Telekom on Big DataDeutsche Telekom on Big Data
Deutsche Telekom on Big DataDataWorks Summit
 
Top 10 ways BigInsights BigIntegrate and BigQuality will improve your life
Top 10 ways BigInsights BigIntegrate and BigQuality will improve your lifeTop 10 ways BigInsights BigIntegrate and BigQuality will improve your life
Top 10 ways BigInsights BigIntegrate and BigQuality will improve your lifeIBM Analytics
 
A New Day for Oracle Analytics
A New Day for Oracle AnalyticsA New Day for Oracle Analytics
A New Day for Oracle AnalyticsRich Clayton
 
Data Science at Speed. At Scale.
Data Science at Speed. At Scale.Data Science at Speed. At Scale.
Data Science at Speed. At Scale.DataWorks Summit
 
"Empower Developers with HPE Machine Learning and Augmented Intelligence", Dr...
"Empower Developers with HPE Machine Learning and Augmented Intelligence", Dr..."Empower Developers with HPE Machine Learning and Augmented Intelligence", Dr...
"Empower Developers with HPE Machine Learning and Augmented Intelligence", Dr...Dataconomy Media
 
Inteligencia artificial - Quebrando el paradigma de la amnesia empresarial
Inteligencia artificial - Quebrando el paradigma de la amnesia empresarialInteligencia artificial - Quebrando el paradigma de la amnesia empresarial
Inteligencia artificial - Quebrando el paradigma de la amnesia empresarialMarcos Quezada
 
A modern data platform meets the needs of each type of data in your business
A modern data platform meets the needs of each type of data in your businessA modern data platform meets the needs of each type of data in your business
A modern data platform meets the needs of each type of data in your businessMarcos Quezada
 
Making Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with EaseMaking Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with EaseHortonworks
 
Bitkom Cray presentation - on HPC affecting big data analytics in FS
Bitkom Cray presentation - on HPC affecting big data analytics in FSBitkom Cray presentation - on HPC affecting big data analytics in FS
Bitkom Cray presentation - on HPC affecting big data analytics in FSPhilip Filleul
 
Monitizing Big Data at Telecom Service Providers
Monitizing Big Data at Telecom Service ProvidersMonitizing Big Data at Telecom Service Providers
Monitizing Big Data at Telecom Service ProvidersDataWorks Summit
 
EMC World 2014 Breakout: Move to the Business Data Lake – Not as Hard as It S...
EMC World 2014 Breakout: Move to the Business Data Lake – Not as Hard as It S...EMC World 2014 Breakout: Move to the Business Data Lake – Not as Hard as It S...
EMC World 2014 Breakout: Move to the Business Data Lake – Not as Hard as It S...Capgemini
 
"From Big Data To Big Valuewith HPE Predictive Analytics & Machine Learning",...
"From Big Data To Big Valuewith HPE Predictive Analytics & Machine Learning",..."From Big Data To Big Valuewith HPE Predictive Analytics & Machine Learning",...
"From Big Data To Big Valuewith HPE Predictive Analytics & Machine Learning",...Dataconomy Media
 

Tendances (20)

Meg Mude, Intel - Data Engineering Lifecycle Optimized on Intel - H2O World S...
Meg Mude, Intel - Data Engineering Lifecycle Optimized on Intel - H2O World S...Meg Mude, Intel - Data Engineering Lifecycle Optimized on Intel - H2O World S...
Meg Mude, Intel - Data Engineering Lifecycle Optimized on Intel - H2O World S...
 
2014 10 09 Top reasons to use IBM BigInsights as your Big Data Hadoop system
2014 10 09 Top reasons to use IBM BigInsights as your Big Data Hadoop system2014 10 09 Top reasons to use IBM BigInsights as your Big Data Hadoop system
2014 10 09 Top reasons to use IBM BigInsights as your Big Data Hadoop system
 
Big data ibm keynote d advani presentation
Big data ibm keynote d advani presentationBig data ibm keynote d advani presentation
Big data ibm keynote d advani presentation
 
Semantic Data Management
Semantic Data ManagementSemantic Data Management
Semantic Data Management
 
Big Data Platform Landscape by 2017
Big Data Platform Landscape by 2017Big Data Platform Landscape by 2017
Big Data Platform Landscape by 2017
 
Deutsche Telekom on Big Data
Deutsche Telekom on Big DataDeutsche Telekom on Big Data
Deutsche Telekom on Big Data
 
Top 10 ways BigInsights BigIntegrate and BigQuality will improve your life
Top 10 ways BigInsights BigIntegrate and BigQuality will improve your lifeTop 10 ways BigInsights BigIntegrate and BigQuality will improve your life
Top 10 ways BigInsights BigIntegrate and BigQuality will improve your life
 
A New Day for Oracle Analytics
A New Day for Oracle AnalyticsA New Day for Oracle Analytics
A New Day for Oracle Analytics
 
Data Science at Speed. At Scale.
Data Science at Speed. At Scale.Data Science at Speed. At Scale.
Data Science at Speed. At Scale.
 
"Empower Developers with HPE Machine Learning and Augmented Intelligence", Dr...
"Empower Developers with HPE Machine Learning and Augmented Intelligence", Dr..."Empower Developers with HPE Machine Learning and Augmented Intelligence", Dr...
"Empower Developers with HPE Machine Learning and Augmented Intelligence", Dr...
 
Inteligencia artificial - Quebrando el paradigma de la amnesia empresarial
Inteligencia artificial - Quebrando el paradigma de la amnesia empresarialInteligencia artificial - Quebrando el paradigma de la amnesia empresarial
Inteligencia artificial - Quebrando el paradigma de la amnesia empresarial
 
A modern data platform meets the needs of each type of data in your business
A modern data platform meets the needs of each type of data in your businessA modern data platform meets the needs of each type of data in your business
A modern data platform meets the needs of each type of data in your business
 
Big Data on AWS
Big Data on AWSBig Data on AWS
Big Data on AWS
 
Big Data Overview
Big Data OverviewBig Data Overview
Big Data Overview
 
Making Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with EaseMaking Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with Ease
 
Bitkom Cray presentation - on HPC affecting big data analytics in FS
Bitkom Cray presentation - on HPC affecting big data analytics in FSBitkom Cray presentation - on HPC affecting big data analytics in FS
Bitkom Cray presentation - on HPC affecting big data analytics in FS
 
Monitizing Big Data at Telecom Service Providers
Monitizing Big Data at Telecom Service ProvidersMonitizing Big Data at Telecom Service Providers
Monitizing Big Data at Telecom Service Providers
 
EMC World 2014 Breakout: Move to the Business Data Lake – Not as Hard as It S...
EMC World 2014 Breakout: Move to the Business Data Lake – Not as Hard as It S...EMC World 2014 Breakout: Move to the Business Data Lake – Not as Hard as It S...
EMC World 2014 Breakout: Move to the Business Data Lake – Not as Hard as It S...
 
"From Big Data To Big Valuewith HPE Predictive Analytics & Machine Learning",...
"From Big Data To Big Valuewith HPE Predictive Analytics & Machine Learning",..."From Big Data To Big Valuewith HPE Predictive Analytics & Machine Learning",...
"From Big Data To Big Valuewith HPE Predictive Analytics & Machine Learning",...
 
The Manulife Journey
The Manulife JourneyThe Manulife Journey
The Manulife Journey
 

Similaire à Making Hadoop Ready for the Enterprise

Get Started Quickly with IBM's Hadoop as a Service
Get Started Quickly with IBM's Hadoop as a ServiceGet Started Quickly with IBM's Hadoop as a Service
Get Started Quickly with IBM's Hadoop as a ServiceIBM Cloud Data Services
 
IBM Smarter Analytics
IBM Smarter AnalyticsIBM Smarter Analytics
IBM Smarter AnalyticsAdrian Turcu
 
Big Data: Introducing BigInsights, IBM's Hadoop- and Spark-based analytical p...
Big Data: Introducing BigInsights, IBM's Hadoop- and Spark-based analytical p...Big Data: Introducing BigInsights, IBM's Hadoop- and Spark-based analytical p...
Big Data: Introducing BigInsights, IBM's Hadoop- and Spark-based analytical p...Cynthia Saracco
 
OC Big Data Monthly Meetup #6 - Session 1 - IBM
OC Big Data Monthly Meetup #6 - Session 1 - IBMOC Big Data Monthly Meetup #6 - Session 1 - IBM
OC Big Data Monthly Meetup #6 - Session 1 - IBMBig Data Joe™ Rossi
 
SD Big Data Monthly Meetup #4 - Session 1 - IBM
SD Big Data Monthly Meetup #4 - Session 1 - IBMSD Big Data Monthly Meetup #4 - Session 1 - IBM
SD Big Data Monthly Meetup #4 - Session 1 - IBMBig Data Joe™ Rossi
 
Analyzing Big Data - Jeff Scheel
Analyzing Big Data - Jeff ScheelAnalyzing Big Data - Jeff Scheel
Analyzing Big Data - Jeff ScheelKangaroot
 
Why Infrastructure Matters for Big Data & Analytics
Why Infrastructure Matters for Big Data & AnalyticsWhy Infrastructure Matters for Big Data & Analytics
Why Infrastructure Matters for Big Data & AnalyticsRick Perret
 
Ανδρέας Τσαγκάρης, 5th Digital Banking Forum
Ανδρέας Τσαγκάρης, 5th Digital Banking ForumΑνδρέας Τσαγκάρης, 5th Digital Banking Forum
Ανδρέας Τσαγκάρης, 5th Digital Banking ForumStarttech Ventures
 
Future of Power: Power Strategy and Offerings for Denmark - Steve Sibley
Future of Power: Power Strategy and Offerings for Denmark - Steve SibleyFuture of Power: Power Strategy and Offerings for Denmark - Steve Sibley
Future of Power: Power Strategy and Offerings for Denmark - Steve SibleyIBM Danmark
 
2016 Sept 1st - IBM Consultants & System Integrators Interchange - Big Data -...
2016 Sept 1st - IBM Consultants & System Integrators Interchange - Big Data -...2016 Sept 1st - IBM Consultants & System Integrators Interchange - Big Data -...
2016 Sept 1st - IBM Consultants & System Integrators Interchange - Big Data -...Anand Haridass
 
ICP for Data- Enterprise platform for AI, ML and Data Science
ICP for Data- Enterprise platform for AI, ML and Data ScienceICP for Data- Enterprise platform for AI, ML and Data Science
ICP for Data- Enterprise platform for AI, ML and Data ScienceKaran Sachdeva
 
Three Dimensions of Data as a Service
Three Dimensions of Data as a ServiceThree Dimensions of Data as a Service
Three Dimensions of Data as a ServiceDenodo
 
Getting started with Hadoop on the Cloud with Bluemix
Getting started with Hadoop on the Cloud with BluemixGetting started with Hadoop on the Cloud with Bluemix
Getting started with Hadoop on the Cloud with BluemixNicolas Morales
 
Hadoop Reporting and Analysis - Jaspersoft
Hadoop Reporting and Analysis - JaspersoftHadoop Reporting and Analysis - Jaspersoft
Hadoop Reporting and Analysis - JaspersoftHortonworks
 
Big Data LDN 2017: The New Dominant Companies Are Running on Data
Big Data LDN 2017: The New Dominant Companies Are Running on DataBig Data LDN 2017: The New Dominant Companies Are Running on Data
Big Data LDN 2017: The New Dominant Companies Are Running on DataMatt Stubbs
 
Big Data LDN 2017: The New Dominant Companies Are Running on Data
Big Data LDN 2017: The New Dominant Companies Are Running on DataBig Data LDN 2017: The New Dominant Companies Are Running on Data
Big Data LDN 2017: The New Dominant Companies Are Running on DataMatt Stubbs
 
The new dominant companies are running on data
The new dominant companies are running on data The new dominant companies are running on data
The new dominant companies are running on data SnapLogic
 
IBM Software Day 2013. Smarter analytics and big data. building the next gene...
IBM Software Day 2013. Smarter analytics and big data. building the next gene...IBM Software Day 2013. Smarter analytics and big data. building the next gene...
IBM Software Day 2013. Smarter analytics and big data. building the next gene...IBM (Middle East and Africa)
 
Why Infrastructure matters?!
Why Infrastructure matters?!Why Infrastructure matters?!
Why Infrastructure matters?!Gabi Bauer
 

Similaire à Making Hadoop Ready for the Enterprise (20)

Get Started Quickly with IBM's Hadoop as a Service
Get Started Quickly with IBM's Hadoop as a ServiceGet Started Quickly with IBM's Hadoop as a Service
Get Started Quickly with IBM's Hadoop as a Service
 
IBM Smarter Analytics
IBM Smarter AnalyticsIBM Smarter Analytics
IBM Smarter Analytics
 
Big Data: Introducing BigInsights, IBM's Hadoop- and Spark-based analytical p...
Big Data: Introducing BigInsights, IBM's Hadoop- and Spark-based analytical p...Big Data: Introducing BigInsights, IBM's Hadoop- and Spark-based analytical p...
Big Data: Introducing BigInsights, IBM's Hadoop- and Spark-based analytical p...
 
OC Big Data Monthly Meetup #6 - Session 1 - IBM
OC Big Data Monthly Meetup #6 - Session 1 - IBMOC Big Data Monthly Meetup #6 - Session 1 - IBM
OC Big Data Monthly Meetup #6 - Session 1 - IBM
 
SD Big Data Monthly Meetup #4 - Session 1 - IBM
SD Big Data Monthly Meetup #4 - Session 1 - IBMSD Big Data Monthly Meetup #4 - Session 1 - IBM
SD Big Data Monthly Meetup #4 - Session 1 - IBM
 
Analyzing Big Data - Jeff Scheel
Analyzing Big Data - Jeff ScheelAnalyzing Big Data - Jeff Scheel
Analyzing Big Data - Jeff Scheel
 
Why Infrastructure Matters for Big Data & Analytics
Why Infrastructure Matters for Big Data & AnalyticsWhy Infrastructure Matters for Big Data & Analytics
Why Infrastructure Matters for Big Data & Analytics
 
Ανδρέας Τσαγκάρης, 5th Digital Banking Forum
Ανδρέας Τσαγκάρης, 5th Digital Banking ForumΑνδρέας Τσαγκάρης, 5th Digital Banking Forum
Ανδρέας Τσαγκάρης, 5th Digital Banking Forum
 
Future of Power: Power Strategy and Offerings for Denmark - Steve Sibley
Future of Power: Power Strategy and Offerings for Denmark - Steve SibleyFuture of Power: Power Strategy and Offerings for Denmark - Steve Sibley
Future of Power: Power Strategy and Offerings for Denmark - Steve Sibley
 
2016 Sept 1st - IBM Consultants & System Integrators Interchange - Big Data -...
2016 Sept 1st - IBM Consultants & System Integrators Interchange - Big Data -...2016 Sept 1st - IBM Consultants & System Integrators Interchange - Big Data -...
2016 Sept 1st - IBM Consultants & System Integrators Interchange - Big Data -...
 
ICP for Data- Enterprise platform for AI, ML and Data Science
ICP for Data- Enterprise platform for AI, ML and Data ScienceICP for Data- Enterprise platform for AI, ML and Data Science
ICP for Data- Enterprise platform for AI, ML and Data Science
 
Three Dimensions of Data as a Service
Three Dimensions of Data as a ServiceThree Dimensions of Data as a Service
Three Dimensions of Data as a Service
 
Getting started with Hadoop on the Cloud with Bluemix
Getting started with Hadoop on the Cloud with BluemixGetting started with Hadoop on the Cloud with Bluemix
Getting started with Hadoop on the Cloud with Bluemix
 
Hadoop Reporting and Analysis - Jaspersoft
Hadoop Reporting and Analysis - JaspersoftHadoop Reporting and Analysis - Jaspersoft
Hadoop Reporting and Analysis - Jaspersoft
 
Big Data LDN 2017: The New Dominant Companies Are Running on Data
Big Data LDN 2017: The New Dominant Companies Are Running on DataBig Data LDN 2017: The New Dominant Companies Are Running on Data
Big Data LDN 2017: The New Dominant Companies Are Running on Data
 
Big Data LDN 2017: The New Dominant Companies Are Running on Data
Big Data LDN 2017: The New Dominant Companies Are Running on DataBig Data LDN 2017: The New Dominant Companies Are Running on Data
Big Data LDN 2017: The New Dominant Companies Are Running on Data
 
The new dominant companies are running on data
The new dominant companies are running on data The new dominant companies are running on data
The new dominant companies are running on data
 
IBM Software Day 2013. Smarter analytics and big data. building the next gene...
IBM Software Day 2013. Smarter analytics and big data. building the next gene...IBM Software Day 2013. Smarter analytics and big data. building the next gene...
IBM Software Day 2013. Smarter analytics and big data. building the next gene...
 
Big Data in Azure
Big Data in AzureBig Data in Azure
Big Data in Azure
 
Why Infrastructure matters?!
Why Infrastructure matters?!Why Infrastructure matters?!
Why Infrastructure matters?!
 

Plus de DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

Plus de DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Dernier

Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 

Dernier (20)

Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 

Making Hadoop Ready for the Enterprise

  • 1. © 2013 IBM Corporation Making Hadoop Ready for the Enterprise Hadoop Summit, June 27, 2013 Anjul Bhambhri Vice-President, IBM Big Data Development
  • 2. Safe area – no graphics here Safe area – no graphics hereSafearea–nographicshere Safearea–nographicshere Big Data is the next Natural Resource Harvesting any resource requires Mining, Refining and Delivering Big Dataisthenext Natural Resource “We have for the first time an economy based on a key resource (Information) that is not only renewable, but self-generating. Running out of it is not a problem, but drowning in it is.” — John Naisbitt Cost efficiently processing the growing Volume 300x 20202005 Source: IDC Responding to the increasing Velocity 19 Billion RFID sensors and counting Source: RFID Forecasts Responding to the increasing Velocity 19 Billion RFID sensors and counting Source: RFID Forecasts Collectively analyzing the broadening Variety Source: IBM Market Information 80% of the world’s data is unstructured Collectively analyzing the broadening Variety Source: IBM Market Information 80% of the world’s data is unstructured Establishing the Veracity of big data sources 1 in 3 business leaders don’t trust the information they use to make decisions Source: IBM. BAO for the Intelligent Enterprise Establishing the Veracity of big data sources 1 in 3 business leaders don’t trust the information they use to make decisions Source: IBM. BAO for the Intelligent Enterprise 40 ZB
  • 3. 24 hour earlier detection of infections You could detect a neonatal infections sooner? What if… Big Data enabled doctors from University of Ontario to apply neonatal infant monitoring to predict infection in ICU 24 hours in advance 120 children monitored :120K message per sec, billion messages per day Solution
  • 4. © 2013 IBM Corporation4 Constant Contact Transforming Marketing Campaign Effectiveness with IBM Big Data • Analyze 35 billion annual emails to guide customers on best dates & times to send emails for maximum response Benefits • 40 times improvement in analysis performance • 15-25% performance increase in customer email campaigns • Analysis time reduced from hours to seconds
  • 5. © 2013 IBM Corporation5 Automobile and Manufacturing Quality Control and Customer Satisfaction • In-flexibility and scalability limitations of existing IT solutions has been a inhibitor to competitive advantage. A new solution is needed to improve quality and operational efficiency • Inventory control of parts • Manufacturing equipment and assembly line data • Warranty and services data from dealers • Telemetry data from vehicles • Next generation of Enterprise Data Warehouse:
  • 6. © 2013 IBM Corporation6 Big Data and Technology PlatformBig Data and Technology Platform Transactional & Application Data Machine Data Enterprise ContentSocial Data New Opportunities with Big Data & Analytics
  • 7. © 2013 IBM Corporation7 Big Data and Technology PlatformBig Data and Technology Platform Roles and AnalyticsRoles and Analytics Data Scientist Business Analyst User New Opportunities with Big Data & Analytics
  • 8. © 2013 IBM Corporation8 Big Data and Technology PlatformBig Data and Technology Platform Roles and AnalyticsRoles and Analytics New OutcomesNew Outcomes Enrich info base Improve customer interaction Reduce risk Gain efficiency and scale Optimize and monetize New Opportunities with Big Data & Analytics
  • 9. Emerging Pattern of Big Data Implementation Ingest Landing and Analytics Sandbox Zone Indexes, facets Hive/HBase Col Stores Documents In Variety of Formats Analytics MapReduce Repository, Workbench Ingestion and Real-time Analytic Zone Data Sinks Filter, Transform Ingest Correlate, Classify Extract, Annotate Warehousing Zone Enterprise Warehouse Data Marts Query Engines Cubes Descriptive, Predictive Models Models Widgets Discovery, Visualizer Search Analytics and Reporting Zone Metadata and Governance Zone 9 Connectors
  • 10. Big Data Exploration Find, visualize, understand all big data to improve decision making Enhanced 360o View of the Customer Extend existing customer views (MDM, CRM, etc) by incorporating additional internal and external information sources Operations Analysis Analyze a variety of machine data for improved business results Data Warehouse Augmentation Integrate big data and data warehouse capabilities to increase operational efficiency Security/Intelligence Extension Lower risk, detect fraud and monitor cyber security in real-time The 5 Key Use Cases
  • 11. Cloud | Mobile | Security Big Data Platform and Application Framework Gather, extract and explore data using best of breed visualization Speed time to value with analytic and application acceleratorsBI / Reporting BI / Reporting Exploration / Visualization Functional App Industry App Predictive Analytics Content Analytics Analytic Applications IBM Big Data Platform Systems Management Applications & Development Visualization & Discovery Analyze streaming data and large data bursts for real-time insights Govern data quality and manage information lifecycle Cost-effectively analyze petabytes of structured and unstructured information Deliver deep insight with advanced in-database analytics and operational analytics Accelerators Information Integration & Governance Hadoop System Stream Computing Data Warehouse Contextual Discovery Index and federated discovery for contextual collaborative insights
  • 12. © 2013 IBM Corporation12 Enterprise Capabilities on Hadoop Enterprise Capabilities Administration & Security Workload Optimization Connectors Open source components Advanced Engines Visualization & Exploration Development Tools Key Platform Requirements – Built-in analytics – Enterprise-grade capabilities – Integrated with enterprise software – Ease of installation and management – Reference hardware configurations – World-class support – Full open source compatibility Business benefits – Quicker time-to-value – Reduced operational risk – Enhanced business knowledge with flexible analytical platform – Leverages and complements existing software investments IBM-certified Apache Hadoop
  • 13. 13 © 2013 IBM Corporation Application Big SQL Engine Hadoop HiveTables HBase tables CSV Files Data Sources SQL Language JDBC / ODBC Driver JDBC / ODBC Server Big Data needs SQL • Most existing applications in the enterprise use SQL • SQL bridges the chasm between existing apps and Big Data • SQL access to all data stored in Hadoop • Via JDBC/ODBC • Using rich standard SQL • Intelligently leverage Map/Reduce parallelism OR direct access for achieving low-latency
  • 14. © 2013 IBM Corporation14 Text Analytics: Getting measurable insights • Most of the world’s data is in unstructured or semi-structured text. • Social media is rife with discussions about products and services • Company Internal Information is locked in blobs, description fields, and sometimes even discarded • How do you get a metrics based understanding of facts from unstructured text? Healthcare Analytics: E-Medical records, hospital reports Public Sectors Case files, police records, emergency calls… Automotive Quality Insight: Tech notes, call logs, online media Insurance Fraud: Insurance claims Social Media for Marketing: twitter, facebook, blogs, forums Over 80% of stored information is unstructured* Structural analysis Mining and visualization
  • 15. © 2013 IBM Corporation15 Football World Cup 2010, one team distinguished themselves well, losing to the eventual champions 1- 0 in the Final. Early in the second half, Netherlands’ striker, Arjen Robben, had a breakaway, but the keeper for Spain, Iker Casilas made the save. Winger Andres Iniesta scored for Spain for the win. NetherlandsStrikerArjen Robben Keeper SpainIker Casilas WingerAndres Iniesta Spain World Cup 2010 Highlights How Text Analytics Works
  • 16. © 2013 IBM Corporation16 Text Analytics Language and Runtime  Declarative SQL-like language  Discovery tools for AQL development Text Analytics Runtime Text Analytics Runtime Input Documents Offline Runtime Development Environment AQL ExtractorAQL Extractor  High-throughput  Small memory footprint create view Employment as select R.jobType as jobType, C.name as companyName from Company C, Role R where Follows(R.jobType, C.name, 0, 20) and ContainsDict('EmpAssociation.dict', RightContext(R.jobType,10)); Extracted Objects Cost-based optimization Dominant Cost is CPU Role Select Join Company Dict Role Dict Select Company Join … Role Join Select Company Dict General-Purpose Linguistic Parsers DictionariesDictionaries
  • 17. © 2013 IBM Corporation17 Enterprise Data Tools Business User Data Scientist Business Analyst Developer Administrator
  • 18. © 2013 IBM Corporation18 Security and compliance in Big Data environments Structure d Unstructured Streaming Big Data Platform Hadoop Cluster Clients • Who is running specific big data requests? • What map-reduce jobs are they running? • Are these jobs part of an authorized program list accessing the data? • Is there an exceptional number of file permission exceptions? • Taps for Hadoop • Collects and streams audit data to Collector • Provides visibility for HDFS, MapReduce, RPC, Oozie, HBase, etc. • Securely stores audit data collected by TAPs • Provides analytics, reporting & compliance workflow automation
  • 19. © 2013 IBM Corporation19 Data Archiving and Masking on Hadoop Data Archiving Database Hadoop Data Masking JASON MICHAELS ROBERT SMITH Mask Before Masking After Masking Mask in-databaseMask in-database ExtractExtract MaskMask Mask in HadoopMask in Hadoop Archive & PurgeArchive & Purge LoadLoad Query-able Auditable Restorable Data Query-able Auditable Restorable DataComplete Business Objects Data Integrity Schema, Metadata Retention Policies Archive files CompressCompress • Mask confidential data to avoid data breach & meet privacy compliance • Protect confidential data while preserving analytics • Support compliance with privacy regulations • Cost-effective query-able archiving • Manage, apply retention policies for compliance • Enable business users to query on Hot, Warm and Cold data
  • 20. 20 © 2013 IBM Corporation Simplified Experience • Designed for easy and quick deployment • Built-in tools designed for users to derive value quickly • Easy connectivity to common data warehouse systems Built-in Expertise • Enables ‘what-if analysis’ and advanced analytics • Supports structured, semi-structured, and unstructured data • Built-in text processing engine and library of annotators to analyze large volumes of text-based information • Data can be used in its native format eliminating need to pre-define and map structures Integration by Design • InfoSphere BigInsights software, cluster management, and IBM System x® servers • Automatic parallelization and resource optimization to scale economically • Enterprise-class security and platform management Introducing pureData for Hadoop – BigInsights Appliance
  • 21. 21 © 2013 IBM Corporation © 2013 IBM Corporation From Getting Starting to Enterprise Deployment: InfoSphere BigInsights Brings Hadoop to the Enterprise Enterprise Edition Breadth of capabilities Enterpriseclass Sold by # of terabytes managed PureData for Hadoop Appliance simplicity for the enterprise Quick Start Edition * Pre-announced Web-based mgmt console Jaql Integrated installApache Hadoop Basic Edition Free download Quick Start features PLUS: Accelerators Enterprise Integration Production support Production-ready features Big Sheets Text Analytics Big SQL Workload optimization/ Query support Dev tools Connectors Mgmt tools IBM Hadoop Core Free download, non-production
  • 22. 22 © 2013 IBM Corporation Streams - Real Time Analytics 22
  • 23. 23 © 2013 IBM Corporation InfoSphere Data Explorer – delivering insights at the point of impact Create unified view of ALL information for real-time monitoring Identify areas of information risk & ensure data compliance Analyze customer data to unlock true customer value Increase productivity & leverage past work increasing speed to market Improve customer service & reduce call times InfoSphere Data Explorer Data access & integration •Index structured & unstructured data—in place •Support existing security •Federate to external sources •Leverage MDM, governance, and taxonomies Discovery & navigation •Clustering & categorization •Contextual intelligence •Easy-to-deploy applications •All at the scale required for today’s big data challenges Providing unified, real-time access and fusion of big data unlocks greater insight and ROI
  • 24. 24 © 2013 IBM Corporation Organizations are Building Big Data Applications on Data Explorer DataExplorerAppBuilder Warehouse Structured Enterprise Data BigInsights Data at rest Data Explorer Semi- & unstructured enterprise data Streams Data in motion
  • 25. 25 © 2013 IBM Corporation Get Started on Your Big Data Journey Today Get Educated • IBM Big Data: ibm.com/bigdata • IBMBigDataHub.com • BigDataUniversity.com Get Your Hands on Big Data • Download Quick Start ibm.coQuickStart

Notes de l'éditeur

  1. University of Ontario Institute of Technology http://www.youtube.com/watch?v=YosyLqbCrD4 ftp://public.dhe.ibm.com/common/ssi/ecm/en/odc03157usen/ODC03157USEN.PDF [UOIT Case study]   Fifteen million babies are born prematurely every year. Of those, over 1 million die, often in the first month of life.   Many of these babies are in ICUs, connected to numerous monitors that measure key statistics such as heart rates, temperature, etc. Until recently, these measurements were only sampled and aggregated into 2-3 readings to indicate the health of the baby.   IBM collaborated with UOIT to develop a solution that processes 1000 pieces of information/sec … identifies patterns …correlates this with doctor’s notes and family history… applies predictive analytics … and this has allowed us to spot the onset of an infection 24 hours in advance.   Same data … but saved lives. ----------------------------------------------------- University of Ontario Institute of Technology http://www.youtube.com/watch?v=YosyLqbCrD4 ftp://public.dhe.ibm.com/common/ssi/ecm/en/odc03157usen/ODC03157USEN.PDF To better detect subtle warning signs of complications, clinicians need to gain greater insight into the moment-by-moment condition of n eonatal infants in a ICU . Fifteen million babies, one in 10 births, are born prematurely every year, a global project suggests led by the WHO. Of those over 1 million die, often in the first 30 days of life – a terrible tragedy. Yet, many of these babies are in NICUs, connected to all sorts of monitors that measure key statistics such as their heart rates, skin temperature, respiration, etc. These measurements add up to 90M/patient/day, yet most of this data is just sampled periodically and written into the patient record, not used for its predictive value. IBM and UOIT developed first-of-its-kind, analytics solution using stream-computing to capture and analyze real-time data from medical monitors, alerting hospital staff to potential health problems before patients manifest clinical signs of infection or other issues. Early warning gives caregivers the ability to proactively deal with potential complications—such as detecting infections in premature infants up to 24 hours before they exhibit symptoms. Solution monitors 120 children analyzing 120K message per second, billions of messages per day. Trials expanding beyond Canada to include hospitals in US, China and Australia. IBM Innovate 2013 07/10/13 16:10 Drury Design Dynamics
  2. SA_Big_Data_NYC_Feb_18_v10 07/10/13 IBM