SlideShare une entreprise Scribd logo
1  sur  30
Télécharger pour lire hors ligne
© 2015 IBM Corporation
Hadoop and SQL: Delivering Analytics
Across the organization (DHS-2147)
Nicholas Berg, Seagate
Adriana Zubiri, IBM
27-Oct-2015 2:30 PM-3:30 PM
• IBM’s statements regarding its plans, directions, and intent are subject to change or withdrawal
without notice at IBM’s sole discretion.
• Information regarding potential future products is intended to outline our general product direction
and it should not be relied on in making a purchasing decision.
• The information mentioned regarding potential future products is not a commitment, promise, or
legal obligation to deliver any material, code or functionality. Information about potential future
products may not be incorporated into any contract.
• The development, release, and timing of any future features or functionality described for our
products remains at our sole discretion.
Performance is based on measurements and projections using standard IBM benchmarks in a
controlled environment. The actual throughput or performance that any user will experience will vary
depending upon many factors, including considerations such as the amount of multiprogramming in the
user’s job stream, the I/O configuration, the storage configuration, and the workload processed.
Therefore, no assurance can be given that an individual user will achieve results similar to those stated
here.
Please Note:
2
A New Seagate
SEAGATE is in a unique position to CREATE EVEN MORE VALUE
for our customers by integrating our 35+ years of storage expertise in
HDD with FLASH, SYSTEMS, SERVICES AND CONSUMER DEVICES
to deliver unique solutions that enable our customers to
ENJOY AND GET VALUE FROM THEIR DATA
more than ever before.
HYBRID SOLUTIONS
HDD FLASH
SILICON
BRANDED
SYSTEMS
• $14B Annual Revenue
• 2 billion drives shipped
• Stores more than 40% of the world’s data
• 43,000 Cloud services clients worldwide
• 50,000 Employees, 26 countries
• 9 Manufacturing plants: US, China, Malaysia,
N.Ireland, Singapore, Thailand
• 5 Design centers: US, Singapore, South Korea
• Vertically integrated factories from Silicon
fabrication to Drive assembly
SYSTEMSHD FLASH SILICON PREMIUMHYBRID SOLUTIONSSYSTEMSFLASH BRANDEDHYBRID SOLUTIONS
LSI
Where to start with Hadoop - find a use case
• Experimented with text analysis of Call Center logs
 Proved out the use case, but Big Data text analytics built into Call
Center support applications met the need without in-house costs
• Marketing organization had some social media Big Data Use
cases
 These are being met by companies specializing in this kind of Big
Data analysis
• Reviewed other potential use cases such as:
 Mining data center support, performance and maintenance logs
 Mining large data sets for IT Security
• Tested loading up some volume factory test log data and run
some analytics
 Compelling use case for Hadoop: Deeper and wider analysis of
Factory and Field data
Traditional Data Architecture Pressured
4.4 ZB in 2013
85% from New Data Types
15x Machine Data by 2020
44 ZB by 2020
ZB = 1B TB
Seagate’s high-level plans for Hadoop
• Enterprise Hadoop cluster as extension of EDW
(augmentation)
 Ability to store and analyze 10x-20x Factory and Field data
 Much longer retention of relevant manufacturing data
 Multi-purpose analytic environment supporting hundreds of
potential users across Engineering, Manufacturing and Quality
• Possible local factory Hadoop clusters for special-purpose
processing
• Eventual integration across multiple clusters and sites
• At a high level, Hadoop will enable us to
 Ask questions we could never ask before...
 About data volumes we could never collect and store before…
 Doing analysis we could never perform in reasonable time…
 And connecting data that could never before be retained for
combined analysis
Hadoop: a dynamic and ever changing landscape
• When we first started our Hadoop journey, MapReduce was the
main way to access and query HDFS data
• Two years on, the Hadoop world had changed with SQL being
a major force in Hadoop (Hive, Impala, BigSQL)
• SQL on Hadoop helps address three main Hadoop challenges:
 Addresses a skills gap: Hadoop MapReduce needs Java coders
vs. using existing SQL skills
 SQL provides integration with existing environments and tools
(i.e. databases and BI tools)
 Enables Hadoop to move from batch processing to interactive
analysis
• New memory based Apache projects are being developed that
allow for even faster interactive analysis like Apache Spark but
SQL is still core to these too
Big data for the enterprise
• We put together a five year Big Data vision statement and
strategy plan
 Socialized strategy plan for feedback
• Decided to conducted a large scale Hadoop pilot
 We wanted to really understand what Hadoop’s real capabilities
and potential were
• Purchased 60 node cluster: 3 management nodes, 57 data
nodes. (Now increased to two cluster 60 + 100 nodes)
• Performed an analysis on which Hadoop distribution to use
• Defined what use cases to run in our large scale pilot
Choosing a Hadoop software distribution
• Two main flavors: open source oriented or more proprietary
• Open source oriented solutions are the most beneficial:
 Portable - easily move your Hadoop cluster from vendor to vendor
 Avoids vendor lock into expensive and proprietary technology
 Open source projects ensure interoperability with other open
source projects
• Other important considerations:
 Integration with RDMSs, BI solutions and other platforms
 R&D investment and support capability
 Consulting and training
• Seagate chose IBM because we believe they have the most
advanced SQL “add-on” Hadoop capability, some other strong
Hadoop tools like BigR and excellent support services
Evolving to a Logical Data Warehouse
• A Logical Data Warehouse combines traditional data
warehouses with big data systems to evolve your analytics
capabilities beyond where you are today
• Hadoop does not replace your EDW. EDW is a good “general
purpose” data management solution for integrating and
conforming enterprise data to produce your everyday business
analytics
• A typical EDW may have 100’s of data feeds, dozens of
integrated applications and run 1000’s to 100,000’s of queries a
day
• Hadoop is more specialized and much less mature. For now it
will have only a few application integration points and run fewer
queries at a lower concurrency, answering different questions
• A Hadoop cluster of 60-100 nodes is a supercomputer. What
would you use a supercomputer for? Probably to answer the
really big questions
Some early practices and learnings
• Incremental phased delivery, or use case by use case
• Form a “data lake” or “data reservoir” for all enterprise data
• Data availability must come first, model and transform the data
in place within Hadoop
 resist moving the data again
• Lots of talk about schema on read but for Data Warehousing
types of uses, this is impractical
 Data modeling is still required but can be simplified
• Have multiple clusters: Development, Test and then two or
more Production, one for Ad Hoc data exploration &
experimentation, one for more governed uses with guaranteed
cluster availability to run important jobs
• Use existing custom query/analytics solution to provide
“transparent” access to Hadoop
Enterprise Hadoop Architecture
The Data Lake: data tiering
13
Tier 1 / Tier 2 custom data loading application
14
Data Transport
• Scoop: Pull EDW data to HDFS Tier 1
• Non-EDW files (Factory push):
• Trickle feed files to staging area
• Unzip, Merge, reZip small files to large files
• Push compacted files to HDFS Tier 1
Data Mapping & Loading
• Match source/target columns
• Detect and handle column changes
• Transform data
• Insert or Update data in Tier 2
• Dual feed to cluster 2 Tier 1 Tier 2
Scheduling
• Oozie backend
• Configurable frequency
• Currently Daily
• Snapshots (waits for data loads to complete)
• Meta data backups
Compaction
• Major and Minor compaction
• Minor: merges small files to large ones
• Major: remove old versions of data (updates)
• Consolidates HDFS directories
T1/T2 App
Hadoop cluster data feeding and querying
15
EDW
Factory Data
Systems
UNIX
HDFS
SQOOP
Map Reduce
Big SQL
Hive
Pig
HCatalog
Big RR
Ganglia|Nagios
Compact
& Load
Tier 3 (Derived Hadoop Tables)
Tier 2 (Hive Tables)
Tier 1 (Delimited Text Files)
Component
READ
JDBC | ODBC| Other Drivers
WRITE
Data Science
Applications
(SAS, Python, ML)
10% Drive
Sampled Drive Data
100%
SparkComponent
Yarn
Adding Hive update support
• Hive is a good structured table format for querying but it does
not support row updates to large fact tables
 This type of capability is known in the database arena as ACID
ACID (Atomicity, Consistency, Isolation, Durability) is a set of
properties that guarantee that database transactions are
processed reliably
 Atomicity: requires that each transaction be "all or nothing”. If
one part of the transaction fails, the entire transaction fails, and
the database state is left unchanged
 Consistency: Any data written to the database must be valid
according to all defined rules, including constraints, cascades,
triggers, and any combination thereof
 Isolation: ensures that the concurrent execution of transactions
results are the same if transactions were executed serially
 Durability: means that once a transaction has been committed, it
will remain so, even in the event of power loss, crashes, or errors
16
Custom Serde (Hive row serializer/deserializer)
17
 Split input files into fragments for
individual map tasks
 Reads index files into memory
(helps identify duplicate records)
 Provides RecordReader Factory
UpdInputFormat
Reads splits and loads data
Discards old versions of rows using info
from the index
Converts individual records into Writable
objects suitable for Mapper
UpdRecordReader
HDFS
MapReduce
jobs for Hive
read SQL
Provides RecordWriter Factory
UpdOutputFormat
Writes each Writable back to HDFS in
user serialization format
Writes index files to HDFS (with PKs in
each version to help identify duplicate
records)
UpdRecordWriter
MapReduce
jobs for Hive
write SQL
Hive Read Query
Hive Write Query
Hadoop challenges
(an emerging and evolving platform)
• Knowing which Hadoop projects to “bet on”, which data formats
and compression types to use
• Speed of change: probably has more code been written than
any other IT platform
 Need to upgrade cluster software frequently (once a quarter)
• Gaps: Some things not ready like ACID, real-time queries
• Resource management for different types of workloads
• Lack of BI tools that can really take advantage of huge data
sets and visualize them
• Still very batch processing orientated but interactive is gaining
traction with Spark etc.
• Provisioning large numbers of machines, hardware failures
• Integrating remote clusters, cross cluster data movement and
inter-cluster processing
Hadoop projects – setting expectations
• Completely new and an awful lot to learn, design &
implementation are huge tasks
• Hadoop is still quite immature and lacks robustness
 Exhibits instability, buggy, new code released too early
• Speed of change: management need to understand that plans will
be dynamic and will change with the evolving technology
 Have less formal schedules, manage expectations to the low side
• Be flexible and adaptable as technology changes and matures
 Be ready to change and adapt to new technology or if support dries
up on a Hadoop project
• Developing IT skills quickly
 Finding experienced and talented Hadoop staff or consultants
 Keeping up with the data scientists
• Convincing security and data center teams to give Hadoop users
UNIX level access
20
• IBM Open Platform –
Foundation of 100% pure
open source Apache Hadoop
components
• Standardizing as the Open
Data Platform
(http://opendataplatform.org)
About the IBM Open Platform for Apache Hadoop
All Standard Apache Open Source Components
HDFS
YARN
MapReduce
Ambari HBase
Spark
Flume
Hive Pig
Sqoop
HCatalog
Solr/Lucene
ODP
Data shared with Hadoop ecosystem
Comprehensive file format support
Superior enablement of IBM software
Enhanced by Third Party software
Modern MPP runtime
Powerful SQL query rewriter
Cost based optimizer
Optimized for concurrent user throughput
Distributed requests to multiple data sources
within a single SQL statement
Main data sources supported:
DB2, Teradata, Oracle, Netezza, MS SQL
Server, Informix
Advanced security/auditing
Resource and workload management
Self tuning memory management
Comprehensive monitoring
Comprehensive SQL Support
IBM SQL PL compatibility
Extensive Analytic Functions
Big SQL At a Glance
New functionality in Big SQL in 2015
• Ambari Installation and
configuration
• Hbase support
• Rich Management User Interface
• Data types
 New primitive data types support
(decimal, char, varbinary)
 Complex data types support (array,
struct, map)
 Enhancements to varchar and date
• Platforms
 Power support
 RHEL 6.6 anmd 7.1 support
• More Performance
 HDFS caching
 UDFs performance improvements
 ANALYZE enhancements (2.5X
faster than 3.0 FP2)
 Native implementations of key Hive
built-in functions
 SQL Enhancements
 New analytic procedures
 New olap functions and aggregate
functionality
 Offset support for limit and fetch
first
 Ability to directly define and
execute Hive User Define
Functions (UDFs)
 Other Improvements
 Improved support for concurrent
LOAD operations
 Support for importing with the
Teradata Connector for Hadoop
(TDH)
 Added SQL server 2012 and
DB2/Z support
 CMX compression support now
supported in the native I/O engine
 High Availability (FP2)
 Technical Previews
 Yarn/Slider Integration
 Spark integration (FP2)
Hadoop-DS: Performance Test update:
Big SQL V4.1 vs. Spark SQL 1.5.1 @ 1 TB, single stream*
23
*Not an official TPC-DS Benchmark.
24
Big SQL runs more SQL out-of-box
Big SQL 4.1 Spark SQL 1.5.0
1 hour 3-4 weeksPorting Effort:
Big SQL can execute all 99
queries with minimal porting
effort
Single stream results:
Big SQL was faster than Spark SQL 76 / 99 Queries
When Big SQL was slower, it was only slower by
1.6X on average
Query vs. Query, Big is on average 5.5X faster
Removing Top 5 / Bottom 5, Big SQL is 2.5X faster
But, … what happens when you scale it?
Scale Single Stream 4 Concurrent Streams
1 TB • Big SQL was faster on 76 / 99
Queries
• Big SQL averaged 5.5X faster
• Removing Top / Bottom 5, Big SQL
averaged 2.5X faster
• Spark SQL FAILED on 3 queries
• Big SQL was 4.4X faster*
10 TB • Big SQL was faster on 80/99 Queries
• Spark SQL FAILED on 7 queries
• Big SQL averaged 6.2X faster*
• Removing Top / Bottom 5, Big SQL
averaged 4.6X faster
• Big SQL elapsed time for workload was
better than linear
• Spark SQL could not complete the
workload (numerous issues). Partial results
possible with only 2 concurrent streams.
*Compares only queries that both Big SQL and Spark SQL could complete (benefits Spark SQL)
More Users
MoreData
26
What is the verdict? Use the right tool for the right job
Machine Learning
Simpler SQL
Good Performance
Ideal tool for BI Data
Analysts and production
workloads
Ideal tool for Data
Scientists and discovery
Big SQL Spark SQL
Migrating existing
workloads to Hadoop
Security
Many Concurrent Users
Best in-class Performance
Big SQL Roadmap 2015-2016
27
Hbase Support
Rich management user interface
Complex data types: Array, Struct, Map
Better Performance
New analytic procedures and OLAP
functions
Offset support for limit and fetch first
Ability to directly define and execute
Hive UDFs
Improve support for concurrent LOAD
Support importing w/ Teradata connector
for Hadoop
New federated sources: SQL Server 2012
and DB2/Z, Oracle 12c
Power platform support
Head node High availability
Yarn/slider support
Performance improvements
at large scale
Resiliency improvements
Spark
integration/exploitation
Faster statistics collection
Cumulative statistics
Sampling statistics
Hbase update/delete
Hive update/delete
User define aggregates
Oracle compatibility improvements
Netezza compatibility
improvements
Integration with Ranger
BLU technology exploitation
Self collecting statistics
zLinux platform support
2015 1H2016
We Value Your Feedback!
Don’t forget to submit your Insight session and speaker
feedback! Your feedback is very important to us – we use it
to continually improve the conference.
Access the Insight Conference Connect tool at
insight2015survey.com to quickly submit your surveys from
your smartphone, laptop or conference kiosk.
28
© 2015 IBM Corporation
Thank You
nick.berg@seagate.com
zubiri@ca.ibm.com

Contenu connexe

Tendances

Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)James Serra
 
Hadoop: Extending your Data Warehouse
Hadoop: Extending your Data WarehouseHadoop: Extending your Data Warehouse
Hadoop: Extending your Data WarehouseCloudera, Inc.
 
2012 10 bigdata_overview
2012 10 bigdata_overview2012 10 bigdata_overview
2012 10 bigdata_overviewjdijcks
 
BarbaraZigmanResume 2016
BarbaraZigmanResume 2016BarbaraZigmanResume 2016
BarbaraZigmanResume 2016bzigman
 
Presentacin webinar move_up_to_power8_with_scale_out_servers_final
Presentacin webinar move_up_to_power8_with_scale_out_servers_finalPresentacin webinar move_up_to_power8_with_scale_out_servers_final
Presentacin webinar move_up_to_power8_with_scale_out_servers_finalDiego Alberto Tamayo
 
Data summit connect fall 2020 - rise of data ops
Data summit connect fall 2020 - rise of data opsData summit connect fall 2020 - rise of data ops
Data summit connect fall 2020 - rise of data opsRyan Gross
 
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011Cloudera, Inc.
 
The technology of the business data lake
The technology of the business data lakeThe technology of the business data lake
The technology of the business data lakeCapgemini
 
Drive DBMS Transformation with EDB Postgres
Drive DBMS Transformation with EDB PostgresDrive DBMS Transformation with EDB Postgres
Drive DBMS Transformation with EDB PostgresEDB
 
DATASTAGE AND QUALITY STAGE 9.1 ONLINE TRAINING
DATASTAGE AND QUALITY STAGE 9.1 ONLINE TRAININGDATASTAGE AND QUALITY STAGE 9.1 ONLINE TRAINING
DATASTAGE AND QUALITY STAGE 9.1 ONLINE TRAININGDatawarehouse Trainings
 
Kb 40 kevin_klineukug_reading20070717[1]
Kb 40 kevin_klineukug_reading20070717[1]Kb 40 kevin_klineukug_reading20070717[1]
Kb 40 kevin_klineukug_reading20070717[1]shuwutong
 
OpenPOWER Roadmap Toward CORAL
OpenPOWER Roadmap Toward CORALOpenPOWER Roadmap Toward CORAL
OpenPOWER Roadmap Toward CORALinside-BigData.com
 
Dell Technology World - IT as a Business - Multi-Cloud Strategy is your Product
Dell Technology World - IT as a Business - Multi-Cloud Strategy is your ProductDell Technology World - IT as a Business - Multi-Cloud Strategy is your Product
Dell Technology World - IT as a Business - Multi-Cloud Strategy is your ProductManuel "Manny" Rodriguez-Perez
 
Pivotal HAWQ and Hortonworks Data Platform: Modern Data Architecture for IT T...
Pivotal HAWQ and Hortonworks Data Platform: Modern Data Architecture for IT T...Pivotal HAWQ and Hortonworks Data Platform: Modern Data Architecture for IT T...
Pivotal HAWQ and Hortonworks Data Platform: Modern Data Architecture for IT T...VMware Tanzu
 
Extending Data Lake using the Lambda Architecture June 2015
Extending Data Lake using the Lambda Architecture June 2015Extending Data Lake using the Lambda Architecture June 2015
Extending Data Lake using the Lambda Architecture June 2015DataWorks Summit
 
IBM Power Systems Update 1Q17
IBM Power Systems Update 1Q17IBM Power Systems Update 1Q17
IBM Power Systems Update 1Q17David Spurway
 
Oracle - Next Generation Datacenter - Alan Hartwell
Oracle - Next Generation Datacenter - Alan HartwellOracle - Next Generation Datacenter - Alan Hartwell
Oracle - Next Generation Datacenter - Alan HartwellHPDutchWorld
 

Tendances (20)

Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 
Hadoop: Extending your Data Warehouse
Hadoop: Extending your Data WarehouseHadoop: Extending your Data Warehouse
Hadoop: Extending your Data Warehouse
 
2012 10 bigdata_overview
2012 10 bigdata_overview2012 10 bigdata_overview
2012 10 bigdata_overview
 
Hadoop and Your Enterprise Data Warehouse
Hadoop and Your Enterprise Data WarehouseHadoop and Your Enterprise Data Warehouse
Hadoop and Your Enterprise Data Warehouse
 
BarbaraZigmanResume 2016
BarbaraZigmanResume 2016BarbaraZigmanResume 2016
BarbaraZigmanResume 2016
 
Hadoop Trends
Hadoop TrendsHadoop Trends
Hadoop Trends
 
Presentacin webinar move_up_to_power8_with_scale_out_servers_final
Presentacin webinar move_up_to_power8_with_scale_out_servers_finalPresentacin webinar move_up_to_power8_with_scale_out_servers_final
Presentacin webinar move_up_to_power8_with_scale_out_servers_final
 
Data summit connect fall 2020 - rise of data ops
Data summit connect fall 2020 - rise of data opsData summit connect fall 2020 - rise of data ops
Data summit connect fall 2020 - rise of data ops
 
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
 
The technology of the business data lake
The technology of the business data lakeThe technology of the business data lake
The technology of the business data lake
 
Drive DBMS Transformation with EDB Postgres
Drive DBMS Transformation with EDB PostgresDrive DBMS Transformation with EDB Postgres
Drive DBMS Transformation with EDB Postgres
 
DATASTAGE AND QUALITY STAGE 9.1 ONLINE TRAINING
DATASTAGE AND QUALITY STAGE 9.1 ONLINE TRAININGDATASTAGE AND QUALITY STAGE 9.1 ONLINE TRAINING
DATASTAGE AND QUALITY STAGE 9.1 ONLINE TRAINING
 
Kb 40 kevin_klineukug_reading20070717[1]
Kb 40 kevin_klineukug_reading20070717[1]Kb 40 kevin_klineukug_reading20070717[1]
Kb 40 kevin_klineukug_reading20070717[1]
 
OpenPOWER Roadmap Toward CORAL
OpenPOWER Roadmap Toward CORALOpenPOWER Roadmap Toward CORAL
OpenPOWER Roadmap Toward CORAL
 
Dell Technology World - IT as a Business - Multi-Cloud Strategy is your Product
Dell Technology World - IT as a Business - Multi-Cloud Strategy is your ProductDell Technology World - IT as a Business - Multi-Cloud Strategy is your Product
Dell Technology World - IT as a Business - Multi-Cloud Strategy is your Product
 
Pivotal HAWQ and Hortonworks Data Platform: Modern Data Architecture for IT T...
Pivotal HAWQ and Hortonworks Data Platform: Modern Data Architecture for IT T...Pivotal HAWQ and Hortonworks Data Platform: Modern Data Architecture for IT T...
Pivotal HAWQ and Hortonworks Data Platform: Modern Data Architecture for IT T...
 
Extending Data Lake using the Lambda Architecture June 2015
Extending Data Lake using the Lambda Architecture June 2015Extending Data Lake using the Lambda Architecture June 2015
Extending Data Lake using the Lambda Architecture June 2015
 
IBM Power Systems Update 1Q17
IBM Power Systems Update 1Q17IBM Power Systems Update 1Q17
IBM Power Systems Update 1Q17
 
Oracle - Next Generation Datacenter - Alan Hartwell
Oracle - Next Generation Datacenter - Alan HartwellOracle - Next Generation Datacenter - Alan Hartwell
Oracle - Next Generation Datacenter - Alan Hartwell
 
SQL Server Disaster Recovery Implementation
SQL Server Disaster Recovery ImplementationSQL Server Disaster Recovery Implementation
SQL Server Disaster Recovery Implementation
 

En vedette

CSNI: How State Medicaid Agencies Can Use Analytics to Predict Opioid Abuse a...
CSNI: How State Medicaid Agencies Can Use Analytics to Predict Opioid Abuse a...CSNI: How State Medicaid Agencies Can Use Analytics to Predict Opioid Abuse a...
CSNI: How State Medicaid Agencies Can Use Analytics to Predict Opioid Abuse a...Seeling Cheung
 
Big Fish Games: Democratizing Data Access
Big Fish Games: Democratizing Data AccessBig Fish Games: Democratizing Data Access
Big Fish Games: Democratizing Data AccessSeeling Cheung
 
Medical University of South Carolina: Using Big Data and Predictive Analytics...
Medical University of South Carolina: Using Big Data and Predictive Analytics...Medical University of South Carolina: Using Big Data and Predictive Analytics...
Medical University of South Carolina: Using Big Data and Predictive Analytics...Seeling Cheung
 
Southwest Power Pool big data case study
Southwest Power Pool big data case study Southwest Power Pool big data case study
Southwest Power Pool big data case study Seeling Cheung
 
Constant Contact: An Online Marketing Leader’s Data Lake Journey
Constant Contact: An Online Marketing Leader’s Data Lake JourneyConstant Contact: An Online Marketing Leader’s Data Lake Journey
Constant Contact: An Online Marketing Leader’s Data Lake JourneySeeling Cheung
 
Seagate: Sensor Overload! Taming The Raging Manufacturing Big Data Torrent
Seagate: Sensor Overload! Taming The Raging Manufacturing Big Data TorrentSeagate: Sensor Overload! Taming The Raging Manufacturing Big Data Torrent
Seagate: Sensor Overload! Taming The Raging Manufacturing Big Data TorrentSeeling Cheung
 
Fiducia & GAD IT AG: From Fraud Detection to Big Data Platform: Bringing Hado...
Fiducia & GAD IT AG: From Fraud Detection to Big Data Platform: Bringing Hado...Fiducia & GAD IT AG: From Fraud Detection to Big Data Platform: Bringing Hado...
Fiducia & GAD IT AG: From Fraud Detection to Big Data Platform: Bringing Hado...Seeling Cheung
 
How Lazada ranks products to improve customer experience and conversion
How Lazada ranks products to improve customer experience and conversionHow Lazada ranks products to improve customer experience and conversion
How Lazada ranks products to improve customer experience and conversionEugene Yan Ziyou
 
China touch screen industry market demand forecast and investment strategy re...
China touch screen industry market demand forecast and investment strategy re...China touch screen industry market demand forecast and investment strategy re...
China touch screen industry market demand forecast and investment strategy re...Qianzhan Intelligence
 
Dr matthew katz_médias_sociaux_19_avril_2012
Dr matthew katz_médias_sociaux_19_avril_2012Dr matthew katz_médias_sociaux_19_avril_2012
Dr matthew katz_médias_sociaux_19_avril_2012laucyn
 
organization of living things
organization of living thingsorganization of living things
organization of living thingsDuda POp
 
China auto parts and components manufacturing industry in depth market resear...
China auto parts and components manufacturing industry in depth market resear...China auto parts and components manufacturing industry in depth market resear...
China auto parts and components manufacturing industry in depth market resear...Qianzhan Intelligence
 
China fluorine chemical industry indepth research and investment strategic pl...
China fluorine chemical industry indepth research and investment strategic pl...China fluorine chemical industry indepth research and investment strategic pl...
China fluorine chemical industry indepth research and investment strategic pl...Qianzhan Intelligence
 
Angular 2 Crash Course with TypeScript
Angular 2 Crash Course with TypeScriptAngular 2 Crash Course with TypeScript
Angular 2 Crash Course with TypeScriptayman diab
 
Les cahiers de l’ant Créer et/ou animer votre page Facebook
Les cahiers de l’ant Créer et/ou animer votre page FacebookLes cahiers de l’ant Créer et/ou animer votre page Facebook
Les cahiers de l’ant Créer et/ou animer votre page FacebookEmilie Rochat
 
читалићи 2013
читалићи 2013читалићи 2013
читалићи 2013sastavzapet
 

En vedette (18)

CSNI: How State Medicaid Agencies Can Use Analytics to Predict Opioid Abuse a...
CSNI: How State Medicaid Agencies Can Use Analytics to Predict Opioid Abuse a...CSNI: How State Medicaid Agencies Can Use Analytics to Predict Opioid Abuse a...
CSNI: How State Medicaid Agencies Can Use Analytics to Predict Opioid Abuse a...
 
Big Fish Games: Democratizing Data Access
Big Fish Games: Democratizing Data AccessBig Fish Games: Democratizing Data Access
Big Fish Games: Democratizing Data Access
 
Medical University of South Carolina: Using Big Data and Predictive Analytics...
Medical University of South Carolina: Using Big Data and Predictive Analytics...Medical University of South Carolina: Using Big Data and Predictive Analytics...
Medical University of South Carolina: Using Big Data and Predictive Analytics...
 
Southwest Power Pool big data case study
Southwest Power Pool big data case study Southwest Power Pool big data case study
Southwest Power Pool big data case study
 
Constant Contact: An Online Marketing Leader’s Data Lake Journey
Constant Contact: An Online Marketing Leader’s Data Lake JourneyConstant Contact: An Online Marketing Leader’s Data Lake Journey
Constant Contact: An Online Marketing Leader’s Data Lake Journey
 
Seagate: Sensor Overload! Taming The Raging Manufacturing Big Data Torrent
Seagate: Sensor Overload! Taming The Raging Manufacturing Big Data TorrentSeagate: Sensor Overload! Taming The Raging Manufacturing Big Data Torrent
Seagate: Sensor Overload! Taming The Raging Manufacturing Big Data Torrent
 
Fiducia & GAD IT AG: From Fraud Detection to Big Data Platform: Bringing Hado...
Fiducia & GAD IT AG: From Fraud Detection to Big Data Platform: Bringing Hado...Fiducia & GAD IT AG: From Fraud Detection to Big Data Platform: Bringing Hado...
Fiducia & GAD IT AG: From Fraud Detection to Big Data Platform: Bringing Hado...
 
How Lazada ranks products to improve customer experience and conversion
How Lazada ranks products to improve customer experience and conversionHow Lazada ranks products to improve customer experience and conversion
How Lazada ranks products to improve customer experience and conversion
 
China touch screen industry market demand forecast and investment strategy re...
China touch screen industry market demand forecast and investment strategy re...China touch screen industry market demand forecast and investment strategy re...
China touch screen industry market demand forecast and investment strategy re...
 
Dr matthew katz_médias_sociaux_19_avril_2012
Dr matthew katz_médias_sociaux_19_avril_2012Dr matthew katz_médias_sociaux_19_avril_2012
Dr matthew katz_médias_sociaux_19_avril_2012
 
organization of living things
organization of living thingsorganization of living things
organization of living things
 
China auto parts and components manufacturing industry in depth market resear...
China auto parts and components manufacturing industry in depth market resear...China auto parts and components manufacturing industry in depth market resear...
China auto parts and components manufacturing industry in depth market resear...
 
-11031502
-11031502-11031502
-11031502
 
China fluorine chemical industry indepth research and investment strategic pl...
China fluorine chemical industry indepth research and investment strategic pl...China fluorine chemical industry indepth research and investment strategic pl...
China fluorine chemical industry indepth research and investment strategic pl...
 
Air Quality Map
Air Quality MapAir Quality Map
Air Quality Map
 
Angular 2 Crash Course with TypeScript
Angular 2 Crash Course with TypeScriptAngular 2 Crash Course with TypeScript
Angular 2 Crash Course with TypeScript
 
Les cahiers de l’ant Créer et/ou animer votre page Facebook
Les cahiers de l’ant Créer et/ou animer votre page FacebookLes cahiers de l’ant Créer et/ou animer votre page Facebook
Les cahiers de l’ant Créer et/ou animer votre page Facebook
 
читалићи 2013
читалићи 2013читалићи 2013
читалићи 2013
 

Similaire à Hadoop and SQL: Delivery Analytics Across the Organization

Big Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RKBig Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RKRajesh Jayarman
 
Using the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid Warehouse
Using the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid WarehouseUsing the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid Warehouse
Using the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid WarehouseRizaldy Ignacio
 
Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which DataWorks Summit
 
Presentation big dataappliance-overview_oow_v3
Presentation   big dataappliance-overview_oow_v3Presentation   big dataappliance-overview_oow_v3
Presentation big dataappliance-overview_oow_v3xKinAnx
 
Accelerating Big Data Analytics
Accelerating Big Data AnalyticsAccelerating Big Data Analytics
Accelerating Big Data AnalyticsAttunity
 
Making the Case for Hadoop in a Large Enterprise-British Airways
Making the Case for Hadoop in a Large Enterprise-British AirwaysMaking the Case for Hadoop in a Large Enterprise-British Airways
Making the Case for Hadoop in a Large Enterprise-British AirwaysDataWorks Summit
 
Hitachi Data Systems Hadoop Solution
Hitachi Data Systems Hadoop SolutionHitachi Data Systems Hadoop Solution
Hitachi Data Systems Hadoop SolutionHitachi Vantara
 
Vmware Serengeti - Based on Infochimps Ironfan
Vmware Serengeti - Based on Infochimps IronfanVmware Serengeti - Based on Infochimps Ironfan
Vmware Serengeti - Based on Infochimps IronfanJim Kaskade
 
Oracle big data appliance and solutions
Oracle big data appliance and solutionsOracle big data appliance and solutions
Oracle big data appliance and solutionssolarisyougood
 
Testing Big Data: Automated Testing of Hadoop with QuerySurge
Testing Big Data: Automated  Testing of Hadoop with QuerySurgeTesting Big Data: Automated  Testing of Hadoop with QuerySurge
Testing Big Data: Automated Testing of Hadoop with QuerySurgeRTTS
 
Hadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter PointHadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter PointInside Analysis
 
Data Warehouse Optimization
Data Warehouse OptimizationData Warehouse Optimization
Data Warehouse OptimizationCloudera, Inc.
 
Big data and hadoop overvew
Big data and hadoop overvewBig data and hadoop overvew
Big data and hadoop overvewKunal Khanna
 
Which Change Data Capture Strategy is Right for You?
Which Change Data Capture Strategy is Right for You?Which Change Data Capture Strategy is Right for You?
Which Change Data Capture Strategy is Right for You?Precisely
 
Justin Sheppard & Ankur Gupta from Sears Holdings Corporation - Single point ...
Justin Sheppard & Ankur Gupta from Sears Holdings Corporation - Single point ...Justin Sheppard & Ankur Gupta from Sears Holdings Corporation - Single point ...
Justin Sheppard & Ankur Gupta from Sears Holdings Corporation - Single point ...Global Business Events
 
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...Innovative Management Services
 
Cisco Big Data Warehouse Expansion Featuring MapR Distribution
Cisco Big Data Warehouse Expansion Featuring MapR DistributionCisco Big Data Warehouse Expansion Featuring MapR Distribution
Cisco Big Data Warehouse Expansion Featuring MapR DistributionAppfluent Technology
 

Similaire à Hadoop and SQL: Delivery Analytics Across the Organization (20)

50 Shades of SQL
50 Shades of SQL50 Shades of SQL
50 Shades of SQL
 
Big Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RKBig Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RK
 
Using the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid Warehouse
Using the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid WarehouseUsing the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid Warehouse
Using the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid Warehouse
 
Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which
 
Presentation big dataappliance-overview_oow_v3
Presentation   big dataappliance-overview_oow_v3Presentation   big dataappliance-overview_oow_v3
Presentation big dataappliance-overview_oow_v3
 
Beyond TCO
Beyond TCOBeyond TCO
Beyond TCO
 
Accelerating Data Warehouse Modernization
Accelerating Data Warehouse ModernizationAccelerating Data Warehouse Modernization
Accelerating Data Warehouse Modernization
 
Accelerating Big Data Analytics
Accelerating Big Data AnalyticsAccelerating Big Data Analytics
Accelerating Big Data Analytics
 
Making the Case for Hadoop in a Large Enterprise-British Airways
Making the Case for Hadoop in a Large Enterprise-British AirwaysMaking the Case for Hadoop in a Large Enterprise-British Airways
Making the Case for Hadoop in a Large Enterprise-British Airways
 
Hitachi Data Systems Hadoop Solution
Hitachi Data Systems Hadoop SolutionHitachi Data Systems Hadoop Solution
Hitachi Data Systems Hadoop Solution
 
Vmware Serengeti - Based on Infochimps Ironfan
Vmware Serengeti - Based on Infochimps IronfanVmware Serengeti - Based on Infochimps Ironfan
Vmware Serengeti - Based on Infochimps Ironfan
 
Oracle big data appliance and solutions
Oracle big data appliance and solutionsOracle big data appliance and solutions
Oracle big data appliance and solutions
 
Testing Big Data: Automated Testing of Hadoop with QuerySurge
Testing Big Data: Automated  Testing of Hadoop with QuerySurgeTesting Big Data: Automated  Testing of Hadoop with QuerySurge
Testing Big Data: Automated Testing of Hadoop with QuerySurge
 
Hadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter PointHadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter Point
 
Data Warehouse Optimization
Data Warehouse OptimizationData Warehouse Optimization
Data Warehouse Optimization
 
Big data and hadoop overvew
Big data and hadoop overvewBig data and hadoop overvew
Big data and hadoop overvew
 
Which Change Data Capture Strategy is Right for You?
Which Change Data Capture Strategy is Right for You?Which Change Data Capture Strategy is Right for You?
Which Change Data Capture Strategy is Right for You?
 
Justin Sheppard & Ankur Gupta from Sears Holdings Corporation - Single point ...
Justin Sheppard & Ankur Gupta from Sears Holdings Corporation - Single point ...Justin Sheppard & Ankur Gupta from Sears Holdings Corporation - Single point ...
Justin Sheppard & Ankur Gupta from Sears Holdings Corporation - Single point ...
 
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
 
Cisco Big Data Warehouse Expansion Featuring MapR Distribution
Cisco Big Data Warehouse Expansion Featuring MapR DistributionCisco Big Data Warehouse Expansion Featuring MapR Distribution
Cisco Big Data Warehouse Expansion Featuring MapR Distribution
 

Dernier

Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Delhi Call girls
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...amitlee9823
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...shivangimorya083
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...shambhavirathore45
 

Dernier (20)

Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
 

Hadoop and SQL: Delivery Analytics Across the Organization

  • 1. © 2015 IBM Corporation Hadoop and SQL: Delivering Analytics Across the organization (DHS-2147) Nicholas Berg, Seagate Adriana Zubiri, IBM 27-Oct-2015 2:30 PM-3:30 PM
  • 2. • IBM’s statements regarding its plans, directions, and intent are subject to change or withdrawal without notice at IBM’s sole discretion. • Information regarding potential future products is intended to outline our general product direction and it should not be relied on in making a purchasing decision. • The information mentioned regarding potential future products is not a commitment, promise, or legal obligation to deliver any material, code or functionality. Information about potential future products may not be incorporated into any contract. • The development, release, and timing of any future features or functionality described for our products remains at our sole discretion. Performance is based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput or performance that any user will experience will vary depending upon many factors, including considerations such as the amount of multiprogramming in the user’s job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve results similar to those stated here. Please Note: 2
  • 3. A New Seagate SEAGATE is in a unique position to CREATE EVEN MORE VALUE for our customers by integrating our 35+ years of storage expertise in HDD with FLASH, SYSTEMS, SERVICES AND CONSUMER DEVICES to deliver unique solutions that enable our customers to ENJOY AND GET VALUE FROM THEIR DATA more than ever before. HYBRID SOLUTIONS HDD FLASH SILICON BRANDED SYSTEMS
  • 4. • $14B Annual Revenue • 2 billion drives shipped • Stores more than 40% of the world’s data • 43,000 Cloud services clients worldwide • 50,000 Employees, 26 countries • 9 Manufacturing plants: US, China, Malaysia, N.Ireland, Singapore, Thailand • 5 Design centers: US, Singapore, South Korea • Vertically integrated factories from Silicon fabrication to Drive assembly SYSTEMSHD FLASH SILICON PREMIUMHYBRID SOLUTIONSSYSTEMSFLASH BRANDEDHYBRID SOLUTIONS LSI
  • 5. Where to start with Hadoop - find a use case • Experimented with text analysis of Call Center logs  Proved out the use case, but Big Data text analytics built into Call Center support applications met the need without in-house costs • Marketing organization had some social media Big Data Use cases  These are being met by companies specializing in this kind of Big Data analysis • Reviewed other potential use cases such as:  Mining data center support, performance and maintenance logs  Mining large data sets for IT Security • Tested loading up some volume factory test log data and run some analytics  Compelling use case for Hadoop: Deeper and wider analysis of Factory and Field data
  • 6. Traditional Data Architecture Pressured 4.4 ZB in 2013 85% from New Data Types 15x Machine Data by 2020 44 ZB by 2020 ZB = 1B TB
  • 7. Seagate’s high-level plans for Hadoop • Enterprise Hadoop cluster as extension of EDW (augmentation)  Ability to store and analyze 10x-20x Factory and Field data  Much longer retention of relevant manufacturing data  Multi-purpose analytic environment supporting hundreds of potential users across Engineering, Manufacturing and Quality • Possible local factory Hadoop clusters for special-purpose processing • Eventual integration across multiple clusters and sites • At a high level, Hadoop will enable us to  Ask questions we could never ask before...  About data volumes we could never collect and store before…  Doing analysis we could never perform in reasonable time…  And connecting data that could never before be retained for combined analysis
  • 8. Hadoop: a dynamic and ever changing landscape • When we first started our Hadoop journey, MapReduce was the main way to access and query HDFS data • Two years on, the Hadoop world had changed with SQL being a major force in Hadoop (Hive, Impala, BigSQL) • SQL on Hadoop helps address three main Hadoop challenges:  Addresses a skills gap: Hadoop MapReduce needs Java coders vs. using existing SQL skills  SQL provides integration with existing environments and tools (i.e. databases and BI tools)  Enables Hadoop to move from batch processing to interactive analysis • New memory based Apache projects are being developed that allow for even faster interactive analysis like Apache Spark but SQL is still core to these too
  • 9. Big data for the enterprise • We put together a five year Big Data vision statement and strategy plan  Socialized strategy plan for feedback • Decided to conducted a large scale Hadoop pilot  We wanted to really understand what Hadoop’s real capabilities and potential were • Purchased 60 node cluster: 3 management nodes, 57 data nodes. (Now increased to two cluster 60 + 100 nodes) • Performed an analysis on which Hadoop distribution to use • Defined what use cases to run in our large scale pilot
  • 10. Choosing a Hadoop software distribution • Two main flavors: open source oriented or more proprietary • Open source oriented solutions are the most beneficial:  Portable - easily move your Hadoop cluster from vendor to vendor  Avoids vendor lock into expensive and proprietary technology  Open source projects ensure interoperability with other open source projects • Other important considerations:  Integration with RDMSs, BI solutions and other platforms  R&D investment and support capability  Consulting and training • Seagate chose IBM because we believe they have the most advanced SQL “add-on” Hadoop capability, some other strong Hadoop tools like BigR and excellent support services
  • 11. Evolving to a Logical Data Warehouse • A Logical Data Warehouse combines traditional data warehouses with big data systems to evolve your analytics capabilities beyond where you are today • Hadoop does not replace your EDW. EDW is a good “general purpose” data management solution for integrating and conforming enterprise data to produce your everyday business analytics • A typical EDW may have 100’s of data feeds, dozens of integrated applications and run 1000’s to 100,000’s of queries a day • Hadoop is more specialized and much less mature. For now it will have only a few application integration points and run fewer queries at a lower concurrency, answering different questions • A Hadoop cluster of 60-100 nodes is a supercomputer. What would you use a supercomputer for? Probably to answer the really big questions
  • 12. Some early practices and learnings • Incremental phased delivery, or use case by use case • Form a “data lake” or “data reservoir” for all enterprise data • Data availability must come first, model and transform the data in place within Hadoop  resist moving the data again • Lots of talk about schema on read but for Data Warehousing types of uses, this is impractical  Data modeling is still required but can be simplified • Have multiple clusters: Development, Test and then two or more Production, one for Ad Hoc data exploration & experimentation, one for more governed uses with guaranteed cluster availability to run important jobs • Use existing custom query/analytics solution to provide “transparent” access to Hadoop
  • 14. The Data Lake: data tiering 13
  • 15. Tier 1 / Tier 2 custom data loading application 14 Data Transport • Scoop: Pull EDW data to HDFS Tier 1 • Non-EDW files (Factory push): • Trickle feed files to staging area • Unzip, Merge, reZip small files to large files • Push compacted files to HDFS Tier 1 Data Mapping & Loading • Match source/target columns • Detect and handle column changes • Transform data • Insert or Update data in Tier 2 • Dual feed to cluster 2 Tier 1 Tier 2 Scheduling • Oozie backend • Configurable frequency • Currently Daily • Snapshots (waits for data loads to complete) • Meta data backups Compaction • Major and Minor compaction • Minor: merges small files to large ones • Major: remove old versions of data (updates) • Consolidates HDFS directories T1/T2 App
  • 16. Hadoop cluster data feeding and querying 15 EDW Factory Data Systems UNIX HDFS SQOOP Map Reduce Big SQL Hive Pig HCatalog Big RR Ganglia|Nagios Compact & Load Tier 3 (Derived Hadoop Tables) Tier 2 (Hive Tables) Tier 1 (Delimited Text Files) Component READ JDBC | ODBC| Other Drivers WRITE Data Science Applications (SAS, Python, ML) 10% Drive Sampled Drive Data 100% SparkComponent Yarn
  • 17. Adding Hive update support • Hive is a good structured table format for querying but it does not support row updates to large fact tables  This type of capability is known in the database arena as ACID ACID (Atomicity, Consistency, Isolation, Durability) is a set of properties that guarantee that database transactions are processed reliably  Atomicity: requires that each transaction be "all or nothing”. If one part of the transaction fails, the entire transaction fails, and the database state is left unchanged  Consistency: Any data written to the database must be valid according to all defined rules, including constraints, cascades, triggers, and any combination thereof  Isolation: ensures that the concurrent execution of transactions results are the same if transactions were executed serially  Durability: means that once a transaction has been committed, it will remain so, even in the event of power loss, crashes, or errors 16
  • 18. Custom Serde (Hive row serializer/deserializer) 17  Split input files into fragments for individual map tasks  Reads index files into memory (helps identify duplicate records)  Provides RecordReader Factory UpdInputFormat Reads splits and loads data Discards old versions of rows using info from the index Converts individual records into Writable objects suitable for Mapper UpdRecordReader HDFS MapReduce jobs for Hive read SQL Provides RecordWriter Factory UpdOutputFormat Writes each Writable back to HDFS in user serialization format Writes index files to HDFS (with PKs in each version to help identify duplicate records) UpdRecordWriter MapReduce jobs for Hive write SQL Hive Read Query Hive Write Query
  • 19. Hadoop challenges (an emerging and evolving platform) • Knowing which Hadoop projects to “bet on”, which data formats and compression types to use • Speed of change: probably has more code been written than any other IT platform  Need to upgrade cluster software frequently (once a quarter) • Gaps: Some things not ready like ACID, real-time queries • Resource management for different types of workloads • Lack of BI tools that can really take advantage of huge data sets and visualize them • Still very batch processing orientated but interactive is gaining traction with Spark etc. • Provisioning large numbers of machines, hardware failures • Integrating remote clusters, cross cluster data movement and inter-cluster processing
  • 20. Hadoop projects – setting expectations • Completely new and an awful lot to learn, design & implementation are huge tasks • Hadoop is still quite immature and lacks robustness  Exhibits instability, buggy, new code released too early • Speed of change: management need to understand that plans will be dynamic and will change with the evolving technology  Have less formal schedules, manage expectations to the low side • Be flexible and adaptable as technology changes and matures  Be ready to change and adapt to new technology or if support dries up on a Hadoop project • Developing IT skills quickly  Finding experienced and talented Hadoop staff or consultants  Keeping up with the data scientists • Convincing security and data center teams to give Hadoop users UNIX level access
  • 21. 20 • IBM Open Platform – Foundation of 100% pure open source Apache Hadoop components • Standardizing as the Open Data Platform (http://opendataplatform.org) About the IBM Open Platform for Apache Hadoop All Standard Apache Open Source Components HDFS YARN MapReduce Ambari HBase Spark Flume Hive Pig Sqoop HCatalog Solr/Lucene ODP
  • 22. Data shared with Hadoop ecosystem Comprehensive file format support Superior enablement of IBM software Enhanced by Third Party software Modern MPP runtime Powerful SQL query rewriter Cost based optimizer Optimized for concurrent user throughput Distributed requests to multiple data sources within a single SQL statement Main data sources supported: DB2, Teradata, Oracle, Netezza, MS SQL Server, Informix Advanced security/auditing Resource and workload management Self tuning memory management Comprehensive monitoring Comprehensive SQL Support IBM SQL PL compatibility Extensive Analytic Functions Big SQL At a Glance
  • 23. New functionality in Big SQL in 2015 • Ambari Installation and configuration • Hbase support • Rich Management User Interface • Data types  New primitive data types support (decimal, char, varbinary)  Complex data types support (array, struct, map)  Enhancements to varchar and date • Platforms  Power support  RHEL 6.6 anmd 7.1 support • More Performance  HDFS caching  UDFs performance improvements  ANALYZE enhancements (2.5X faster than 3.0 FP2)  Native implementations of key Hive built-in functions  SQL Enhancements  New analytic procedures  New olap functions and aggregate functionality  Offset support for limit and fetch first  Ability to directly define and execute Hive User Define Functions (UDFs)  Other Improvements  Improved support for concurrent LOAD operations  Support for importing with the Teradata Connector for Hadoop (TDH)  Added SQL server 2012 and DB2/Z support  CMX compression support now supported in the native I/O engine  High Availability (FP2)  Technical Previews  Yarn/Slider Integration  Spark integration (FP2)
  • 24. Hadoop-DS: Performance Test update: Big SQL V4.1 vs. Spark SQL 1.5.1 @ 1 TB, single stream* 23 *Not an official TPC-DS Benchmark.
  • 25. 24 Big SQL runs more SQL out-of-box Big SQL 4.1 Spark SQL 1.5.0 1 hour 3-4 weeksPorting Effort: Big SQL can execute all 99 queries with minimal porting effort Single stream results: Big SQL was faster than Spark SQL 76 / 99 Queries When Big SQL was slower, it was only slower by 1.6X on average Query vs. Query, Big is on average 5.5X faster Removing Top 5 / Bottom 5, Big SQL is 2.5X faster
  • 26. But, … what happens when you scale it? Scale Single Stream 4 Concurrent Streams 1 TB • Big SQL was faster on 76 / 99 Queries • Big SQL averaged 5.5X faster • Removing Top / Bottom 5, Big SQL averaged 2.5X faster • Spark SQL FAILED on 3 queries • Big SQL was 4.4X faster* 10 TB • Big SQL was faster on 80/99 Queries • Spark SQL FAILED on 7 queries • Big SQL averaged 6.2X faster* • Removing Top / Bottom 5, Big SQL averaged 4.6X faster • Big SQL elapsed time for workload was better than linear • Spark SQL could not complete the workload (numerous issues). Partial results possible with only 2 concurrent streams. *Compares only queries that both Big SQL and Spark SQL could complete (benefits Spark SQL) More Users MoreData
  • 27. 26 What is the verdict? Use the right tool for the right job Machine Learning Simpler SQL Good Performance Ideal tool for BI Data Analysts and production workloads Ideal tool for Data Scientists and discovery Big SQL Spark SQL Migrating existing workloads to Hadoop Security Many Concurrent Users Best in-class Performance
  • 28. Big SQL Roadmap 2015-2016 27 Hbase Support Rich management user interface Complex data types: Array, Struct, Map Better Performance New analytic procedures and OLAP functions Offset support for limit and fetch first Ability to directly define and execute Hive UDFs Improve support for concurrent LOAD Support importing w/ Teradata connector for Hadoop New federated sources: SQL Server 2012 and DB2/Z, Oracle 12c Power platform support Head node High availability Yarn/slider support Performance improvements at large scale Resiliency improvements Spark integration/exploitation Faster statistics collection Cumulative statistics Sampling statistics Hbase update/delete Hive update/delete User define aggregates Oracle compatibility improvements Netezza compatibility improvements Integration with Ranger BLU technology exploitation Self collecting statistics zLinux platform support 2015 1H2016
  • 29. We Value Your Feedback! Don’t forget to submit your Insight session and speaker feedback! Your feedback is very important to us – we use it to continually improve the conference. Access the Insight Conference Connect tool at insight2015survey.com to quickly submit your surveys from your smartphone, laptop or conference kiosk. 28
  • 30. © 2015 IBM Corporation Thank You nick.berg@seagate.com zubiri@ca.ibm.com