SlideShare a Scribd company logo
1 of 45
SQL on Hadoop
Paul Groom
RAM not Disk
create external script LM_PRODUCT_FORECAST environment rsint
receives ( SALEDATE DATE, DOW INTEGER, ROW_ID INTEGER, PRODNO INTEGER, DAILYSALES
partition by PRODNO order by PRODNO, ROW_ID
sends ( R_OUTPUT varchar )
isolate partitions
script S'endofr( # Simple R script to run a linear fit on daily sales
prod1<-read.csv(file=file("stdin"), header=FALSE,row.names=1)
colnames(prod1)<-c("DOW","ID","PRODNO","DAILYSALES")
dim1<-dim(prod1)
daily1<-aggregate(prod1$DAILYSALES, list(DOW = prod1$DOW), median)
daily1[,2]<-daily1[,2]/sum(daily1[,2])
basesales<-array(0,c(dim1[1],2))
basesales[,1]<-prod1$ID
basesales[,2]<-(prod1$DAILYSALES/daily1[prod1$DOW+1,2])
colnames(basesales)<-c("ID","BASESALES")
fit1=lm(BASESALES ~ ID,as.data.frame(basesales))
select Trans_Year, Num_Trans,
count(distinct Account_ID) Num_Accts,
sum(count( distinct Account_ID)) over (partition by Trans_Year
cast(sum(total_spend)/1000 as int) Total_Spend,
cast(sum(total_spend)/1000 as int) / count(distinct Account_ID
rank() over (partition by Trans_Year order by count(distinct A
rank() over (partition by Trans_Year order by sum(total_spend)
from( select Account_ID,
Extract(Year from Effective_Date) Trans_Year,
count(Transaction_ID) Num_Trans,
select dept, sum(sales)
from sales_fact
Where period between date ‘01-05-2006’ and date ‘31-05-2006’
group by dept
having sum(sales) > 50000;
select sum(sales)
from sales_history
where year = 2006 and month = 5 and region=1;
select total_sales
from summary
where year = 2006 and month = 5 and region=1;
Behind the
numbers
Machine learning
algorithms Dynamic
Simulation
Statistical
Analysis
Clustering
Behaviour
modelling
Faster, deeper, insight
Reporting & BPM
Fraud detection
Dynamic
Interaction
Technology/Automation
AnalyticalComplexity
Campaign
Management
Time to influence
Reaction – what? – potential value
Action – opportunity - interaction
BI is becoming democratized
Innovate
Consolidate
I need….
Dynamic access
Drill unlimited
Data Discovery tools
Business [Intelligence] Desires
More timely
Lower latency
More granularity
More usersinteractions
Richer data model
Self service
“What percentage of business pertinent data
is in your Hadoop today?”
How will you improve that percentage?”
Merv Adrian @merv
@ratesberger mindless #Hadumping is IT's equivalent of
fast food - and just as well-balanced. Forethought and
planning still matter. 8:43 PM - 12 Mar 13
Oliver Ratzesberger @ratesberger
Too much talk about #Hadoop being the end of ETL and
then turned into the corporate #BigData dumpster.
8:40 PM - 12 Mar 13
But…
Are you just Hadumping?
data
Hadumping
Data Lake
Enterprise Integration
Awareness &
Structured Access
Investigative effort
Planning
Value
Data
So…
engage with that data
…but Hadoop too slow
for interactive BI
…loss of train-of-thought
still
Business [Intelligence] Desires
in relation to Big Data
More timely
Lower latency
More granularity
More users interactions
Richer data model
Self service
Complex Analytics & Data Science
more math
…a lot more math
It’s all about getting work
done
Bottlenecks
Used to be simple fetch of value
Tasks evolving:
Then dynamic aggregation
Now complex algorithms!
Bottlenecks
Must get more
out of Hadoop!
Need better
SQL integration
SQL support
…degrees of
What about ad-hoc, on-demand now…not batch!
BI Users want a lot more than just ANSI ‘89 or ’92 support
What about ‘99, 2003, 2006, 2008 and now 2011?
SQL performance
…degrees of
Are you thinking about lots of these?
When you should be thinking about lots of these?
Problem
RAM
Let’s talk about: Flash is not RAM
Let’s talk about: in-memory V cache
In-memory misunderstood
DRAM
Dynamic
Random
Access
select count(*) from T1;
mov ebx, base(T1)
mov ecx, num
top:
mov eax, const
cmp eax, *ebx
jne next
inc count
next:
add ebx, len(row)
loop ecx, top
Let’s talk about: scale-out V scale-up
Larger RAM few cores does not help
Scale-out with consistent
RAM-to-Core ratio
memory
13 We fetch rows back into an internal interpreter structure.
14 We drop the temporary table TT2.
15 We prepare the interpreter to execute another query.
16 We get values from a lookup table to prequalify the loading of
EDW_RESPD_EXPSR_QHR_FACT. This is performed by the following steps, up
to 'We fetch rows back into an internal interpreter structure'.
17 We create an empty temporary table TT3 in RAM which will be randomly
distributed.
18 We select rows from the replicated table EDW_SRVC_MKT_SEG_DIM(6490) with
local conditions applied. From these rows, a result set will be
generated containing 2 columns. The results will be inserted into the
randomly distributed temporary table TT3 in RAM only. Approximately 14
rows will be in the result set with an estimated cost of 0.011.
19 We select rows from the randomly distributed temporary table TT3. From
these rows, a result set will be generated containing 1 column. The
results will be prepared to be fetched by the interpreter.
Approximately 14 rows will be in the result set with an estimated cost
of 0.023.
20 We fetch rows back into an internal interpreter structure.
OptimizeOptimizer
Good News: The Price of RAM
Price of RAM
(Log10)
1995 2000 2005 20101987
DDR4
Greater throughput to feed more CPU cores
…and thus do more analysis
Pertinence comes through analytics;
Analytics comes through processing
…and not just occasional batch runs.
So leave no core idling – query from RAM
So remember in-memory is about lots of these?
Business Integration - Analytical
Platform
Analytical
Platform
Layer
Near-line
Storage
(optional)
Application &
Client Layer
All BI Tools All OLAP Clients Excel
Persistence
Layer Hadoop
Clusters
Enterprise Data
Warehouses
Legacy
Systems
Kognitio
Storage
Reporting
Cloud
Storage
Building corporate information architecture
“Information Anywhere”:
Acquire all data
Structured Hadoop repository
In-memory analytical platform
Business Intelligence tools
Analytical tools
Functional SQL interconnects
Building blocks for information discovery and extraction
Epilogue
Inevitable commoditization
“vendors always commoditize
storage platforms …again and again”
In 2013 Kinetic hard drives first launched
Direct access over Ethernet
Direct object access via key value pairs
The HDFS versions followed a few years later
…now map-reduce going into firmware?
Innovate
Consolidate
connect
kognitio.com
kognitio.tel
kognitio.com/blog
twitter.com/kognitio
linkedin.com/companies/kognitio
tinyurl.com/kognitio
youtube.com/kognitio
contact
Michael Hiskey
VP, Marketing & Business Development
michael.hiskey@kognitio.com
Paul Groom
Chief Innovation Officer
paul.groom@kognitio.com
Steve Friedberg - press contact
MMI Communications
steve@mmicomm.com
Kognitio is an Exabyte Sponsor of Strata Hadoop World – see us at booth #409

More Related Content

What's hot

Sql server performance tuning
Sql server performance tuningSql server performance tuning
Sql server performance tuningJugal Shah
 
Perfect trio : temporal tables, transparent archiving in db2 for z_os and idaa
Perfect trio : temporal tables, transparent archiving in db2 for z_os and idaaPerfect trio : temporal tables, transparent archiving in db2 for z_os and idaa
Perfect trio : temporal tables, transparent archiving in db2 for z_os and idaaCuneyt Goksu
 
Oracle Database In-Memory and the Query Optimizer
Oracle Database In-Memory and the Query OptimizerOracle Database In-Memory and the Query Optimizer
Oracle Database In-Memory and the Query OptimizerChristian Antognini
 
Top 10 tips for Oracle performance
Top 10 tips for Oracle performanceTop 10 tips for Oracle performance
Top 10 tips for Oracle performanceGuy Harrison
 
DB2DART - DB2Night Show October 2011
DB2DART - DB2Night Show October 2011DB2DART - DB2Night Show October 2011
DB2DART - DB2Night Show October 2011Laura Hood
 
Oracle 12c Application development
Oracle 12c Application developmentOracle 12c Application development
Oracle 12c Application developmentpasalapudi123
 
db2dart and inspect
db2dart and inspectdb2dart and inspect
db2dart and inspectdbawork
 
Adaptive Query Optimization
Adaptive Query OptimizationAdaptive Query Optimization
Adaptive Query OptimizationAnju Garg
 
Adaptive Query Optimization in 12c
Adaptive Query Optimization in 12cAdaptive Query Optimization in 12c
Adaptive Query Optimization in 12cAnju Garg
 
How should I monitor my idaa
How should I monitor my idaaHow should I monitor my idaa
How should I monitor my idaaCuneyt Goksu
 
Capacity Management of an ETL System
Capacity Management of an ETL SystemCapacity Management of an ETL System
Capacity Management of an ETL SystemASHOK BHATLA
 
report on aadhaar anlysis using bid data hadoop and hive
report on aadhaar anlysis using bid data hadoop and hivereport on aadhaar anlysis using bid data hadoop and hive
report on aadhaar anlysis using bid data hadoop and hivesiddharthboora
 
Njug presentation
Njug presentationNjug presentation
Njug presentationiwrigley
 
Database Performance Tuning
Database Performance Tuning Database Performance Tuning
Database Performance Tuning Arno Huetter
 
Inside SQL Server In-Memory OLTP
Inside SQL Server In-Memory OLTPInside SQL Server In-Memory OLTP
Inside SQL Server In-Memory OLTPBob Ward
 
Oracle Parallel Distribution and 12c Adaptive Plans
Oracle Parallel Distribution and 12c Adaptive PlansOracle Parallel Distribution and 12c Adaptive Plans
Oracle Parallel Distribution and 12c Adaptive PlansFranck Pachot
 
Indexing in Exadata
Indexing in ExadataIndexing in Exadata
Indexing in ExadataEnkitec
 
Strata + Hadoop 2015 Slides
Strata + Hadoop 2015 SlidesStrata + Hadoop 2015 Slides
Strata + Hadoop 2015 SlidesJun Liu
 
Day 1 Data Stage Administrator And Director 11.0
Day 1 Data Stage Administrator And Director 11.0Day 1 Data Stage Administrator And Director 11.0
Day 1 Data Stage Administrator And Director 11.0kshanmug2
 
data deduplication
data deduplicationdata deduplication
data deduplicationssuser1eca7d
 

What's hot (20)

Sql server performance tuning
Sql server performance tuningSql server performance tuning
Sql server performance tuning
 
Perfect trio : temporal tables, transparent archiving in db2 for z_os and idaa
Perfect trio : temporal tables, transparent archiving in db2 for z_os and idaaPerfect trio : temporal tables, transparent archiving in db2 for z_os and idaa
Perfect trio : temporal tables, transparent archiving in db2 for z_os and idaa
 
Oracle Database In-Memory and the Query Optimizer
Oracle Database In-Memory and the Query OptimizerOracle Database In-Memory and the Query Optimizer
Oracle Database In-Memory and the Query Optimizer
 
Top 10 tips for Oracle performance
Top 10 tips for Oracle performanceTop 10 tips for Oracle performance
Top 10 tips for Oracle performance
 
DB2DART - DB2Night Show October 2011
DB2DART - DB2Night Show October 2011DB2DART - DB2Night Show October 2011
DB2DART - DB2Night Show October 2011
 
Oracle 12c Application development
Oracle 12c Application developmentOracle 12c Application development
Oracle 12c Application development
 
db2dart and inspect
db2dart and inspectdb2dart and inspect
db2dart and inspect
 
Adaptive Query Optimization
Adaptive Query OptimizationAdaptive Query Optimization
Adaptive Query Optimization
 
Adaptive Query Optimization in 12c
Adaptive Query Optimization in 12cAdaptive Query Optimization in 12c
Adaptive Query Optimization in 12c
 
How should I monitor my idaa
How should I monitor my idaaHow should I monitor my idaa
How should I monitor my idaa
 
Capacity Management of an ETL System
Capacity Management of an ETL SystemCapacity Management of an ETL System
Capacity Management of an ETL System
 
report on aadhaar anlysis using bid data hadoop and hive
report on aadhaar anlysis using bid data hadoop and hivereport on aadhaar anlysis using bid data hadoop and hive
report on aadhaar anlysis using bid data hadoop and hive
 
Njug presentation
Njug presentationNjug presentation
Njug presentation
 
Database Performance Tuning
Database Performance Tuning Database Performance Tuning
Database Performance Tuning
 
Inside SQL Server In-Memory OLTP
Inside SQL Server In-Memory OLTPInside SQL Server In-Memory OLTP
Inside SQL Server In-Memory OLTP
 
Oracle Parallel Distribution and 12c Adaptive Plans
Oracle Parallel Distribution and 12c Adaptive PlansOracle Parallel Distribution and 12c Adaptive Plans
Oracle Parallel Distribution and 12c Adaptive Plans
 
Indexing in Exadata
Indexing in ExadataIndexing in Exadata
Indexing in Exadata
 
Strata + Hadoop 2015 Slides
Strata + Hadoop 2015 SlidesStrata + Hadoop 2015 Slides
Strata + Hadoop 2015 Slides
 
Day 1 Data Stage Administrator And Director 11.0
Day 1 Data Stage Administrator And Director 11.0Day 1 Data Stage Administrator And Director 11.0
Day 1 Data Stage Administrator And Director 11.0
 
data deduplication
data deduplicationdata deduplication
data deduplication
 

Viewers also liked

Methods Migration from On-premise to Cloud
Methods Migration from On-premise to CloudMethods Migration from On-premise to Cloud
Methods Migration from On-premise to Cloudiosrjce
 
Next generation big data bi
Next generation big data biNext generation big data bi
Next generation big data biStanley Wang
 
12 SQL On-Hadoop Tools
12 SQL On-Hadoop Tools12 SQL On-Hadoop Tools
12 SQL On-Hadoop ToolsXplenty
 
The state of SQL-on-Hadoop in the Cloud
The state of SQL-on-Hadoop in the CloudThe state of SQL-on-Hadoop in the Cloud
The state of SQL-on-Hadoop in the CloudNicolas Poggi
 
5 keys to easy big data implementation
5 keys to easy big data implementation5 keys to easy big data implementation
5 keys to easy big data implementationLaurence Malroux
 
How big data is transforming BI
How big data is transforming BIHow big data is transforming BI
How big data is transforming BIDeZyre
 
Understanding Cloud Strategies: On premise, Cloud, and the Hybrid Approach
Understanding Cloud Strategies: On premise, Cloud, and the Hybrid ApproachUnderstanding Cloud Strategies: On premise, Cloud, and the Hybrid Approach
Understanding Cloud Strategies: On premise, Cloud, and the Hybrid ApproachAlithya
 
Bi on Big Data - Strata 2016 in London
Bi on Big Data - Strata 2016 in LondonBi on Big Data - Strata 2016 in London
Bi on Big Data - Strata 2016 in LondonDremio Corporation
 
BI congres 2014-5: from BI to big data - Jan Aertsen - Pentaho
BI congres 2014-5: from BI to big data - Jan Aertsen - PentahoBI congres 2014-5: from BI to big data - Jan Aertsen - Pentaho
BI congres 2014-5: from BI to big data - Jan Aertsen - PentahoBICC Thomas More
 
Extending BI with Big Data Analytics
Extending BI with Big Data AnalyticsExtending BI with Big Data Analytics
Extending BI with Big Data AnalyticsDatameer
 
Architecting your Cloud Strategy - Part One.vsdx
Architecting your Cloud Strategy - Part One.vsdxArchitecting your Cloud Strategy - Part One.vsdx
Architecting your Cloud Strategy - Part One.vsdxGareth Llewellyn
 
Implementation challenges in Big Data - Dr. Nilesh Karnik
Implementation challenges in Big Data - Dr. Nilesh KarnikImplementation challenges in Big Data - Dr. Nilesh Karnik
Implementation challenges in Big Data - Dr. Nilesh KarnikAureus Analytics
 
Valtech - Du BI au Big Data, une révolution dans l’entreprise
Valtech - Du BI au Big Data, une révolution dans l’entrepriseValtech - Du BI au Big Data, une révolution dans l’entreprise
Valtech - Du BI au Big Data, une révolution dans l’entrepriseValtech
 
Introduction to PolyBase
Introduction to PolyBaseIntroduction to PolyBase
Introduction to PolyBaseJames Serra
 

Viewers also liked (20)

Methods Migration from On-premise to Cloud
Methods Migration from On-premise to CloudMethods Migration from On-premise to Cloud
Methods Migration from On-premise to Cloud
 
BigData in Banking
BigData in BankingBigData in Banking
BigData in Banking
 
The evolution of Business Intelligence
The evolution of Business IntelligenceThe evolution of Business Intelligence
The evolution of Business Intelligence
 
Next generation big data bi
Next generation big data biNext generation big data bi
Next generation big data bi
 
SQL on Hadoop in Taiwan
SQL on Hadoop in TaiwanSQL on Hadoop in Taiwan
SQL on Hadoop in Taiwan
 
12 SQL On-Hadoop Tools
12 SQL On-Hadoop Tools12 SQL On-Hadoop Tools
12 SQL On-Hadoop Tools
 
The state of SQL-on-Hadoop in the Cloud
The state of SQL-on-Hadoop in the CloudThe state of SQL-on-Hadoop in the Cloud
The state of SQL-on-Hadoop in the Cloud
 
BI + Big Data
BI + Big DataBI + Big Data
BI + Big Data
 
5 keys to easy big data implementation
5 keys to easy big data implementation5 keys to easy big data implementation
5 keys to easy big data implementation
 
How big data is transforming BI
How big data is transforming BIHow big data is transforming BI
How big data is transforming BI
 
Understanding Cloud Strategies: On premise, Cloud, and the Hybrid Approach
Understanding Cloud Strategies: On premise, Cloud, and the Hybrid ApproachUnderstanding Cloud Strategies: On premise, Cloud, and the Hybrid Approach
Understanding Cloud Strategies: On premise, Cloud, and the Hybrid Approach
 
Bi on Big Data - Strata 2016 in London
Bi on Big Data - Strata 2016 in LondonBi on Big Data - Strata 2016 in London
Bi on Big Data - Strata 2016 in London
 
BI congres 2014-5: from BI to big data - Jan Aertsen - Pentaho
BI congres 2014-5: from BI to big data - Jan Aertsen - PentahoBI congres 2014-5: from BI to big data - Jan Aertsen - Pentaho
BI congres 2014-5: from BI to big data - Jan Aertsen - Pentaho
 
Extending BI with Big Data Analytics
Extending BI with Big Data AnalyticsExtending BI with Big Data Analytics
Extending BI with Big Data Analytics
 
Architecting your Cloud Strategy - Part One.vsdx
Architecting your Cloud Strategy - Part One.vsdxArchitecting your Cloud Strategy - Part One.vsdx
Architecting your Cloud Strategy - Part One.vsdx
 
Hybrid Data Platform
Hybrid Data Platform Hybrid Data Platform
Hybrid Data Platform
 
Implementation challenges in Big Data - Dr. Nilesh Karnik
Implementation challenges in Big Data - Dr. Nilesh KarnikImplementation challenges in Big Data - Dr. Nilesh Karnik
Implementation challenges in Big Data - Dr. Nilesh Karnik
 
Valtech - Du BI au Big Data, une révolution dans l’entreprise
Valtech - Du BI au Big Data, une révolution dans l’entrepriseValtech - Du BI au Big Data, une révolution dans l’entreprise
Valtech - Du BI au Big Data, une révolution dans l’entreprise
 
Introduction to PolyBase
Introduction to PolyBaseIntroduction to PolyBase
Introduction to PolyBase
 
Big Data: Issues and Challenges
Big Data: Issues and ChallengesBig Data: Issues and Challenges
Big Data: Issues and Challenges
 

Similar to Sql on hadoop the secret presentation.3pptx

Modernizing Mission-Critical Apps with SQL Server
Modernizing Mission-Critical Apps with SQL ServerModernizing Mission-Critical Apps with SQL Server
Modernizing Mission-Critical Apps with SQL ServerMicrosoft Tech Community
 
Best Practices for Building and Deploying Data Pipelines in Apache Spark
Best Practices for Building and Deploying Data Pipelines in Apache SparkBest Practices for Building and Deploying Data Pipelines in Apache Spark
Best Practices for Building and Deploying Data Pipelines in Apache SparkDatabricks
 
Web-scale data processing: practical approaches for low-latency and batch
Web-scale data processing: practical approaches for low-latency and batchWeb-scale data processing: practical approaches for low-latency and batch
Web-scale data processing: practical approaches for low-latency and batchEdward Capriolo
 
Challenges of Implementing an Advanced SQL Engine on Hadoop
Challenges of Implementing an Advanced SQL Engine on HadoopChallenges of Implementing an Advanced SQL Engine on Hadoop
Challenges of Implementing an Advanced SQL Engine on HadoopDataWorks Summit
 
Challenges of Building a First Class SQL-on-Hadoop Engine
Challenges of Building a First Class SQL-on-Hadoop EngineChallenges of Building a First Class SQL-on-Hadoop Engine
Challenges of Building a First Class SQL-on-Hadoop EngineNicolas Morales
 
New Developments in Spark
New Developments in SparkNew Developments in Spark
New Developments in SparkDatabricks
 
Sql server 2016 it just runs faster sql bits 2017 edition
Sql server 2016 it just runs faster   sql bits 2017 editionSql server 2016 it just runs faster   sql bits 2017 edition
Sql server 2016 it just runs faster sql bits 2017 editionBob Ward
 
Explore big data at speed of thought with Spark 2.0 and Snappydata
Explore big data at speed of thought with Spark 2.0 and SnappydataExplore big data at speed of thought with Spark 2.0 and Snappydata
Explore big data at speed of thought with Spark 2.0 and SnappydataData Con LA
 
Cloudera Impala Overview (via Scott Leberknight)
Cloudera Impala Overview (via Scott Leberknight)Cloudera Impala Overview (via Scott Leberknight)
Cloudera Impala Overview (via Scott Leberknight)Cloudera, Inc.
 
Cloudera Impala, updated for v1.0
Cloudera Impala, updated for v1.0Cloudera Impala, updated for v1.0
Cloudera Impala, updated for v1.0Scott Leberknight
 
Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon RedshiftBest Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon RedshiftAmazon Web Services
 
Launching Your First Big Data Project on AWS
Launching Your First Big Data Project on AWSLaunching Your First Big Data Project on AWS
Launching Your First Big Data Project on AWSAmazon Web Services
 
Using SAS GRID v 9 with Isilon F810
Using SAS GRID v 9 with Isilon F810Using SAS GRID v 9 with Isilon F810
Using SAS GRID v 9 with Isilon F810Boni Bruno
 
About "Apache Cassandra"
About "Apache Cassandra"About "Apache Cassandra"
About "Apache Cassandra"Jihyun Ahn
 
Experience SQL Server 2017: The Modern Data Platform
Experience SQL Server 2017: The Modern Data PlatformExperience SQL Server 2017: The Modern Data Platform
Experience SQL Server 2017: The Modern Data PlatformBob Ward
 
Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon RedshiftBest Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon RedshiftAmazon Web Services
 
Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon Redshift Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon Redshift Amazon Web Services
 
AIDC NY: BODO AI Presentation - 09.19.2019
AIDC NY: BODO AI Presentation - 09.19.2019AIDC NY: BODO AI Presentation - 09.19.2019
AIDC NY: BODO AI Presentation - 09.19.2019Intel® Software
 
SnappyData Ad Analytics Use Case -- BDAM Meetup Sept 14th
SnappyData Ad Analytics Use Case -- BDAM Meetup Sept 14thSnappyData Ad Analytics Use Case -- BDAM Meetup Sept 14th
SnappyData Ad Analytics Use Case -- BDAM Meetup Sept 14thSnappyData
 

Similar to Sql on hadoop the secret presentation.3pptx (20)

Modernizing Mission-Critical Apps with SQL Server
Modernizing Mission-Critical Apps with SQL ServerModernizing Mission-Critical Apps with SQL Server
Modernizing Mission-Critical Apps with SQL Server
 
Best Practices for Building and Deploying Data Pipelines in Apache Spark
Best Practices for Building and Deploying Data Pipelines in Apache SparkBest Practices for Building and Deploying Data Pipelines in Apache Spark
Best Practices for Building and Deploying Data Pipelines in Apache Spark
 
Web-scale data processing: practical approaches for low-latency and batch
Web-scale data processing: practical approaches for low-latency and batchWeb-scale data processing: practical approaches for low-latency and batch
Web-scale data processing: practical approaches for low-latency and batch
 
Challenges of Implementing an Advanced SQL Engine on Hadoop
Challenges of Implementing an Advanced SQL Engine on HadoopChallenges of Implementing an Advanced SQL Engine on Hadoop
Challenges of Implementing an Advanced SQL Engine on Hadoop
 
Challenges of Building a First Class SQL-on-Hadoop Engine
Challenges of Building a First Class SQL-on-Hadoop EngineChallenges of Building a First Class SQL-on-Hadoop Engine
Challenges of Building a First Class SQL-on-Hadoop Engine
 
New Developments in Spark
New Developments in SparkNew Developments in Spark
New Developments in Spark
 
Sql server 2016 it just runs faster sql bits 2017 edition
Sql server 2016 it just runs faster   sql bits 2017 editionSql server 2016 it just runs faster   sql bits 2017 edition
Sql server 2016 it just runs faster sql bits 2017 edition
 
Explore big data at speed of thought with Spark 2.0 and Snappydata
Explore big data at speed of thought with Spark 2.0 and SnappydataExplore big data at speed of thought with Spark 2.0 and Snappydata
Explore big data at speed of thought with Spark 2.0 and Snappydata
 
Cloudera Impala Overview (via Scott Leberknight)
Cloudera Impala Overview (via Scott Leberknight)Cloudera Impala Overview (via Scott Leberknight)
Cloudera Impala Overview (via Scott Leberknight)
 
Cloudera Impala, updated for v1.0
Cloudera Impala, updated for v1.0Cloudera Impala, updated for v1.0
Cloudera Impala, updated for v1.0
 
Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon RedshiftBest Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon Redshift
 
Launching Your First Big Data Project on AWS
Launching Your First Big Data Project on AWSLaunching Your First Big Data Project on AWS
Launching Your First Big Data Project on AWS
 
Using SAS GRID v 9 with Isilon F810
Using SAS GRID v 9 with Isilon F810Using SAS GRID v 9 with Isilon F810
Using SAS GRID v 9 with Isilon F810
 
About "Apache Cassandra"
About "Apache Cassandra"About "Apache Cassandra"
About "Apache Cassandra"
 
Experience SQL Server 2017: The Modern Data Platform
Experience SQL Server 2017: The Modern Data PlatformExperience SQL Server 2017: The Modern Data Platform
Experience SQL Server 2017: The Modern Data Platform
 
Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon RedshiftBest Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon Redshift
 
notes
notesnotes
notes
 
Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon Redshift Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon Redshift
 
AIDC NY: BODO AI Presentation - 09.19.2019
AIDC NY: BODO AI Presentation - 09.19.2019AIDC NY: BODO AI Presentation - 09.19.2019
AIDC NY: BODO AI Presentation - 09.19.2019
 
SnappyData Ad Analytics Use Case -- BDAM Meetup Sept 14th
SnappyData Ad Analytics Use Case -- BDAM Meetup Sept 14thSnappyData Ad Analytics Use Case -- BDAM Meetup Sept 14th
SnappyData Ad Analytics Use Case -- BDAM Meetup Sept 14th
 

Recently uploaded

FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 

Recently uploaded (20)

FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 

Sql on hadoop the secret presentation.3pptx

  • 1. SQL on Hadoop Paul Groom RAM not Disk
  • 2.
  • 3. create external script LM_PRODUCT_FORECAST environment rsint receives ( SALEDATE DATE, DOW INTEGER, ROW_ID INTEGER, PRODNO INTEGER, DAILYSALES partition by PRODNO order by PRODNO, ROW_ID sends ( R_OUTPUT varchar ) isolate partitions script S'endofr( # Simple R script to run a linear fit on daily sales prod1<-read.csv(file=file("stdin"), header=FALSE,row.names=1) colnames(prod1)<-c("DOW","ID","PRODNO","DAILYSALES") dim1<-dim(prod1) daily1<-aggregate(prod1$DAILYSALES, list(DOW = prod1$DOW), median) daily1[,2]<-daily1[,2]/sum(daily1[,2]) basesales<-array(0,c(dim1[1],2)) basesales[,1]<-prod1$ID basesales[,2]<-(prod1$DAILYSALES/daily1[prod1$DOW+1,2]) colnames(basesales)<-c("ID","BASESALES") fit1=lm(BASESALES ~ ID,as.data.frame(basesales)) select Trans_Year, Num_Trans, count(distinct Account_ID) Num_Accts, sum(count( distinct Account_ID)) over (partition by Trans_Year cast(sum(total_spend)/1000 as int) Total_Spend, cast(sum(total_spend)/1000 as int) / count(distinct Account_ID rank() over (partition by Trans_Year order by count(distinct A rank() over (partition by Trans_Year order by sum(total_spend) from( select Account_ID, Extract(Year from Effective_Date) Trans_Year, count(Transaction_ID) Num_Trans, select dept, sum(sales) from sales_fact Where period between date ‘01-05-2006’ and date ‘31-05-2006’ group by dept having sum(sales) > 50000; select sum(sales) from sales_history where year = 2006 and month = 5 and region=1; select total_sales from summary where year = 2006 and month = 5 and region=1; Behind the numbers
  • 4. Machine learning algorithms Dynamic Simulation Statistical Analysis Clustering Behaviour modelling Faster, deeper, insight Reporting & BPM Fraud detection Dynamic Interaction Technology/Automation AnalyticalComplexity Campaign Management
  • 5.
  • 6.
  • 7. Time to influence Reaction – what? – potential value Action – opportunity - interaction BI is becoming democratized
  • 11. Business [Intelligence] Desires More timely Lower latency More granularity More usersinteractions Richer data model Self service
  • 12.
  • 13. “What percentage of business pertinent data is in your Hadoop today?” How will you improve that percentage?”
  • 14.
  • 15. Merv Adrian @merv @ratesberger mindless #Hadumping is IT's equivalent of fast food - and just as well-balanced. Forethought and planning still matter. 8:43 PM - 12 Mar 13 Oliver Ratzesberger @ratesberger Too much talk about #Hadoop being the end of ETL and then turned into the corporate #BigData dumpster. 8:40 PM - 12 Mar 13 But… Are you just Hadumping? data
  • 16. Hadumping Data Lake Enterprise Integration Awareness & Structured Access Investigative effort Planning Value Data
  • 18. …but Hadoop too slow for interactive BI …loss of train-of-thought still
  • 19. Business [Intelligence] Desires in relation to Big Data More timely Lower latency More granularity More users interactions Richer data model Self service
  • 20. Complex Analytics & Data Science more math …a lot more math
  • 21. It’s all about getting work done Bottlenecks Used to be simple fetch of value Tasks evolving: Then dynamic aggregation Now complex algorithms! Bottlenecks
  • 22. Must get more out of Hadoop! Need better SQL integration
  • 23. SQL support …degrees of What about ad-hoc, on-demand now…not batch! BI Users want a lot more than just ANSI ‘89 or ’92 support What about ‘99, 2003, 2006, 2008 and now 2011?
  • 25.
  • 26. Are you thinking about lots of these?
  • 27. When you should be thinking about lots of these?
  • 29. RAM
  • 30. Let’s talk about: Flash is not RAM
  • 31. Let’s talk about: in-memory V cache
  • 32. In-memory misunderstood DRAM Dynamic Random Access select count(*) from T1; mov ebx, base(T1) mov ecx, num top: mov eax, const cmp eax, *ebx jne next inc count next: add ebx, len(row) loop ecx, top
  • 33. Let’s talk about: scale-out V scale-up Larger RAM few cores does not help Scale-out with consistent RAM-to-Core ratio memory
  • 34. 13 We fetch rows back into an internal interpreter structure. 14 We drop the temporary table TT2. 15 We prepare the interpreter to execute another query. 16 We get values from a lookup table to prequalify the loading of EDW_RESPD_EXPSR_QHR_FACT. This is performed by the following steps, up to 'We fetch rows back into an internal interpreter structure'. 17 We create an empty temporary table TT3 in RAM which will be randomly distributed. 18 We select rows from the replicated table EDW_SRVC_MKT_SEG_DIM(6490) with local conditions applied. From these rows, a result set will be generated containing 2 columns. The results will be inserted into the randomly distributed temporary table TT3 in RAM only. Approximately 14 rows will be in the result set with an estimated cost of 0.011. 19 We select rows from the randomly distributed temporary table TT3. From these rows, a result set will be generated containing 1 column. The results will be prepared to be fetched by the interpreter. Approximately 14 rows will be in the result set with an estimated cost of 0.023. 20 We fetch rows back into an internal interpreter structure. OptimizeOptimizer
  • 35. Good News: The Price of RAM Price of RAM (Log10) 1995 2000 2005 20101987
  • 36. DDR4 Greater throughput to feed more CPU cores …and thus do more analysis
  • 37. Pertinence comes through analytics; Analytics comes through processing …and not just occasional batch runs. So leave no core idling – query from RAM
  • 38. So remember in-memory is about lots of these?
  • 39. Business Integration - Analytical Platform Analytical Platform Layer Near-line Storage (optional) Application & Client Layer All BI Tools All OLAP Clients Excel Persistence Layer Hadoop Clusters Enterprise Data Warehouses Legacy Systems Kognitio Storage Reporting Cloud Storage
  • 40. Building corporate information architecture “Information Anywhere”: Acquire all data Structured Hadoop repository In-memory analytical platform Business Intelligence tools Analytical tools Functional SQL interconnects Building blocks for information discovery and extraction
  • 43. “vendors always commoditize storage platforms …again and again” In 2013 Kinetic hard drives first launched Direct access over Ethernet Direct object access via key value pairs The HDFS versions followed a few years later …now map-reduce going into firmware?
  • 45. connect kognitio.com kognitio.tel kognitio.com/blog twitter.com/kognitio linkedin.com/companies/kognitio tinyurl.com/kognitio youtube.com/kognitio contact Michael Hiskey VP, Marketing & Business Development michael.hiskey@kognitio.com Paul Groom Chief Innovation Officer paul.groom@kognitio.com Steve Friedberg - press contact MMI Communications steve@mmicomm.com Kognitio is an Exabyte Sponsor of Strata Hadoop World – see us at booth #409

Editor's Notes

  1. Note: 2 click build RAM - misunderstood In an industry hooked on a cute little yellow elephant – lets establish a mental placeholder for the changes to come Big ram with attitude!
  2. Note: 1 click build If you have trickle of data – time to leave the room Is this your reality – a hug flow of data?
  3. Note: 1-Click Build Is this increasing complexity of query your problem? BI mostly focuses (sells) on presentation – Graphics, pictures, Visualisation BUT behind the scenes a lot of heavy lifting has to be done This workload has changed over time from the simple to complex
  4. No Build Do your users aspire to more than just simple reports? Richer more complex low latency analytics Lots of applications utilise SQL to get their data and run complex queries Evaluating, clustering, Scoring – on the fly No longer background low frequency but foreground high frequency Machine learning – fraud detection/gaming Web Analytics – Dynamic content/bid management Modelling – traditional clustering/behavioural for marketing/product development/resource optimisation Investigative Reporting (Dashboards and reports with granular data access) Data Model
  5. Note: 2 click build Is this the scale of user community? - Lucky you – I’d watch the one on the left – looks like trouble Or is reality more like this – lots of users Or more like this bunch – they think it should all be one click away…why…
  6. Have your users been subtly brainwashed by this Innocuous high performance little box?
  7. Note: no build But most importantly – is there a time imperative? Time to deliver Latency in process SLAs to meet Volumes to support in operational windows - time ‘needed’ to influence – reaction - what - the time ‘now ‘to influence – action – opportunity Two contexts - time to influence peers and managers - time to influence customers
  8. Note: Progressive build, then 1 click change Exciting times Industry cyclic behaviour – definitely in the innovate phase at present – lots of tech disruption Lightbulb – innovation - Europe only allows low energy bulbs JOKE: “How many hadoop engineers does it take to change a lightbulb?” none as there are at least 2 other redundant light bulbs and we’ll add a new section of ceiling if you need more light Castle good icon for consolidation But sand castles are only castles built these days! Same plight for Data warehouses with Hadoop disruption
  9. Note: 2 click build OK lets talk about Jeff he has to innovate. We hear at conferences about hoodies and suits – I don’t see many of either in this room. Jeff is a data person – head of BI, head of analytics, CIO – pick a title but data dominates his thoughts, in fact getting value from data dominates his thoughts. Jeff has a suit in his life – Jeff I want to improve sales, grow revenue, make things more efficient, make our customer happier
  10. Note: no build Common factor SQL Lots of great tools – Note these are BUSINESS tools Not just HADOOP tools – used ACROSS the business Business tools rather than Hadoop tools – existing investment Plugs into traditional DB, DW etc. NOTE That: Platfora, Datameer, Hadapt are only Hadoop centric. Visokio – omniscope Datawatch - panopticon New players like Domo, changing players like Alteryx
  11. Note: 1 click build of text Its rarely about more charts, more colours, more report styles Lower latency – speed of access to new data - real time access More timely also ‘faster’ where’s the value – in the data and in the access Build and they will come – its more about interactions per user than raw users (concurrency debate)
  12. Note: no Build Enter Hadoop – takes on the big data challenge Introduces a new economic model
  13. Note: no Build If you are at this event you own, want to own or have been told own a Hadoop implementation!
  14. Note: no Build What is pertinence – lots of synonyms http://thesaurus.com/browse/pertinent
  15. Note: Auto build (no click) Grabbing, holding Planning – to drive value to improve pertinence!
  16. What is difference between hadumping and data lake – serenity and desirability See the ripple that’s the business sticking a toe into the lake! Enterprise integration is goal – remember pertinence – but just not for the few for the masses! Its only meaningful if value is derived
  17. Note: No build, next slide swipes over “Investigative effort” – requires action to enagage with data
  18. 1-Click Build Sorry – even with 2.0 and YARN – there is a long way to go Train-of-thought, drag-and-drop, google effect Remember the rise of data discovery Fine for big trawls Not good for low latency iterations, high frequency access There, I have dared to say it! Does not accelerate BI quite in the way business was sold by the EDW Loss of “interactivity” A decade of being sold train-of-thought Hadoop - Not hands on, not desktop, not agile
  19. Note: Auto Build So a quick check point – where are we More timely – no – too much effort to work out what to do? Batch processing gets in the way of interactive access Self-serve if you are knowledgeable enough Winning in some areas but not in all
  20. Note: 1 click progressive build Remember the point about innovation – well BI is being rapidly pushed/dragged into Complex analytics and Data Science
  21. Note: 1 click build What the business cares about is getting work done DW is now a bottleneck – its rigour and model get in the way! They really don’t care about how it is stored or where it is stored! Its not about raw individual speed its about throughput Address the bottlenecks Too many vendors play games that just shift the bottleneck
  22. Note: 2 click Build Back to Jeff – ready to swim in the data lake – the value is in their somewhere Jeff wants to exploit existing business software stacks not rebuild from scratch
  23. Note: 1 click progressive build SQL is so old – no trendy mascot or logo Uterly embedded
  24. Note: No Build NASA Juno space probe that will study Jupiter Recent earth slingshot made it fastest man-made object ever - 25miles/sec
  25. Note: no build So in the hallowed computer halls – all that latent power [Google NC Data Center]
  26. Note: No build – transition to next That’s just your data dumpster – the store, its passive Stop thinking storage and start think analyzing
  27. Note: no build Confusion about in-memory – its cores dummy! CPUs do the work – they can do continuous work if fed quickly enough They reshape data, filter data, summarize and compute They help find and shape pertinent data Good parallelism
  28. Note: no build Too far apart CPU is hungry CPUs are available on mass so get them working on the compute requirements Processor barely idling waiting for data What sits between them
  29. Note: 1 Click build, delayed overlay Ram picture then overlay Dell PowerEdge R620 rack server With lots of RAM
  30. Note: 2 click build SSDs great for Random I/O not so good for sequential scanning Still page based access and disk controller access mechanism
  31. Note: 1 click build ***Key Point ***** Another misunderstood word! Cache is optimistic – its B+ trying hard Not deterministic Still requires a lot of code to “say is data in cache, copy to main memory, then use”
  32. Note: 2 click build ***Key Point ***** Even analyst community has taken time to catch-up No code to check cache or request I/O – just code to access data That’s in-memory processing
  33. Note: 2 click build ***Key Point ***** SMP and NUMA - Noooooooo! - HP Kraken - to help SAP Its not about making huge RAM - need to scale controllers and CPUs Scale out MPP in-memory is hard! Same message as Hadoop platform itself - scale out on commodity platform - others struggle
  34. Note: 1 click build Lets revisit cash quickly – price of RAM has plummeted
  35. Note: 1 click build Innovation – faster – but like the bulb lower energy cost! DDR4 now in mass production Faster clock frequencies and data transfer rates (2133–4266 MT/s compared to DDR3&amp;apos;s 800 to 2133 MT/s[6][7][8]) 2,667 Mb/s 16GB double data rate-4 (DDR4), registered dual inline memory modules (RDIMMs), which at the outset are designed for use in enterprise class servers. and it&amp;apos;s expected to reach twice the current 1,600Mbps throughput of DDR3 - plus lower power (30-40% reduction)
  36. Note: no Build
  37. Note: no build Confusion about in-memory – its cores dummy! CPUs do the work – they can do continuous work if fed quickly enough They reshape data, filter data, summarize and compute They help find and shape pertinent data Good parallelism
  38. Note: no Build SQL on Hadoop and on DW and on cloud Must also be inclusive Hadoop is not the only store of data Don’t forget the cloud Don’t forget the DW and surrounding marts Don’t forget the operational systems
  39. Note: 1 Click Build Jeff – will be happy – less headaches No plethora of components and tech - plug and play - hey just like his SQL database Focus on the data and data quality And information extraction – data pertinence
  40. Remember old TV programmes - Epilogue
  41. Note: No Build Where Hadoop is headed The openness and accessibility. It already runs on a commodity platform! Every refinement in functionality and provisioning makes Hadoop more commoditized. Only a few major suppliers but must operate to the open standards of functionality. Every program that does code generation eliminates need for programmers. No one has the Oracle or Teradata market capture and licence model to make a fortune.
  42. Notes: 2 click build Industry already pushing down into components Seagate – Terascale drives – functional network device Data centric access – key store commoditised platform What about AWS? – cloud based commoditization http://forums.theregister.co.uk/forum/1/2013/10/23/seagate_terascale_is_first_kinetic_drive/
  43. Note: no build Industry cyclic behaviour will soon cycle back to Consolidation Rationalise computing real-estate, consolidate applications and services, Hadoop is exciting now but Its eclectic and fiddly which requires knowledge and skill to traverse great for programmers, not so great for business Every step forward is step towards commoditization Hadoop is the not the “be all and end all” lots of other data platforms
  44. Kognitio