SlideShare une entreprise Scribd logo
1  sur  27
Télécharger pour lire hors ligne
BI, Hive or Big Data Analytics?

© 2012 Datameer, Inc. All rights reserved.
© 2012 Datameer, Inc. All rights reserved.
View the Recording of these Slides!
You can view the full recording of this
on-demand webinar with slides at:
http://info.datameer.com/Slideshare-BI-HiveBig-Data-Analytics.html
!

© 2012 Datameer, Inc. All rights reserved.
About our Speaker!
Todd Nash!
!

Todd is a founding Principal at CBIG
Consulting, a professional services firm that
helps clients leverage their data assets to
produce timely, effective business strategies
and tactical decisions. Todd leads CBIG’s
eastern region consulting practice in the
development, implementation, and execution of
business intelligence and Big Data
methodologies, cloud-based analytics
strategies, and complex data warehousing
solutions.!
!
Todd graduated from Clemson University with a
Bachelor of Science degree in Management
Information Systems.!
© 2012 Datameer, Inc. All rights reserved.
About our Speaker!
Eduardo Rosas!

!
Eduardo Rosas is Vice President of Services at
Datameer and brings over 12 years of software
implementation experience to the table.!
!
In this role, Eduardo is focused on delivering
repeatable, high quality level of services and
support to help clients achieve their goals. !
!
Prior to Datameer, Eduardo spent 11 years at
Trintech where he focused on managing a team of
Technical Consultants and implementing global
Java web based solutions. Eduardo is originally
from San Jose, CA and graduated from Santa
Clara University.!
!

© 2012 Datameer, Inc. All rights reserved.
Agenda	
  
•  Problem	
  Statement	
  –	
  Business	
  &	
  Technical	
  
•  POC	
  Technical	
  Solu;on	
  –	
  High-­‐level	
  and	
  Detailed	
  
•  Results	
  
•  Lessons	
  Learned	
  

Copyright	
  ©	
  2013	
  CBIG	
  Consul;ng	
  

5	
  
PROBLEM STATEMENT

Copyright	
  ©	
  2013	
  CBIG	
  Consul;ng	
  

6	
  
Business	
  Problem	
  Statement	
  
A	
  Real	
  Estate	
  .com	
  business	
  makes	
  money	
  in	
  two	
  ways:	
  
1.  Property	
  Owners	
  adver;se	
  proper;es	
  
2.  Ancillary	
  businesses	
  adver;se	
  services	
  
This	
  site	
  needs	
  the	
  analy;cs	
  to	
  show	
  customers	
  the	
  return	
  on	
  their	
  investment	
  
	
  
SEARCH	
  

IMPRESSIONS	
  

CLICK-­‐THRU	
  

LEAD	
  

	
  

	
  

	
  

	
  

Breadth:	
  	
  
•  Searches	
  to	
  Impressions	
  to	
  Click	
  Thru	
  to	
  Leads	
  
•  Website	
  op;miza;on	
  
•  Customer	
  op;miza;on	
  &	
  upgrades	
  
•  Market	
  op;miza;on	
  
Depth:	
  
•  Can	
  the	
  search	
  criteria	
  be	
  op;mized?	
  
•  Conversion	
  of	
  impressions	
  based	
  on	
  refinement	
  of	
  search?	
  
•  Which	
  product	
  mix	
  of	
  impressions	
  get	
  the	
  greatest	
  click	
  thru	
  
•  What	
  is	
  the	
  impact	
  of	
  ameni;es	
  to	
  leads?	
  
•  What	
  addi;onal	
  features	
  get	
  used	
  to	
  convert	
  to	
  leads?	
  
Copyright	
  ©	
  2013	
  CBIG	
  Consul;ng	
  

7	
  
Source	
  
Source	
  

• 
• 
• 
• 
• 

Web	
  
Ac7vity	
  
Master	
  
Data	
  

Search	
  &	
  
Impression	
  
ODS	
  

Lookup	
  
Data	
  

Data	
  Movement	
  

Source	
  

Data	
  Movement	
  

Source	
  

Service	
  

Search	
  

Data	
  
Movement	
  

Technical	
  Problem	
  Statement	
  

Search	
  &	
  
Impression	
  
EDW	
  

Search	
  
Cube	
  
Sales	
  Cube	
  
Marke7ng	
  
Cube	
  

Search & Impressions volume too large to build cube and provide deep analytics
This has a negative impact on all reporting and performance of the entire system
The business is unable to determine the value of all the data; has requests to add more
Evaluating options to increase environment or look for alternatives
POC to evaluate how Hadoop, Amazon cloud and Datameer could support challenge
Copyright	
  ©	
  2013	
  CBIG	
  Consul;ng	
  

8	
  
Technical	
  Problem	
  Statement	
  

Source	
  
Source	
  

• 
• 
• 
• 
• 

Web	
  
Ac7vity	
  
Master	
  
Data	
  

Search	
  &	
  
Impression	
  
ODS	
  

Lookup	
  
Data	
  

Data	
  Movement	
  

Source	
  

Data	
  Movement	
  

Source	
  

Service	
  

Search	
  

Data	
  
Movement	
  

	
  
Search	
  

EDW	
  

Sales	
  Cube	
  
Marke7ng	
  
Cube	
  

Search & Impressions volume too large to build cube and provide deep analytics
This has a negative impact on all reporting and performance of the entire system
The business is unable to determine the value of all the data; has requests to add more
Evaluating options to increase environment or look for alternatives
POC to evaluate how Hadoop, Amazon cloud and Datameer could support challenge
Copyright	
  ©	
  2013	
  CBIG	
  Consul;ng	
  

9	
  
Problem	
  Statement	
  –	
  Success	
  Criteria	
  
Objec7ve:	
  
To	
  prove	
  that	
  the	
  Hadoop	
  architecture	
  is	
  an	
  excellent	
  op;on	
  for	
  the	
  business	
  to	
  
interact	
  with	
  large	
  data	
  and	
  find	
  dataset	
  and	
  rela;onships	
  that	
  require	
  deeper	
  
analy;cs.	
  	
  
	
  
Original	
  Scope	
  &	
  Goals:	
  
•  Bring	
  in	
  one	
  years	
  worth	
  of	
  data	
  from	
  6	
  tables,	
  into	
  the	
  Amazon	
  Cloud	
  Hadoop	
  
environment.	
  
•  IT	
  resources	
  will	
  be	
  able	
  to	
  extract	
  the	
  data	
  from	
  these	
  tables	
  and	
  load	
  them	
  
into	
  .CSV	
  files.	
  	
  	
  
•  The	
  success	
  criteria	
  for	
  this	
  stream	
  of	
  work	
  will	
  be:	
  	
  
ü  Amazon	
  Hadoop	
  cloud	
  environment	
  &	
  account	
  is	
  setup.	
  
ü  Search	
  Analy;cs	
  data	
  loaded	
  into	
  the	
  Amazon	
  Hadoop	
  cloud	
  	
  
ü  Business	
  is	
  able	
  to	
  execute	
  and	
  perform	
  analy;cs	
  on	
  Search	
  Analy;cs	
  data	
  
that	
  is	
  stored	
  in	
  Hadoop	
  with	
  acceptable	
  performance.	
  	
  
ü  Gain	
  analy;cal	
  insights	
  with	
  new	
  solu;on	
  

Copyright	
  ©	
  2013	
  CBIG	
  Consul;ng	
  

10	
  
POC TECHNICAL SOLUTION

Copyright	
  ©	
  2013	
  CBIG	
  Consul;ng	
  

11	
  
POC	
  Technical	
  Solu;on	
  –	
  High	
  Level	
  
Web	
  Ac7vity	
  
History	
  	
  
Lookup	
  Data	
  

AWS	
  
S3	
  

Datameer	
  
(Data	
  
Discovery)	
  

Web	
  Portal	
  
(Widget	
  Based	
  
UI)	
  

AWS	
  
EMR	
  	
  
(Hadoop)	
  
Amazon	
  Web	
  Services	
  (Cloud)	
  

Copyright	
  ©	
  2013	
  CBIG	
  Consul;ng	
  

12	
  
POC	
  Technical	
  Solu;on	
  -­‐	
  Detailed	
  
Amazon	
  Cloud	
  
AllLeads	
  

WebClicks	
  

WebClicks	
  

Web	
  
Impressions	
  
WebLead	
  

Data	
  Movement	
  

AllLeads	
  

Web	
  
Impressions	
  
WebLead	
  

WebSearch	
  

WebSearch	
  

WebVisit	
  

WebVisit	
  

Generic	
  Ac;vity	
  
LR	
  Apts	
  IMPS	
  
Other	
  Leads	
  

EmailLeads	
  
Data	
  Movement	
  

EmailLeads	
  

Phone	
  Leads	
  

Generic	
  Ac;vity	
  
LR	
  Apts	
  IMPS	
  
Other	
  Leads	
  
Phone	
  Leads	
  

Site	
  

SubSite	
  

Site	
  

Lead	
  Type	
  

PageType	
  

Lead	
  Type	
  

Event	
  Type	
  

PhoneType	
  

Event	
  Type	
  

Email	
  Type	
  
Contaniner	
  
Type	
  
Affliate	
  

Product	
  ID	
  

Email	
  Type	
  
Contaniner	
  
Type	
  
Affliate	
  

Property	
  List	
  
SearchType	
  

S3	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  

Hadoop	
  

SubSite	
  
PageType	
  
PhoneType	
  
Product	
  ID	
  
Property	
  List	
  
SearchType	
  

Data	
  Workbooks	
  
	
  
AllLeads	
  
	
  
WebClicks	
  
	
  
Web	
  Impressions	
  
	
  
WebLeads	
  
	
  
WebSearch	
  
	
  
WebVisits	
  
	
  
	
   	
  
	
  
Use	
  Case	
  Workbooks	
  
	
  
Use	
  Case1	
  
Use	
  C	
  ase	
  2	
  
	
  	
  
	
  
	
  
	
  
	
  
	
  
	
  
Addi7onal	
  Data	
  Workbooks	
  
	
   	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
Addi7onal	
  	
  Use	
  Cases	
  
	
  	
  
	
  
	
  
	
  
	
  
	
  
	
  
RESULTS

Copyright	
  ©	
  2013	
  CBIG	
  Consul;ng	
  

14	
  
POC	
  Results	
  
	
  	
  	
  	
  	
  	
  	
  Success	
  Criteria 	
  

	
  

Hadoop,	
  Amazon,	
  Datameer	
  
environment	
  setup	
  
	
  
Able	
  to	
  load	
  1	
  years	
  worth	
  of	
  
data	
  –	
  nearly	
  1.3	
  TB	
  
	
  
Business	
  able	
  to	
  execute	
  and	
  
perform	
  analy;cs	
  
	
  	
  
Users	
  provided	
  acceptable	
  
performance	
  
	
  
Gain	
  new	
  insights	
  

	
  

	
  

	
  

	
  

	
  

	
  Results 	
  	
  

Environment	
  setup	
  within	
  the	
  1st	
  couple	
  of	
  days	
  
	
  	
  	
  
	
  
Loaded	
  significantly	
  more	
  data	
  than	
  planned	
  for	
  
more	
  robust	
  analy;cs	
  
	
  
Business	
  leveraged	
  Datameer	
  to	
  execute	
  use	
  cases;	
  
executed	
  ~20	
  addi;onal	
  without	
  IT	
  help	
  
	
  	
  	
  	
  
Queries	
  executed	
  to	
  comple;on.	
  Some	
  took	
  seconds,	
  
some	
  took	
  minutes	
  and	
  some	
  required	
  overnight.	
  
	
  
1st	
  ;me	
  able	
  to	
  run	
  these	
  analy;cs.	
  	
  Found	
  pajerns	
  
and	
  rela;onships	
  contrary	
  to	
  assump;ons.	
  	
  Will	
  be	
  
upda;ng	
  service	
  offerings	
  &	
  marke;ng	
  plans	
  because	
  
of	
  POC	
  

Copyright	
  ©	
  2013	
  CBIG	
  Consul;ng	
  

15	
  
LESSONS LEARNED

Copyright	
  ©	
  2013	
  CBIG	
  Consul;ng	
  

16	
  
Lessons	
  Learned	
  
GETTING	
  DATA	
  TO	
  HADOOP	
  
	
  	
  	
  	
  Hadoop	
  is	
  file	
  structure	
  

	
  	
  	
  	
  	
  	
  Finding	
  the	
  right	
  delimiter	
  

	
  	
  	
  	
  Integra;ng	
  data	
  

	
  	
  	
  	
  	
  	
  Requires	
  ETL	
  

	
  	
  	
  	
  Data	
  cleansing	
  can	
  be	
  big	
  

	
  	
  	
  	
  	
  	
  Several	
  itera;ons	
  required	
  

CLOUD	
  
	
  	
  	
  	
  Cloud	
  flexible	
  

	
  	
  	
  	
  	
  	
  Easy	
  setup	
  and	
  scaling	
  

	
  	
  	
  	
  Performance	
  &	
  sizing	
  

	
  	
  	
  	
  	
  	
  Sizing	
  the	
  cloud	
  is	
  challenging	
  

	
  	
  	
  	
  Cost	
  for	
  performance	
  

	
  	
  	
  	
  	
  	
  TBs	
  with	
  support	
  becomes	
  costly	
  

HADOOP	
  
	
  	
  	
  	
  Hadoop	
  is	
  batch 	
  	
  

	
  	
  	
  	
  	
  	
  Answers	
  one	
  thing	
  at	
  a	
  ;me	
  

	
  	
  	
  	
  Analy;cs	
  

	
  	
  	
  	
  	
  	
  Move	
  to	
  database	
  w/	
  tools	
  

PEOPLE	
  
	
  	
  	
  	
  Remember	
  change	
  mgmt	
  

	
  	
  	
  	
  	
  	
  Educa;on	
  new	
  methods	
  &	
  tools	
  

Copyright	
  ©	
  2013	
  CBIG	
  Consul;ng	
  

17	
  
So what about open source
tools like hive?

© 2012 Datameer, Inc. All rights reserved.
© 2012 Datameer, Inc. All rights reserved.
Hive…!

! 

Goal of hive!
• 

! 

Eases the complexity of writing
MapReduce jobs by providing the
technical user a set of tools that are
more familiar with via sql!

Who can use hive?!
• 

SQL Users can pick up hql basics fairly
quickly!
! 

Prerequisites!
• 
• 
• 

Must have data in hadoop!
The data must be CLEAN!
Schema must be applied to the
data by creating a hive table!

© 2012 Datameer, Inc. All rights reserved.
What is hive really good at?!
! 

Hive is good in environments where we have clean prepared
data that doesn’t change often already in hadoop!

!
! 

Resembles a language that many IT folks are already familiar
with.!

!
! 

Hive can help a user trying to identify a reporting trend!

!
! 

User defined fields (UDFs) can be used to reuse functions!

© 2012 Datameer, Inc. All rights reserved.
Some troubles!
<< - Start of Hive script ->>	
--Create an TEMP Housing Table	
	
CREATE EXTERNAL TABLE MY_TABLE(	
num_ods
string,	
num_bus_id int,	
um_ctry_cd
int,	
prod_id
string,	
rng_svc_cd string,	
rng6
string,	
bin string,	
bin_bus_id_enr
int,	
bin_ctry_cd int,	
cd_fmt_a_2
string,	
cd_enr string,	
rsn_us_ind string,	
x_bus_id
int,	
flg_enr
string,	
my_dt string,	
user_id
string,	
mthd_cd_enr
string,	
tran_seq_id string,	
cd_enr2
string,	
us_amt
string,	
moto_cd
string,	
fee_curr_cd
int,	
fee_desc_num
string,	
fee_sgn_amt
string,	
us_fee_sgn_amt
string,	
mkt_spec
string,	
catg_cd
int,	
city_enr
string,	
ctry_cd_enr
int,	
dba_id
int,	
nm_dscrptr string,	
geo_id
int,	
geo_phone_num
string,	
tier_cd
string,	
msa
string,	
nrmlzd_id int,	
pstl_cd
string,	
b_st_cd_enr string,	
b_store_id
string,	
b_vrfcn_val string,	
ntwrk_id
int,	
site string,	

! 
! 
! 
! 
! 

entry_mode_cd
string,	
term_cpbty_cd
string,	
sub_typ_cd string,	
dt string,	
id_num_enr int,	
prod_num
int,	
prod_ppd_sub_typ_cd
string,	
prod_typ_cd_enr string,	
prod_typ_ext_enr
string,	
promo_cd
string,	
promo_typ
string,	
rwds_pgm_id_enr string,	
tran_cd string,	
tran_gmt_dt
string,	
tran_gmt_tm
string,	
tran_id string,	
unfrzn_acct_num_bus_id_enr
int,	
unfrzn_arn_bin_bus_id_enr
int,	
usage_cd_enr
string,	
Other_amt
string,	
curr_cd
int,	
dt
string,	
)COMMENT "THIS IS MY TEMP TABLE";	
	
--INSERT DATA INTO MY_TABLE	
	
INSERT OVERWRITE MY_TABLE 	
select * , 	
SUM(us_tran_amt) AS SALES_VOL,	
SUM(US_FEE_SGN_AMT) AS US_FEE_SGN_AMT,	
COUNT(*) AS TRAN_COUNT,	
MIN(ACTIVE_DT) AS FIRST_ACTIVE_DT,	
MAX(SEARCH_DT) AS LAST_SEARCH_DT,	
MAX(customer_biz_id) AS customer_biz_id, 	
MAX(PGM_ID_ENR) AS PGM_ID_ENR, 	
MAX(CUST_PROD_ID) AS CUST_PROD_ID , 	
MAX(POD_ID_NUM_ENR) AS POD_ID_NUM_ENR,	
MAX(PROD_TYPE)AS PROD_TYPE, 	
MAX(SUB_TYPE) AS SUB_TYPE, 	
1 as ID	
from MY_TABLE	
WHERE dt like '2012%' 	
GROUP BY 	
customer_biz_id, 	
PGM_ID_ENR, 	
CUST_PROD_ID, 	

eci_moto_cd, 	
catg_cd, 	
city_enr,	
ctry_cd_enr, 	
pstl_cd, 	
pod_id, 	
prod_num, 	
SUB_TYPE;	
	
--CREATE TEMP LOOKUP TABLE	
	
CREATE EXTERNAL TABLE TEMP_LOOKUP(	
acct_num
bigint,	
acct_sta_cd
string,	
acct_zip_cd
string,	
rwrd_pgm_id
string,	
pgm_ref_cd
	
	string,	
acct_prod_id
string,	
bus_id
int,	
bin
int,	
status
string,	
pgm_eff_dt
string,	
dt
string,	
)COMMENT "THIS IS TEMP LOOKUP TABLE";	
	
--INSERT DATA INTO IT	
	
INSERT OVERWRITE MY_LOOKUP	
SELECT *, 1 as cmf_ind	
FROM LOOKUP	
WHERE DT = '201211';	
	
--Do a Full Outer Join	
	
SELECT * FROM MY_TABLE mt	
FULL OUTER JOIN 	
MY_LOOKUP ml	
ON mt.member_id = ml.member_id;	
	
	

No way to get data in hadoop!
No data validation / may throw data away!
Security !
Sharing code via teams is a challenge!
No visualization!

© 2012 Datameer, Inc. All rights reserved.
… but it’s free right?!
! 

! 

"Time to create Hive":  



Any machine-generated data (or anything semi/unstructured) must first be parsed by writing !!
!MapReduce or Pig/Python programs.  Time-to-market disadvantage.

Table definition is a manual effort (though this can be made easier by 3rd party tools).

!
"Time to maintain Hive":



Hive data models (tables) are most likely static, shared objects maintained and controlled by a few
people who own the schema !
Hive is also more of a black box for new employees coming in (so employee churn creates more
maintenance effort). !

!
! 

Cost to implement Hive:



This is mostly down to the human capital (expensive developers), and don't forget the prerequisite
cost of implementing the data ingestion stage of the pipeline (populating the warehouse by writing
MapReduce programs or other programs parsing/loading the data).  !

© 2012 Datameer, Inc. All rights reserved.
Business decsion!
!
! 

Do I train my engineers on a language or
eliminate the need from this by taking the problem
directly to the business user.!

!

© 2012 Datameer, Inc. All rights reserved.
So what would my hive resource need
to know?!
! 

!

! 

Hive QL (different dialect than ANSI standard SQL)

!
MapReduce TUNING parameters.  (to name a few)!
•  Data block size!
•  Number of mappers/reducers!
•  Compression at map out level; result compression; what codec to use!
•  io.sort.factor !!
Access to hive is mainly done via Command line interface!

© 2012 Datameer, Inc. All rights reserved.
How does Datameer do it differently
!

© 2012 Datameer, Inc. All rights reserved.
Questions and Answers!

© 2012 Datameer, Inc. All rights reserved.
Online Resources

§ 
§ 

Try Datameer: www.datameer.com!
Follow us on Twitter @datameer!

!

!
© 2012 Datameer, Inc. All rights reserved.

Contenu connexe

Tendances

Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big DataCombine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big DataHortonworks
 
Introduction to Microsoft HDInsight and BI Tools
Introduction to Microsoft HDInsight and BI ToolsIntroduction to Microsoft HDInsight and BI Tools
Introduction to Microsoft HDInsight and BI ToolsDataWorks Summit
 
Extending the EDW with Hadoop - Chicago Data Summit 2011
Extending the EDW with Hadoop - Chicago Data Summit 2011Extending the EDW with Hadoop - Chicago Data Summit 2011
Extending the EDW with Hadoop - Chicago Data Summit 2011Jonathan Seidman
 
Intro to HDFS and MapReduce
Intro to HDFS and MapReduceIntro to HDFS and MapReduce
Intro to HDFS and MapReduceRyan Tabora
 
Introduction to Hortonworks Data Platform for Windows
Introduction to Hortonworks Data Platform for WindowsIntroduction to Hortonworks Data Platform for Windows
Introduction to Hortonworks Data Platform for WindowsHortonworks
 
Hadoop in the cloud – The what, why and how from the experts
Hadoop in the cloud – The what, why and how from the expertsHadoop in the cloud – The what, why and how from the experts
Hadoop in the cloud – The what, why and how from the expertsDataWorks Summit
 
Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol
Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol
Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol HARMAN Services
 
Create a Smarter Data Lake with HP Haven and Apache Hadoop
Create a Smarter Data Lake with HP Haven and Apache HadoopCreate a Smarter Data Lake with HP Haven and Apache Hadoop
Create a Smarter Data Lake with HP Haven and Apache HadoopHortonworks
 
Evolution of Big Data at Intel - Crawl, Walk and Run Approach
Evolution of Big Data at Intel - Crawl, Walk and Run ApproachEvolution of Big Data at Intel - Crawl, Walk and Run Approach
Evolution of Big Data at Intel - Crawl, Walk and Run ApproachDataWorks Summit
 
Distributed Data Analysis with Hadoop and R - Strangeloop 2011
Distributed Data Analysis with Hadoop and R - Strangeloop 2011Distributed Data Analysis with Hadoop and R - Strangeloop 2011
Distributed Data Analysis with Hadoop and R - Strangeloop 2011Jonathan Seidman
 
Flexible In-Situ Indexing for Hadoop via Elephant Twin
Flexible In-Situ Indexing for Hadoop via Elephant TwinFlexible In-Situ Indexing for Hadoop via Elephant Twin
Flexible In-Situ Indexing for Hadoop via Elephant TwinDmitriy Ryaboy
 
Data Lake for the Cloud: Extending your Hadoop Implementation
Data Lake for the Cloud: Extending your Hadoop ImplementationData Lake for the Cloud: Extending your Hadoop Implementation
Data Lake for the Cloud: Extending your Hadoop ImplementationHortonworks
 
Empowering you with Democratized Data Access, Data Science and Machine Learning
Empowering you with Democratized Data Access, Data Science and Machine LearningEmpowering you with Democratized Data Access, Data Science and Machine Learning
Empowering you with Democratized Data Access, Data Science and Machine LearningDataWorks Summit
 
Can you Re-Platform your Teradata, Oracle, Netezza and SQL Server Analytic Wo...
Can you Re-Platform your Teradata, Oracle, Netezza and SQL Server Analytic Wo...Can you Re-Platform your Teradata, Oracle, Netezza and SQL Server Analytic Wo...
Can you Re-Platform your Teradata, Oracle, Netezza and SQL Server Analytic Wo...DataWorks Summit
 
Using hadoop to expand data warehousing
Using hadoop to expand data warehousingUsing hadoop to expand data warehousing
Using hadoop to expand data warehousingDataWorks Summit
 
20100806 cloudera 10 hadoopable problems webinar
20100806 cloudera 10 hadoopable problems webinar20100806 cloudera 10 hadoopable problems webinar
20100806 cloudera 10 hadoopable problems webinarCloudera, Inc.
 
HDInsight Hadoop on Windows Azure
HDInsight Hadoop on Windows AzureHDInsight Hadoop on Windows Azure
HDInsight Hadoop on Windows AzureLynn Langit
 

Tendances (20)

Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big DataCombine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
 
Introduction to Microsoft HDInsight and BI Tools
Introduction to Microsoft HDInsight and BI ToolsIntroduction to Microsoft HDInsight and BI Tools
Introduction to Microsoft HDInsight and BI Tools
 
Extending the EDW with Hadoop - Chicago Data Summit 2011
Extending the EDW with Hadoop - Chicago Data Summit 2011Extending the EDW with Hadoop - Chicago Data Summit 2011
Extending the EDW with Hadoop - Chicago Data Summit 2011
 
Intro to HDFS and MapReduce
Intro to HDFS and MapReduceIntro to HDFS and MapReduce
Intro to HDFS and MapReduce
 
Introduction to Hortonworks Data Platform for Windows
Introduction to Hortonworks Data Platform for WindowsIntroduction to Hortonworks Data Platform for Windows
Introduction to Hortonworks Data Platform for Windows
 
Hadoop in the cloud – The what, why and how from the experts
Hadoop in the cloud – The what, why and how from the expertsHadoop in the cloud – The what, why and how from the experts
Hadoop in the cloud – The what, why and how from the experts
 
Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol
Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol
Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol
 
Create a Smarter Data Lake with HP Haven and Apache Hadoop
Create a Smarter Data Lake with HP Haven and Apache HadoopCreate a Smarter Data Lake with HP Haven and Apache Hadoop
Create a Smarter Data Lake with HP Haven and Apache Hadoop
 
Evolution of Big Data at Intel - Crawl, Walk and Run Approach
Evolution of Big Data at Intel - Crawl, Walk and Run ApproachEvolution of Big Data at Intel - Crawl, Walk and Run Approach
Evolution of Big Data at Intel - Crawl, Walk and Run Approach
 
Distributed Data Analysis with Hadoop and R - Strangeloop 2011
Distributed Data Analysis with Hadoop and R - Strangeloop 2011Distributed Data Analysis with Hadoop and R - Strangeloop 2011
Distributed Data Analysis with Hadoop and R - Strangeloop 2011
 
Flexible In-Situ Indexing for Hadoop via Elephant Twin
Flexible In-Situ Indexing for Hadoop via Elephant TwinFlexible In-Situ Indexing for Hadoop via Elephant Twin
Flexible In-Situ Indexing for Hadoop via Elephant Twin
 
Data Lake for the Cloud: Extending your Hadoop Implementation
Data Lake for the Cloud: Extending your Hadoop ImplementationData Lake for the Cloud: Extending your Hadoop Implementation
Data Lake for the Cloud: Extending your Hadoop Implementation
 
Empowering you with Democratized Data Access, Data Science and Machine Learning
Empowering you with Democratized Data Access, Data Science and Machine LearningEmpowering you with Democratized Data Access, Data Science and Machine Learning
Empowering you with Democratized Data Access, Data Science and Machine Learning
 
Can you Re-Platform your Teradata, Oracle, Netezza and SQL Server Analytic Wo...
Can you Re-Platform your Teradata, Oracle, Netezza and SQL Server Analytic Wo...Can you Re-Platform your Teradata, Oracle, Netezza and SQL Server Analytic Wo...
Can you Re-Platform your Teradata, Oracle, Netezza and SQL Server Analytic Wo...
 
Using hadoop to expand data warehousing
Using hadoop to expand data warehousingUsing hadoop to expand data warehousing
Using hadoop to expand data warehousing
 
Azure HDInsight
Azure HDInsightAzure HDInsight
Azure HDInsight
 
20100806 cloudera 10 hadoopable problems webinar
20100806 cloudera 10 hadoopable problems webinar20100806 cloudera 10 hadoopable problems webinar
20100806 cloudera 10 hadoopable problems webinar
 
50 Shades of SQL
50 Shades of SQL50 Shades of SQL
50 Shades of SQL
 
A Mayo Clinic Big Data Implementation
A Mayo Clinic Big Data ImplementationA Mayo Clinic Big Data Implementation
A Mayo Clinic Big Data Implementation
 
HDInsight Hadoop on Windows Azure
HDInsight Hadoop on Windows AzureHDInsight Hadoop on Windows Azure
HDInsight Hadoop on Windows Azure
 

En vedette

Getting started with Hadoop, Hive, and Elastic MapReduce
Getting started with Hadoop, Hive, and Elastic MapReduceGetting started with Hadoop, Hive, and Elastic MapReduce
Getting started with Hadoop, Hive, and Elastic MapReduceobdit
 
Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011
Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011
Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011Jonathan Seidman
 
Jan 2013 HUG: Cloud-Friendly Hadoop and Hive
Jan 2013 HUG: Cloud-Friendly Hadoop and HiveJan 2013 HUG: Cloud-Friendly Hadoop and Hive
Jan 2013 HUG: Cloud-Friendly Hadoop and HiveYahoo Developer Network
 
Scrapy talk at DataPhilly
Scrapy talk at DataPhillyScrapy talk at DataPhilly
Scrapy talk at DataPhillyobdit
 
Qubole - Big data in cloud
Qubole - Big data in cloudQubole - Big data in cloud
Qubole - Big data in cloudDmitry Tolpeko
 

En vedette (8)

Getting started with Hadoop, Hive, and Elastic MapReduce
Getting started with Hadoop, Hive, and Elastic MapReduceGetting started with Hadoop, Hive, and Elastic MapReduce
Getting started with Hadoop, Hive, and Elastic MapReduce
 
Hadoop Now, Next & Beyond
Hadoop Now, Next & BeyondHadoop Now, Next & Beyond
Hadoop Now, Next & Beyond
 
Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011
Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011
Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011
 
Jan 2013 HUG: Cloud-Friendly Hadoop and Hive
Jan 2013 HUG: Cloud-Friendly Hadoop and HiveJan 2013 HUG: Cloud-Friendly Hadoop and Hive
Jan 2013 HUG: Cloud-Friendly Hadoop and Hive
 
Scrapy talk at DataPhilly
Scrapy talk at DataPhillyScrapy talk at DataPhilly
Scrapy talk at DataPhilly
 
SQL in Hadoop
SQL in HadoopSQL in Hadoop
SQL in Hadoop
 
Hive sq lfor-hadoop
Hive sq lfor-hadoopHive sq lfor-hadoop
Hive sq lfor-hadoop
 
Qubole - Big data in cloud
Qubole - Big data in cloudQubole - Big data in cloud
Qubole - Big data in cloud
 

Similaire à BI, Hive or Big Data Analytics? Which is Best for Your Business

Where the Warehouse Ends: A New Age of Information Access
Where the Warehouse Ends: A New Age of Information AccessWhere the Warehouse Ends: A New Age of Information Access
Where the Warehouse Ends: A New Age of Information AccessInside Analysis
 
Create your Big Data vision and Hadoop-ify your data warehouse
Create your Big Data vision and Hadoop-ify your data warehouseCreate your Big Data vision and Hadoop-ify your data warehouse
Create your Big Data vision and Hadoop-ify your data warehouseJeff Kelly
 
Big data an elephant business opportunities
Big data an elephant   business opportunitiesBig data an elephant   business opportunities
Big data an elephant business opportunitiesBigdata Meetup Kochi
 
Is your big data journey stalling? Take the Leap with Capgemini and Cloudera
Is your big data journey stalling? Take the Leap with Capgemini and ClouderaIs your big data journey stalling? Take the Leap with Capgemini and Cloudera
Is your big data journey stalling? Take the Leap with Capgemini and ClouderaCloudera, Inc.
 
Cloud Integration with Database.com and Heroku
Cloud Integration with Database.com and HerokuCloud Integration with Database.com and Heroku
Cloud Integration with Database.com and HerokuSalesforce Developers
 
BIG Data & Hadoop Applications in Finance
BIG Data & Hadoop Applications in FinanceBIG Data & Hadoop Applications in Finance
BIG Data & Hadoop Applications in FinanceSkillspeed
 
Developing Your Cloud Strategy
Developing Your Cloud StrategyDeveloping Your Cloud Strategy
Developing Your Cloud StrategyVISI
 
Transform DBMS to Drive Apps of Engagement Innovation
Transform DBMS to Drive Apps of Engagement InnovationTransform DBMS to Drive Apps of Engagement Innovation
Transform DBMS to Drive Apps of Engagement InnovationEDB
 
2015 02 12 talend hortonworks webinar challenges to hadoop adoption
2015 02 12 talend hortonworks webinar challenges to hadoop adoption2015 02 12 talend hortonworks webinar challenges to hadoop adoption
2015 02 12 talend hortonworks webinar challenges to hadoop adoptionHortonworks
 
Contexti / Oracle - Big Data : From Pilot to Production
Contexti / Oracle - Big Data : From Pilot to ProductionContexti / Oracle - Big Data : From Pilot to Production
Contexti / Oracle - Big Data : From Pilot to ProductionContexti
 
Open Source Ecosystem Future of Enterprise IT
Open Source Ecosystem Future of Enterprise ITOpen Source Ecosystem Future of Enterprise IT
Open Source Ecosystem Future of Enterprise ITandreas kuncoro
 
Transform Your DBMS to Drive Application Innovation
Transform Your DBMS to Drive Application InnovationTransform Your DBMS to Drive Application Innovation
Transform Your DBMS to Drive Application InnovationEDB
 
Getting Started with Big Data for Business Managers
Getting Started with Big Data for Business ManagersGetting Started with Big Data for Business Managers
Getting Started with Big Data for Business ManagersDatameer
 
AWS May Webinar Series - Industry Trends and Best Practices for Cloud Adoption
AWS May Webinar Series - Industry Trends and Best Practices for Cloud AdoptionAWS May Webinar Series - Industry Trends and Best Practices for Cloud Adoption
AWS May Webinar Series - Industry Trends and Best Practices for Cloud AdoptionAmazon Web Services
 
ATAGTR2017 Performance Testing and Non-Functional Testing Strategy for Big Da...
ATAGTR2017 Performance Testing and Non-Functional Testing Strategy for Big Da...ATAGTR2017 Performance Testing and Non-Functional Testing Strategy for Big Da...
ATAGTR2017 Performance Testing and Non-Functional Testing Strategy for Big Da...Agile Testing Alliance
 
Horses for Courses: Database Roundtable
Horses for Courses: Database RoundtableHorses for Courses: Database Roundtable
Horses for Courses: Database RoundtableEric Kavanagh
 
Big Data LDN 2017: The New Dominant Companies Are Running on Data
Big Data LDN 2017: The New Dominant Companies Are Running on DataBig Data LDN 2017: The New Dominant Companies Are Running on Data
Big Data LDN 2017: The New Dominant Companies Are Running on DataMatt Stubbs
 
Big Data LDN 2017: The New Dominant Companies Are Running on Data
Big Data LDN 2017: The New Dominant Companies Are Running on DataBig Data LDN 2017: The New Dominant Companies Are Running on Data
Big Data LDN 2017: The New Dominant Companies Are Running on DataMatt Stubbs
 
Big Data Solutions Executive Overview
Big Data Solutions Executive OverviewBig Data Solutions Executive Overview
Big Data Solutions Executive OverviewRCG Global Services
 

Similaire à BI, Hive or Big Data Analytics? Which is Best for Your Business (20)

Where the Warehouse Ends: A New Age of Information Access
Where the Warehouse Ends: A New Age of Information AccessWhere the Warehouse Ends: A New Age of Information Access
Where the Warehouse Ends: A New Age of Information Access
 
Create your Big Data vision and Hadoop-ify your data warehouse
Create your Big Data vision and Hadoop-ify your data warehouseCreate your Big Data vision and Hadoop-ify your data warehouse
Create your Big Data vision and Hadoop-ify your data warehouse
 
Big data an elephant business opportunities
Big data an elephant   business opportunitiesBig data an elephant   business opportunities
Big data an elephant business opportunities
 
Is your big data journey stalling? Take the Leap with Capgemini and Cloudera
Is your big data journey stalling? Take the Leap with Capgemini and ClouderaIs your big data journey stalling? Take the Leap with Capgemini and Cloudera
Is your big data journey stalling? Take the Leap with Capgemini and Cloudera
 
Cloud Integration with Database.com and Heroku
Cloud Integration with Database.com and HerokuCloud Integration with Database.com and Heroku
Cloud Integration with Database.com and Heroku
 
BIG Data & Hadoop Applications in Finance
BIG Data & Hadoop Applications in FinanceBIG Data & Hadoop Applications in Finance
BIG Data & Hadoop Applications in Finance
 
Developing Your Cloud Strategy
Developing Your Cloud StrategyDeveloping Your Cloud Strategy
Developing Your Cloud Strategy
 
Transform DBMS to Drive Apps of Engagement Innovation
Transform DBMS to Drive Apps of Engagement InnovationTransform DBMS to Drive Apps of Engagement Innovation
Transform DBMS to Drive Apps of Engagement Innovation
 
Developing Your Cloud Strategy
Developing Your Cloud StrategyDeveloping Your Cloud Strategy
Developing Your Cloud Strategy
 
2015 02 12 talend hortonworks webinar challenges to hadoop adoption
2015 02 12 talend hortonworks webinar challenges to hadoop adoption2015 02 12 talend hortonworks webinar challenges to hadoop adoption
2015 02 12 talend hortonworks webinar challenges to hadoop adoption
 
Contexti / Oracle - Big Data : From Pilot to Production
Contexti / Oracle - Big Data : From Pilot to ProductionContexti / Oracle - Big Data : From Pilot to Production
Contexti / Oracle - Big Data : From Pilot to Production
 
Open Source Ecosystem Future of Enterprise IT
Open Source Ecosystem Future of Enterprise ITOpen Source Ecosystem Future of Enterprise IT
Open Source Ecosystem Future of Enterprise IT
 
Transform Your DBMS to Drive Application Innovation
Transform Your DBMS to Drive Application InnovationTransform Your DBMS to Drive Application Innovation
Transform Your DBMS to Drive Application Innovation
 
Getting Started with Big Data for Business Managers
Getting Started with Big Data for Business ManagersGetting Started with Big Data for Business Managers
Getting Started with Big Data for Business Managers
 
AWS May Webinar Series - Industry Trends and Best Practices for Cloud Adoption
AWS May Webinar Series - Industry Trends and Best Practices for Cloud AdoptionAWS May Webinar Series - Industry Trends and Best Practices for Cloud Adoption
AWS May Webinar Series - Industry Trends and Best Practices for Cloud Adoption
 
ATAGTR2017 Performance Testing and Non-Functional Testing Strategy for Big Da...
ATAGTR2017 Performance Testing and Non-Functional Testing Strategy for Big Da...ATAGTR2017 Performance Testing and Non-Functional Testing Strategy for Big Da...
ATAGTR2017 Performance Testing and Non-Functional Testing Strategy for Big Da...
 
Horses for Courses: Database Roundtable
Horses for Courses: Database RoundtableHorses for Courses: Database Roundtable
Horses for Courses: Database Roundtable
 
Big Data LDN 2017: The New Dominant Companies Are Running on Data
Big Data LDN 2017: The New Dominant Companies Are Running on DataBig Data LDN 2017: The New Dominant Companies Are Running on Data
Big Data LDN 2017: The New Dominant Companies Are Running on Data
 
Big Data LDN 2017: The New Dominant Companies Are Running on Data
Big Data LDN 2017: The New Dominant Companies Are Running on DataBig Data LDN 2017: The New Dominant Companies Are Running on Data
Big Data LDN 2017: The New Dominant Companies Are Running on Data
 
Big Data Solutions Executive Overview
Big Data Solutions Executive OverviewBig Data Solutions Executive Overview
Big Data Solutions Executive Overview
 

Plus de Datameer

Datameer6 for prospects - june 2016_v2
Datameer6 for prospects - june 2016_v2Datameer6 for prospects - june 2016_v2
Datameer6 for prospects - june 2016_v2Datameer
 
Extending BI with Big Data Analytics
Extending BI with Big Data AnalyticsExtending BI with Big Data Analytics
Extending BI with Big Data AnalyticsDatameer
 
The State of Big Data Adoption: A Glance at Top Industries Adopting Big Data ...
The State of Big Data Adoption: A Glance at Top Industries Adopting Big Data ...The State of Big Data Adoption: A Glance at Top Industries Adopting Big Data ...
The State of Big Data Adoption: A Glance at Top Industries Adopting Big Data ...Datameer
 
Understand Your Customer Buying Journey with Big Data
Understand Your Customer Buying Journey with Big Data Understand Your Customer Buying Journey with Big Data
Understand Your Customer Buying Journey with Big Data Datameer
 
Analyzing Unstructured Data in Hadoop Webinar
Analyzing Unstructured Data in Hadoop WebinarAnalyzing Unstructured Data in Hadoop Webinar
Analyzing Unstructured Data in Hadoop WebinarDatameer
 
How to Avoid Pitfalls in Big Data Analytics Webinar
How to Avoid Pitfalls in Big Data Analytics WebinarHow to Avoid Pitfalls in Big Data Analytics Webinar
How to Avoid Pitfalls in Big Data Analytics WebinarDatameer
 
Webinar - Introducing Datameer 4.0: Visual, End-to-End
Webinar - Introducing Datameer 4.0: Visual, End-to-EndWebinar - Introducing Datameer 4.0: Visual, End-to-End
Webinar - Introducing Datameer 4.0: Visual, End-to-EndDatameer
 
Webinar - Big Data: Power to the User
Webinar - Big Data: Power to the User Webinar - Big Data: Power to the User
Webinar - Big Data: Power to the User Datameer
 
Why Use Hadoop for Big Data Analytics?
Why Use Hadoop for Big Data Analytics?Why Use Hadoop for Big Data Analytics?
Why Use Hadoop for Big Data Analytics?Datameer
 
Why Use Hadoop?
Why Use Hadoop?Why Use Hadoop?
Why Use Hadoop?Datameer
 
Online Fraud Detection Using Big Data Analytics Webinar
Online Fraud Detection Using Big Data Analytics WebinarOnline Fraud Detection Using Big Data Analytics Webinar
Online Fraud Detection Using Big Data Analytics WebinarDatameer
 
Instant Visualizations in Every Step of Analysis
Instant Visualizations in Every Step of AnalysisInstant Visualizations in Every Step of Analysis
Instant Visualizations in Every Step of AnalysisDatameer
 
Customer Case Studies of Self-Service Big Data Analytics
Customer Case Studies of Self-Service Big Data AnalyticsCustomer Case Studies of Self-Service Big Data Analytics
Customer Case Studies of Self-Service Big Data AnalyticsDatameer
 
Is Your Hadoop Environment Secure?
Is Your Hadoop Environment Secure?Is Your Hadoop Environment Secure?
Is Your Hadoop Environment Secure?Datameer
 
Fight Fraud with Big Data Analytics
Fight Fraud with Big Data AnalyticsFight Fraud with Big Data Analytics
Fight Fraud with Big Data AnalyticsDatameer
 
Complement Your Existing Data Warehouse with Big Data & Hadoop
Complement Your Existing Data Warehouse with Big Data & HadoopComplement Your Existing Data Warehouse with Big Data & Hadoop
Complement Your Existing Data Warehouse with Big Data & HadoopDatameer
 
Lean Production Meets Big Data: A Next Generation Use Case
Lean Production Meets Big Data: A Next Generation Use CaseLean Production Meets Big Data: A Next Generation Use Case
Lean Production Meets Big Data: A Next Generation Use CaseDatameer
 
The Economics of SQL on Hadoop
The Economics of SQL on HadoopThe Economics of SQL on Hadoop
The Economics of SQL on HadoopDatameer
 
Top 3 Considerations for Machine Learning on Big Data
Top 3 Considerations for Machine Learning on Big DataTop 3 Considerations for Machine Learning on Big Data
Top 3 Considerations for Machine Learning on Big DataDatameer
 
Best Practices for Big Data Analytics with Machine Learning by Datameer
Best Practices for Big Data Analytics with Machine Learning by DatameerBest Practices for Big Data Analytics with Machine Learning by Datameer
Best Practices for Big Data Analytics with Machine Learning by DatameerDatameer
 

Plus de Datameer (20)

Datameer6 for prospects - june 2016_v2
Datameer6 for prospects - june 2016_v2Datameer6 for prospects - june 2016_v2
Datameer6 for prospects - june 2016_v2
 
Extending BI with Big Data Analytics
Extending BI with Big Data AnalyticsExtending BI with Big Data Analytics
Extending BI with Big Data Analytics
 
The State of Big Data Adoption: A Glance at Top Industries Adopting Big Data ...
The State of Big Data Adoption: A Glance at Top Industries Adopting Big Data ...The State of Big Data Adoption: A Glance at Top Industries Adopting Big Data ...
The State of Big Data Adoption: A Glance at Top Industries Adopting Big Data ...
 
Understand Your Customer Buying Journey with Big Data
Understand Your Customer Buying Journey with Big Data Understand Your Customer Buying Journey with Big Data
Understand Your Customer Buying Journey with Big Data
 
Analyzing Unstructured Data in Hadoop Webinar
Analyzing Unstructured Data in Hadoop WebinarAnalyzing Unstructured Data in Hadoop Webinar
Analyzing Unstructured Data in Hadoop Webinar
 
How to Avoid Pitfalls in Big Data Analytics Webinar
How to Avoid Pitfalls in Big Data Analytics WebinarHow to Avoid Pitfalls in Big Data Analytics Webinar
How to Avoid Pitfalls in Big Data Analytics Webinar
 
Webinar - Introducing Datameer 4.0: Visual, End-to-End
Webinar - Introducing Datameer 4.0: Visual, End-to-EndWebinar - Introducing Datameer 4.0: Visual, End-to-End
Webinar - Introducing Datameer 4.0: Visual, End-to-End
 
Webinar - Big Data: Power to the User
Webinar - Big Data: Power to the User Webinar - Big Data: Power to the User
Webinar - Big Data: Power to the User
 
Why Use Hadoop for Big Data Analytics?
Why Use Hadoop for Big Data Analytics?Why Use Hadoop for Big Data Analytics?
Why Use Hadoop for Big Data Analytics?
 
Why Use Hadoop?
Why Use Hadoop?Why Use Hadoop?
Why Use Hadoop?
 
Online Fraud Detection Using Big Data Analytics Webinar
Online Fraud Detection Using Big Data Analytics WebinarOnline Fraud Detection Using Big Data Analytics Webinar
Online Fraud Detection Using Big Data Analytics Webinar
 
Instant Visualizations in Every Step of Analysis
Instant Visualizations in Every Step of AnalysisInstant Visualizations in Every Step of Analysis
Instant Visualizations in Every Step of Analysis
 
Customer Case Studies of Self-Service Big Data Analytics
Customer Case Studies of Self-Service Big Data AnalyticsCustomer Case Studies of Self-Service Big Data Analytics
Customer Case Studies of Self-Service Big Data Analytics
 
Is Your Hadoop Environment Secure?
Is Your Hadoop Environment Secure?Is Your Hadoop Environment Secure?
Is Your Hadoop Environment Secure?
 
Fight Fraud with Big Data Analytics
Fight Fraud with Big Data AnalyticsFight Fraud with Big Data Analytics
Fight Fraud with Big Data Analytics
 
Complement Your Existing Data Warehouse with Big Data & Hadoop
Complement Your Existing Data Warehouse with Big Data & HadoopComplement Your Existing Data Warehouse with Big Data & Hadoop
Complement Your Existing Data Warehouse with Big Data & Hadoop
 
Lean Production Meets Big Data: A Next Generation Use Case
Lean Production Meets Big Data: A Next Generation Use CaseLean Production Meets Big Data: A Next Generation Use Case
Lean Production Meets Big Data: A Next Generation Use Case
 
The Economics of SQL on Hadoop
The Economics of SQL on HadoopThe Economics of SQL on Hadoop
The Economics of SQL on Hadoop
 
Top 3 Considerations for Machine Learning on Big Data
Top 3 Considerations for Machine Learning on Big DataTop 3 Considerations for Machine Learning on Big Data
Top 3 Considerations for Machine Learning on Big Data
 
Best Practices for Big Data Analytics with Machine Learning by Datameer
Best Practices for Big Data Analytics with Machine Learning by DatameerBest Practices for Big Data Analytics with Machine Learning by Datameer
Best Practices for Big Data Analytics with Machine Learning by Datameer
 

Dernier

Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 

Dernier (20)

DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 

BI, Hive or Big Data Analytics? Which is Best for Your Business

  • 1. BI, Hive or Big Data Analytics? © 2012 Datameer, Inc. All rights reserved. © 2012 Datameer, Inc. All rights reserved.
  • 2. View the Recording of these Slides! You can view the full recording of this on-demand webinar with slides at: http://info.datameer.com/Slideshare-BI-HiveBig-Data-Analytics.html ! © 2012 Datameer, Inc. All rights reserved.
  • 3. About our Speaker! Todd Nash! ! Todd is a founding Principal at CBIG Consulting, a professional services firm that helps clients leverage their data assets to produce timely, effective business strategies and tactical decisions. Todd leads CBIG’s eastern region consulting practice in the development, implementation, and execution of business intelligence and Big Data methodologies, cloud-based analytics strategies, and complex data warehousing solutions.! ! Todd graduated from Clemson University with a Bachelor of Science degree in Management Information Systems.! © 2012 Datameer, Inc. All rights reserved.
  • 4. About our Speaker! Eduardo Rosas! ! Eduardo Rosas is Vice President of Services at Datameer and brings over 12 years of software implementation experience to the table.! ! In this role, Eduardo is focused on delivering repeatable, high quality level of services and support to help clients achieve their goals. ! ! Prior to Datameer, Eduardo spent 11 years at Trintech where he focused on managing a team of Technical Consultants and implementing global Java web based solutions. Eduardo is originally from San Jose, CA and graduated from Santa Clara University.! ! © 2012 Datameer, Inc. All rights reserved.
  • 5. Agenda   •  Problem  Statement  –  Business  &  Technical   •  POC  Technical  Solu;on  –  High-­‐level  and  Detailed   •  Results   •  Lessons  Learned   Copyright  ©  2013  CBIG  Consul;ng   5  
  • 6. PROBLEM STATEMENT Copyright  ©  2013  CBIG  Consul;ng   6  
  • 7. Business  Problem  Statement   A  Real  Estate  .com  business  makes  money  in  two  ways:   1.  Property  Owners  adver;se  proper;es   2.  Ancillary  businesses  adver;se  services   This  site  needs  the  analy;cs  to  show  customers  the  return  on  their  investment     SEARCH   IMPRESSIONS   CLICK-­‐THRU   LEAD           Breadth:     •  Searches  to  Impressions  to  Click  Thru  to  Leads   •  Website  op;miza;on   •  Customer  op;miza;on  &  upgrades   •  Market  op;miza;on   Depth:   •  Can  the  search  criteria  be  op;mized?   •  Conversion  of  impressions  based  on  refinement  of  search?   •  Which  product  mix  of  impressions  get  the  greatest  click  thru   •  What  is  the  impact  of  ameni;es  to  leads?   •  What  addi;onal  features  get  used  to  convert  to  leads?   Copyright  ©  2013  CBIG  Consul;ng   7  
  • 8. Source   Source   •  •  •  •  •  Web   Ac7vity   Master   Data   Search  &   Impression   ODS   Lookup   Data   Data  Movement   Source   Data  Movement   Source   Service   Search   Data   Movement   Technical  Problem  Statement   Search  &   Impression   EDW   Search   Cube   Sales  Cube   Marke7ng   Cube   Search & Impressions volume too large to build cube and provide deep analytics This has a negative impact on all reporting and performance of the entire system The business is unable to determine the value of all the data; has requests to add more Evaluating options to increase environment or look for alternatives POC to evaluate how Hadoop, Amazon cloud and Datameer could support challenge Copyright  ©  2013  CBIG  Consul;ng   8  
  • 9. Technical  Problem  Statement   Source   Source   •  •  •  •  •  Web   Ac7vity   Master   Data   Search  &   Impression   ODS   Lookup   Data   Data  Movement   Source   Data  Movement   Source   Service   Search   Data   Movement     Search   EDW   Sales  Cube   Marke7ng   Cube   Search & Impressions volume too large to build cube and provide deep analytics This has a negative impact on all reporting and performance of the entire system The business is unable to determine the value of all the data; has requests to add more Evaluating options to increase environment or look for alternatives POC to evaluate how Hadoop, Amazon cloud and Datameer could support challenge Copyright  ©  2013  CBIG  Consul;ng   9  
  • 10. Problem  Statement  –  Success  Criteria   Objec7ve:   To  prove  that  the  Hadoop  architecture  is  an  excellent  op;on  for  the  business  to   interact  with  large  data  and  find  dataset  and  rela;onships  that  require  deeper   analy;cs.       Original  Scope  &  Goals:   •  Bring  in  one  years  worth  of  data  from  6  tables,  into  the  Amazon  Cloud  Hadoop   environment.   •  IT  resources  will  be  able  to  extract  the  data  from  these  tables  and  load  them   into  .CSV  files.       •  The  success  criteria  for  this  stream  of  work  will  be:     ü  Amazon  Hadoop  cloud  environment  &  account  is  setup.   ü  Search  Analy;cs  data  loaded  into  the  Amazon  Hadoop  cloud     ü  Business  is  able  to  execute  and  perform  analy;cs  on  Search  Analy;cs  data   that  is  stored  in  Hadoop  with  acceptable  performance.     ü  Gain  analy;cal  insights  with  new  solu;on   Copyright  ©  2013  CBIG  Consul;ng   10  
  • 11. POC TECHNICAL SOLUTION Copyright  ©  2013  CBIG  Consul;ng   11  
  • 12. POC  Technical  Solu;on  –  High  Level   Web  Ac7vity   History     Lookup  Data   AWS   S3   Datameer   (Data   Discovery)   Web  Portal   (Widget  Based   UI)   AWS   EMR     (Hadoop)   Amazon  Web  Services  (Cloud)   Copyright  ©  2013  CBIG  Consul;ng   12  
  • 13. POC  Technical  Solu;on  -­‐  Detailed   Amazon  Cloud   AllLeads   WebClicks   WebClicks   Web   Impressions   WebLead   Data  Movement   AllLeads   Web   Impressions   WebLead   WebSearch   WebSearch   WebVisit   WebVisit   Generic  Ac;vity   LR  Apts  IMPS   Other  Leads   EmailLeads   Data  Movement   EmailLeads   Phone  Leads   Generic  Ac;vity   LR  Apts  IMPS   Other  Leads   Phone  Leads   Site   SubSite   Site   Lead  Type   PageType   Lead  Type   Event  Type   PhoneType   Event  Type   Email  Type   Contaniner   Type   Affliate   Product  ID   Email  Type   Contaniner   Type   Affliate   Property  List   SearchType   S3                                                                       Hadoop   SubSite   PageType   PhoneType   Product  ID   Property  List   SearchType   Data  Workbooks     AllLeads     WebClicks     Web  Impressions     WebLeads     WebSearch     WebVisits           Use  Case  Workbooks     Use  Case1   Use  C  ase  2                   Addi7onal  Data  Workbooks                     Addi7onal    Use  Cases                  
  • 14. RESULTS Copyright  ©  2013  CBIG  Consul;ng   14  
  • 15. POC  Results                Success  Criteria     Hadoop,  Amazon,  Datameer   environment  setup     Able  to  load  1  years  worth  of   data  –  nearly  1.3  TB     Business  able  to  execute  and   perform  analy;cs       Users  provided  acceptable   performance     Gain  new  insights              Results     Environment  setup  within  the  1st  couple  of  days           Loaded  significantly  more  data  than  planned  for   more  robust  analy;cs     Business  leveraged  Datameer  to  execute  use  cases;   executed  ~20  addi;onal  without  IT  help           Queries  executed  to  comple;on.  Some  took  seconds,   some  took  minutes  and  some  required  overnight.     1st  ;me  able  to  run  these  analy;cs.    Found  pajerns   and  rela;onships  contrary  to  assump;ons.    Will  be   upda;ng  service  offerings  &  marke;ng  plans  because   of  POC   Copyright  ©  2013  CBIG  Consul;ng   15  
  • 16. LESSONS LEARNED Copyright  ©  2013  CBIG  Consul;ng   16  
  • 17. Lessons  Learned   GETTING  DATA  TO  HADOOP          Hadoop  is  file  structure              Finding  the  right  delimiter          Integra;ng  data              Requires  ETL          Data  cleansing  can  be  big              Several  itera;ons  required   CLOUD          Cloud  flexible              Easy  setup  and  scaling          Performance  &  sizing              Sizing  the  cloud  is  challenging          Cost  for  performance              TBs  with  support  becomes  costly   HADOOP          Hadoop  is  batch                Answers  one  thing  at  a  ;me          Analy;cs              Move  to  database  w/  tools   PEOPLE          Remember  change  mgmt              Educa;on  new  methods  &  tools   Copyright  ©  2013  CBIG  Consul;ng   17  
  • 18. So what about open source tools like hive? © 2012 Datameer, Inc. All rights reserved. © 2012 Datameer, Inc. All rights reserved.
  • 19. Hive…! !  Goal of hive! •  !  Eases the complexity of writing MapReduce jobs by providing the technical user a set of tools that are more familiar with via sql! Who can use hive?! •  SQL Users can pick up hql basics fairly quickly! !  Prerequisites! •  •  •  Must have data in hadoop! The data must be CLEAN! Schema must be applied to the data by creating a hive table! © 2012 Datameer, Inc. All rights reserved.
  • 20. What is hive really good at?! !  Hive is good in environments where we have clean prepared data that doesn’t change often already in hadoop! ! !  Resembles a language that many IT folks are already familiar with.! ! !  Hive can help a user trying to identify a reporting trend! ! !  User defined fields (UDFs) can be used to reuse functions! © 2012 Datameer, Inc. All rights reserved.
  • 21. Some troubles! << - Start of Hive script ->> --Create an TEMP Housing Table CREATE EXTERNAL TABLE MY_TABLE( num_ods string, num_bus_id int, um_ctry_cd int, prod_id string, rng_svc_cd string, rng6 string, bin string, bin_bus_id_enr int, bin_ctry_cd int, cd_fmt_a_2 string, cd_enr string, rsn_us_ind string, x_bus_id int, flg_enr string, my_dt string, user_id string, mthd_cd_enr string, tran_seq_id string, cd_enr2 string, us_amt string, moto_cd string, fee_curr_cd int, fee_desc_num string, fee_sgn_amt string, us_fee_sgn_amt string, mkt_spec string, catg_cd int, city_enr string, ctry_cd_enr int, dba_id int, nm_dscrptr string, geo_id int, geo_phone_num string, tier_cd string, msa string, nrmlzd_id int, pstl_cd string, b_st_cd_enr string, b_store_id string, b_vrfcn_val string, ntwrk_id int, site string, !  !  !  !  !  entry_mode_cd string, term_cpbty_cd string, sub_typ_cd string, dt string, id_num_enr int, prod_num int, prod_ppd_sub_typ_cd string, prod_typ_cd_enr string, prod_typ_ext_enr string, promo_cd string, promo_typ string, rwds_pgm_id_enr string, tran_cd string, tran_gmt_dt string, tran_gmt_tm string, tran_id string, unfrzn_acct_num_bus_id_enr int, unfrzn_arn_bin_bus_id_enr int, usage_cd_enr string, Other_amt string, curr_cd int, dt string, )COMMENT "THIS IS MY TEMP TABLE"; --INSERT DATA INTO MY_TABLE INSERT OVERWRITE MY_TABLE select * , SUM(us_tran_amt) AS SALES_VOL, SUM(US_FEE_SGN_AMT) AS US_FEE_SGN_AMT, COUNT(*) AS TRAN_COUNT, MIN(ACTIVE_DT) AS FIRST_ACTIVE_DT, MAX(SEARCH_DT) AS LAST_SEARCH_DT, MAX(customer_biz_id) AS customer_biz_id, MAX(PGM_ID_ENR) AS PGM_ID_ENR, MAX(CUST_PROD_ID) AS CUST_PROD_ID , MAX(POD_ID_NUM_ENR) AS POD_ID_NUM_ENR, MAX(PROD_TYPE)AS PROD_TYPE, MAX(SUB_TYPE) AS SUB_TYPE, 1 as ID from MY_TABLE WHERE dt like '2012%' GROUP BY customer_biz_id, PGM_ID_ENR, CUST_PROD_ID, eci_moto_cd, catg_cd, city_enr, ctry_cd_enr, pstl_cd, pod_id, prod_num, SUB_TYPE; --CREATE TEMP LOOKUP TABLE CREATE EXTERNAL TABLE TEMP_LOOKUP( acct_num bigint, acct_sta_cd string, acct_zip_cd string, rwrd_pgm_id string, pgm_ref_cd string, acct_prod_id string, bus_id int, bin int, status string, pgm_eff_dt string, dt string, )COMMENT "THIS IS TEMP LOOKUP TABLE"; --INSERT DATA INTO IT INSERT OVERWRITE MY_LOOKUP SELECT *, 1 as cmf_ind FROM LOOKUP WHERE DT = '201211'; --Do a Full Outer Join SELECT * FROM MY_TABLE mt FULL OUTER JOIN MY_LOOKUP ml ON mt.member_id = ml.member_id; No way to get data in hadoop! No data validation / may throw data away! Security ! Sharing code via teams is a challenge! No visualization! © 2012 Datameer, Inc. All rights reserved.
  • 22. … but it’s free right?! !  !  "Time to create Hive":  
 
 Any machine-generated data (or anything semi/unstructured) must first be parsed by writing !! !MapReduce or Pig/Python programs.  Time-to-market disadvantage.
 Table definition is a manual effort (though this can be made easier by 3rd party tools).
 ! "Time to maintain Hive":
 
 Hive data models (tables) are most likely static, shared objects maintained and controlled by a few people who own the schema ! Hive is also more of a black box for new employees coming in (so employee churn creates more maintenance effort). ! ! !  Cost to implement Hive:
 
 This is mostly down to the human capital (expensive developers), and don't forget the prerequisite cost of implementing the data ingestion stage of the pipeline (populating the warehouse by writing MapReduce programs or other programs parsing/loading the data).  ! © 2012 Datameer, Inc. All rights reserved.
  • 23. Business decsion! ! !  Do I train my engineers on a language or eliminate the need from this by taking the problem directly to the business user.! ! © 2012 Datameer, Inc. All rights reserved.
  • 24. So what would my hive resource need to know?! !  ! !  Hive QL (different dialect than ANSI standard SQL)
 ! MapReduce TUNING parameters.  (to name a few)! •  Data block size! •  Number of mappers/reducers! •  Compression at map out level; result compression; what codec to use! •  io.sort.factor !! Access to hive is mainly done via Command line interface! © 2012 Datameer, Inc. All rights reserved.
  • 25. How does Datameer do it differently ! © 2012 Datameer, Inc. All rights reserved.
  • 26. Questions and Answers! © 2012 Datameer, Inc. All rights reserved.
  • 27. Online Resources §  §  Try Datameer: www.datameer.com! Follow us on Twitter @datameer! ! ! © 2012 Datameer, Inc. All rights reserved.