SlideShare une entreprise Scribd logo
1  sur  21
@joe_Caserta@BizAnalyticsTT
Architecting for Big Data:
Trends, Tips, and Deployment Options
Joe Caserta
President
Caserta Concepts
New York City
@joe_Caserta@BizAnalyticsTT
Top 20 Big Data
Consulting - CIO Review
Joe Caserta Timeline
Launched Big Data practice
Co-author, with Ralph Kimball, The
Data Warehouse ETL Toolkit (Wiley)
Dedicated to Data Warehousing,
Business Intelligence since 1996
Began consulting database
programing and data modeling 25+ years hands-on experience
building database solutions
Founded Caserta Concepts in NYC
Web log analytics solution published
in Intelligent Enterprise
Formalized Alliances / Partnerships –
System Integrators
Partnered with Big Data vendors
Cloudera, Hortonworks, IBM, Cisco,
Datameer, Basho more…
Launched Training practice, teaching
data concepts world-wide
Laser focus on extending Data
Warehouses with Big Data solutions
1986
2004
1996
2009
2001
2010
2013
Launched Big Data Warehousing
Meetup in NYC ~ 1,500 Members
2012
2014
Established best practices for big
data ecosystem implementation –
Healthcare, Finance, Insurance
Top 20 Most Powerful
Big Data consulting firms
Dedicated to Data Governance
Techniques on Big Data (Innovation)
@joe_Caserta@BizAnalyticsTT
About Caserta Concepts
• Technology services company with expertise in data analysis:
• Big Data Solutions
• Data Warehousing
• Business Intelligence
• Core focus in the following industries:
• eCommerce / Retail / Digital Marketing
• Financial Services / Insurance
• Healthcare / Higher Education
• Established in 2001:
• Increased growth year-over-year
• Industry recognized work force
• Strategy, Implementation, Analytics
• Writing, Education, Mentoring
• Data Science & Analytics
• Cloud Computing
• Data Interaction & Visualization
@joe_Caserta@BizAnalyticsTT
Sales
Marketing
Finance
ETL
Data Exploration
Horizontally Scalable Environment - Optimized for Analytics
Big Data Cluster Big Data Analytics
NoSQL
Databases
ETL
Ad-Hoc/Canned
Reporting
Traditional BI
Spark MapReduce Pig/Hive
N1 N2 N4N3 N5
Hadoop Distributed File System (HDFS)
Others…
The Evolution of Enterprise Data?
Data Science
Enterprise
Data Warehouse
ETL
@joe_Caserta@BizAnalyticsTT
Tools and Technologies
Best Practices
Data Warehousing/
ETL/Data Integration
BI/Visualization/
Analytics
Big Data Analytics
@joe_Caserta@BizAnalyticsTT
@joe_Caserta@BizAnalyticsTT
The one’s you need to know….
Hadoop Distribution: Cloudera, Hortonworks, MapR, Pivotal-HD, IBM
 Tools:
 Hive: Map data to structures and use SQL-like queries
 Pig: Data transformation language for big data
 Sqoop: Extracts external sources and loads Hadoop
 Spark: General-purpose cluster computing framework
 Storm: Real-time ETL
 NoSQL:
 Document: MongoDB, CouchDB
 Graph: Neo4j, Titan
 Key Value: Riak, Redis
 Columnar: Cassandra, Hbase
 Search: Lucene, Solr, ElasticSearch
 Languages: Python, SciPy, Java, R, Scala
@joe_Caserta@BizAnalyticsTT
Advertising
Real time interactive queries on massive
audience datasets in the cloud
360
o
Customer
Cross-channel customer linking to
improve the customer experience and
increase sales
Why are we Changing?
Recommendation Engines
“You chose… you might also like…”
Real-Time
Aggregation, Monitoring & Alerting on
events at extremely high message
rates… ~1M msgs/sec
Big Data Warehouse
Extending EDW with Hadoop
Governing data from the “lake” to the
EDW
Personal/Commercial Banking
Investment/Trading Bank
Quick Service Restaurant (QSR)
Cable Television
Audience-based Advertising
@joe_Caserta@BizAnalyticsTT
The Big Data Pyramid
 Hadoop has different demands at each tier.
 Only top tier of the is fully governed and ready for Enterprise BI
Big
Data
Warehouse
Data Science
Workspace
Data Lake – Integrated Sandbox
Landing Area – Source Data in “Full Fidelity”
Metadata  Catalog
ILM  who has access,
how long do we
“manage it”
Raw machine
data collection,
collect everything
Data is ready to be turned
into information: organized,
well defined, complete.
Agile business insight through
data-munging, machine learning,
blending with external data,
development of to-be BDW facts
Metadata  Catalog
ILM  who has access, how long do we
“manage it”
Data Quality and Monitoring 
Monitor completeness of data
Metadata  Catalog
ILM  who has access, how long to “manage it”
Data Quality and Monitoring  Monitoring of
completeness of data
Fully Data Governed ( trusted)
User community arbitrary queries and
reporting
@joe_Caserta@BizAnalyticsTT
• The Big Data movement breaks the relational database
barrier and enables analysis on massive amounts of
structured and unstructured data.
• NoSQL puts the value of SQL based relational databases
into question. This disruption is forging a new road for the
progress and advancement of scalable data analytics.
• The value of legacy Business Intelligence comes into
question.
• Rather than forcing data users to become technologists, it
must make data analysis available for the masses.
BI is About to be Disrupted!
@joe_Caserta@BizAnalyticsTT
• The role of the ‘Business Analyst’, the primary user of the
BI tool, is being replaced or by two types of data users:
1. Highly technical Data Scientists
2. Non-technical Business Persons
• New analytics (BI) platforms must be created to
accommodate the new users. We see these very discrete
users using very different technologies.
• Perhaps legacy BI tools will not go away, but the market is
absolutely about to be disrupted.
Who Does BI Today?
@joe_Caserta@BizAnalyticsTT
• Data Scientists have deep technical knowledge
• They enjoy writing code and mining data
• The best way to serve a data scientist is to provide access
to raw data and then get out of their way.
Empower the Data Scientist
@joe_Caserta@BizAnalyticsTT
What does a Data Scientist Do, Anyway?
 Searching for the data they need
 Making sense of the data
 Figuring why the data looks the way is does and assessing its validity
 Cleaning up all the garbage within the data so it represents true
business
 Combining events with Reference data to give it context
 Correlating event data with other events
 Finally, they write algorithms to perform mining, clustering and
predictive analytics – the sexy stuff.
 Writes really cool and
sophisticated algorithms that
impacts the way the business
runs.
 Much of the time of a Data
Scientist is spent:
 NOT
@joe_Caserta@BizAnalyticsTT
• Business users don’t have, and don’t want to have,
technical wherewithal to interact with ‘data’.
• “We have a business to run! Programming should be done by
people in rooms with no windows.”
• “I need information at my fingertips and I should not need a PhD in
SQL to get it.”
• “It’s a myth that BI tools will solve my problems, I still need IT to get
new reports. This is unacceptable.”
• Every business professional on the planet knows how to
search for needed information via a Google search bar.
• Business people want to be able to ‘Google’ their
corporate data for the information they need.
Empower the Business Person
@joe_Caserta@BizAnalyticsTT
The Future of BI (if the Business gets its way)…
@joe_Caserta@BizAnalyticsTT
Facets created
automatically
based on
relevant data
Navigating Data in BI…
@joe_Caserta@BizAnalyticsTT
• During normal BI
implementations, much
time is spent/wasted on
selecting the best way to
graphically represent a
set of metrics.
• We can embed
algorithms that are
statistically proven to
best represent
information depending
on the type of question
being asked.
• The user should be able
to preview and change
from the default
infographic as easy as
clicking ‘next’ on a
Yahoo! Slideshow.
Why do we make it so difficult?
@joe_Caserta@BizAnalyticsTT
Lady gaga sales by state by customer age Go!
joe@casertaconcepts.com
Region
Northeast
Midwest
South
West
Product
Records
Perfume
Clothes
Performances
Dates
2009 to 2013
DOWNLOAD
TO EXCEL
Imagine the Possibilities….
@joe_Caserta@BizAnalyticsTT
• Modern web application framework
• Developed and supported by Google
• Bootstrap used for Mobile
Angular
• JavaScript library for data visualization.
• Exposes full capability CSS3, HTML5 and SVG. Is extremely fast
• Support large datasets and dynamic behaviors for interaction
D3.js
• The “glue” that brings other components together
• The ‘engine’ that transforms search strings into queries.
• Integrated with the Customer Metadata repository
Python
• Full-text and faceted-search engine and database
• This is the backbone of the applicationSolr
• Customer Metadata repository. Stores all business rules (default
facets, etc) and user preferences (default graph types, etc)
• Cassandra may not be ultimate selection
Cassandra
• Amazon Web Services
• Queree will be a zero-footprint cloud based solution
• User experience is same as Googling info
AWS
Building the Future of BI (Hint: it’s Big Data)
@joe_Caserta@BizAnalyticsTT
Innovation is the only sustainable
competitive advantage a company can
have.
Closing Thought
Challenge the status quo!
@joe_Caserta@BizAnalyticsTT
Thank You
Joe Caserta
President, Caserta Concepts
joe@casertaconcepts.com
(914) 261-3648
@joe_Caserta

Contenu connexe

En vedette

Telematics and Big Data: Next Generation Automotive Technology
Telematics and Big Data: Next Generation Automotive TechnologyTelematics and Big Data: Next Generation Automotive Technology
Telematics and Big Data: Next Generation Automotive TechnologyHCL Technologies
 
Introduction to Big Data Analytics and Data Science
Introduction to Big Data Analytics and Data ScienceIntroduction to Big Data Analytics and Data Science
Introduction to Big Data Analytics and Data ScienceData Science Thailand
 
Global Advanced Driver Assistance Systems (ADAS) Market: Trends and Opportuni...
Global Advanced Driver Assistance Systems (ADAS) Market: Trends and Opportuni...Global Advanced Driver Assistance Systems (ADAS) Market: Trends and Opportuni...
Global Advanced Driver Assistance Systems (ADAS) Market: Trends and Opportuni...Daedal Research
 
Autonomous Vehicles: Technologies, Economics, and Opportunities
Autonomous Vehicles: Technologies, Economics, and OpportunitiesAutonomous Vehicles: Technologies, Economics, and Opportunities
Autonomous Vehicles: Technologies, Economics, and OpportunitiesJeffrey Funk
 
Sensors and Data Management for Autonomous Vehicles report 2015 by Yole Devel...
Sensors and Data Management for Autonomous Vehicles report 2015 by Yole Devel...Sensors and Data Management for Autonomous Vehicles report 2015 by Yole Devel...
Sensors and Data Management for Autonomous Vehicles report 2015 by Yole Devel...Yole Developpement
 
Big Data Architecture
Big Data ArchitectureBig Data Architecture
Big Data ArchitectureGuido Schmutz
 
(BDT310) Big Data Architectural Patterns and Best Practices on AWS
(BDT310) Big Data Architectural Patterns and Best Practices on AWS(BDT310) Big Data Architectural Patterns and Best Practices on AWS
(BDT310) Big Data Architectural Patterns and Best Practices on AWSAmazon Web Services
 
Big Data in Healthcare Made Simple: Where It Stands Today and Where It’s Going
Big Data in Healthcare Made Simple: Where It Stands Today and Where It’s GoingBig Data in Healthcare Made Simple: Where It Stands Today and Where It’s Going
Big Data in Healthcare Made Simple: Where It Stands Today and Where It’s GoingHealth Catalyst
 

En vedette (9)

Big Data Analytics for the Car of the Future
Big Data Analytics for the Car of the FutureBig Data Analytics for the Car of the Future
Big Data Analytics for the Car of the Future
 
Telematics and Big Data: Next Generation Automotive Technology
Telematics and Big Data: Next Generation Automotive TechnologyTelematics and Big Data: Next Generation Automotive Technology
Telematics and Big Data: Next Generation Automotive Technology
 
Introduction to Big Data Analytics and Data Science
Introduction to Big Data Analytics and Data ScienceIntroduction to Big Data Analytics and Data Science
Introduction to Big Data Analytics and Data Science
 
Global Advanced Driver Assistance Systems (ADAS) Market: Trends and Opportuni...
Global Advanced Driver Assistance Systems (ADAS) Market: Trends and Opportuni...Global Advanced Driver Assistance Systems (ADAS) Market: Trends and Opportuni...
Global Advanced Driver Assistance Systems (ADAS) Market: Trends and Opportuni...
 
Autonomous Vehicles: Technologies, Economics, and Opportunities
Autonomous Vehicles: Technologies, Economics, and OpportunitiesAutonomous Vehicles: Technologies, Economics, and Opportunities
Autonomous Vehicles: Technologies, Economics, and Opportunities
 
Sensors and Data Management for Autonomous Vehicles report 2015 by Yole Devel...
Sensors and Data Management for Autonomous Vehicles report 2015 by Yole Devel...Sensors and Data Management for Autonomous Vehicles report 2015 by Yole Devel...
Sensors and Data Management for Autonomous Vehicles report 2015 by Yole Devel...
 
Big Data Architecture
Big Data ArchitectureBig Data Architecture
Big Data Architecture
 
(BDT310) Big Data Architectural Patterns and Best Practices on AWS
(BDT310) Big Data Architectural Patterns and Best Practices on AWS(BDT310) Big Data Architectural Patterns and Best Practices on AWS
(BDT310) Big Data Architectural Patterns and Best Practices on AWS
 
Big Data in Healthcare Made Simple: Where It Stands Today and Where It’s Going
Big Data in Healthcare Made Simple: Where It Stands Today and Where It’s GoingBig Data in Healthcare Made Simple: Where It Stands Today and Where It’s Going
Big Data in Healthcare Made Simple: Where It Stands Today and Where It’s Going
 

Plus de Caserta

Using Machine Learning & Spark to Power Data-Driven Marketing
Using Machine Learning & Spark to Power Data-Driven MarketingUsing Machine Learning & Spark to Power Data-Driven Marketing
Using Machine Learning & Spark to Power Data-Driven MarketingCaserta
 
Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...
Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...
Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...Caserta
 
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017Caserta
 
General Data Protection Regulation - BDW Meetup, October 11th, 2017
General Data Protection Regulation - BDW Meetup, October 11th, 2017General Data Protection Regulation - BDW Meetup, October 11th, 2017
General Data Protection Regulation - BDW Meetup, October 11th, 2017Caserta
 
Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...
Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...
Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...Caserta
 
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing KeynoteArchitecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing KeynoteCaserta
 
Introduction to Data Science (Data Summit, 2017)
Introduction to Data Science (Data Summit, 2017)Introduction to Data Science (Data Summit, 2017)
Introduction to Data Science (Data Summit, 2017)Caserta
 
Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017
Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017
Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017Caserta
 
The Rise of the CDO in Today's Enterprise
The Rise of the CDO in Today's EnterpriseThe Rise of the CDO in Today's Enterprise
The Rise of the CDO in Today's EnterpriseCaserta
 
Building a New Platform for Customer Analytics
Building a New Platform for Customer Analytics Building a New Platform for Customer Analytics
Building a New Platform for Customer Analytics Caserta
 
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016Caserta
 
You're the New CDO, Now What?
You're the New CDO, Now What?You're the New CDO, Now What?
You're the New CDO, Now What?Caserta
 
The Data Lake - Balancing Data Governance and Innovation
The Data Lake - Balancing Data Governance and Innovation The Data Lake - Balancing Data Governance and Innovation
The Data Lake - Balancing Data Governance and Innovation Caserta
 
Making Big Data Easy for Everyone
Making Big Data Easy for EveryoneMaking Big Data Easy for Everyone
Making Big Data Easy for EveryoneCaserta
 
Benefits of the Azure Cloud
Benefits of the Azure CloudBenefits of the Azure Cloud
Benefits of the Azure CloudCaserta
 
Big Data Analytics on the Cloud
Big Data Analytics on the CloudBig Data Analytics on the Cloud
Big Data Analytics on the CloudCaserta
 
Intro to Data Science on Hadoop
Intro to Data Science on HadoopIntro to Data Science on Hadoop
Intro to Data Science on HadoopCaserta
 
The Emerging Role of the Data Lake
The Emerging Role of the Data LakeThe Emerging Role of the Data Lake
The Emerging Role of the Data LakeCaserta
 
Not Your Father's Database by Databricks
Not Your Father's Database by DatabricksNot Your Father's Database by Databricks
Not Your Father's Database by DatabricksCaserta
 
Mastering Customer Data on Apache Spark
Mastering Customer Data on Apache SparkMastering Customer Data on Apache Spark
Mastering Customer Data on Apache SparkCaserta
 

Plus de Caserta (20)

Using Machine Learning & Spark to Power Data-Driven Marketing
Using Machine Learning & Spark to Power Data-Driven MarketingUsing Machine Learning & Spark to Power Data-Driven Marketing
Using Machine Learning & Spark to Power Data-Driven Marketing
 
Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...
Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...
Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...
 
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017
 
General Data Protection Regulation - BDW Meetup, October 11th, 2017
General Data Protection Regulation - BDW Meetup, October 11th, 2017General Data Protection Regulation - BDW Meetup, October 11th, 2017
General Data Protection Regulation - BDW Meetup, October 11th, 2017
 
Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...
Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...
Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...
 
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing KeynoteArchitecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
 
Introduction to Data Science (Data Summit, 2017)
Introduction to Data Science (Data Summit, 2017)Introduction to Data Science (Data Summit, 2017)
Introduction to Data Science (Data Summit, 2017)
 
Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017
Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017
Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017
 
The Rise of the CDO in Today's Enterprise
The Rise of the CDO in Today's EnterpriseThe Rise of the CDO in Today's Enterprise
The Rise of the CDO in Today's Enterprise
 
Building a New Platform for Customer Analytics
Building a New Platform for Customer Analytics Building a New Platform for Customer Analytics
Building a New Platform for Customer Analytics
 
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016
 
You're the New CDO, Now What?
You're the New CDO, Now What?You're the New CDO, Now What?
You're the New CDO, Now What?
 
The Data Lake - Balancing Data Governance and Innovation
The Data Lake - Balancing Data Governance and Innovation The Data Lake - Balancing Data Governance and Innovation
The Data Lake - Balancing Data Governance and Innovation
 
Making Big Data Easy for Everyone
Making Big Data Easy for EveryoneMaking Big Data Easy for Everyone
Making Big Data Easy for Everyone
 
Benefits of the Azure Cloud
Benefits of the Azure CloudBenefits of the Azure Cloud
Benefits of the Azure Cloud
 
Big Data Analytics on the Cloud
Big Data Analytics on the CloudBig Data Analytics on the Cloud
Big Data Analytics on the Cloud
 
Intro to Data Science on Hadoop
Intro to Data Science on HadoopIntro to Data Science on Hadoop
Intro to Data Science on Hadoop
 
The Emerging Role of the Data Lake
The Emerging Role of the Data LakeThe Emerging Role of the Data Lake
The Emerging Role of the Data Lake
 
Not Your Father's Database by Databricks
Not Your Father's Database by DatabricksNot Your Father's Database by Databricks
Not Your Father's Database by Databricks
 
Mastering Customer Data on Apache Spark
Mastering Customer Data on Apache SparkMastering Customer Data on Apache Spark
Mastering Customer Data on Apache Spark
 

Dernier

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 

Dernier (20)

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 

Architecting for Big Data: Trends, Tips, and Deployment Options

  • 1. @joe_Caserta@BizAnalyticsTT Architecting for Big Data: Trends, Tips, and Deployment Options Joe Caserta President Caserta Concepts New York City
  • 2. @joe_Caserta@BizAnalyticsTT Top 20 Big Data Consulting - CIO Review Joe Caserta Timeline Launched Big Data practice Co-author, with Ralph Kimball, The Data Warehouse ETL Toolkit (Wiley) Dedicated to Data Warehousing, Business Intelligence since 1996 Began consulting database programing and data modeling 25+ years hands-on experience building database solutions Founded Caserta Concepts in NYC Web log analytics solution published in Intelligent Enterprise Formalized Alliances / Partnerships – System Integrators Partnered with Big Data vendors Cloudera, Hortonworks, IBM, Cisco, Datameer, Basho more… Launched Training practice, teaching data concepts world-wide Laser focus on extending Data Warehouses with Big Data solutions 1986 2004 1996 2009 2001 2010 2013 Launched Big Data Warehousing Meetup in NYC ~ 1,500 Members 2012 2014 Established best practices for big data ecosystem implementation – Healthcare, Finance, Insurance Top 20 Most Powerful Big Data consulting firms Dedicated to Data Governance Techniques on Big Data (Innovation)
  • 3. @joe_Caserta@BizAnalyticsTT About Caserta Concepts • Technology services company with expertise in data analysis: • Big Data Solutions • Data Warehousing • Business Intelligence • Core focus in the following industries: • eCommerce / Retail / Digital Marketing • Financial Services / Insurance • Healthcare / Higher Education • Established in 2001: • Increased growth year-over-year • Industry recognized work force • Strategy, Implementation, Analytics • Writing, Education, Mentoring • Data Science & Analytics • Cloud Computing • Data Interaction & Visualization
  • 4. @joe_Caserta@BizAnalyticsTT Sales Marketing Finance ETL Data Exploration Horizontally Scalable Environment - Optimized for Analytics Big Data Cluster Big Data Analytics NoSQL Databases ETL Ad-Hoc/Canned Reporting Traditional BI Spark MapReduce Pig/Hive N1 N2 N4N3 N5 Hadoop Distributed File System (HDFS) Others… The Evolution of Enterprise Data? Data Science Enterprise Data Warehouse ETL
  • 5. @joe_Caserta@BizAnalyticsTT Tools and Technologies Best Practices Data Warehousing/ ETL/Data Integration BI/Visualization/ Analytics Big Data Analytics
  • 7. @joe_Caserta@BizAnalyticsTT The one’s you need to know…. Hadoop Distribution: Cloudera, Hortonworks, MapR, Pivotal-HD, IBM  Tools:  Hive: Map data to structures and use SQL-like queries  Pig: Data transformation language for big data  Sqoop: Extracts external sources and loads Hadoop  Spark: General-purpose cluster computing framework  Storm: Real-time ETL  NoSQL:  Document: MongoDB, CouchDB  Graph: Neo4j, Titan  Key Value: Riak, Redis  Columnar: Cassandra, Hbase  Search: Lucene, Solr, ElasticSearch  Languages: Python, SciPy, Java, R, Scala
  • 8. @joe_Caserta@BizAnalyticsTT Advertising Real time interactive queries on massive audience datasets in the cloud 360 o Customer Cross-channel customer linking to improve the customer experience and increase sales Why are we Changing? Recommendation Engines “You chose… you might also like…” Real-Time Aggregation, Monitoring & Alerting on events at extremely high message rates… ~1M msgs/sec Big Data Warehouse Extending EDW with Hadoop Governing data from the “lake” to the EDW Personal/Commercial Banking Investment/Trading Bank Quick Service Restaurant (QSR) Cable Television Audience-based Advertising
  • 9. @joe_Caserta@BizAnalyticsTT The Big Data Pyramid  Hadoop has different demands at each tier.  Only top tier of the is fully governed and ready for Enterprise BI Big Data Warehouse Data Science Workspace Data Lake – Integrated Sandbox Landing Area – Source Data in “Full Fidelity” Metadata  Catalog ILM  who has access, how long do we “manage it” Raw machine data collection, collect everything Data is ready to be turned into information: organized, well defined, complete. Agile business insight through data-munging, machine learning, blending with external data, development of to-be BDW facts Metadata  Catalog ILM  who has access, how long do we “manage it” Data Quality and Monitoring  Monitor completeness of data Metadata  Catalog ILM  who has access, how long to “manage it” Data Quality and Monitoring  Monitoring of completeness of data Fully Data Governed ( trusted) User community arbitrary queries and reporting
  • 10. @joe_Caserta@BizAnalyticsTT • The Big Data movement breaks the relational database barrier and enables analysis on massive amounts of structured and unstructured data. • NoSQL puts the value of SQL based relational databases into question. This disruption is forging a new road for the progress and advancement of scalable data analytics. • The value of legacy Business Intelligence comes into question. • Rather than forcing data users to become technologists, it must make data analysis available for the masses. BI is About to be Disrupted!
  • 11. @joe_Caserta@BizAnalyticsTT • The role of the ‘Business Analyst’, the primary user of the BI tool, is being replaced or by two types of data users: 1. Highly technical Data Scientists 2. Non-technical Business Persons • New analytics (BI) platforms must be created to accommodate the new users. We see these very discrete users using very different technologies. • Perhaps legacy BI tools will not go away, but the market is absolutely about to be disrupted. Who Does BI Today?
  • 12. @joe_Caserta@BizAnalyticsTT • Data Scientists have deep technical knowledge • They enjoy writing code and mining data • The best way to serve a data scientist is to provide access to raw data and then get out of their way. Empower the Data Scientist
  • 13. @joe_Caserta@BizAnalyticsTT What does a Data Scientist Do, Anyway?  Searching for the data they need  Making sense of the data  Figuring why the data looks the way is does and assessing its validity  Cleaning up all the garbage within the data so it represents true business  Combining events with Reference data to give it context  Correlating event data with other events  Finally, they write algorithms to perform mining, clustering and predictive analytics – the sexy stuff.  Writes really cool and sophisticated algorithms that impacts the way the business runs.  Much of the time of a Data Scientist is spent:  NOT
  • 14. @joe_Caserta@BizAnalyticsTT • Business users don’t have, and don’t want to have, technical wherewithal to interact with ‘data’. • “We have a business to run! Programming should be done by people in rooms with no windows.” • “I need information at my fingertips and I should not need a PhD in SQL to get it.” • “It’s a myth that BI tools will solve my problems, I still need IT to get new reports. This is unacceptable.” • Every business professional on the planet knows how to search for needed information via a Google search bar. • Business people want to be able to ‘Google’ their corporate data for the information they need. Empower the Business Person
  • 15. @joe_Caserta@BizAnalyticsTT The Future of BI (if the Business gets its way)…
  • 17. @joe_Caserta@BizAnalyticsTT • During normal BI implementations, much time is spent/wasted on selecting the best way to graphically represent a set of metrics. • We can embed algorithms that are statistically proven to best represent information depending on the type of question being asked. • The user should be able to preview and change from the default infographic as easy as clicking ‘next’ on a Yahoo! Slideshow. Why do we make it so difficult?
  • 18. @joe_Caserta@BizAnalyticsTT Lady gaga sales by state by customer age Go! joe@casertaconcepts.com Region Northeast Midwest South West Product Records Perfume Clothes Performances Dates 2009 to 2013 DOWNLOAD TO EXCEL Imagine the Possibilities….
  • 19. @joe_Caserta@BizAnalyticsTT • Modern web application framework • Developed and supported by Google • Bootstrap used for Mobile Angular • JavaScript library for data visualization. • Exposes full capability CSS3, HTML5 and SVG. Is extremely fast • Support large datasets and dynamic behaviors for interaction D3.js • The “glue” that brings other components together • The ‘engine’ that transforms search strings into queries. • Integrated with the Customer Metadata repository Python • Full-text and faceted-search engine and database • This is the backbone of the applicationSolr • Customer Metadata repository. Stores all business rules (default facets, etc) and user preferences (default graph types, etc) • Cassandra may not be ultimate selection Cassandra • Amazon Web Services • Queree will be a zero-footprint cloud based solution • User experience is same as Googling info AWS Building the Future of BI (Hint: it’s Big Data)
  • 20. @joe_Caserta@BizAnalyticsTT Innovation is the only sustainable competitive advantage a company can have. Closing Thought Challenge the status quo!
  • 21. @joe_Caserta@BizAnalyticsTT Thank You Joe Caserta President, Caserta Concepts joe@casertaconcepts.com (914) 261-3648 @joe_Caserta