SlideShare a Scribd company logo
1 of 34
Introduction to Big Data
                   An analogy between Sugar Cane & Big Data




Image Source: alternative-energy-fuels.com                                    Image Source: MicFarris.com




                                             Jean-Marc Desvaux – March 2012
Session Abstract :

What is Big Data ? Where does it apply ?
What are the technologies behind it ?
Is it going to replace your RDBMS ? …
Big data, It’s all Silicon Valley is talking about. It’s
the new buzz word after ‘cloud.’


“Everybody is speaking of it and many are
convinced it is the only way forward. As always,
such dramatic statements are not only dangerous
but serve to put some people off the concept. “
Source: Tom Kyte’s Big Data Are you ready ? presentation
What is Big Data ?
Big Data is data that exceeds the processing
capacity of conventional database systems.

It’s too big, too fast or does not fit the
structures of database architectures.
To gain value from this type of data you need
an alternative way to process it.

Why this is happening ?
Data is growing faster than computers are
getting bigger.
A catch-all term.
Includes Social Networks data, Web logs, MP3s,
Web pages unstructured content, XML, GPS
tracking data, Vehicles Telemetry, financial market
data and many more…

Can be characterized by the 3 Vs :-




                                   Image Source: Tom Kyte’s Big Data Are you ready ? presentation
Volume
Data growing faster than machines getting
bigger.
Data sources adding up..

Velocity
Rate of acquisition and desired rate of
consumption.


Variety
Extends beyond structured data, includes
unstructured data of all varieties.

                  Image Source: Tom Kyte’s Big Data Are you ready ? presentation
Where does Big Data apply?
Big Data value to an Organisation falls into two
main categories :


            Analytical Use


            Enabling new products
            and services
Analytical Use
To reveal insights previously hidden because
hard to record and exploit.
An edge on classic Analytics based on
sampling and more “static” &
predetermined reports.
It promotes an investigative approach to
data and put the data scientist and analyst
in the spotlight.

Hal Varian, chief economist at Google
“I keep saying that the sexy job in the next 10 years
will be statisticians”
Some terms linked to the Analytical Use of Big Data


                            Sentiment Analysis :
Mining the Web in real time and getting a quick read of what people are thinking.


          Named-entity recognition (NER) (also known as entity
 identification and entity extraction) is a subtask of information extraction that
  seeks to locate and classify atomic elements in text into predefined categories
   such as the names of persons, organizations, locations, expressions of times,
    quantities, monetary values, percentages, etc.(ex: Big B in a tweet is for Big
                            Brother or Amitabh Bachan)
Product/Service Enabler

Some products and services cannot exist if not
backed up by Big Data technologies:
-Need to Scale
-Need a fast Feedback Loop on complex
analytics.

Highly successful Web startups pioneering Big
Data technologies through R&D to enable new
type of products are a good example:
Google, Yahoo, Amazon,Facebook.
Sectors with Fast Adoption and High Potential

              Financial Sector
            Telecommunications
                Government
                  Health
                   Retail
Big Data Sources :
Internal &
Data Marketplaces.
Internal sources

             Time Attendance logs
                RFID sensors logs
                  Security Logs
             Vehicles GPS tracking
           Machinery/Telemetry Logs
                Pictures & videos
           Enterprise Social Networks
           Service Forum/Discussions
                       ….

Mostly anything unstructured or simply structured
External Sources (feeders/data marketplaces)
Examples: Infochimps.com, DataSift.com, datamarket.azure.com




                                                Source: DataSift.com
An Enterprise Architecture for Big
              Data
 An analogy with a Sugar Cane Factory
SUGAR CANE FIELDS        A Sugar Factory
AQUIRE (HARVEST)




EXTRACT/SCHRED




EVAPORATE/DISTILL/BOIL




 DRY/STORE/SUGAR



   BOTTOM LINE              = VALUE
DATA SOURCES
     (RDBMS &
                        An Enterprise Big Data Factory
 Data Marketplaces)




 AQUIRE (HARVEST)
                                 HDFS                  NoSQL Database                  RDBMS
                         (Hadoop Distributed FS)      (Hadoop Distributed FS)    Enterprise Applications



ORGANIZE(EXTRACT)           Map Reduce                     Big Data                    RDBMS
                             (Hadoop)                     Connectors                 Connectors


      ANALYSE
                                           Data Warehousing / RDBMS stores
(SCHRED/DISTILL/BOIL)


     BUSINESS                                        Analytic Applications
   INTELLIGENCE                                    the sweet part (sugar/rhum)
     (DECIDE)

    BOTTOM LINE                                       = VALUE
Some Factories & architectures
        from vendors
Greenplum (EMC2)
An Example of a Turnkey Factory Solution
Another “Turnkey Factory” Example from Oracle
            Targeting high-end Analytics




AQUIRE (HARVEST)     ORGANIZE(EXTRACT)                        BUSINESS
                          ANALYSE                           INTELLIGENCE
ORGANIZE(EXTRACT)   (SCHRED/DISTILL/BOIL)                     (DECIDE)

                                        Image Source: Tom Kyte’s Big Data Are you ready ? presentation
The Microsoft way




+ Of Course, you can build your own factory using
 OpenSource widely available and on which most
            turnkey factory are built.
Technologies behind Big Data
Factory blocks & screws used for engineering
                  solutions
NoSQL will kill SQL ?!
Turning RDBMS to a legacy data store ?

Not at all.

We need RDBMS to store high value data and for its
feature rich approach (feature first).

NoSQL (scale first) is not a superset of RDBMS
technologies (a bit like Einstein Relativity to Newton
Physics).

Remember NoSQL is not “No SQL” but “Not Only SQL”
Big Data future
Rise of Data Marketplaces
Data Science tools development:
More powerful & expressive toolsets for analysis
Streaming Data processing emerging tools
(Twitter Storm, Yahoo s4, Streambase) :Real-time enablement / Live BI

Further cloud-enablement
Ease of integration to Enterprise Sources
Conclusion
To leverage Big Data you need something like a Sugar
Factory.
It can be very entry level factory (Excel – Azure Source)
or more complex.
The more complex and complete the more value at the
end of the processing chain

To turn Big Data technologies from developer-centric
solutions to enterprise solutions, they must be
combined with SQL solutions into a single proven
infrastructure meeting manageability and security
requirements of enterprises.
The challenge for Enterprises is to simplify Big Data
integration/engineering and leverage it where possible
to improve their processes at tactical and strategic
levels.

Architects & DBAs will be able to make choices for
datastores technologies and will need to understand
where one is better than the other.

Big Data has to be part of the Enterprise Applications
EcoSystem where it will be turned to value.
Thank you.

More Related Content

What's hot

Sudhir hadoop and Data warehousing resume
Sudhir hadoop and Data warehousing resume Sudhir hadoop and Data warehousing resume
Sudhir hadoop and Data warehousing resume Sudhir Saxena
 
Data Science Operationalization: The Journey of Enterprise AI
Data Science Operationalization: The Journey of Enterprise AIData Science Operationalization: The Journey of Enterprise AI
Data Science Operationalization: The Journey of Enterprise AIDenodo
 
DataStax GeekNet Webinar - Apache Cassandra: Enterprise NoSQL
DataStax GeekNet Webinar - Apache Cassandra: Enterprise NoSQLDataStax GeekNet Webinar - Apache Cassandra: Enterprise NoSQL
DataStax GeekNet Webinar - Apache Cassandra: Enterprise NoSQLDataStax
 
Building the Enterprise Data Lake - Important Considerations Before You Jump In
Building the Enterprise Data Lake - Important Considerations Before You Jump InBuilding the Enterprise Data Lake - Important Considerations Before You Jump In
Building the Enterprise Data Lake - Important Considerations Before You Jump InSnapLogic
 
Big data data lake and beyond
Big data data lake and beyond Big data data lake and beyond
Big data data lake and beyond Rajesh Kumar
 
Improve your Tech Quotient
Improve your Tech QuotientImprove your Tech Quotient
Improve your Tech QuotientTarence DSouza
 
Balance agility and governance with #TrueDataOps and The Data Cloud
Balance agility and governance with #TrueDataOps and The Data CloudBalance agility and governance with #TrueDataOps and The Data Cloud
Balance agility and governance with #TrueDataOps and The Data CloudKent Graziano
 
Big Data & Oracle Technologies
Big Data & Oracle TechnologiesBig Data & Oracle Technologies
Big Data & Oracle TechnologiesOleksii Movchaniuk
 
Cloud Computing Big Data Is Future Of It
Cloud Computing Big  Data Is Future Of ItCloud Computing Big  Data Is Future Of It
Cloud Computing Big Data Is Future Of ItAman Ghei
 
Big data - what, why, where, when and how
Big data - what, why, where, when and howBig data - what, why, where, when and how
Big data - what, why, where, when and howbobosenthil
 
Introduction to Bigdata and NoSQL
Introduction to Bigdata and NoSQLIntroduction to Bigdata and NoSQL
Introduction to Bigdata and NoSQLTushar Shende
 
Data virtualization
Data virtualizationData virtualization
Data virtualizationHamed Hatami
 
Microsof azure class 1- intro
Microsof azure   class 1- introMicrosof azure   class 1- intro
Microsof azure class 1- introMHMuhammadAli1
 
How to Crunch Petabytes with Hadoop and Big Data using InfoSphere BigInsights...
How to Crunch Petabytes with Hadoop and Big Data using InfoSphere BigInsights...How to Crunch Petabytes with Hadoop and Big Data using InfoSphere BigInsights...
How to Crunch Petabytes with Hadoop and Big Data using InfoSphere BigInsights...Vladimir Bacvanski, PhD
 
The Evolution of Metadata: LinkedIn's Story [Strata NYC 2019]
The Evolution of Metadata: LinkedIn's Story [Strata NYC 2019]The Evolution of Metadata: LinkedIn's Story [Strata NYC 2019]
The Evolution of Metadata: LinkedIn's Story [Strata NYC 2019]Shirshanka Das
 
Modern Data Platforms
Modern Data Platforms Modern Data Platforms
Modern Data Platforms Arne Roßmann
 
Study notes for CompTIA Certified Advanced Security Practitioner
Study notes for CompTIA Certified Advanced Security PractitionerStudy notes for CompTIA Certified Advanced Security Practitioner
Study notes for CompTIA Certified Advanced Security PractitionerDavid Sweigert
 
Database revolution opening webcast 01 18-12
Database revolution opening webcast 01 18-12Database revolution opening webcast 01 18-12
Database revolution opening webcast 01 18-12mark madsen
 

What's hot (20)

Sudhir hadoop and Data warehousing resume
Sudhir hadoop and Data warehousing resume Sudhir hadoop and Data warehousing resume
Sudhir hadoop and Data warehousing resume
 
Data Science Operationalization: The Journey of Enterprise AI
Data Science Operationalization: The Journey of Enterprise AIData Science Operationalization: The Journey of Enterprise AI
Data Science Operationalization: The Journey of Enterprise AI
 
Data lake ppt
Data lake pptData lake ppt
Data lake ppt
 
DataStax GeekNet Webinar - Apache Cassandra: Enterprise NoSQL
DataStax GeekNet Webinar - Apache Cassandra: Enterprise NoSQLDataStax GeekNet Webinar - Apache Cassandra: Enterprise NoSQL
DataStax GeekNet Webinar - Apache Cassandra: Enterprise NoSQL
 
Building the Enterprise Data Lake - Important Considerations Before You Jump In
Building the Enterprise Data Lake - Important Considerations Before You Jump InBuilding the Enterprise Data Lake - Important Considerations Before You Jump In
Building the Enterprise Data Lake - Important Considerations Before You Jump In
 
Big data data lake and beyond
Big data data lake and beyond Big data data lake and beyond
Big data data lake and beyond
 
Improve your Tech Quotient
Improve your Tech QuotientImprove your Tech Quotient
Improve your Tech Quotient
 
Balance agility and governance with #TrueDataOps and The Data Cloud
Balance agility and governance with #TrueDataOps and The Data CloudBalance agility and governance with #TrueDataOps and The Data Cloud
Balance agility and governance with #TrueDataOps and The Data Cloud
 
Big Data & Oracle Technologies
Big Data & Oracle TechnologiesBig Data & Oracle Technologies
Big Data & Oracle Technologies
 
Cloud Computing Big Data Is Future Of It
Cloud Computing Big  Data Is Future Of ItCloud Computing Big  Data Is Future Of It
Cloud Computing Big Data Is Future Of It
 
Big data - what, why, where, when and how
Big data - what, why, where, when and howBig data - what, why, where, when and how
Big data - what, why, where, when and how
 
Introduction to Bigdata and NoSQL
Introduction to Bigdata and NoSQLIntroduction to Bigdata and NoSQL
Introduction to Bigdata and NoSQL
 
Data virtualization
Data virtualizationData virtualization
Data virtualization
 
Microsof azure class 1- intro
Microsof azure   class 1- introMicrosof azure   class 1- intro
Microsof azure class 1- intro
 
How to Crunch Petabytes with Hadoop and Big Data using InfoSphere BigInsights...
How to Crunch Petabytes with Hadoop and Big Data using InfoSphere BigInsights...How to Crunch Petabytes with Hadoop and Big Data using InfoSphere BigInsights...
How to Crunch Petabytes with Hadoop and Big Data using InfoSphere BigInsights...
 
The Evolution of Metadata: LinkedIn's Story [Strata NYC 2019]
The Evolution of Metadata: LinkedIn's Story [Strata NYC 2019]The Evolution of Metadata: LinkedIn's Story [Strata NYC 2019]
The Evolution of Metadata: LinkedIn's Story [Strata NYC 2019]
 
Modern Data Platforms
Modern Data Platforms Modern Data Platforms
Modern Data Platforms
 
Hadoop dev 01
Hadoop dev 01Hadoop dev 01
Hadoop dev 01
 
Study notes for CompTIA Certified Advanced Security Practitioner
Study notes for CompTIA Certified Advanced Security PractitionerStudy notes for CompTIA Certified Advanced Security Practitioner
Study notes for CompTIA Certified Advanced Security Practitioner
 
Database revolution opening webcast 01 18-12
Database revolution opening webcast 01 18-12Database revolution opening webcast 01 18-12
Database revolution opening webcast 01 18-12
 

Viewers also liked

U.S. Cane Sugar Market. Analysis And Forecast to 2020
U.S. Cane Sugar Market. Analysis And Forecast to 2020U.S. Cane Sugar Market. Analysis And Forecast to 2020
U.S. Cane Sugar Market. Analysis And Forecast to 2020IndexBox Marketing
 
monsanto 12-01-08
monsanto 12-01-08monsanto 12-01-08
monsanto 12-01-08finance28
 
monsanto_rd_platform_aquisition
monsanto_rd_platform_aquisitionmonsanto_rd_platform_aquisition
monsanto_rd_platform_aquisitionfinance28
 
Marketing Strategy - Daurala Sugar Works
Marketing Strategy - Daurala Sugar WorksMarketing Strategy - Daurala Sugar Works
Marketing Strategy - Daurala Sugar WorksSharad Srivastava
 
Digital Business Models 101
Digital Business Models 101Digital Business Models 101
Digital Business Models 101Willy Braun
 
Sugarcane cultivation
Sugarcane cultivationSugarcane cultivation
Sugarcane cultivationsugarmills
 
constraints in sugarcane production and strategies to overcome
constraints in sugarcane production and strategies to overcomeconstraints in sugarcane production and strategies to overcome
constraints in sugarcane production and strategies to overcomeSameera Deshan
 
Lean Canvas Process and Examples
Lean Canvas Process and ExamplesLean Canvas Process and Examples
Lean Canvas Process and Examplesde-pe
 
Business Model Canvas
Business Model CanvasBusiness Model Canvas
Business Model Canvassvanebjerg
 
Business Model Canvas 101
Business Model Canvas 101Business Model Canvas 101
Business Model Canvas 101Emad Saif
 

Viewers also liked (13)

U.S. Cane Sugar Market. Analysis And Forecast to 2020
U.S. Cane Sugar Market. Analysis And Forecast to 2020U.S. Cane Sugar Market. Analysis And Forecast to 2020
U.S. Cane Sugar Market. Analysis And Forecast to 2020
 
monsanto 12-01-08
monsanto 12-01-08monsanto 12-01-08
monsanto 12-01-08
 
monsanto_rd_platform_aquisition
monsanto_rd_platform_aquisitionmonsanto_rd_platform_aquisition
monsanto_rd_platform_aquisition
 
Marketing Strategy - Daurala Sugar Works
Marketing Strategy - Daurala Sugar WorksMarketing Strategy - Daurala Sugar Works
Marketing Strategy - Daurala Sugar Works
 
sugarcane pests
sugarcane pests sugarcane pests
sugarcane pests
 
Digital Business Models 101
Digital Business Models 101Digital Business Models 101
Digital Business Models 101
 
Canvas examples
Canvas examplesCanvas examples
Canvas examples
 
Sugarcane cultivation
Sugarcane cultivationSugarcane cultivation
Sugarcane cultivation
 
Sugarcane crop-ebook
Sugarcane crop-ebookSugarcane crop-ebook
Sugarcane crop-ebook
 
constraints in sugarcane production and strategies to overcome
constraints in sugarcane production and strategies to overcomeconstraints in sugarcane production and strategies to overcome
constraints in sugarcane production and strategies to overcome
 
Lean Canvas Process and Examples
Lean Canvas Process and ExamplesLean Canvas Process and Examples
Lean Canvas Process and Examples
 
Business Model Canvas
Business Model CanvasBusiness Model Canvas
Business Model Canvas
 
Business Model Canvas 101
Business Model Canvas 101Business Model Canvas 101
Business Model Canvas 101
 

Similar to Introduction to Big Data - An Overview of Big Data Concepts and Technologies

Big Data PPT by Rohit Dubey
Big Data PPT by Rohit DubeyBig Data PPT by Rohit Dubey
Big Data PPT by Rohit DubeyRohit Dubey
 
Big Data
Big DataBig Data
Big DataNGDATA
 
Hadoop for beginners free course ppt
Hadoop for beginners   free course pptHadoop for beginners   free course ppt
Hadoop for beginners free course pptNjain85
 
Big Data Session 1.pptx
Big Data Session 1.pptxBig Data Session 1.pptx
Big Data Session 1.pptxElsonPaul2
 
Deutsche Telekom on Big Data
Deutsche Telekom on Big DataDeutsche Telekom on Big Data
Deutsche Telekom on Big DataDataWorks Summit
 
The book of elephant tattoo
The book of elephant tattooThe book of elephant tattoo
The book of elephant tattooMohamed Magdy
 
Big data peresintaion
Big data peresintaion Big data peresintaion
Big data peresintaion ahmed alshikh
 
Big Data Performance and Capacity Management
Big Data Performance and Capacity ManagementBig Data Performance and Capacity Management
Big Data Performance and Capacity Managementrightsize
 
How Big Data ,Cloud Computing ,Data Science can help business
How Big Data ,Cloud Computing ,Data Science can help businessHow Big Data ,Cloud Computing ,Data Science can help business
How Big Data ,Cloud Computing ,Data Science can help businessAjay Ohri
 
02 a holistic approach to big data
02 a holistic approach to big data02 a holistic approach to big data
02 a holistic approach to big dataRaul Chong
 
Big Data Driven Solutions to Combat Covid' 19
Big Data Driven Solutions to Combat Covid' 19Big Data Driven Solutions to Combat Covid' 19
Big Data Driven Solutions to Combat Covid' 19Prof.Balakrishnan S
 
Big Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture CapabilitiesBig Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture CapabilitiesAshraf Uddin
 
How to tackle big data from a security
How to tackle big data from a securityHow to tackle big data from a security
How to tackle big data from a securityTyrone Systems
 
Fast Data Strategy Houston Roadshow Presentation
Fast Data Strategy Houston Roadshow PresentationFast Data Strategy Houston Roadshow Presentation
Fast Data Strategy Houston Roadshow PresentationDenodo
 
Hadoop India Summit, Feb 2011 - Informatica
Hadoop India Summit, Feb 2011 - InformaticaHadoop India Summit, Feb 2011 - Informatica
Hadoop India Summit, Feb 2011 - InformaticaSanjeev Kumar
 

Similar to Introduction to Big Data - An Overview of Big Data Concepts and Technologies (20)

Big Data PPT by Rohit Dubey
Big Data PPT by Rohit DubeyBig Data PPT by Rohit Dubey
Big Data PPT by Rohit Dubey
 
Big Data
Big DataBig Data
Big Data
 
Big data analysis concepts and references
Big data analysis concepts and referencesBig data analysis concepts and references
Big data analysis concepts and references
 
Hadoop for beginners free course ppt
Hadoop for beginners   free course pptHadoop for beginners   free course ppt
Hadoop for beginners free course ppt
 
A Big Data Concept
A Big Data ConceptA Big Data Concept
A Big Data Concept
 
Big Data Session 1.pptx
Big Data Session 1.pptxBig Data Session 1.pptx
Big Data Session 1.pptx
 
Deutsche Telekom on Big Data
Deutsche Telekom on Big DataDeutsche Telekom on Big Data
Deutsche Telekom on Big Data
 
The book of elephant tattoo
The book of elephant tattooThe book of elephant tattoo
The book of elephant tattoo
 
Big data peresintaion
Big data peresintaion Big data peresintaion
Big data peresintaion
 
Big Data: an introduction
Big Data: an introductionBig Data: an introduction
Big Data: an introduction
 
Big Data: hype or necessity?
Big Data: hype or necessity?Big Data: hype or necessity?
Big Data: hype or necessity?
 
Big Data Performance and Capacity Management
Big Data Performance and Capacity ManagementBig Data Performance and Capacity Management
Big Data Performance and Capacity Management
 
How Big Data ,Cloud Computing ,Data Science can help business
How Big Data ,Cloud Computing ,Data Science can help businessHow Big Data ,Cloud Computing ,Data Science can help business
How Big Data ,Cloud Computing ,Data Science can help business
 
02 a holistic approach to big data
02 a holistic approach to big data02 a holistic approach to big data
02 a holistic approach to big data
 
Big Data Driven Solutions to Combat Covid' 19
Big Data Driven Solutions to Combat Covid' 19Big Data Driven Solutions to Combat Covid' 19
Big Data Driven Solutions to Combat Covid' 19
 
Big Data
Big DataBig Data
Big Data
 
Big Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture CapabilitiesBig Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture Capabilities
 
How to tackle big data from a security
How to tackle big data from a securityHow to tackle big data from a security
How to tackle big data from a security
 
Fast Data Strategy Houston Roadshow Presentation
Fast Data Strategy Houston Roadshow PresentationFast Data Strategy Houston Roadshow Presentation
Fast Data Strategy Houston Roadshow Presentation
 
Hadoop India Summit, Feb 2011 - Informatica
Hadoop India Summit, Feb 2011 - InformaticaHadoop India Summit, Feb 2011 - Informatica
Hadoop India Summit, Feb 2011 - Informatica
 

Recently uploaded

WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 

Recently uploaded (20)

WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 

Introduction to Big Data - An Overview of Big Data Concepts and Technologies

  • 1. Introduction to Big Data An analogy between Sugar Cane & Big Data Image Source: alternative-energy-fuels.com Image Source: MicFarris.com Jean-Marc Desvaux – March 2012
  • 2. Session Abstract : What is Big Data ? Where does it apply ? What are the technologies behind it ? Is it going to replace your RDBMS ? …
  • 3. Big data, It’s all Silicon Valley is talking about. It’s the new buzz word after ‘cloud.’ “Everybody is speaking of it and many are convinced it is the only way forward. As always, such dramatic statements are not only dangerous but serve to put some people off the concept. “
  • 4. Source: Tom Kyte’s Big Data Are you ready ? presentation
  • 5. What is Big Data ?
  • 6. Big Data is data that exceeds the processing capacity of conventional database systems. It’s too big, too fast or does not fit the structures of database architectures. To gain value from this type of data you need an alternative way to process it. Why this is happening ? Data is growing faster than computers are getting bigger.
  • 7. A catch-all term. Includes Social Networks data, Web logs, MP3s, Web pages unstructured content, XML, GPS tracking data, Vehicles Telemetry, financial market data and many more… Can be characterized by the 3 Vs :- Image Source: Tom Kyte’s Big Data Are you ready ? presentation
  • 8. Volume Data growing faster than machines getting bigger. Data sources adding up.. Velocity Rate of acquisition and desired rate of consumption. Variety Extends beyond structured data, includes unstructured data of all varieties. Image Source: Tom Kyte’s Big Data Are you ready ? presentation
  • 9. Where does Big Data apply?
  • 10. Big Data value to an Organisation falls into two main categories : Analytical Use Enabling new products and services
  • 11. Analytical Use To reveal insights previously hidden because hard to record and exploit. An edge on classic Analytics based on sampling and more “static” & predetermined reports. It promotes an investigative approach to data and put the data scientist and analyst in the spotlight. Hal Varian, chief economist at Google “I keep saying that the sexy job in the next 10 years will be statisticians”
  • 12. Some terms linked to the Analytical Use of Big Data Sentiment Analysis : Mining the Web in real time and getting a quick read of what people are thinking. Named-entity recognition (NER) (also known as entity identification and entity extraction) is a subtask of information extraction that seeks to locate and classify atomic elements in text into predefined categories such as the names of persons, organizations, locations, expressions of times, quantities, monetary values, percentages, etc.(ex: Big B in a tweet is for Big Brother or Amitabh Bachan)
  • 13. Product/Service Enabler Some products and services cannot exist if not backed up by Big Data technologies: -Need to Scale -Need a fast Feedback Loop on complex analytics. Highly successful Web startups pioneering Big Data technologies through R&D to enable new type of products are a good example: Google, Yahoo, Amazon,Facebook.
  • 14. Sectors with Fast Adoption and High Potential Financial Sector Telecommunications Government Health Retail
  • 15. Big Data Sources : Internal & Data Marketplaces.
  • 16. Internal sources Time Attendance logs RFID sensors logs Security Logs Vehicles GPS tracking Machinery/Telemetry Logs Pictures & videos Enterprise Social Networks Service Forum/Discussions …. Mostly anything unstructured or simply structured
  • 17. External Sources (feeders/data marketplaces) Examples: Infochimps.com, DataSift.com, datamarket.azure.com Source: DataSift.com
  • 18. An Enterprise Architecture for Big Data An analogy with a Sugar Cane Factory
  • 19. SUGAR CANE FIELDS A Sugar Factory AQUIRE (HARVEST) EXTRACT/SCHRED EVAPORATE/DISTILL/BOIL DRY/STORE/SUGAR BOTTOM LINE = VALUE
  • 20. DATA SOURCES (RDBMS & An Enterprise Big Data Factory Data Marketplaces) AQUIRE (HARVEST) HDFS NoSQL Database RDBMS (Hadoop Distributed FS) (Hadoop Distributed FS) Enterprise Applications ORGANIZE(EXTRACT) Map Reduce Big Data RDBMS (Hadoop) Connectors Connectors ANALYSE Data Warehousing / RDBMS stores (SCHRED/DISTILL/BOIL) BUSINESS Analytic Applications INTELLIGENCE the sweet part (sugar/rhum) (DECIDE) BOTTOM LINE = VALUE
  • 21. Some Factories & architectures from vendors
  • 22. Greenplum (EMC2) An Example of a Turnkey Factory Solution
  • 23. Another “Turnkey Factory” Example from Oracle Targeting high-end Analytics AQUIRE (HARVEST) ORGANIZE(EXTRACT) BUSINESS ANALYSE INTELLIGENCE ORGANIZE(EXTRACT) (SCHRED/DISTILL/BOIL) (DECIDE) Image Source: Tom Kyte’s Big Data Are you ready ? presentation
  • 24. The Microsoft way + Of Course, you can build your own factory using OpenSource widely available and on which most turnkey factory are built.
  • 26. Factory blocks & screws used for engineering solutions
  • 27. NoSQL will kill SQL ?!
  • 28. Turning RDBMS to a legacy data store ? Not at all. We need RDBMS to store high value data and for its feature rich approach (feature first). NoSQL (scale first) is not a superset of RDBMS technologies (a bit like Einstein Relativity to Newton Physics). Remember NoSQL is not “No SQL” but “Not Only SQL”
  • 30. Rise of Data Marketplaces Data Science tools development: More powerful & expressive toolsets for analysis Streaming Data processing emerging tools (Twitter Storm, Yahoo s4, Streambase) :Real-time enablement / Live BI Further cloud-enablement Ease of integration to Enterprise Sources
  • 32. To leverage Big Data you need something like a Sugar Factory. It can be very entry level factory (Excel – Azure Source) or more complex. The more complex and complete the more value at the end of the processing chain To turn Big Data technologies from developer-centric solutions to enterprise solutions, they must be combined with SQL solutions into a single proven infrastructure meeting manageability and security requirements of enterprises.
  • 33. The challenge for Enterprises is to simplify Big Data integration/engineering and leverage it where possible to improve their processes at tactical and strategic levels. Architects & DBAs will be able to make choices for datastores technologies and will need to understand where one is better than the other. Big Data has to be part of the Enterprise Applications EcoSystem where it will be turned to value.