SlideShare une entreprise Scribd logo
1  sur  28
BIG DATA
BY: ZEESHAN ALAM KHAN(MCA, AMU)
Big Data: A definition
• Big data is a collection of data sets so large and complex
that it becomes difficult to process using on-hand
database management tools. The challenges include
capture, curation, storage, search, sharing, analysis, and
visualization. The trend to larger data sets is due to the
additional information derivable from analysis of a single
large set of related data, as compared to separate smaller
sets with the same total amount of data, allowing
correlations to be found to "spot business trends,
determine quality of research, prevent diseases, link legal
citations, combat crime, and determine real-time roadway
traffic conditions. (Wikipedia)
Big Data: A definition
• Put another way, big data is the realization of greater
business intelligence by storing, processing, and
analyzing data that was previously ignored due to the
limitations of traditional data management
technologies
Source: Harness the Power of Big Data: The IBM Big Data Platform
Lots of data
• 2.5 quintillion bytes of data are generated every day!
– A quintillion is 1018
• Data come from many quarters.
– Social media sites
– Sensors
– Digital photos
– Business transactions
– Location-based data
Source: IBM http://www-01.ibm.com/software/data/bigdata/
The four dimensions of Big Data
• Volume: Large volumes of data
• Velocity: Quickly moving data
• Variety: structured, unstructured, images, etc.
• Veracity: Trust and integrity is a challenge and a must
and is important for big data just as for traditional
relational DBs
Source: IBM http://www-01.ibm.com/software/data/bigdata/
The four dimensions of use
• Aspects of the way in which users want to interact
with their data…
– Totality: Users have an increased desire to process and
analyze all available data
– Exploration: Users apply analytic approaches where the
schema is defined in response to the nature of the query
– Frequency: Users have a desire to increase the rate of
analysis in order to generate more accurate and timely
business intelligence
– Dependency: Users’ need to balance investment in existing
technologies and skills with the adoption of new techniques
Source: IBM http://www-01.ibm.com/software/data/bigdata/
So, in a nutshell
• Big Data is about better analytics!
Why Big Data and BI
Source: Business Intelligence Strategy: A Framework for
Achieving BI Excellence
Source: Business Intelligence Strategy: A Framework for
Achieving BI Excellence
Big Data Conundrum
• Problems:
– Although there is a massive spike available data, the
percentage of the data that an enterprise can understand is
on the decline
– The data that the enterprise is trying to understand is
saturated with both useful signals and lots of noise
Source: IBM http://www-01.ibm.com/software/data/bigdata/
The Big Data platform Manifesto
imperatives and underlying technologies
IBM’s Big Data Platform
Some concepts
• NoSQL (Not Only SQL): Databases that “move
beyond” relational data models (i.e., no tables, limited
or no use of SQL)
– Focus on retrieval of data and appending new data (not
necessarily tables)
– Focus on key-value data stores that can be used to locate
data objects
– Focus on supporting storage of large quantities of
unstructured data
– SQL is not used for storage or retrieval of data
– No ACID (atomicity, consistency, isolation, durability)
NoSQL
• NoSQL focuses on a schema-less architecture (i.e.,
the data structure is not predefined)
• In contrast, traditional relation DBs require the
schema to be defined before the database is built and
populated.
– Data are structured
– Limited in scope
– Designed around ACID principles.
Hadoop
• Hadoop is a distributed file system and data processing
engine that is designed to handle extremely high volumes
of data in any structure.
• Hadoop has two components:
– The Hadoop distributed file system (HDFS), which supports data
in structured relational form, in unstructured form, and in any
form in between
– The MapReduce programing paradigm for managing
applications on multiple distributed servers
• The focus is on supporting redundancy, distributed
architectures, and parallel processing
Some Hadoop Related
Names to Know
• Apache Avro: designed for communication between
Hadoop nodes through data serialization
• Cassandra and Hbase: a non-relational database designed
for use with Hadoop
• Hive: a query language similar to SQL (HiveQL) but
compatible with Hadoop
• Mahout: an AI tool designed for machine learning; that is,
to assist with filtering data for analysis and exploration
• Pig Latin: A data-flow language and execution framework
for parallel computation
• ZooKeeper: Keeps all the parts coordinated and working
together
What to do with the data
Parallels with Data Warehousing
Data Warehouses
• Extraction
• Transformation
• Load
• Connector
• Processing
• User Management
Connector Framework
• Supports access to data by creating indexes that can
be used for access to the data in its native repository
(i.e., it does not manage the data, it keeps track of
where it is located)
Processing Layer
• Two primary functions:
– Indexes content: data are crawled, parsed, and analyzed
with the result that contents are indexed and located
• Processes queries
– Manages access to various servers hosting the indexed and
searchable content
Annotated Query Language
• AQL is an SQL-like declarative language for
performing text analysis and extraction
create view PersonPhone as select P.name as person, N.number as phone
from Person P, Phone PN, Sentence S where Follows(P.name. PN.number, 0, 30)
and Contains(S.sentence, P.name) and Contains(S.sentence, PN.number)
and ContainsRegex(/b(phone|at)b/, SpanBetween(P.name, PN.number));
The provenance viewer
Machine data analysis
Some resources
• BigInsights Wiki
• Information Management Bookstore
• BigData University

Contenu connexe

Tendances

Prcn 2019 stage 1264-question-presentation_poster file_id-15
Prcn 2019 stage 1264-question-presentation_poster file_id-15Prcn 2019 stage 1264-question-presentation_poster file_id-15
Prcn 2019 stage 1264-question-presentation_poster file_id-15
madynav
 

Tendances (20)

Data Mining and Data Warehousing
Data Mining and Data WarehousingData Mining and Data Warehousing
Data Mining and Data Warehousing
 
Big data unit 2
Big data unit 2Big data unit 2
Big data unit 2
 
Présentation on radoop
Présentation on radoop   Présentation on radoop
Présentation on radoop
 
Data Warehouse and Data Mining
Data Warehouse and Data MiningData Warehouse and Data Mining
Data Warehouse and Data Mining
 
Big Data Modeling and Analytic Patterns – Beyond Schema on Read
Big Data Modeling and Analytic Patterns – Beyond Schema on ReadBig Data Modeling and Analytic Patterns – Beyond Schema on Read
Big Data Modeling and Analytic Patterns – Beyond Schema on Read
 
Data Warehousing and Mining
Data Warehousing and MiningData Warehousing and Mining
Data Warehousing and Mining
 
Top Big data Analytics tools: Emerging trends and Best practices
Top Big data Analytics tools: Emerging trends and Best practicesTop Big data Analytics tools: Emerging trends and Best practices
Top Big data Analytics tools: Emerging trends and Best practices
 
Data as a service
Data as a serviceData as a service
Data as a service
 
Cortana Analytics Workshop: Azure Data Catalog
Cortana Analytics Workshop: Azure Data CatalogCortana Analytics Workshop: Azure Data Catalog
Cortana Analytics Workshop: Azure Data Catalog
 
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
 
Global IT Outsourcing case study
Global IT Outsourcing case studyGlobal IT Outsourcing case study
Global IT Outsourcing case study
 
Prcn 2019 stage 1264-question-presentation_poster file_id-15
Prcn 2019 stage 1264-question-presentation_poster file_id-15Prcn 2019 stage 1264-question-presentation_poster file_id-15
Prcn 2019 stage 1264-question-presentation_poster file_id-15
 
AzureDay - Introduction Big Data Analytics.
AzureDay  - Introduction Big Data Analytics.AzureDay  - Introduction Big Data Analytics.
AzureDay - Introduction Big Data Analytics.
 
introduction to data warehousing and mining
 introduction to data warehousing and mining introduction to data warehousing and mining
introduction to data warehousing and mining
 
Data Catalog in Denodo Platform 7.0: Creating a Data Marketplace with Data Vi...
Data Catalog in Denodo Platform 7.0: Creating a Data Marketplace with Data Vi...Data Catalog in Denodo Platform 7.0: Creating a Data Marketplace with Data Vi...
Data Catalog in Denodo Platform 7.0: Creating a Data Marketplace with Data Vi...
 
DATA WAREHOUSING AND DATA MINING
DATA WAREHOUSING AND DATA MININGDATA WAREHOUSING AND DATA MINING
DATA WAREHOUSING AND DATA MINING
 
Digital intelligence satish bhatia
Digital intelligence satish bhatiaDigital intelligence satish bhatia
Digital intelligence satish bhatia
 
Unit 3 part i Data mining
Unit 3 part i Data miningUnit 3 part i Data mining
Unit 3 part i Data mining
 
Data Mining & Data Warehousing Lecture Notes
Data Mining & Data Warehousing Lecture NotesData Mining & Data Warehousing Lecture Notes
Data Mining & Data Warehousing Lecture Notes
 
3 Ways Tableau Improves Predictive Analytics
3 Ways Tableau Improves Predictive Analytics3 Ways Tableau Improves Predictive Analytics
3 Ways Tableau Improves Predictive Analytics
 

Similaire à Introduction to BIG DATA

How to Quickly and Easily Draw Value from Big Data Sources_Q3 symposia(Moa)
How to Quickly and Easily Draw Value  from Big Data Sources_Q3 symposia(Moa)How to Quickly and Easily Draw Value  from Big Data Sources_Q3 symposia(Moa)
How to Quickly and Easily Draw Value from Big Data Sources_Q3 symposia(Moa)
Moacyr Passador
 
Big Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RKBig Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RK
Rajesh Jayarman
 

Similaire à Introduction to BIG DATA (20)

Big Data
Big DataBig Data
Big Data
 
SMAC - Social, Mobile, Analytics and Cloud - An overview
SMAC - Social, Mobile, Analytics and Cloud - An overview SMAC - Social, Mobile, Analytics and Cloud - An overview
SMAC - Social, Mobile, Analytics and Cloud - An overview
 
How to Quickly and Easily Draw Value from Big Data Sources_Q3 symposia(Moa)
How to Quickly and Easily Draw Value  from Big Data Sources_Q3 symposia(Moa)How to Quickly and Easily Draw Value  from Big Data Sources_Q3 symposia(Moa)
How to Quickly and Easily Draw Value from Big Data Sources_Q3 symposia(Moa)
 
Eclipse day Sydney 2014 BIG data presentation
Eclipse day Sydney 2014 BIG data presentationEclipse day Sydney 2014 BIG data presentation
Eclipse day Sydney 2014 BIG data presentation
 
How Big Data ,Cloud Computing ,Data Science can help business
How Big Data ,Cloud Computing ,Data Science can help businessHow Big Data ,Cloud Computing ,Data Science can help business
How Big Data ,Cloud Computing ,Data Science can help business
 
Lecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.pptLecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.ppt
 
bigdataintro.pptx
bigdataintro.pptxbigdataintro.pptx
bigdataintro.pptx
 
Big Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RKBig Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RK
 
Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Big Data part 2
Skillwise Big Data part 2
 
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenariosThe Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
 
Skilwise Big data
Skilwise Big dataSkilwise Big data
Skilwise Big data
 
Big data peresintaion
Big data peresintaion Big data peresintaion
Big data peresintaion
 
Big data analytics - hadoop
Big data analytics - hadoopBig data analytics - hadoop
Big data analytics - hadoop
 
Overview of Bigdata Analytics
Overview of Bigdata Analytics Overview of Bigdata Analytics
Overview of Bigdata Analytics
 
IARE_BDBA_ PPT_0.pptx
IARE_BDBA_ PPT_0.pptxIARE_BDBA_ PPT_0.pptx
IARE_BDBA_ PPT_0.pptx
 
Oh! Session on Introduction to BIG Data
Oh! Session on Introduction to BIG DataOh! Session on Introduction to BIG Data
Oh! Session on Introduction to BIG Data
 
Big Data Evolution
Big Data EvolutionBig Data Evolution
Big Data Evolution
 
Big Data
Big DataBig Data
Big Data
 

Plus de Zeeshan Khan

Plus de Zeeshan Khan (12)

Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Spring security4.x
Spring security4.xSpring security4.x
Spring security4.x
 
Micro services overview
Micro services overviewMicro services overview
Micro services overview
 
XML / WEB SERVICES & RESTful Services
XML / WEB SERVICES & RESTful ServicesXML / WEB SERVICES & RESTful Services
XML / WEB SERVICES & RESTful Services
 
Manual Testing
Manual TestingManual Testing
Manual Testing
 
Collection framework (completenotes) zeeshan
Collection framework (completenotes) zeeshanCollection framework (completenotes) zeeshan
Collection framework (completenotes) zeeshan
 
JUnit with_mocking
JUnit with_mockingJUnit with_mocking
JUnit with_mocking
 
OOPS in Java
OOPS in JavaOOPS in Java
OOPS in Java
 
Java
JavaJava
Java
 
Big data
Big dataBig data
Big data
 
Android application development
Android application developmentAndroid application development
Android application development
 
Cyber crime
Cyber crimeCyber crime
Cyber crime
 

Dernier

( Pune ) VIP Baner Call Girls 🎗️ 9352988975 Sizzling | Escorts | Girls Are Re...
( Pune ) VIP Baner Call Girls 🎗️ 9352988975 Sizzling | Escorts | Girls Are Re...( Pune ) VIP Baner Call Girls 🎗️ 9352988975 Sizzling | Escorts | Girls Are Re...
( Pune ) VIP Baner Call Girls 🎗️ 9352988975 Sizzling | Escorts | Girls Are Re...
nilamkumrai
 
valsad Escorts Service ☎️ 6378878445 ( Sakshi Sinha ) High Profile Call Girls...
valsad Escorts Service ☎️ 6378878445 ( Sakshi Sinha ) High Profile Call Girls...valsad Escorts Service ☎️ 6378878445 ( Sakshi Sinha ) High Profile Call Girls...
valsad Escorts Service ☎️ 6378878445 ( Sakshi Sinha ) High Profile Call Girls...
Call Girls In Delhi Whatsup 9873940964 Enjoy Unlimited Pleasure
 

Dernier (20)

( Pune ) VIP Baner Call Girls 🎗️ 9352988975 Sizzling | Escorts | Girls Are Re...
( Pune ) VIP Baner Call Girls 🎗️ 9352988975 Sizzling | Escorts | Girls Are Re...( Pune ) VIP Baner Call Girls 🎗️ 9352988975 Sizzling | Escorts | Girls Are Re...
( Pune ) VIP Baner Call Girls 🎗️ 9352988975 Sizzling | Escorts | Girls Are Re...
 
valsad Escorts Service ☎️ 6378878445 ( Sakshi Sinha ) High Profile Call Girls...
valsad Escorts Service ☎️ 6378878445 ( Sakshi Sinha ) High Profile Call Girls...valsad Escorts Service ☎️ 6378878445 ( Sakshi Sinha ) High Profile Call Girls...
valsad Escorts Service ☎️ 6378878445 ( Sakshi Sinha ) High Profile Call Girls...
 
VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting High Prof...
VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting  High Prof...VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting  High Prof...
VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting High Prof...
 
APNIC Updates presented by Paul Wilson at ARIN 53
APNIC Updates presented by Paul Wilson at ARIN 53APNIC Updates presented by Paul Wilson at ARIN 53
APNIC Updates presented by Paul Wilson at ARIN 53
 
Call Now ☎ 8264348440 !! Call Girls in Rani Bagh Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Rani Bagh Escort Service Delhi N.C.R.Call Now ☎ 8264348440 !! Call Girls in Rani Bagh Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Rani Bagh Escort Service Delhi N.C.R.
 
Busty Desi⚡Call Girls in Vasundhara Ghaziabad >༒8448380779 Escort Service
Busty Desi⚡Call Girls in Vasundhara Ghaziabad >༒8448380779 Escort ServiceBusty Desi⚡Call Girls in Vasundhara Ghaziabad >༒8448380779 Escort Service
Busty Desi⚡Call Girls in Vasundhara Ghaziabad >༒8448380779 Escort Service
 
Sarola * Female Escorts Service in Pune | 8005736733 Independent Escorts & Da...
Sarola * Female Escorts Service in Pune | 8005736733 Independent Escorts & Da...Sarola * Female Escorts Service in Pune | 8005736733 Independent Escorts & Da...
Sarola * Female Escorts Service in Pune | 8005736733 Independent Escorts & Da...
 
Ganeshkhind ! Call Girls Pune - 450+ Call Girl Cash Payment 8005736733 Neha T...
Ganeshkhind ! Call Girls Pune - 450+ Call Girl Cash Payment 8005736733 Neha T...Ganeshkhind ! Call Girls Pune - 450+ Call Girl Cash Payment 8005736733 Neha T...
Ganeshkhind ! Call Girls Pune - 450+ Call Girl Cash Payment 8005736733 Neha T...
 
Real Escorts in Al Nahda +971524965298 Dubai Escorts Service
Real Escorts in Al Nahda +971524965298 Dubai Escorts ServiceReal Escorts in Al Nahda +971524965298 Dubai Escorts Service
Real Escorts in Al Nahda +971524965298 Dubai Escorts Service
 
Call Now ☎ 8264348440 !! Call Girls in Sarai Rohilla Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Sarai Rohilla Escort Service Delhi N.C.R.Call Now ☎ 8264348440 !! Call Girls in Sarai Rohilla Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Sarai Rohilla Escort Service Delhi N.C.R.
 
Yerawada ] Independent Escorts in Pune - Book 8005736733 Call Girls Available...
Yerawada ] Independent Escorts in Pune - Book 8005736733 Call Girls Available...Yerawada ] Independent Escorts in Pune - Book 8005736733 Call Girls Available...
Yerawada ] Independent Escorts in Pune - Book 8005736733 Call Girls Available...
 
Shikrapur - Call Girls in Pune Neha 8005736733 | 100% Gennuine High Class Ind...
Shikrapur - Call Girls in Pune Neha 8005736733 | 100% Gennuine High Class Ind...Shikrapur - Call Girls in Pune Neha 8005736733 | 100% Gennuine High Class Ind...
Shikrapur - Call Girls in Pune Neha 8005736733 | 100% Gennuine High Class Ind...
 
VVIP Pune Call Girls Mohammadwadi WhatSapp Number 8005736733 With Elite Staff...
VVIP Pune Call Girls Mohammadwadi WhatSapp Number 8005736733 With Elite Staff...VVIP Pune Call Girls Mohammadwadi WhatSapp Number 8005736733 With Elite Staff...
VVIP Pune Call Girls Mohammadwadi WhatSapp Number 8005736733 With Elite Staff...
 
Call Now ☎ 8264348440 !! Call Girls in Shahpur Jat Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Shahpur Jat Escort Service Delhi N.C.R.Call Now ☎ 8264348440 !! Call Girls in Shahpur Jat Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Shahpur Jat Escort Service Delhi N.C.R.
 
All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445
All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445
All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445
 
Al Barsha Night Partner +0567686026 Call Girls Dubai
Al Barsha Night Partner +0567686026 Call Girls  DubaiAl Barsha Night Partner +0567686026 Call Girls  Dubai
Al Barsha Night Partner +0567686026 Call Girls Dubai
 
Wagholi & High Class Call Girls Pune Neha 8005736733 | 100% Gennuine High Cla...
Wagholi & High Class Call Girls Pune Neha 8005736733 | 100% Gennuine High Cla...Wagholi & High Class Call Girls Pune Neha 8005736733 | 100% Gennuine High Cla...
Wagholi & High Class Call Girls Pune Neha 8005736733 | 100% Gennuine High Cla...
 
Enjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort Service
Enjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort ServiceEnjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort Service
Enjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort Service
 
Moving Beyond Twitter/X and Facebook - Social Media for local news providers
Moving Beyond Twitter/X and Facebook - Social Media for local news providersMoving Beyond Twitter/X and Facebook - Social Media for local news providers
Moving Beyond Twitter/X and Facebook - Social Media for local news providers
 
WhatsApp 📞 8448380779 ✅Call Girls In Mamura Sector 66 ( Noida)
WhatsApp 📞 8448380779 ✅Call Girls In Mamura Sector 66 ( Noida)WhatsApp 📞 8448380779 ✅Call Girls In Mamura Sector 66 ( Noida)
WhatsApp 📞 8448380779 ✅Call Girls In Mamura Sector 66 ( Noida)
 

Introduction to BIG DATA

  • 1. BIG DATA BY: ZEESHAN ALAM KHAN(MCA, AMU)
  • 2. Big Data: A definition • Big data is a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools. The challenges include capture, curation, storage, search, sharing, analysis, and visualization. The trend to larger data sets is due to the additional information derivable from analysis of a single large set of related data, as compared to separate smaller sets with the same total amount of data, allowing correlations to be found to "spot business trends, determine quality of research, prevent diseases, link legal citations, combat crime, and determine real-time roadway traffic conditions. (Wikipedia)
  • 3. Big Data: A definition • Put another way, big data is the realization of greater business intelligence by storing, processing, and analyzing data that was previously ignored due to the limitations of traditional data management technologies Source: Harness the Power of Big Data: The IBM Big Data Platform
  • 4. Lots of data • 2.5 quintillion bytes of data are generated every day! – A quintillion is 1018 • Data come from many quarters. – Social media sites – Sensors – Digital photos – Business transactions – Location-based data Source: IBM http://www-01.ibm.com/software/data/bigdata/
  • 5. The four dimensions of Big Data • Volume: Large volumes of data • Velocity: Quickly moving data • Variety: structured, unstructured, images, etc. • Veracity: Trust and integrity is a challenge and a must and is important for big data just as for traditional relational DBs Source: IBM http://www-01.ibm.com/software/data/bigdata/
  • 6. The four dimensions of use • Aspects of the way in which users want to interact with their data… – Totality: Users have an increased desire to process and analyze all available data – Exploration: Users apply analytic approaches where the schema is defined in response to the nature of the query – Frequency: Users have a desire to increase the rate of analysis in order to generate more accurate and timely business intelligence – Dependency: Users’ need to balance investment in existing technologies and skills with the adoption of new techniques Source: IBM http://www-01.ibm.com/software/data/bigdata/
  • 7. So, in a nutshell • Big Data is about better analytics!
  • 8. Why Big Data and BI Source: Business Intelligence Strategy: A Framework for Achieving BI Excellence
  • 9. Source: Business Intelligence Strategy: A Framework for Achieving BI Excellence
  • 10. Big Data Conundrum • Problems: – Although there is a massive spike available data, the percentage of the data that an enterprise can understand is on the decline – The data that the enterprise is trying to understand is saturated with both useful signals and lots of noise Source: IBM http://www-01.ibm.com/software/data/bigdata/
  • 11. The Big Data platform Manifesto imperatives and underlying technologies
  • 12. IBM’s Big Data Platform
  • 13. Some concepts • NoSQL (Not Only SQL): Databases that “move beyond” relational data models (i.e., no tables, limited or no use of SQL) – Focus on retrieval of data and appending new data (not necessarily tables) – Focus on key-value data stores that can be used to locate data objects – Focus on supporting storage of large quantities of unstructured data – SQL is not used for storage or retrieval of data – No ACID (atomicity, consistency, isolation, durability)
  • 14. NoSQL • NoSQL focuses on a schema-less architecture (i.e., the data structure is not predefined) • In contrast, traditional relation DBs require the schema to be defined before the database is built and populated. – Data are structured – Limited in scope – Designed around ACID principles.
  • 15. Hadoop • Hadoop is a distributed file system and data processing engine that is designed to handle extremely high volumes of data in any structure. • Hadoop has two components: – The Hadoop distributed file system (HDFS), which supports data in structured relational form, in unstructured form, and in any form in between – The MapReduce programing paradigm for managing applications on multiple distributed servers • The focus is on supporting redundancy, distributed architectures, and parallel processing
  • 16. Some Hadoop Related Names to Know • Apache Avro: designed for communication between Hadoop nodes through data serialization • Cassandra and Hbase: a non-relational database designed for use with Hadoop • Hive: a query language similar to SQL (HiveQL) but compatible with Hadoop • Mahout: an AI tool designed for machine learning; that is, to assist with filtering data for analysis and exploration • Pig Latin: A data-flow language and execution framework for parallel computation • ZooKeeper: Keeps all the parts coordinated and working together
  • 17. What to do with the data
  • 18. Parallels with Data Warehousing Data Warehouses • Extraction • Transformation • Load • Connector • Processing • User Management
  • 19. Connector Framework • Supports access to data by creating indexes that can be used for access to the data in its native repository (i.e., it does not manage the data, it keeps track of where it is located)
  • 20. Processing Layer • Two primary functions: – Indexes content: data are crawled, parsed, and analyzed with the result that contents are indexed and located • Processes queries – Manages access to various servers hosting the indexed and searchable content
  • 21. Annotated Query Language • AQL is an SQL-like declarative language for performing text analysis and extraction create view PersonPhone as select P.name as person, N.number as phone from Person P, Phone PN, Sentence S where Follows(P.name. PN.number, 0, 30) and Contains(S.sentence, P.name) and Contains(S.sentence, PN.number) and ContainsRegex(/b(phone|at)b/, SpanBetween(P.name, PN.number));
  • 22.
  • 25.
  • 26.
  • 27.
  • 28. Some resources • BigInsights Wiki • Information Management Bookstore • BigData University