SlideShare une entreprise Scribd logo
1  sur  13
Introduction to Big Data
& Basic Data Analysis
Big Data EveryWhere!
 Lots of data is being collected
and warehoused
 Web data, e-commerce
 purchases at department/
grocery stores
 Bank/Credit Card
transactions
 Social Network
How much data?
Google processes 20 PB a day (2008)
Wayback Machine has 3 PB + 100 TB/month
(3/2009)
Facebook has 2.5 PB of user data + 15 TB/day
(4/2009)
eBay has 6.5 PB of user data + 50 TB/day
(5/2009)
CERN’s Large Hydron Collider (LHC) generates
15 PB a year
640K ought to be
enough for anybody.
Maximilien Brice, © CERN
The Earthscope
• The Earthscope is the world's largest science
project. Designed to track North America's
geological evolution, this observatory records
data over 3.8 million square miles, amassing 67
terabytes of data. It analyzes seismic slips in the
San Andreas fault, sure, but also the plume of
magma underneath Yellowstone and much, much
more.
(http://www.msnbc.msn.com/id/44363598/ns/tech
nology_and_science-
future_of_technology/#.TmetOdQ--uI)
Type of Data
 Relational Data (Tables/Transaction/Legacy Data)
 Text Data (Web)
 Semi-structured Data (XML)
 Graph Data
 Social Network, Semantic Web (RDF), …
 Streaming Data
 You can only scan the data once
What to do with these data?
 Aggregation and Statistics
 Data warehouse and OLAP
 Indexing, Searching, and Querying
 Keyword based search
 Pattern matching (XML/RDF)
 Knowledge discovery
 Data Mining
 Statistical Modeling
What is Data Mining?
 Discovery of useful, possibly unexpected, patterns in data
 Non-trivial extraction of implicit, previously unknown and potentially useful
information from data
 Exploration & analysis, by automatic or
semi-automatic means, of large quantities of data in order to discover
meaningful patterns
Data Mining Tasks
 Classification [Predictive]
 Clustering [Descriptive]
 Association Rule Discovery [Descriptive]
 Sequential Pattern Discovery [Descriptive]
 Regression [Predictive]
 Deviation Detection [Predictive]
 Collaborative Filter [Predictive]
Classification: Definition
 Given a collection of records (training set )
 Each record contains a set of attributes, one of the attributes is the class.
 Find a model for class attribute as a function of the values of other
attributes.
 Goal: previously unseen records should be assigned a class as
accurately as possible.
 A test set is used to determine the accuracy of the model. Usually, the given
data set is divided into training and test sets, with training set used to build
the model and test set used to validate it.
Other Types of Mining
Text mining: application of data mining
to textual documents
cluster Web pages to find related pages
cluster pages a user has visited to organize
their visit history
classify Web pages automatically into a Web
directory
Graph Mining:
Deal with graph data
11
Data Streams
 What are Data Streams?
 Continuous streams
 Huge, Fast, and Changing
 Why Data Streams?
 The arriving speed of streams and the huge amount of data
are beyond our capability to store them.
 “Real-time” processing
 Window Models
 Landscape window (Entire Data Stream)
 Sliding Window
 Damped Window
 Mining Data Stream
12
Streaming Sample Problem
 Scan the dataset once
 Sample K records
 Each one has equally probability to be sampled
 Total N record: K/N
ExcelR Data analytics courses

Contenu connexe

Tendances

Publishing XBRL as Linked Open Data
Publishing XBRL as Linked Open DataPublishing XBRL as Linked Open Data
Publishing XBRL as Linked Open DataRoberto García
 
A use case-driven iterative method for building a provenance-aware GCIS onto...
A use case-driven iterative method for building a provenance-aware GCIS onto...A use case-driven iterative method for building a provenance-aware GCIS onto...
A use case-driven iterative method for building a provenance-aware GCIS onto...Xiaogang (Marshall) Ma
 
Pitch Reactome2json_ld @ swat4hcls 2020
Pitch Reactome2json_ld @ swat4hcls 2020Pitch Reactome2json_ld @ swat4hcls 2020
Pitch Reactome2json_ld @ swat4hcls 2020François Belleau
 
Database novelty detection
Database novelty detectionDatabase novelty detection
Database novelty detectionMostafaAliAbbas
 
Serving Ireland's Geospatial Information as Linked Data
Serving Ireland's Geospatial Information as Linked DataServing Ireland's Geospatial Information as Linked Data
Serving Ireland's Geospatial Information as Linked DataChristophe Debruyne
 
INSPIRE Hackathon Webinar Intro to Linked Data and Semantics
INSPIRE Hackathon Webinar   Intro to Linked Data and SemanticsINSPIRE Hackathon Webinar   Intro to Linked Data and Semantics
INSPIRE Hackathon Webinar Intro to Linked Data and Semanticsplan4all
 
Mca535 data mining and data warehousing
Mca535 data mining and data warehousingMca535 data mining and data warehousing
Mca535 data mining and data warehousingBiswa Nayak
 
An evaluation of taxonomic name finding & next steps in Biodiversity Heritage...
An evaluation of taxonomic name finding & next steps in Biodiversity Heritage...An evaluation of taxonomic name finding & next steps in Biodiversity Heritage...
An evaluation of taxonomic name finding & next steps in Biodiversity Heritage...Chris Freeland
 
Plays Well with Others, or What I’ve learned as a data provider in an intero...
Plays Well with Others, or What I’ve learned as a data provider in an intero...Plays Well with Others, or What I’ve learned as a data provider in an intero...
Plays Well with Others, or What I’ve learned as a data provider in an intero...Chris Freeland
 
Providing Research Graph data in JSON-LD using Schema.org
Providing Research Graph data in JSON-LD using Schema.orgProviding Research Graph data in JSON-LD using Schema.org
Providing Research Graph data in JSON-LD using Schema.orgJingbo Wang
 
Collaboratively Conceived, Designed and Implemented: Matching Visualization ...
Collaboratively Conceived, Designed and Implemented:  Matching Visualization ...Collaboratively Conceived, Designed and Implemented:  Matching Visualization ...
Collaboratively Conceived, Designed and Implemented: Matching Visualization ...Nancy Hoebelheinrich
 
2016 web mapping track: colorado oil & gas wells over time by will carter
2016 web mapping track: colorado oil & gas wells over time by will carter2016 web mapping track: colorado oil & gas wells over time by will carter
2016 web mapping track: colorado oil & gas wells over time by will carterGIS in the Rockies
 
Scalable, Collaborative, Reproducible, and Extensible analysis of TCGA data i...
Scalable, Collaborative, Reproducible, and Extensible analysis of TCGA data i...Scalable, Collaborative, Reproducible, and Extensible analysis of TCGA data i...
Scalable, Collaborative, Reproducible, and Extensible analysis of TCGA data i...Brandi Davis-Dusenbery
 
Developing COUNTER Standards to Measure the Use of Open Access Resources
Developing COUNTER Standards to Measure the Use of Open Access ResourcesDeveloping COUNTER Standards to Measure the Use of Open Access Resources
Developing COUNTER Standards to Measure the Use of Open Access ResourcesUCD Library
 
GNIS-LD: Serving and Visualizing the Geographic Names Information System Gaze...
GNIS-LD: Serving and Visualizing the Geographic Names Information System Gaze...GNIS-LD: Serving and Visualizing the Geographic Names Information System Gaze...
GNIS-LD: Serving and Visualizing the Geographic Names Information System Gaze...Blake Regalia
 

Tendances (20)

Publishing XBRL as Linked Open Data
Publishing XBRL as Linked Open DataPublishing XBRL as Linked Open Data
Publishing XBRL as Linked Open Data
 
Or2013 poster
Or2013 posterOr2013 poster
Or2013 poster
 
DBpedia mobile
DBpedia mobileDBpedia mobile
DBpedia mobile
 
A use case-driven iterative method for building a provenance-aware GCIS onto...
A use case-driven iterative method for building a provenance-aware GCIS onto...A use case-driven iterative method for building a provenance-aware GCIS onto...
A use case-driven iterative method for building a provenance-aware GCIS onto...
 
Pitch Reactome2json_ld @ swat4hcls 2020
Pitch Reactome2json_ld @ swat4hcls 2020Pitch Reactome2json_ld @ swat4hcls 2020
Pitch Reactome2json_ld @ swat4hcls 2020
 
Geospatial data
Geospatial dataGeospatial data
Geospatial data
 
Database novelty detection
Database novelty detectionDatabase novelty detection
Database novelty detection
 
Serving Ireland's Geospatial Information as Linked Data
Serving Ireland's Geospatial Information as Linked DataServing Ireland's Geospatial Information as Linked Data
Serving Ireland's Geospatial Information as Linked Data
 
INSPIRE Hackathon Webinar Intro to Linked Data and Semantics
INSPIRE Hackathon Webinar   Intro to Linked Data and SemanticsINSPIRE Hackathon Webinar   Intro to Linked Data and Semantics
INSPIRE Hackathon Webinar Intro to Linked Data and Semantics
 
Mca535 data mining and data warehousing
Mca535 data mining and data warehousingMca535 data mining and data warehousing
Mca535 data mining and data warehousing
 
An evaluation of taxonomic name finding & next steps in Biodiversity Heritage...
An evaluation of taxonomic name finding & next steps in Biodiversity Heritage...An evaluation of taxonomic name finding & next steps in Biodiversity Heritage...
An evaluation of taxonomic name finding & next steps in Biodiversity Heritage...
 
Plays Well with Others, or What I’ve learned as a data provider in an intero...
Plays Well with Others, or What I’ve learned as a data provider in an intero...Plays Well with Others, or What I’ve learned as a data provider in an intero...
Plays Well with Others, or What I’ve learned as a data provider in an intero...
 
Providing Research Graph data in JSON-LD using Schema.org
Providing Research Graph data in JSON-LD using Schema.orgProviding Research Graph data in JSON-LD using Schema.org
Providing Research Graph data in JSON-LD using Schema.org
 
Collaboratively Conceived, Designed and Implemented: Matching Visualization ...
Collaboratively Conceived, Designed and Implemented:  Matching Visualization ...Collaboratively Conceived, Designed and Implemented:  Matching Visualization ...
Collaboratively Conceived, Designed and Implemented: Matching Visualization ...
 
2016 web mapping track: colorado oil & gas wells over time by will carter
2016 web mapping track: colorado oil & gas wells over time by will carter2016 web mapping track: colorado oil & gas wells over time by will carter
2016 web mapping track: colorado oil & gas wells over time by will carter
 
Scalable, Collaborative, Reproducible, and Extensible analysis of TCGA data i...
Scalable, Collaborative, Reproducible, and Extensible analysis of TCGA data i...Scalable, Collaborative, Reproducible, and Extensible analysis of TCGA data i...
Scalable, Collaborative, Reproducible, and Extensible analysis of TCGA data i...
 
Learn about Your Location (Using ALL Your Data)
Learn about Your Location (Using ALL Your Data)Learn about Your Location (Using ALL Your Data)
Learn about Your Location (Using ALL Your Data)
 
Developing COUNTER Standards to Measure the Use of Open Access Resources
Developing COUNTER Standards to Measure the Use of Open Access ResourcesDeveloping COUNTER Standards to Measure the Use of Open Access Resources
Developing COUNTER Standards to Measure the Use of Open Access Resources
 
GNIS-LD: Serving and Visualizing the Geographic Names Information System Gaze...
GNIS-LD: Serving and Visualizing the Geographic Names Information System Gaze...GNIS-LD: Serving and Visualizing the Geographic Names Information System Gaze...
GNIS-LD: Serving and Visualizing the Geographic Names Information System Gaze...
 
Iugonet 20100706
Iugonet 20100706Iugonet 20100706
Iugonet 20100706
 

Similaire à Data science courses

Big Data Real Time Training in Chennai
Big Data Real Time Training in ChennaiBig Data Real Time Training in Chennai
Big Data Real Time Training in ChennaiVijay Susheedran C G
 
Big Data 101 - An introduction
Big Data 101 - An introductionBig Data 101 - An introduction
Big Data 101 - An introductionNeeraj Tewari
 
eScience: A Transformed Scientific Method
eScience: A Transformed Scientific MethodeScience: A Transformed Scientific Method
eScience: A Transformed Scientific MethodDuncan Hull
 
Keynote on 2015 Yale Day of Data
Keynote on 2015 Yale Day of Data Keynote on 2015 Yale Day of Data
Keynote on 2015 Yale Day of Data Robert Grossman
 
Data-Mining-ppt (1).pptx
Data-Mining-ppt (1).pptxData-Mining-ppt (1).pptx
Data-Mining-ppt (1).pptxParvathyparu25
 
Data-Mining-ppt.pptx
Data-Mining-ppt.pptxData-Mining-ppt.pptx
Data-Mining-ppt.pptxayush309565
 
sybca-bigdata-ppt.pptx
sybca-bigdata-ppt.pptxsybca-bigdata-ppt.pptx
sybca-bigdata-ppt.pptxcalf_ville86
 
Big Data Analytics and Open Data
Big Data Analytics and Open Data Big Data Analytics and Open Data
Big Data Analytics and Open Data Sharjeel Imtiaz
 
Big Data in Practice.pdf
Big Data in Practice.pdfBig Data in Practice.pdf
Big Data in Practice.pdfTom Tan
 
MINING FUZZY ASSOCIATION RULES FROM WEB USAGE QUANTITATIVE DATA
MINING FUZZY ASSOCIATION RULES FROM WEB USAGE QUANTITATIVE DATAMINING FUZZY ASSOCIATION RULES FROM WEB USAGE QUANTITATIVE DATA
MINING FUZZY ASSOCIATION RULES FROM WEB USAGE QUANTITATIVE DATAcscpconf
 
Mining Fuzzy Association Rules from Web Usage Quantitative Data
Mining Fuzzy Association Rules from Web Usage Quantitative Data Mining Fuzzy Association Rules from Web Usage Quantitative Data
Mining Fuzzy Association Rules from Web Usage Quantitative Data csandit
 
Image retrieval from the world wide web issues, techniques, and systems
Image retrieval from the world wide web issues, techniques, and systemsImage retrieval from the world wide web issues, techniques, and systems
Image retrieval from the world wide web issues, techniques, and systemsunyil96
 
Image retrieval from the world wide web
Image retrieval from the world wide webImage retrieval from the world wide web
Image retrieval from the world wide webunyil96
 
Data analytics introduction
Data analytics introductionData analytics introduction
Data analytics introductionamiyadash
 
Science cloud foster june 2013
Science cloud foster june 2013Science cloud foster june 2013
Science cloud foster june 2013Kirill Osipov
 
Science as a Service: How On-Demand Computing can Accelerate Discovery
Science as a Service: How On-Demand Computing can Accelerate DiscoveryScience as a Service: How On-Demand Computing can Accelerate Discovery
Science as a Service: How On-Demand Computing can Accelerate DiscoveryIan Foster
 
Roots tech 2013 Big Data at Ancestry (3-22-2013) - no animations
Roots tech 2013 Big Data at Ancestry (3-22-2013) - no animationsRoots tech 2013 Big Data at Ancestry (3-22-2013) - no animations
Roots tech 2013 Big Data at Ancestry (3-22-2013) - no animationsWilliam Yetman
 
Learning Systems for Science
Learning Systems for ScienceLearning Systems for Science
Learning Systems for ScienceIan Foster
 

Similaire à Data science courses (20)

Big Data Real Time Training in Chennai
Big Data Real Time Training in ChennaiBig Data Real Time Training in Chennai
Big Data Real Time Training in Chennai
 
Big Data 101 - An introduction
Big Data 101 - An introductionBig Data 101 - An introduction
Big Data 101 - An introduction
 
eScience: A Transformed Scientific Method
eScience: A Transformed Scientific MethodeScience: A Transformed Scientific Method
eScience: A Transformed Scientific Method
 
Keynote on 2015 Yale Day of Data
Keynote on 2015 Yale Day of Data Keynote on 2015 Yale Day of Data
Keynote on 2015 Yale Day of Data
 
Data-Mining-ppt (1).pptx
Data-Mining-ppt (1).pptxData-Mining-ppt (1).pptx
Data-Mining-ppt (1).pptx
 
Data-Mining-ppt.pptx
Data-Mining-ppt.pptxData-Mining-ppt.pptx
Data-Mining-ppt.pptx
 
BIG DATA AND HADOOP.pdf
BIG DATA AND HADOOP.pdfBIG DATA AND HADOOP.pdf
BIG DATA AND HADOOP.pdf
 
sybca-bigdata-ppt.pptx
sybca-bigdata-ppt.pptxsybca-bigdata-ppt.pptx
sybca-bigdata-ppt.pptx
 
Big Data Analytics and Open Data
Big Data Analytics and Open Data Big Data Analytics and Open Data
Big Data Analytics and Open Data
 
Big Data in Practice.pdf
Big Data in Practice.pdfBig Data in Practice.pdf
Big Data in Practice.pdf
 
MINING FUZZY ASSOCIATION RULES FROM WEB USAGE QUANTITATIVE DATA
MINING FUZZY ASSOCIATION RULES FROM WEB USAGE QUANTITATIVE DATAMINING FUZZY ASSOCIATION RULES FROM WEB USAGE QUANTITATIVE DATA
MINING FUZZY ASSOCIATION RULES FROM WEB USAGE QUANTITATIVE DATA
 
Mining Fuzzy Association Rules from Web Usage Quantitative Data
Mining Fuzzy Association Rules from Web Usage Quantitative Data Mining Fuzzy Association Rules from Web Usage Quantitative Data
Mining Fuzzy Association Rules from Web Usage Quantitative Data
 
Image retrieval from the world wide web issues, techniques, and systems
Image retrieval from the world wide web issues, techniques, and systemsImage retrieval from the world wide web issues, techniques, and systems
Image retrieval from the world wide web issues, techniques, and systems
 
Image retrieval from the world wide web
Image retrieval from the world wide webImage retrieval from the world wide web
Image retrieval from the world wide web
 
data.2.pptx
data.2.pptxdata.2.pptx
data.2.pptx
 
Data analytics introduction
Data analytics introductionData analytics introduction
Data analytics introduction
 
Science cloud foster june 2013
Science cloud foster june 2013Science cloud foster june 2013
Science cloud foster june 2013
 
Science as a Service: How On-Demand Computing can Accelerate Discovery
Science as a Service: How On-Demand Computing can Accelerate DiscoveryScience as a Service: How On-Demand Computing can Accelerate Discovery
Science as a Service: How On-Demand Computing can Accelerate Discovery
 
Roots tech 2013 Big Data at Ancestry (3-22-2013) - no animations
Roots tech 2013 Big Data at Ancestry (3-22-2013) - no animationsRoots tech 2013 Big Data at Ancestry (3-22-2013) - no animations
Roots tech 2013 Big Data at Ancestry (3-22-2013) - no animations
 
Learning Systems for Science
Learning Systems for ScienceLearning Systems for Science
Learning Systems for Science
 

Plus de priyankaravilla

Pmp institute in bangalore
Pmp institute in bangalorePmp institute in bangalore
Pmp institute in bangalorepriyankaravilla
 
Pmp certification bangalore
Pmp certification bangalorePmp certification bangalore
Pmp certification bangalorepriyankaravilla
 
Pmp institute in bangalore
Pmp institute in bangalorePmp institute in bangalore
Pmp institute in bangalorepriyankaravilla
 
Data science training in bangalore
Data science training in bangaloreData science training in bangalore
Data science training in bangalorepriyankaravilla
 
Pmp institute in bangalore
Pmp institute in bangalorePmp institute in bangalore
Pmp institute in bangalorepriyankaravilla
 
Digital Marketing Course
Digital Marketing CourseDigital Marketing Course
Digital Marketing Coursepriyankaravilla
 
Machine Learning course in bangalore
Machine Learning course in bangaloreMachine Learning course in bangalore
Machine Learning course in bangalorepriyankaravilla
 
Machine Learning Course Training
Machine Learning Course TrainingMachine Learning Course Training
Machine Learning Course Trainingpriyankaravilla
 

Plus de priyankaravilla (14)

Pmp institute in bangalore
Pmp institute in bangalorePmp institute in bangalore
Pmp institute in bangalore
 
Pmp certification bangalore
Pmp certification bangalorePmp certification bangalore
Pmp certification bangalore
 
Pmp institute in bangalore
Pmp institute in bangalorePmp institute in bangalore
Pmp institute in bangalore
 
Data science training in bangalore
Data science training in bangaloreData science training in bangalore
Data science training in bangalore
 
Data science courses
Data science coursesData science courses
Data science courses
 
Pmp institute in bangalore
Pmp institute in bangalorePmp institute in bangalore
Pmp institute in bangalore
 
PMP Certifiation
PMP CertifiationPMP Certifiation
PMP Certifiation
 
Data analytics courses
Data analytics coursesData analytics courses
Data analytics courses
 
Data analytics courses
Data analytics coursesData analytics courses
Data analytics courses
 
Pmp training bangalore
Pmp training bangalorePmp training bangalore
Pmp training bangalore
 
Digital Marketing Course
Digital Marketing CourseDigital Marketing Course
Digital Marketing Course
 
Machine Learning course in bangalore
Machine Learning course in bangaloreMachine Learning course in bangalore
Machine Learning course in bangalore
 
Machine Learning Course Training
Machine Learning Course TrainingMachine Learning Course Training
Machine Learning Course Training
 
Machine learning
Machine learningMachine learning
Machine learning
 

Dernier

Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpinRaunakKeshri1
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Disha Kariya
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...anjaliyadav012327
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
The byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptxThe byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptxShobhayan Kirtania
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 

Dernier (20)

Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
The byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptxThe byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptx
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 

Data science courses

  • 1. Introduction to Big Data & Basic Data Analysis
  • 2. Big Data EveryWhere!  Lots of data is being collected and warehoused  Web data, e-commerce  purchases at department/ grocery stores  Bank/Credit Card transactions  Social Network
  • 3. How much data? Google processes 20 PB a day (2008) Wayback Machine has 3 PB + 100 TB/month (3/2009) Facebook has 2.5 PB of user data + 15 TB/day (4/2009) eBay has 6.5 PB of user data + 50 TB/day (5/2009) CERN’s Large Hydron Collider (LHC) generates 15 PB a year 640K ought to be enough for anybody.
  • 5. The Earthscope • The Earthscope is the world's largest science project. Designed to track North America's geological evolution, this observatory records data over 3.8 million square miles, amassing 67 terabytes of data. It analyzes seismic slips in the San Andreas fault, sure, but also the plume of magma underneath Yellowstone and much, much more. (http://www.msnbc.msn.com/id/44363598/ns/tech nology_and_science- future_of_technology/#.TmetOdQ--uI)
  • 6. Type of Data  Relational Data (Tables/Transaction/Legacy Data)  Text Data (Web)  Semi-structured Data (XML)  Graph Data  Social Network, Semantic Web (RDF), …  Streaming Data  You can only scan the data once
  • 7. What to do with these data?  Aggregation and Statistics  Data warehouse and OLAP  Indexing, Searching, and Querying  Keyword based search  Pattern matching (XML/RDF)  Knowledge discovery  Data Mining  Statistical Modeling
  • 8. What is Data Mining?  Discovery of useful, possibly unexpected, patterns in data  Non-trivial extraction of implicit, previously unknown and potentially useful information from data  Exploration & analysis, by automatic or semi-automatic means, of large quantities of data in order to discover meaningful patterns
  • 9. Data Mining Tasks  Classification [Predictive]  Clustering [Descriptive]  Association Rule Discovery [Descriptive]  Sequential Pattern Discovery [Descriptive]  Regression [Predictive]  Deviation Detection [Predictive]  Collaborative Filter [Predictive]
  • 10. Classification: Definition  Given a collection of records (training set )  Each record contains a set of attributes, one of the attributes is the class.  Find a model for class attribute as a function of the values of other attributes.  Goal: previously unseen records should be assigned a class as accurately as possible.  A test set is used to determine the accuracy of the model. Usually, the given data set is divided into training and test sets, with training set used to build the model and test set used to validate it.
  • 11. Other Types of Mining Text mining: application of data mining to textual documents cluster Web pages to find related pages cluster pages a user has visited to organize their visit history classify Web pages automatically into a Web directory Graph Mining: Deal with graph data 11
  • 12. Data Streams  What are Data Streams?  Continuous streams  Huge, Fast, and Changing  Why Data Streams?  The arriving speed of streams and the huge amount of data are beyond our capability to store them.  “Real-time” processing  Window Models  Landscape window (Entire Data Stream)  Sliding Window  Damped Window  Mining Data Stream 12
  • 13. Streaming Sample Problem  Scan the dataset once  Sample K records  Each one has equally probability to be sampled  Total N record: K/N ExcelR Data analytics courses