SlideShare une entreprise Scribd logo
1  sur  29
The Science of Data Science
(Data plus Semantics yields Knowledge)
Prof. James Hendler
Tetherless World Constellation Chair of Computer, Web and
Cognitive Sciences
Director, The Rensselaer IDEA
1
The Rensselaer Institute for Data
Exploration and Applications
Performance Plan to Budget
Presentation
February 2015
The Rensselaer Institute for Data Exploration and Applications (IDEA) is a
breakthrough initiative brings together key research areas and advanced
technologies to revolutionize the way we use data in science, engineering,
and virtually every other research and educational discipline. By bridging the
gaps between analytics, modeling, and simulation we continue the
Rensselaer tradition as a leader in applying critical technologies to improving
everyday life and meeting the challenges of the future.
3
The Rensselaer Institute for Data Exploration and Applications
Business
Systems:
Built and Natural
Environments:
Cyber-
Resiliency:
Policy, Ethics and
Stewardship:
Materials Informatics:Data-driven
Physical/Life Sciences:
Healthcare Analytics
and Mobile Health:
Social Network
Analytics:
Agents and
Augmented Reality:
4
IDEA project examples
• Healthcare in Context:
Data mining/analytics to
Improve public health from
a systems perspective at
the individual to national
scales.
• Data-Centric Engineering
Design: Data-driven
Design & Control under
uncertainty via data fusion
across multiple scales and
sources
• Supply Chain Resilience
through Information
Visibility: Demonstrate
uses of supply chain
information visibility for
anticipating, mitigating and
recovering from disruptive
events
• Accelerated design of
functional materials/Material
Ontology: Address basic
materials processing data-based
informatics for complex,
multifunctional (often nano)
materials.
• Biome-informatics: Develop data
aggregation and computational
tools to integrate disparate
datasets into large ecosystem
models using data collected on
the microbial communities that
inhabit the base of most
ecosystems
• Deducing Structure to Function
in Biomedicine: Develop
systematic data-resourced
methods for discovering and
exploiting structure-to-function
relationships.
5
KDD Pipeline – as usually presented
Data Storage
(Big Data
Warehouse)
KDD Pipeline – in the real world
Data is increasingly being
brought in from external
sources, with mixed
provenance, and
increasingly outside the
analyzers’ control.
At increasing rates and scales
6
Data
Storage
Sensors and apps Social
Media
Customer
Behaviors
Web
Partners
Formatting, standards use, data
cleansing, data bias analysis, …
Open data
Data
Storage
Data
Storage
Data
Storage
Data
Storage
Data
Storage
Data
Storage
Data
Storage
Tough data integration challenges
Enterprise
analytics
Open Data
Integration
Hard
problems!
Closing the loop on (big) data
IDEA is focusing on key data science
areas
which are revolutionizing engineering, science
and business with significant social impact
8
Predictive Analytics Discovery Informatics Data Exploration
Theme 1: Predictive Analytics
9
From “what is” to “what if”
Courtesy of
Eric Schadt,
Mount Sinai
Example: Healthcare Data Analytics
The Digital Universe of Data to Better
Diagnose and Treat Patients
Courtesy of
Eric
Schadt,
Mount Sinai
Identifying predictive features in data
Each factor must be separately
analyzed for its “Predictivity”
• Mutual information measure
The “black art” of predictive
analytics is finding the right
ones
• Use too few, the model is
weak
• Use too many, the model
becomes slow and dominated
by noise
Algorithms required to do this
because the overwhelming
number of “weak” factors defies
human abilities to combine
• Machine learning identifies
key feature
• some require “roll ups”
• some require “pull outs”
• Mathematical techniques then
reduce the dimensionality
11
12
Predictive analytics in sensors
Extend-o-hand
(Josh Shinavier. PhD)
Classification of the sensor data (via machine-learning) allows predictive recognition
of different gestures (i.e. before the gesture is finished).
13
Predictive analytics in large scale behaviors
List clusters at risk for Asian Clams
<1mile Cook’s Bay.”
Machine-learning generates predicts future distributions of invasive species in Lake
George based on current distributions and bathymetry similarity.
Predictive Social Network Analytics (with RPI NeST center)
14
Social Networks in Action
Analyzing cascading failures
Modeling (supply chain)
networks…
and predicting (cascading)
network risks.
Modeling network stressors (including
human cognitive element)
Understanding network dynamics
15
Data Science Research Center: tools for data analytics
Theory & Algorithms
• Randomized
• Optimization
• Approximation
• Multilinear Algebra
Applications
Statistics
• Multivariate analysis
• Optimal Experimental
Design
Dimension reduction by
randomized algorithms for
numerical linear algebra for
identify significant components
and visualizing Petabyte-scale
data matrices (P. Drineas, CSCI)
Parallel Factor Analysis for tensor systems creates a scalable
solution, on AMOS, for a critical data-processing component of
data analytics for large graphs. (B. Yener, CSCI)
Computational concerns
• Scaling
• Cyber Security for Data
Adding Semantics: Discovery Informatics
16
From “what if” to “Why”
17
Scientific data: Microbiome informatics
Human Biome
Environmental Biome
Built Environment
Data Analytics
Semantic Data Integration
While microbes are among the smallest
organisms on the planet, they are also
the largest influence on mass and
nutrient transport in the biosphere. They
are the base of most natural ecosystems,
as well as the purveyors of air and water
quality. It is also microbes that primarily
govern disease transfer and human
health in our built environments.
18
Materials Processing Ontology (cMDIS/IDEA)
The materials field has made much progress on systematically understanding materials
structure-to-property relationships, but lacks an organized model of processing-to-
property relations.
A critical need for systematic development of new materials technologies!
Goal: Create a (machine-readable) ontology
for materials processing.
By combining our expertise in data science,
materials and manufacturing, we are creating a
key missing link in the Materials Genome
Initiative.
Some questions need a qualitative answer
Platform for Experimental Collaborative Ethnography
20
Discovery Informatics Requires Unstructured data
Integration of text analytics,
natural language processing,
network-based multimedia
analysis and
structured/unstructured data
integration
Requires Unstructured data (real-time feeds /images/video)
DOE SEAB report on HPC:
How might a neuromorphic “accelerator” type processor be
used to improve the application performance, power
consumption and overall system reliability of future
exascale systems?
21
Power Consumption (w/IBM)
Network Learning (sensors)
Sparse Distributed Representations
Hybrid Neural/Symbolic Systems
Neuromorphic Computing: software systems that implement models
inspired by neural systems to analyze data tied to perception, motor control,
or multisensory integration.
22
Neuromorphic Computing (CCI/IDEA)
Joint CCI/IDEA project to use supercomputer to model state-of-the-art neuromorphic processors
Use for improving AMOS energy use (like autonomic control)
Use for exploring inputs from data-sensing systems (extrinsic control)
Neuromorphic Computing requires critical Rensselaer technologies
Integrating data analytics (on the fly) with simulation and modeling
CCI (AMOS) allows us to explore new variants on neuromorphic
approaches
IDEA provides learning models and analytics capabilities for evaluation
Together allow us to attack audio/visual streaming data
autonomic
extrinsic
Theme 3: Data Exploration
23
From “why” to “what is”
24
From visualization to exploration
… Unfortunately, visualization too often becomes an end product of scientific analysis,
rather than an exploration tool that scientists can use throughout the research life cycle.
However, new database technologies, coupled with emerging Web-based technologies,
may hold the key to lowering the cost of visualization generation and allow it to become
a more integral part of the scientific process.
25
From visualization to exploration
… Unfortunately, visualization too often becomes an end product of scientific analysis,
rather than an exploration tool that scientists can use throughout the research life cycle.
However, new database technologies, coupled with emerging Web-based technologies,
may hold the key to lowering the cost of visualization generation and allow it to become
a more integral part of the scientific process.
26
From what is, to what if, to why (and back)
These capabilities are critical in “closing the loop” between data,
simulation and modeling in scientific discovery, engineering
design, and business innovation.
27
A “Data Science” Research Agenda
Multiscale
Sparcity
Abductive Agent-oriented
• Gathering and
representing
information from
multiple sources
• topic of CODS talk
• Systematic (and
scalable) methods for
predictive analytics
• example: Parallel
search for best kernel
functions
28
Supporting the Scientific agenda
• New Data Exploration
platforms
• example: Patent
pending on new multi-
user collaborative
device
• Cognitive and
immersive platforms
• Data sharing standards
• Research Data Alliance
• W3C
The Rensselaer IDEA
Summary
• Data is not just the “oil” of the new
generation
• information is the new power source generated from that “oil”
• Using data for prediction is becoming less of an art,
but still needs systematicity
• Scaling tools beyond MapReduce
• Better methods for rapid customization
• Turning data into causal or design knowledge is in its
early stages
• Closing the loop from data to design requires new informatics,
new mathematics, and new ways of thinking beyond data mining
29

Contenu connexe

Tendances

Data science presentation 2nd CI day
Data science presentation 2nd CI dayData science presentation 2nd CI day
Data science presentation 2nd CI dayMohammed Barakat
 
Keynote - An overview on Big Data & Data Science - Dr Gregory Piatetsky-Shapiro
Keynote -  An overview on Big Data & Data Science - Dr Gregory Piatetsky-ShapiroKeynote -  An overview on Big Data & Data Science - Dr Gregory Piatetsky-Shapiro
Keynote - An overview on Big Data & Data Science - Dr Gregory Piatetsky-ShapiroData ScienceTech Institute
 
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?Data Science London
 
Python for Data Science - TDC 2015
Python for Data Science - TDC 2015Python for Data Science - TDC 2015
Python for Data Science - TDC 2015Gabriel Moreira
 
Big Data and Data Science: The Technologies Shaping Our Lives
Big Data and Data Science: The Technologies Shaping Our LivesBig Data and Data Science: The Technologies Shaping Our Lives
Big Data and Data Science: The Technologies Shaping Our LivesRukshan Batuwita
 
Big Data Science: Intro and Benefits
Big Data Science: Intro and BenefitsBig Data Science: Intro and Benefits
Big Data Science: Intro and BenefitsChandan Rajah
 
Data science e machine learning
Data science e machine learningData science e machine learning
Data science e machine learningGiuseppe Manco
 
Introduction to Data Science and Large-scale Machine Learning
Introduction to Data Science and Large-scale Machine LearningIntroduction to Data Science and Large-scale Machine Learning
Introduction to Data Science and Large-scale Machine LearningNik Spirin
 
Demystifying Data Science with an introduction to Machine Learning
Demystifying Data Science with an introduction to Machine LearningDemystifying Data Science with an introduction to Machine Learning
Demystifying Data Science with an introduction to Machine LearningJulian Bright
 
Data science presentation
Data science presentationData science presentation
Data science presentationMSDEVMTL
 
Big Data and the Art of Data Science
Big Data and the Art of Data ScienceBig Data and the Art of Data Science
Big Data and the Art of Data ScienceAndrew Gardner
 
Introduction of Data Science
Introduction of Data ScienceIntroduction of Data Science
Introduction of Data ScienceJason Geng
 
Big data deep learning: applications and challenges
Big data deep learning: applications and challengesBig data deep learning: applications and challenges
Big data deep learning: applications and challengesfazail amin
 
Applications of Machine Learning at USC
Applications of Machine Learning at USCApplications of Machine Learning at USC
Applications of Machine Learning at USCSri Ambati
 

Tendances (20)

Data science presentation 2nd CI day
Data science presentation 2nd CI dayData science presentation 2nd CI day
Data science presentation 2nd CI day
 
BigDataCSEKeyNote_2012
BigDataCSEKeyNote_2012BigDataCSEKeyNote_2012
BigDataCSEKeyNote_2012
 
Keynote - An overview on Big Data & Data Science - Dr Gregory Piatetsky-Shapiro
Keynote -  An overview on Big Data & Data Science - Dr Gregory Piatetsky-ShapiroKeynote -  An overview on Big Data & Data Science - Dr Gregory Piatetsky-Shapiro
Keynote - An overview on Big Data & Data Science - Dr Gregory Piatetsky-Shapiro
 
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
 
Python for Data Science - TDC 2015
Python for Data Science - TDC 2015Python for Data Science - TDC 2015
Python for Data Science - TDC 2015
 
Big Data and Data Science: The Technologies Shaping Our Lives
Big Data and Data Science: The Technologies Shaping Our LivesBig Data and Data Science: The Technologies Shaping Our Lives
Big Data and Data Science: The Technologies Shaping Our Lives
 
Big Data Science: Intro and Benefits
Big Data Science: Intro and BenefitsBig Data Science: Intro and Benefits
Big Data Science: Intro and Benefits
 
Data science e machine learning
Data science e machine learningData science e machine learning
Data science e machine learning
 
Introduction to Data Science and Large-scale Machine Learning
Introduction to Data Science and Large-scale Machine LearningIntroduction to Data Science and Large-scale Machine Learning
Introduction to Data Science and Large-scale Machine Learning
 
Intro to Data Science by DatalentTeam at Data Science Clinic#11
Intro to Data Science by DatalentTeam at Data Science Clinic#11Intro to Data Science by DatalentTeam at Data Science Clinic#11
Intro to Data Science by DatalentTeam at Data Science Clinic#11
 
Lecture #03
Lecture #03Lecture #03
Lecture #03
 
Demystifying Data Science with an introduction to Machine Learning
Demystifying Data Science with an introduction to Machine LearningDemystifying Data Science with an introduction to Machine Learning
Demystifying Data Science with an introduction to Machine Learning
 
Data science presentation
Data science presentationData science presentation
Data science presentation
 
Big Data and the Art of Data Science
Big Data and the Art of Data ScienceBig Data and the Art of Data Science
Big Data and the Art of Data Science
 
Data Science: Past, Present, and Future
Data Science: Past, Present, and FutureData Science: Past, Present, and Future
Data Science: Past, Present, and Future
 
Lecture #02
Lecture #02 Lecture #02
Lecture #02
 
Introduction of Data Science
Introduction of Data ScienceIntroduction of Data Science
Introduction of Data Science
 
Big data deep learning: applications and challenges
Big data deep learning: applications and challengesBig data deep learning: applications and challenges
Big data deep learning: applications and challenges
 
Applications of Machine Learning at USC
Applications of Machine Learning at USCApplications of Machine Learning at USC
Applications of Machine Learning at USC
 
Data science unit2
Data science unit2Data science unit2
Data science unit2
 

En vedette

The Art of Data Science
The Art of Data ScienceThe Art of Data Science
The Art of Data ScienceBostjan Kaluza
 
The Art and Science of Analyzing Software Data
The Art and Science of Analyzing Software DataThe Art and Science of Analyzing Software Data
The Art and Science of Analyzing Software DataCS, NcState
 
Social Machines - 2017 Update (University of Iowa)
Social Machines - 2017 Update (University of Iowa)Social Machines - 2017 Update (University of Iowa)
Social Machines - 2017 Update (University of Iowa)James Hendler
 
Social Machines: The coming collision of Artificial Intelligence, Social Netw...
Social Machines: The coming collision of Artificial Intelligence, Social Netw...Social Machines: The coming collision of Artificial Intelligence, Social Netw...
Social Machines: The coming collision of Artificial Intelligence, Social Netw...James Hendler
 
Watson: An Academic's Perspective
Watson: An Academic's PerspectiveWatson: An Academic's Perspective
Watson: An Academic's PerspectiveJames Hendler
 
On Beyond OWL: challenges for ontologies on the Web
On Beyond OWL: challenges for ontologies on the WebOn Beyond OWL: challenges for ontologies on the Web
On Beyond OWL: challenges for ontologies on the WebJames Hendler
 
Python as the Zen of Data Science
Python as the Zen of Data SciencePython as the Zen of Data Science
Python as the Zen of Data ScienceTravis Oliphant
 
KPI e Metriche per i Media e la Comunicazione Commerciale
KPI e  Metriche per i Media e la Comunicazione CommercialeKPI e  Metriche per i Media e la Comunicazione Commerciale
KPI e Metriche per i Media e la Comunicazione CommercialePaola Furlanetto
 
Ephesians for Beginners - #6 - The Basis for Unity in the Church
Ephesians for Beginners - #6 - The Basis for Unity in the ChurchEphesians for Beginners - #6 - The Basis for Unity in the Church
Ephesians for Beginners - #6 - The Basis for Unity in the ChurchBibleTalk.tv
 
Anticipatory Coordination in Socio-technical Knowledge-intensive Environments...
Anticipatory Coordination in Socio-technical Knowledge-intensive Environments...Anticipatory Coordination in Socio-technical Knowledge-intensive Environments...
Anticipatory Coordination in Socio-technical Knowledge-intensive Environments...Andrea Omicini
 
καστοριά
καστοριάκαστοριά
καστοριάasteraki
 
Introduction to high-tech entrepreneurship
Introduction to high-tech entrepreneurshipIntroduction to high-tech entrepreneurship
Introduction to high-tech entrepreneurshipSergey Dovgopolyy
 
腰カラビナ そして野帳
腰カラビナ そして野帳腰カラビナ そして野帳
腰カラビナ そして野帳Ryo Amano
 
Getting started erlang
Getting started erlangGetting started erlang
Getting started erlangKwanzoo Dev
 
CSCM Chapter 3 strategic procurement and value chain cscm
CSCM Chapter 3 strategic procurement and value chain cscmCSCM Chapter 3 strategic procurement and value chain cscm
CSCM Chapter 3 strategic procurement and value chain cscmEst
 
Strengthening Security with Continuous Monitoring
Strengthening Security with Continuous MonitoringStrengthening Security with Continuous Monitoring
Strengthening Security with Continuous MonitoringBooz Allen Hamilton
 

En vedette (17)

Data Mining
Data MiningData Mining
Data Mining
 
The Art of Data Science
The Art of Data ScienceThe Art of Data Science
The Art of Data Science
 
The Art and Science of Analyzing Software Data
The Art and Science of Analyzing Software DataThe Art and Science of Analyzing Software Data
The Art and Science of Analyzing Software Data
 
Social Machines - 2017 Update (University of Iowa)
Social Machines - 2017 Update (University of Iowa)Social Machines - 2017 Update (University of Iowa)
Social Machines - 2017 Update (University of Iowa)
 
Social Machines: The coming collision of Artificial Intelligence, Social Netw...
Social Machines: The coming collision of Artificial Intelligence, Social Netw...Social Machines: The coming collision of Artificial Intelligence, Social Netw...
Social Machines: The coming collision of Artificial Intelligence, Social Netw...
 
Watson: An Academic's Perspective
Watson: An Academic's PerspectiveWatson: An Academic's Perspective
Watson: An Academic's Perspective
 
On Beyond OWL: challenges for ontologies on the Web
On Beyond OWL: challenges for ontologies on the WebOn Beyond OWL: challenges for ontologies on the Web
On Beyond OWL: challenges for ontologies on the Web
 
Python as the Zen of Data Science
Python as the Zen of Data SciencePython as the Zen of Data Science
Python as the Zen of Data Science
 
KPI e Metriche per i Media e la Comunicazione Commerciale
KPI e  Metriche per i Media e la Comunicazione CommercialeKPI e  Metriche per i Media e la Comunicazione Commerciale
KPI e Metriche per i Media e la Comunicazione Commerciale
 
Ephesians for Beginners - #6 - The Basis for Unity in the Church
Ephesians for Beginners - #6 - The Basis for Unity in the ChurchEphesians for Beginners - #6 - The Basis for Unity in the Church
Ephesians for Beginners - #6 - The Basis for Unity in the Church
 
Anticipatory Coordination in Socio-technical Knowledge-intensive Environments...
Anticipatory Coordination in Socio-technical Knowledge-intensive Environments...Anticipatory Coordination in Socio-technical Knowledge-intensive Environments...
Anticipatory Coordination in Socio-technical Knowledge-intensive Environments...
 
καστοριά
καστοριάκαστοριά
καστοριά
 
Introduction to high-tech entrepreneurship
Introduction to high-tech entrepreneurshipIntroduction to high-tech entrepreneurship
Introduction to high-tech entrepreneurship
 
腰カラビナ そして野帳
腰カラビナ そして野帳腰カラビナ そして野帳
腰カラビナ そして野帳
 
Getting started erlang
Getting started erlangGetting started erlang
Getting started erlang
 
CSCM Chapter 3 strategic procurement and value chain cscm
CSCM Chapter 3 strategic procurement and value chain cscmCSCM Chapter 3 strategic procurement and value chain cscm
CSCM Chapter 3 strategic procurement and value chain cscm
 
Strengthening Security with Continuous Monitoring
Strengthening Security with Continuous MonitoringStrengthening Security with Continuous Monitoring
Strengthening Security with Continuous Monitoring
 

Similaire à The Science of Data Science

Data Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactData Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactDr. Sunil Kr. Pandey
 
Making an impact with data science
Making an impact  with data scienceMaking an impact  with data science
Making an impact with data scienceJordan Engbers
 
Data Science Demystified_ Journeying Through Insights and Innovations
Data Science Demystified_ Journeying Through Insights and InnovationsData Science Demystified_ Journeying Through Insights and Innovations
Data Science Demystified_ Journeying Through Insights and InnovationsVaishali Pal
 
Big Data, The Community and The Commons (May 12, 2014)
Big Data, The Community and The Commons (May 12, 2014)Big Data, The Community and The Commons (May 12, 2014)
Big Data, The Community and The Commons (May 12, 2014)Robert Grossman
 
The Internet of Things: What's next?
The Internet of Things: What's next? The Internet of Things: What's next?
The Internet of Things: What's next? PayamBarnaghi
 
Data Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemData Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemWarren Kibbe
 
Big data: Challenges, Practices and Technologies
Big data: Challenges, Practices and TechnologiesBig data: Challenges, Practices and Technologies
Big data: Challenges, Practices and TechnologiesNavneet Randhawa
 
Massive-Scale Analytics Applied to Real-World Problems
Massive-Scale Analytics Applied to Real-World ProblemsMassive-Scale Analytics Applied to Real-World Problems
Massive-Scale Analytics Applied to Real-World Problemsinside-BigData.com
 
The FAIR data movement and 22 Feb 2023.pdf
The FAIR data movement and 22 Feb 2023.pdfThe FAIR data movement and 22 Feb 2023.pdf
The FAIR data movement and 22 Feb 2023.pdfAlan Morrison
 
Ci2004-10.doc
Ci2004-10.docCi2004-10.doc
Ci2004-10.docbutest
 
Introduction to Data Analytics and data analytics life cycle
Introduction to Data Analytics and data analytics life cycleIntroduction to Data Analytics and data analytics life cycle
Introduction to Data Analytics and data analytics life cycleDr. Radhey Shyam
 
Data Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemData Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemWarren Kibbe
 
KIT-601-L-UNIT-1 (Revised) Introduction to Data Analytcs.pdf
KIT-601-L-UNIT-1 (Revised) Introduction to Data Analytcs.pdfKIT-601-L-UNIT-1 (Revised) Introduction to Data Analytcs.pdf
KIT-601-L-UNIT-1 (Revised) Introduction to Data Analytcs.pdfDr. Radhey Shyam
 
Data Mining: Future Trends and Applications
Data Mining: Future Trends and ApplicationsData Mining: Future Trends and Applications
Data Mining: Future Trends and ApplicationsIJMER
 
BIMCV, Banco de Imagen Medica de la Comunidad Valenciana. María de la Iglesia
BIMCV, Banco de Imagen Medica de la Comunidad Valenciana. María de la IglesiaBIMCV, Banco de Imagen Medica de la Comunidad Valenciana. María de la Iglesia
BIMCV, Banco de Imagen Medica de la Comunidad Valenciana. María de la IglesiaMaria de la Iglesia
 
using big-data methods analyse the Cross platform aviation
 using big-data methods analyse the Cross platform aviation using big-data methods analyse the Cross platform aviation
using big-data methods analyse the Cross platform aviationranjit banshpal
 

Similaire à The Science of Data Science (20)

Data Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactData Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
 
Making an impact with data science
Making an impact  with data scienceMaking an impact  with data science
Making an impact with data science
 
Data Science Demystified_ Journeying Through Insights and Innovations
Data Science Demystified_ Journeying Through Insights and InnovationsData Science Demystified_ Journeying Through Insights and Innovations
Data Science Demystified_ Journeying Through Insights and Innovations
 
Big Data, The Community and The Commons (May 12, 2014)
Big Data, The Community and The Commons (May 12, 2014)Big Data, The Community and The Commons (May 12, 2014)
Big Data, The Community and The Commons (May 12, 2014)
 
The Internet of Things: What's next?
The Internet of Things: What's next? The Internet of Things: What's next?
The Internet of Things: What's next?
 
Data Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemData Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health System
 
Big data: Challenges, Practices and Technologies
Big data: Challenges, Practices and TechnologiesBig data: Challenges, Practices and Technologies
Big data: Challenges, Practices and Technologies
 
Massive-Scale Analytics Applied to Real-World Problems
Massive-Scale Analytics Applied to Real-World ProblemsMassive-Scale Analytics Applied to Real-World Problems
Massive-Scale Analytics Applied to Real-World Problems
 
The FAIR data movement and 22 Feb 2023.pdf
The FAIR data movement and 22 Feb 2023.pdfThe FAIR data movement and 22 Feb 2023.pdf
The FAIR data movement and 22 Feb 2023.pdf
 
Ci2004-10.doc
Ci2004-10.docCi2004-10.doc
Ci2004-10.doc
 
Introduction to Data Analytics and data analytics life cycle
Introduction to Data Analytics and data analytics life cycleIntroduction to Data Analytics and data analytics life cycle
Introduction to Data Analytics and data analytics life cycle
 
Data Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemData Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health System
 
Information_Systems
Information_SystemsInformation_Systems
Information_Systems
 
Fair by design
Fair by designFair by design
Fair by design
 
KIT-601-L-UNIT-1 (Revised) Introduction to Data Analytcs.pdf
KIT-601-L-UNIT-1 (Revised) Introduction to Data Analytcs.pdfKIT-601-L-UNIT-1 (Revised) Introduction to Data Analytcs.pdf
KIT-601-L-UNIT-1 (Revised) Introduction to Data Analytcs.pdf
 
Big Data & DS Analytics for PAARL
Big Data & DS Analytics for PAARLBig Data & DS Analytics for PAARL
Big Data & DS Analytics for PAARL
 
Data Mining: Future Trends and Applications
Data Mining: Future Trends and ApplicationsData Mining: Future Trends and Applications
Data Mining: Future Trends and Applications
 
BIMCV, Banco de Imagen Medica de la Comunidad Valenciana. María de la Iglesia
BIMCV, Banco de Imagen Medica de la Comunidad Valenciana. María de la IglesiaBIMCV, Banco de Imagen Medica de la Comunidad Valenciana. María de la Iglesia
BIMCV, Banco de Imagen Medica de la Comunidad Valenciana. María de la Iglesia
 
BrightTALK - Semantic AI
BrightTALK - Semantic AI BrightTALK - Semantic AI
BrightTALK - Semantic AI
 
using big-data methods analyse the Cross platform aviation
 using big-data methods analyse the Cross platform aviation using big-data methods analyse the Cross platform aviation
using big-data methods analyse the Cross platform aviation
 

Plus de James Hendler

Knowing what AI Systems Don't know and Why it matters
Knowing what AI  Systems Don't know and Why it mattersKnowing what AI  Systems Don't know and Why it matters
Knowing what AI Systems Don't know and Why it mattersJames Hendler
 
Exploring the Boundaries of Artificial Intelligence (or "Modern AI")
Exploring the Boundaries of Artificial Intelligence (or "Modern AI")Exploring the Boundaries of Artificial Intelligence (or "Modern AI")
Exploring the Boundaries of Artificial Intelligence (or "Modern AI")James Hendler
 
Tragedy of the Data Commons (ODSC-East, 2021)
Tragedy of the Data Commons (ODSC-East, 2021)Tragedy of the Data Commons (ODSC-East, 2021)
Tragedy of the Data Commons (ODSC-East, 2021)James Hendler
 
Tragedy of the (Data) Commons
Tragedy of the (Data) CommonsTragedy of the (Data) Commons
Tragedy of the (Data) CommonsJames Hendler
 
Knowledge Graph Semantics/Interoperability
Knowledge Graph Semantics/InteroperabilityKnowledge Graph Semantics/Interoperability
Knowledge Graph Semantics/InteroperabilityJames Hendler
 
The Future(s) of the World Wide Web
The Future(s) of the World Wide WebThe Future(s) of the World Wide Web
The Future(s) of the World Wide WebJames Hendler
 
Enhancing Precision Wellness with Personal Health Knowledge Graphs
Enhancing Precision Wellness with Personal Health Knowledge Graphs Enhancing Precision Wellness with Personal Health Knowledge Graphs
Enhancing Precision Wellness with Personal Health Knowledge Graphs James Hendler
 
The Future of AI: Going Beyond Deep Learning, Watson, and the Semantic Web
The Future of AI: Going BeyondDeep Learning, Watson, and the Semantic WebThe Future of AI: Going BeyondDeep Learning, Watson, and the Semantic Web
The Future of AI: Going Beyond Deep Learning, Watson, and the Semantic WebJames Hendler
 
Capacity Building: Data Science in the University At Rensselaer Polytechnic ...
Capacity Building: Data Science in the University  At Rensselaer Polytechnic ...Capacity Building: Data Science in the University  At Rensselaer Polytechnic ...
Capacity Building: Data Science in the University At Rensselaer Polytechnic ...James Hendler
 
Enhancing Precision Wellness with Knowledge Graphs and Semantic Analytics: O...
Enhancing Precision Wellness with  Knowledge Graphs and Semantic Analytics: O...Enhancing Precision Wellness with  Knowledge Graphs and Semantic Analytics: O...
Enhancing Precision Wellness with Knowledge Graphs and Semantic Analytics: O...James Hendler
 
KR in the age of Deep Learning
KR in the age of Deep LearningKR in the age of Deep Learning
KR in the age of Deep LearningJames Hendler
 
Digital Archiving, The Semantic Web, and Modern AI
Digital Archiving, The Semantic Web, and Modern AIDigital Archiving, The Semantic Web, and Modern AI
Digital Archiving, The Semantic Web, and Modern AIJames Hendler
 
The Unreasonable Effectiveness of Metadata
The Unreasonable Effectiveness of MetadataThe Unreasonable Effectiveness of Metadata
The Unreasonable Effectiveness of MetadataJames Hendler
 
Knowledge Representation in the Age of Deep Learning, Watson, and the Semanti...
Knowledge Representation in the Age of Deep Learning, Watson, and the Semanti...Knowledge Representation in the Age of Deep Learning, Watson, and the Semanti...
Knowledge Representation in the Age of Deep Learning, Watson, and the Semanti...James Hendler
 
Artificial Intelligence: Existential Threat or Our Best Hope for the Future?
Artificial Intelligence: Existential Threat or Our Best Hope for the Future?Artificial Intelligence: Existential Threat or Our Best Hope for the Future?
Artificial Intelligence: Existential Threat or Our Best Hope for the Future?James Hendler
 
Facilitating Web Science Collaboration through Semantic Markup
Facilitating Web Science Collaboration through Semantic MarkupFacilitating Web Science Collaboration through Semantic Markup
Facilitating Web Science Collaboration through Semantic MarkupJames Hendler
 
Why Watson Won: A cognitive perspective
Why Watson Won: A cognitive perspectiveWhy Watson Won: A cognitive perspective
Why Watson Won: A cognitive perspectiveJames Hendler
 
Watson at RPI - Summer 2013
Watson at RPI - Summer 2013Watson at RPI - Summer 2013
Watson at RPI - Summer 2013James Hendler
 
Future of the World WIde Web (India)
Future of the World WIde Web (India)Future of the World WIde Web (India)
Future of the World WIde Web (India)James Hendler
 

Plus de James Hendler (20)

Knowing what AI Systems Don't know and Why it matters
Knowing what AI  Systems Don't know and Why it mattersKnowing what AI  Systems Don't know and Why it matters
Knowing what AI Systems Don't know and Why it matters
 
Exploring the Boundaries of Artificial Intelligence (or "Modern AI")
Exploring the Boundaries of Artificial Intelligence (or "Modern AI")Exploring the Boundaries of Artificial Intelligence (or "Modern AI")
Exploring the Boundaries of Artificial Intelligence (or "Modern AI")
 
Tragedy of the Data Commons (ODSC-East, 2021)
Tragedy of the Data Commons (ODSC-East, 2021)Tragedy of the Data Commons (ODSC-East, 2021)
Tragedy of the Data Commons (ODSC-East, 2021)
 
Tragedy of the (Data) Commons
Tragedy of the (Data) CommonsTragedy of the (Data) Commons
Tragedy of the (Data) Commons
 
Knowledge Graph Semantics/Interoperability
Knowledge Graph Semantics/InteroperabilityKnowledge Graph Semantics/Interoperability
Knowledge Graph Semantics/Interoperability
 
The Future(s) of the World Wide Web
The Future(s) of the World Wide WebThe Future(s) of the World Wide Web
The Future(s) of the World Wide Web
 
Enhancing Precision Wellness with Personal Health Knowledge Graphs
Enhancing Precision Wellness with Personal Health Knowledge Graphs Enhancing Precision Wellness with Personal Health Knowledge Graphs
Enhancing Precision Wellness with Personal Health Knowledge Graphs
 
The Future of AI: Going Beyond Deep Learning, Watson, and the Semantic Web
The Future of AI: Going BeyondDeep Learning, Watson, and the Semantic WebThe Future of AI: Going BeyondDeep Learning, Watson, and the Semantic Web
The Future of AI: Going Beyond Deep Learning, Watson, and the Semantic Web
 
Capacity Building: Data Science in the University At Rensselaer Polytechnic ...
Capacity Building: Data Science in the University  At Rensselaer Polytechnic ...Capacity Building: Data Science in the University  At Rensselaer Polytechnic ...
Capacity Building: Data Science in the University At Rensselaer Polytechnic ...
 
Enhancing Precision Wellness with Knowledge Graphs and Semantic Analytics: O...
Enhancing Precision Wellness with  Knowledge Graphs and Semantic Analytics: O...Enhancing Precision Wellness with  Knowledge Graphs and Semantic Analytics: O...
Enhancing Precision Wellness with Knowledge Graphs and Semantic Analytics: O...
 
KR in the age of Deep Learning
KR in the age of Deep LearningKR in the age of Deep Learning
KR in the age of Deep Learning
 
Digital Archiving, The Semantic Web, and Modern AI
Digital Archiving, The Semantic Web, and Modern AIDigital Archiving, The Semantic Web, and Modern AI
Digital Archiving, The Semantic Web, and Modern AI
 
The Unreasonable Effectiveness of Metadata
The Unreasonable Effectiveness of MetadataThe Unreasonable Effectiveness of Metadata
The Unreasonable Effectiveness of Metadata
 
Knowledge Representation in the Age of Deep Learning, Watson, and the Semanti...
Knowledge Representation in the Age of Deep Learning, Watson, and the Semanti...Knowledge Representation in the Age of Deep Learning, Watson, and the Semanti...
Knowledge Representation in the Age of Deep Learning, Watson, and the Semanti...
 
Wither OWL
Wither OWLWither OWL
Wither OWL
 
Artificial Intelligence: Existential Threat or Our Best Hope for the Future?
Artificial Intelligence: Existential Threat or Our Best Hope for the Future?Artificial Intelligence: Existential Threat or Our Best Hope for the Future?
Artificial Intelligence: Existential Threat or Our Best Hope for the Future?
 
Facilitating Web Science Collaboration through Semantic Markup
Facilitating Web Science Collaboration through Semantic MarkupFacilitating Web Science Collaboration through Semantic Markup
Facilitating Web Science Collaboration through Semantic Markup
 
Why Watson Won: A cognitive perspective
Why Watson Won: A cognitive perspectiveWhy Watson Won: A cognitive perspective
Why Watson Won: A cognitive perspective
 
Watson at RPI - Summer 2013
Watson at RPI - Summer 2013Watson at RPI - Summer 2013
Watson at RPI - Summer 2013
 
Future of the World WIde Web (India)
Future of the World WIde Web (India)Future of the World WIde Web (India)
Future of the World WIde Web (India)
 

Dernier

GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 

Dernier (20)

GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 

The Science of Data Science

  • 1. The Science of Data Science (Data plus Semantics yields Knowledge) Prof. James Hendler Tetherless World Constellation Chair of Computer, Web and Cognitive Sciences Director, The Rensselaer IDEA 1
  • 2. The Rensselaer Institute for Data Exploration and Applications Performance Plan to Budget Presentation February 2015 The Rensselaer Institute for Data Exploration and Applications (IDEA) is a breakthrough initiative brings together key research areas and advanced technologies to revolutionize the way we use data in science, engineering, and virtually every other research and educational discipline. By bridging the gaps between analytics, modeling, and simulation we continue the Rensselaer tradition as a leader in applying critical technologies to improving everyday life and meeting the challenges of the future.
  • 3. 3 The Rensselaer Institute for Data Exploration and Applications Business Systems: Built and Natural Environments: Cyber- Resiliency: Policy, Ethics and Stewardship: Materials Informatics:Data-driven Physical/Life Sciences: Healthcare Analytics and Mobile Health: Social Network Analytics: Agents and Augmented Reality:
  • 4. 4 IDEA project examples • Healthcare in Context: Data mining/analytics to Improve public health from a systems perspective at the individual to national scales. • Data-Centric Engineering Design: Data-driven Design & Control under uncertainty via data fusion across multiple scales and sources • Supply Chain Resilience through Information Visibility: Demonstrate uses of supply chain information visibility for anticipating, mitigating and recovering from disruptive events • Accelerated design of functional materials/Material Ontology: Address basic materials processing data-based informatics for complex, multifunctional (often nano) materials. • Biome-informatics: Develop data aggregation and computational tools to integrate disparate datasets into large ecosystem models using data collected on the microbial communities that inhabit the base of most ecosystems • Deducing Structure to Function in Biomedicine: Develop systematic data-resourced methods for discovering and exploiting structure-to-function relationships.
  • 5. 5 KDD Pipeline – as usually presented Data Storage (Big Data Warehouse)
  • 6. KDD Pipeline – in the real world Data is increasingly being brought in from external sources, with mixed provenance, and increasingly outside the analyzers’ control. At increasing rates and scales 6 Data Storage Sensors and apps Social Media Customer Behaviors Web Partners Formatting, standards use, data cleansing, data bias analysis, … Open data Data Storage Data Storage Data Storage Data Storage Data Storage Data Storage Data Storage
  • 7. Tough data integration challenges Enterprise analytics Open Data Integration Hard problems!
  • 8. Closing the loop on (big) data IDEA is focusing on key data science areas which are revolutionizing engineering, science and business with significant social impact 8 Predictive Analytics Discovery Informatics Data Exploration
  • 9. Theme 1: Predictive Analytics 9 From “what is” to “what if”
  • 10. Courtesy of Eric Schadt, Mount Sinai Example: Healthcare Data Analytics The Digital Universe of Data to Better Diagnose and Treat Patients Courtesy of Eric Schadt, Mount Sinai
  • 11. Identifying predictive features in data Each factor must be separately analyzed for its “Predictivity” • Mutual information measure The “black art” of predictive analytics is finding the right ones • Use too few, the model is weak • Use too many, the model becomes slow and dominated by noise Algorithms required to do this because the overwhelming number of “weak” factors defies human abilities to combine • Machine learning identifies key feature • some require “roll ups” • some require “pull outs” • Mathematical techniques then reduce the dimensionality 11
  • 12. 12 Predictive analytics in sensors Extend-o-hand (Josh Shinavier. PhD) Classification of the sensor data (via machine-learning) allows predictive recognition of different gestures (i.e. before the gesture is finished).
  • 13. 13 Predictive analytics in large scale behaviors List clusters at risk for Asian Clams <1mile Cook’s Bay.” Machine-learning generates predicts future distributions of invasive species in Lake George based on current distributions and bathymetry similarity.
  • 14. Predictive Social Network Analytics (with RPI NeST center) 14 Social Networks in Action Analyzing cascading failures Modeling (supply chain) networks… and predicting (cascading) network risks. Modeling network stressors (including human cognitive element) Understanding network dynamics
  • 15. 15 Data Science Research Center: tools for data analytics Theory & Algorithms • Randomized • Optimization • Approximation • Multilinear Algebra Applications Statistics • Multivariate analysis • Optimal Experimental Design Dimension reduction by randomized algorithms for numerical linear algebra for identify significant components and visualizing Petabyte-scale data matrices (P. Drineas, CSCI) Parallel Factor Analysis for tensor systems creates a scalable solution, on AMOS, for a critical data-processing component of data analytics for large graphs. (B. Yener, CSCI) Computational concerns • Scaling • Cyber Security for Data
  • 16. Adding Semantics: Discovery Informatics 16 From “what if” to “Why”
  • 17. 17 Scientific data: Microbiome informatics Human Biome Environmental Biome Built Environment Data Analytics Semantic Data Integration While microbes are among the smallest organisms on the planet, they are also the largest influence on mass and nutrient transport in the biosphere. They are the base of most natural ecosystems, as well as the purveyors of air and water quality. It is also microbes that primarily govern disease transfer and human health in our built environments.
  • 18. 18 Materials Processing Ontology (cMDIS/IDEA) The materials field has made much progress on systematically understanding materials structure-to-property relationships, but lacks an organized model of processing-to- property relations. A critical need for systematic development of new materials technologies! Goal: Create a (machine-readable) ontology for materials processing. By combining our expertise in data science, materials and manufacturing, we are creating a key missing link in the Materials Genome Initiative.
  • 19. Some questions need a qualitative answer Platform for Experimental Collaborative Ethnography
  • 20. 20 Discovery Informatics Requires Unstructured data Integration of text analytics, natural language processing, network-based multimedia analysis and structured/unstructured data integration
  • 21. Requires Unstructured data (real-time feeds /images/video) DOE SEAB report on HPC: How might a neuromorphic “accelerator” type processor be used to improve the application performance, power consumption and overall system reliability of future exascale systems? 21 Power Consumption (w/IBM) Network Learning (sensors) Sparse Distributed Representations Hybrid Neural/Symbolic Systems Neuromorphic Computing: software systems that implement models inspired by neural systems to analyze data tied to perception, motor control, or multisensory integration.
  • 22. 22 Neuromorphic Computing (CCI/IDEA) Joint CCI/IDEA project to use supercomputer to model state-of-the-art neuromorphic processors Use for improving AMOS energy use (like autonomic control) Use for exploring inputs from data-sensing systems (extrinsic control) Neuromorphic Computing requires critical Rensselaer technologies Integrating data analytics (on the fly) with simulation and modeling CCI (AMOS) allows us to explore new variants on neuromorphic approaches IDEA provides learning models and analytics capabilities for evaluation Together allow us to attack audio/visual streaming data autonomic extrinsic
  • 23. Theme 3: Data Exploration 23 From “why” to “what is”
  • 24. 24 From visualization to exploration … Unfortunately, visualization too often becomes an end product of scientific analysis, rather than an exploration tool that scientists can use throughout the research life cycle. However, new database technologies, coupled with emerging Web-based technologies, may hold the key to lowering the cost of visualization generation and allow it to become a more integral part of the scientific process.
  • 25. 25 From visualization to exploration … Unfortunately, visualization too often becomes an end product of scientific analysis, rather than an exploration tool that scientists can use throughout the research life cycle. However, new database technologies, coupled with emerging Web-based technologies, may hold the key to lowering the cost of visualization generation and allow it to become a more integral part of the scientific process.
  • 26. 26 From what is, to what if, to why (and back) These capabilities are critical in “closing the loop” between data, simulation and modeling in scientific discovery, engineering design, and business innovation.
  • 27. 27 A “Data Science” Research Agenda Multiscale Sparcity Abductive Agent-oriented
  • 28. • Gathering and representing information from multiple sources • topic of CODS talk • Systematic (and scalable) methods for predictive analytics • example: Parallel search for best kernel functions 28 Supporting the Scientific agenda • New Data Exploration platforms • example: Patent pending on new multi- user collaborative device • Cognitive and immersive platforms • Data sharing standards • Research Data Alliance • W3C
  • 29. The Rensselaer IDEA Summary • Data is not just the “oil” of the new generation • information is the new power source generated from that “oil” • Using data for prediction is becoming less of an art, but still needs systematicity • Scaling tools beyond MapReduce • Better methods for rapid customization • Turning data into causal or design knowledge is in its early stages • Closing the loop from data to design requires new informatics, new mathematics, and new ways of thinking beyond data mining 29

Notes de l'éditeur

  1. Ones with numbers are secondary diagnosis indicator variables. * indicate categorical variables. In practice during modeling they is one “predictivity” index
  2. Working with faculty from SoS, SoE, HASS and SoA
  3. (in the UCTE power grid network, employing capacity-limited current flows in resistor networks).
  4. Put it on a slide: show an example of someone using it Current version of PECE is running over 4 different projects (Disaster STS Network, The Asthma Files, World PECE, World Academia) Largest site (DSTS Network) has 35 users from universities all over the country and 14 different user groups “Feature” lists of moveable functionality and modules that can be ported to any Drupal site.
  5. this is the data science agenda- basically, these are the hard problems in the closing the loop – how to go from the correlation on one side to the causal on the other – I don’t love the term agent-oriented, but we mean a combination of unstructured, AI, etc – abductive is usually where I talk about these being hard inverse problems where we don’t know a specific function, but rathr are looking for an explanation.
  6. this is the data science agenda- basically, these are the hard problems in the closing the loop – how to go from the correlation on one side to the causal on the other – I don’t love the term agent-oriented, but we mean a combination of unstructured, AI, etc – abductive is usually where I talk about these being hard inverse problems where we don’t know a specific function, but rathr are looking for an explanation.