SlideShare une entreprise Scribd logo
1  sur  66
Bill Howe
Information School
Computer Science & Engineering
University of Washington
Big Data + Big Sim:
Query Processing over
Unstructured CFD Models
8/7/2017 Bill Howe, UW 1
Scott Moe
Applied Math
University of Washington
This morning…
• Data-intensive science in oceanography
• Background on databases and query
algebras
• Regridding: Integrating ocean models using
a database-style algebra
• If time: Responsible data science
8/7/2017 Bill Howe, UW 2
Motivation Algebraic Optimization Regridding End
My position for this talk…
• Simulations are sources of data
• Analysis requires querying across
heterogeneous data sources, including
simulations
• The CS database community has the
right set of concepts and approaches
…but ultimately we’re just plumbers
8/7/2017 Bill Howe, UW 3
Motivation Algebraic Optimization Regridding End
The Fourth Paradigm
1. Empirical + experimental
2. Theoretical
3. Computational
4. Data-Intensive
Jim Gray
8/7/2017 Bill Howe, UW 4
Motivation Algebraic Optimization Regridding End
Nearly every field of discovery is transitioning
from “data poor” to “data rich”
Astronomy: LSST
Physics: LHC
Oceanography: OOI
Social Sciences
Biology: Sequencing
Economics
Neuroscience: EEG, fMRI
Motivation Algebraic Optimization Regridding End
8/7/2017 Bill Howe, UW 6
Complex
System
“Little linear windows”
Academic research
Practitioners
One view of “data science” is the streamline the discovery, interpretation,
and operationalization of semi-robust local patterns that have predictive
power for some task.1
In general, these don’t exist. But in specific situations, they do.
slide: John Delaney, UW
Motivation Algebraic Optimization Regridding End
Regional Scale Nodes
8/7/2017 Bill Howe, UW 8
John
Delaney
10s of Gigabits/second from the ocean floor
Motivation Algebraic Optimization Regridding End
8/7/2017 Bill Howe, UW 9
17 federal organizations named as partners
11 Regional Associations
“a strategy for incorporating observation systems from …
near shore waters as part of … a network of observatories.”
Motivation Algebraic Optimization Regridding End
Center for Coastal Margin
Observation and Prediction (CMOP)
8/7/2017 Bill Howe, UW 10
Antonio
Baptista
Motivation Algebraic Optimization Regridding End
Virtual Mekong Basin
8/7/2017 Bill Howe, UW 11
img src: Mark Stoermer, UW Center for Environmental Visualization
Jeff
Richey
Motivation Algebraic Optimization Regridding End
So what?
• Geosciences are transitioning from
expedition-based to observatory-based
science
• Enormous investments in integrating
sensors and models
• The big problem: ad hoc queries over
large, heterogeneous, distributed datasets
and models
8/7/2017 Bill Howe, UW 12
Motivation Algebraic Optimization Regridding End
So what do we do about querying across
heterogeneous sources?
Raise the level of abstraction and let the
system handle the details
8/7/2017 Bill Howe, UW 13
Motivation Algebraic Optimization Regridding End
Pre-Relational: if your data changed, your application broke.
Early RDBMS were buggy and slow (and often reviled), but
required only 5% of the application code.
“Activities of users at terminals and most application programs should
remain unaffected when the internal representation of data is changed and
even when some aspects of the external representation are changed.”
Key Idea: Programs that manipulate tabular data exhibit an algebraic
structure allowing reasoning and manipulation independently of physical
data representation
Digression: Relational Database History
-- Codd 1979
Motivation Algebraic Optimization Regridding End
Key Idea: An Algebra of Tables
select
project
join join
Other operators: aggregate, union, difference, cross product
Motivation Algebraic Optimization Regridding End
16
Review: Algebraic Optimization
N = ((4*2)+((4*3)+0))/1
Algebraic Laws:
1. (+) identity: x+0 = x
2. (/) identity: x/1 = x
3. (*) distributes: (n*x+n*y) = n*(x+y)
4. (*) commutes: x*y = y*x
Apply rules 1, 3, 4, 2: N = (2+3)*4
two operations instead of five, no division operator
Same idea works with very large tables, but the payoff is much higher
Motivation Algebraic Optimization Regridding End
17
Algebraic Optimization:
Find a better logical plan
Product Purchase
pid=pid
price>100 and city=‘Seattle’
x.name,z.name
δ
cid=cid
Customer
Π
σ
Product(pid, name, price)
Purchase(pid, cid, store)
Customer(cid, name, city)
SELECT DISTINCT x.name, z.name
FROM Product x, Purchase y, Customer z
WHERE x.pid = y.pid and y.cid = z.cid and
x.price > 100 and z.city = ‘Seattle’
Motivation Algebraic Optimization Regridding End
18
Algebraic Optimization:
Find a better logical plan
Product Purchase
pid=pid
city=‘Seattle’
x.name,z.name
δ
cid=cid
Customer
Π
σ
price>100
σ
Query optimization =
finding cheaper,
equivalent expressions
SELECT DISTINCT x.name, z.name
FROM Product x, Purchase y, Customer z
WHERE x.pid = y.pid and y.cid = z.cid and
x.price > 100 and z.city = ‘Seattle’
Motivation Algebraic Optimization Regridding End
Same logical expression, different physical
algorithms
Which is faster?
SELECT *
FROM Order o, Item i
WHERE o.order = i.order
join
scan scan
o.order = i.order
Order oItem i
for each record i in Item:
for each record o in Order:
if o.order = i.order:
return (r,s)
Option 1
for each record i in Item:
insert into hashtable
for each record o in Order:
lookup corresponding records in hashtable
return matching pairs
Option 2
O(N)
O(1)
O(M)
O(1)
O(N)
O(1)
O(~1)
O(M)
overall:
O(N*M)
overall:
O(N+M)
Motivation Algebraic Optimization Regridding End
3/12/09 Bill Howe, eScience Institute 20

H0 : (x,y,b) V0 : (z)
A
restrict(0, z >b)
B
color is depth
Algebraic Manipulation of Scientific Datasets,
B. Howe, D. Maier, VLDBJ 2005

H0 : (x,y,b) V0 : ( )
apply(0, z=(surf  b) *  )
bind(0, surf)
C
color is salinity
GridFields: An Algebra of Meshes
Motivation Algebraic Optimization Regridding End
Example (1)
H = Scan(context, "H")
rH = Restrict("(326<x) & (x<345) & (287<y) & (y<302)", 0, H)
H = rH =
dimensionpredicate
color: bathymetry
Motivation Algebraic Optimization Regridding End
8/7/2017 howeb@stccmop.org
Example: Transect
P
Motivation Algebraic Optimization Regridding End
8/7/2017 howeb@stccmop.org
Transect: Bad Query Plan

H(x,y,b)
V(z)
r(z>b) b(s) regrid

P
P  V
1) Construct full-size 3D grid
2) Construct 2D transect grid
3) Interpolate 1) onto 2)
Motivation Algebraic Optimization Regridding End
8/7/2017 howeb@stccmop.org
Transect: Optimized Plan
P  V
V(z)
P
H(x,y,b)
regrid b(s) regrid

1) Find 2D cells containing points
2) Create “stacks” of 2D cells carrying data
3) Create 2D transect grid
4) Interpolate 2) onto 3)
Motivation Algebraic Optimization Regridding End
8/7/2017 howeb@stccmop.org
1) Find cells containing points in P
Motivation Algebraic Optimization Regridding End
8/7/2017 howeb@stccmop.org
1)
4)
2)
1) Find cells containing points in P
2) Construct “stacks” of cells
4) Interpolate
Motivation Algebraic Optimization Regridding End
Transect: Results
8/7/2017 howeb@stccmop.org
0
5
10
15
20
25
30
35
40
45
vtk(3D) interpolate simple interp_o simple_o
secs
800 MB
(1 timestep)
Motivation Algebraic Optimization Regridding End
Back to integrating models:
What is the right abstraction?
• Claim: Everything reduces to regridding
• Model-data comparisons skill assessment?
Regrid observations onto model mesh
• Model-model comparison?
Regrid one model’s mesh onto the other’s
• Model coupling?
Regrid a meso-scale atmospheric model onto your regional ocean model
• Visualization?
Regrid onto a 3D mesh, or regrid onto a 2D array of pixels
8/7/2017 Bill Howe, UW 28
Motivation Algebraic Optimization Regridding End
Status Quo
• “FTP + MATLAB”
• “Nascent Databases”
– File-based, format-specific API
– UniData’s NetCDF, HDF5
– Some IO optimization, some indexing
• “Data Servers”
– Same as file-based systems,
– but supports RPC
8/7/2017 Bill Howe, UW 29
Hyrax
None of this scales
- up with data volumes
- up with number of sources
- down with developer expertise
Motivation Algebraic Optimization Regridding End
Summary so far
• “Integration” means “regridding”
– mesh to pixels, mesh to mesh, trajectory to mesh
– satellites to models, models to models, observations to models
• Regridding is hard
– Must be easy, tolerant of unusual grids, numerically conservative, efficient
Our goal
• Define a “universal regridding” operator with nice algebraic
properties
• Use it to implement efficient distributed data sharing applications,
parallel algorithms, and more
8/7/2017 Bill Howe, UW 30
Motivation Algebraic Optimization Regridding End
What are some complexities we want to
hide?
• Unstructured Grids
• Numerical Conservation
• Choice of Algorithms
8/7/2017 Bill Howe, UW 31
Motivation Algebraic Optimization Regridding End
8/7/2017 Bill Howe, UW 32
Motivation Algebraic Optimization Regridding End
8/7/2017 Bill Howe, UW 33
Washington
Oregon
Columbia River Estuary
Motivation Algebraic Optimization Regridding End
Washington
Oregon
Columbia River Estuary
Motivation Algebraic Optimization Regridding End
SciDB
Hyrax
GridFields
ESMF
VTK/Paraview
easy; good support hard; poor support
Motivation Algebraic Optimization Regridding End
Structured grids are easy
8/7/2017 Bill Howe, eScience Institute 36
 The data model…
(Cartesian products of coordinate variables)
 …immediately implies a representation,
(multidimensional arrays)
 …an API,
(reading and writing subslabs)
 …and an efficient implementation
(address calculation using array “shape”)
Motivation Algebraic Optimization Regridding End
What are some complexities we want to
hide?
• Unstructured Grids
• Numerical Conservation
• Choice of Algorithms
8/7/2017 Bill Howe, UW 37
Motivation Algebraic Optimization Regridding End
Naïve Method: Interpolation (Spatial Join)
8/7/2017 Bill Howe, UW 38
For each vertex in the target grid,
Find containing cell in the source grid,
Evaluate the basis functions to interpolate
Motivation Algebraic Optimization Regridding End
8/7/2017 Bill Howe, UW 39
Motivation Algebraic Optimization Regridding End
Supermeshing [Farrell 10]
8/7/2017 Bill Howe, UW 40
For each cell in the target grid,
Find overlapping cells in the source grid,
Compute their intersections
Derive new coefficients to minimize L2 norm
* Guaranteeed Conservative
* Minimizes Error
But:
Domains must match exactly
Motivation Algebraic Optimization Regridding End
8/7/2017 Bill Howe, UW 41
Motivation Algebraic Optimization Regridding End
What are some complexities we want to
hide?
• Unstructured Grids
• Numerical Conservation
• Choice of algorithms
8/7/2017 Bill Howe, UW 42
Motivation Algebraic Optimization Regridding End
8/7/2017 Bill Howe, UW 43
Motivation Algebraic Optimization Regridding End
Finding mesh intersections
8/7/2017 Bill Howe, UW 44
Motivation Algebraic Optimization Regridding End
8/7/2017 Bill Howe, UW 45
Motivation Algebraic Optimization Regridding End
8/7/2017 Bill Howe, UW 46
Motivation Algebraic Optimization Regridding End
8/7/2017 Bill Howe, UW 47
Restrict(Regrid(X,Y)) = Regrid(Restrict(X), Restrict(Y))
Commutativity of Regrid and Restrict:
G0 = Regrid(Restrict0(X), Restrict0(Y)))
G1 = Regrid(Restrict1(X), Restrict1(Y)))
:
GN = Regrid(Restrict2(X), Restrict2(Y)))
R = Stitch(G0, G1, G2)
Motivation Algebraic Optimization Regridding End
8/7/2017 Bill Howe, UW 48
Motivation Algebraic Optimization Regridding End
“Lumping”
8/7/2017 Bill Howe, UW 49
Motivation Algebraic Optimization Regridding End
8/7/2017 Bill Howe, UW 50
Motivation Algebraic Optimization Regridding End
8/7/2017 Bill Howe, UW 51
Motivation Algebraic Optimization Regridding End
8/7/2017 Bill Howe, UW 52
Globally conservative
Parallelizable
Commutes with user-
selected restrictions
masking to handle
mismatched domains
Todos:
• Characterize the error relative to plain supermeshing
• Universal Regridding-as-a-Service
Motivation Algebraic Optimization Regridding End
Outreach and Usage
• Code is available, but in transition to github
– Search “gridfields” on google code
– http://code.google.com/p/gridfields/
– C++ with Python bindings
• Integrated into the Hyrax Data Server
– OPULS project funded by NOAA
– Server-side processing of unstructured grids
• Other users
– US Geological Survey
– NOAA
8/7/2017 Bill Howe, UW 538/7/2017 Bill Howe, UW 53
Motivation Algebraic Optimization Regridding End
8/7/2017 Bill Howe, UW 54
• Screenshot of OPeNDAP demo
http://ec2-174-129-186-110.compute-1.amazonaws.com:8088/nc/test4.nc.nc?
ugrid_restrict(0,"Y>41.5&Y<42.75&X>-68.0&X<-66.0")
Motivation Algebraic Optimization Regridding End
Wrap up
• Integration of big data and big models is the game
• Database-style systems are about hiding complexity
and raising the level of abstraction
• A database-style query algebra for FEMs emphasizing
interpolation and regridding across data and models
made sense to us
• But more broadly: a richer infrastructure for comparing
and sharing model results and data
• One idea: “Virtual datasets” where the model is
executed in response to queries, perhaps with simpler
grids and relaxed assumptions
8/7/2017 Bill Howe, UW 55
Motivation Algebraic Optimization Regridding End
56
Propublica, May 2016
Motivation Regridding Supermeshi
ng
Database Algebras Evaluat
ion
Numerical
conservatio
n
Responsible Data Science
57
The Special Committee on Criminal Justice Reform's
hearing of reducing the pre-trial jail population.
Technical.ly, September 2016
Philadelphia is grappling with the prospect of a racist computer algorithm
Any background signal in the
data of institutional racism is
amplified by the algorithm
operationalized by the algorithm
legitimized by the algorithm
“Should I be afraid of risk assessment tools?”
“No, you gotta tell me a lot more about yourself.
At what age were you first arrested?
What is the date of your most recent crime?”
“And what’s the culture of policing in the
neighborhood in which I grew up in?”
Motivation Regridding Supermeshi
ng
Database Algebras Evaluat
ion
Numerical
conservatio
n
Responsible Data Science
8/7/2017 Bill Howe, UW 58
Amazon Prime Now Delivery Area: Atlanta Bloomberg, 2016
Motivation Regridding Supermeshi
ng
Database Algebras Evaluat
ion
Numerical
conservatio
n
Responsible Data Science
8/7/2017 Bill Howe, UW 59
Amazon Prime Now Delivery Area: Boston Bloomberg, 2016
Motivation Regridding Supermeshi
ng
Database Algebras Evaluat
ion
Numerical
conservatio
n
Responsible Data Science
8/7/2017 Bill Howe, UW 60
Amazon Prime Now Delivery Area: Chicago Bloomberg, 2016
Motivation Regridding Supermeshi
ng
Database Algebras Evaluat
ion
Numerical
conservatio
n
Responsible Data Science
First decade of Data Science research and practice:
What can we do with massive, noisy, heterogeneous datasets?
Next decade of Data Science research and practice:
What should we do with massive, noisy, heterogeneous datasets?
The way I think about this…..(1)
Motivation Regridding Supermeshi
ng
Database Algebras Evaluat
ion
Numerical
conservatio
n
Responsible Data Science
The way I think about this…. (2)
Decisions are based on two sources of information:
1. Past examples
e.g., “prior arrests tend to increase likelihood of future arrests”
2. Societal constraints
e.g., “we must avoid racial discrimination”
8/7/2017 Data, Responsibly / SciTech NW 62
We’ve become very good at automating the use of past examples
We’ve only just started to think about incorporating societal constraints
Motivation Regridding Supermeshi
ng
Database Algebras Evaluat
ion
Numerical
conservatio
n
Responsible Data Science
The way I think about this… (3)
How do we apply societal constraints to algorithmic
decision-making?
Option 1: Rely on human oversight
Ex: EU General Data Protection Regulation requires that a
human be involved in legally binding algorithmic decision-making
Ex: Wisconsin Supreme Court says a human must review
algorithmic decisions made by recidivism models
Issues with scalability, prejudice
Option 2: Build systems to help enforce these constraints
This is the approach we are exploring
8/7/2017 Data, Responsibly / SciTech NW 63
Motivation Regridding Supermeshi
ng
Database Algebras Evaluat
ion
Numerical
conservatio
n
Responsible Data Science
The way I think about this…(4)
On transparency vs. accountability:
• For human decision-making, sometimes explanations are
required, improving transparency
– Supreme court decisions
– Employee reprimands/termination
• But when transparency is difficult, accountability takes over
– medical emergencies, business decisions
• As we shift decisions to algorithms, we lose both
transparency AND accountability
• “The buck stops where?”
8/7/2017 Data, Responsibly / SciTech NW 64
Motivation Regridding Supermeshi
ng
Database Algebras Evaluat
ion
Numerical
conservatio
n
Responsible Data Science
Fairness
Accountability
Transparency
Privacy
Reproducibility
Fides: A platform for responsible data science
joint with Stoyanovich [US], Abiteboul [FR], Miklau [US], Sahuguet [US], Weikum [DE]
Data Curation
novel features to support:
So what do we do about it?
Motivation Regridding Supermeshi
ng
Database Algebras Evaluat
ion
Numerical
conservatio
n
Responsible Data Science
Motivation Regridding Supermeshi
ng
Database Algebras Evaluat
ion
Numerical
conservatio
n
Responsible Data Science

Contenu connexe

Tendances

President Election of Korea in 2017
President Election of Korea in 2017President Election of Korea in 2017
President Election of Korea in 2017Jongwook Woo
 
Tutorial Data Management and workflows
Tutorial Data Management and workflowsTutorial Data Management and workflows
Tutorial Data Management and workflowsSSSW
 
Big Data, The Community and The Commons (May 12, 2014)
Big Data, The Community and The Commons (May 12, 2014)Big Data, The Community and The Commons (May 12, 2014)
Big Data, The Community and The Commons (May 12, 2014)Robert Grossman
 
Data Trajectories: tracking the reuse of published data for transitive credi...
Data Trajectories: tracking the reuse of published datafor transitive credi...Data Trajectories: tracking the reuse of published datafor transitive credi...
Data Trajectories: tracking the reuse of published data for transitive credi...Paolo Missier
 
Classifying Simulation and Data Intensive Applications and the HPC-Big Data C...
Classifying Simulation and Data Intensive Applications and the HPC-Big Data C...Classifying Simulation and Data Intensive Applications and the HPC-Big Data C...
Classifying Simulation and Data Intensive Applications and the HPC-Big Data C...Geoffrey Fox
 
PUC Masterclass Big Data
PUC Masterclass Big DataPUC Masterclass Big Data
PUC Masterclass Big DataArjen de Vries
 
International Collaboration Networks in the Emerging (Big) Data Science
International Collaboration Networks in the Emerging (Big) Data ScienceInternational Collaboration Networks in the Emerging (Big) Data Science
International Collaboration Networks in the Emerging (Big) Data Sciencedatasciencekorea
 
Scalable and Efficient Algorithms for Analysis of Massive, Streaming Graphs
Scalable and Efficient Algorithms for Analysis of Massive, Streaming GraphsScalable and Efficient Algorithms for Analysis of Massive, Streaming Graphs
Scalable and Efficient Algorithms for Analysis of Massive, Streaming GraphsJason Riedy
 
A Hacking Toolset for Big Tabular Files (3)
A Hacking Toolset for Big Tabular Files (3)A Hacking Toolset for Big Tabular Files (3)
A Hacking Toolset for Big Tabular Files (3)Toshiyuki Shimono
 
Introduction To Data Science
Introduction To Data ScienceIntroduction To Data Science
Introduction To Data ScienceSpotle.ai
 
Structured Data Challenges in Finance and Statistics
Structured Data Challenges in Finance and StatisticsStructured Data Challenges in Finance and Statistics
Structured Data Challenges in Finance and StatisticsWes McKinney
 
Introduction to Big Data/Machine Learning
Introduction to Big Data/Machine LearningIntroduction to Big Data/Machine Learning
Introduction to Big Data/Machine LearningLars Marius Garshol
 
Machine Learning - Challenges, Learnings & Opportunities
Machine Learning - Challenges, Learnings & OpportunitiesMachine Learning - Challenges, Learnings & Opportunities
Machine Learning - Challenges, Learnings & OpportunitiesCodePolitan
 
Employing Graph Databases as a Standardization Model towards Addressing Heter...
Employing Graph Databases as a Standardization Model towards Addressing Heter...Employing Graph Databases as a Standardization Model towards Addressing Heter...
Employing Graph Databases as a Standardization Model towards Addressing Heter...Dippy Aggarwal
 

Tendances (20)

Are you ready for BIG DATA?
Are you ready for BIG DATA?Are you ready for BIG DATA?
Are you ready for BIG DATA?
 
President Election of Korea in 2017
President Election of Korea in 2017President Election of Korea in 2017
President Election of Korea in 2017
 
Tutorial Data Management and workflows
Tutorial Data Management and workflowsTutorial Data Management and workflows
Tutorial Data Management and workflows
 
Big Data, The Community and The Commons (May 12, 2014)
Big Data, The Community and The Commons (May 12, 2014)Big Data, The Community and The Commons (May 12, 2014)
Big Data, The Community and The Commons (May 12, 2014)
 
Data Trajectories: tracking the reuse of published data for transitive credi...
Data Trajectories: tracking the reuse of published datafor transitive credi...Data Trajectories: tracking the reuse of published datafor transitive credi...
Data Trajectories: tracking the reuse of published data for transitive credi...
 
Big data
Big dataBig data
Big data
 
Big Data - Gerami
Big Data - GeramiBig Data - Gerami
Big Data - Gerami
 
Classifying Simulation and Data Intensive Applications and the HPC-Big Data C...
Classifying Simulation and Data Intensive Applications and the HPC-Big Data C...Classifying Simulation and Data Intensive Applications and the HPC-Big Data C...
Classifying Simulation and Data Intensive Applications and the HPC-Big Data C...
 
PUC Masterclass Big Data
PUC Masterclass Big DataPUC Masterclass Big Data
PUC Masterclass Big Data
 
International Collaboration Networks in the Emerging (Big) Data Science
International Collaboration Networks in the Emerging (Big) Data ScienceInternational Collaboration Networks in the Emerging (Big) Data Science
International Collaboration Networks in the Emerging (Big) Data Science
 
V3 i35
V3 i35V3 i35
V3 i35
 
Sildes big-data-ia-may
Sildes big-data-ia-maySildes big-data-ia-may
Sildes big-data-ia-may
 
Scalable and Efficient Algorithms for Analysis of Massive, Streaming Graphs
Scalable and Efficient Algorithms for Analysis of Massive, Streaming GraphsScalable and Efficient Algorithms for Analysis of Massive, Streaming Graphs
Scalable and Efficient Algorithms for Analysis of Massive, Streaming Graphs
 
NoSQL (Not Only SQL)
NoSQL (Not Only SQL)NoSQL (Not Only SQL)
NoSQL (Not Only SQL)
 
A Hacking Toolset for Big Tabular Files (3)
A Hacking Toolset for Big Tabular Files (3)A Hacking Toolset for Big Tabular Files (3)
A Hacking Toolset for Big Tabular Files (3)
 
Introduction To Data Science
Introduction To Data ScienceIntroduction To Data Science
Introduction To Data Science
 
Structured Data Challenges in Finance and Statistics
Structured Data Challenges in Finance and StatisticsStructured Data Challenges in Finance and Statistics
Structured Data Challenges in Finance and Statistics
 
Introduction to Big Data/Machine Learning
Introduction to Big Data/Machine LearningIntroduction to Big Data/Machine Learning
Introduction to Big Data/Machine Learning
 
Machine Learning - Challenges, Learnings & Opportunities
Machine Learning - Challenges, Learnings & OpportunitiesMachine Learning - Challenges, Learnings & Opportunities
Machine Learning - Challenges, Learnings & Opportunities
 
Employing Graph Databases as a Standardization Model towards Addressing Heter...
Employing Graph Databases as a Standardization Model towards Addressing Heter...Employing Graph Databases as a Standardization Model towards Addressing Heter...
Employing Graph Databases as a Standardization Model towards Addressing Heter...
 

Similaire à Big Data + Big Sim: Query Processing over Unstructured CFD Models

MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)
MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)
MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)University of Washington
 
EDBT 2015: Summer School Overview
EDBT 2015: Summer School OverviewEDBT 2015: Summer School Overview
EDBT 2015: Summer School Overviewdgarijo
 
AstraZeneca - Re-imagining the Data Landscape in Compound Synthesis & Management
AstraZeneca - Re-imagining the Data Landscape in Compound Synthesis & ManagementAstraZeneca - Re-imagining the Data Landscape in Compound Synthesis & Management
AstraZeneca - Re-imagining the Data Landscape in Compound Synthesis & ManagementNeo4j
 
Data legend dh_benelux_2017.key
Data legend dh_benelux_2017.keyData legend dh_benelux_2017.key
Data legend dh_benelux_2017.keyRichard Zijdeman
 
AutoML for Data Science Productivity and Toward Better Digital Decisions
AutoML for Data Science Productivity and Toward Better Digital DecisionsAutoML for Data Science Productivity and Toward Better Digital Decisions
AutoML for Data Science Productivity and Toward Better Digital DecisionsSteven Gustafson
 
H2O Overview with Amy Wang at useR! Aalborg
H2O Overview with Amy Wang at useR! AalborgH2O Overview with Amy Wang at useR! Aalborg
H2O Overview with Amy Wang at useR! AalborgSri Ambati
 
The Other HPC: High Productivity Computing
The Other HPC: High Productivity ComputingThe Other HPC: High Productivity Computing
The Other HPC: High Productivity ComputingUniversity of Washington
 
Analytics of analytics pipelines: from optimising re-execution to general Dat...
Analytics of analytics pipelines:from optimising re-execution to general Dat...Analytics of analytics pipelines:from optimising re-execution to general Dat...
Analytics of analytics pipelines: from optimising re-execution to general Dat...Paolo Missier
 
Massive-Scale Analytics Applied to Real-World Problems
Massive-Scale Analytics Applied to Real-World ProblemsMassive-Scale Analytics Applied to Real-World Problems
Massive-Scale Analytics Applied to Real-World Problemsinside-BigData.com
 

Similaire à Big Data + Big Sim: Query Processing over Unstructured CFD Models (20)

MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)
MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)
MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)
 
EDBT 2015: Summer School Overview
EDBT 2015: Summer School OverviewEDBT 2015: Summer School Overview
EDBT 2015: Summer School Overview
 
CLIM Program: Remote Sensing Workshop, Foundations Session: A Discussion - Br...
CLIM Program: Remote Sensing Workshop, Foundations Session: A Discussion - Br...CLIM Program: Remote Sensing Workshop, Foundations Session: A Discussion - Br...
CLIM Program: Remote Sensing Workshop, Foundations Session: A Discussion - Br...
 
AstraZeneca - Re-imagining the Data Landscape in Compound Synthesis & Management
AstraZeneca - Re-imagining the Data Landscape in Compound Synthesis & ManagementAstraZeneca - Re-imagining the Data Landscape in Compound Synthesis & Management
AstraZeneca - Re-imagining the Data Landscape in Compound Synthesis & Management
 
Kenett On Information NYU-Poly 2013
Kenett On Information NYU-Poly 2013Kenett On Information NYU-Poly 2013
Kenett On Information NYU-Poly 2013
 
Data legend dh_benelux_2017.key
Data legend dh_benelux_2017.keyData legend dh_benelux_2017.key
Data legend dh_benelux_2017.key
 
AutoML for Data Science Productivity and Toward Better Digital Decisions
AutoML for Data Science Productivity and Toward Better Digital DecisionsAutoML for Data Science Productivity and Toward Better Digital Decisions
AutoML for Data Science Productivity and Toward Better Digital Decisions
 
Bill howe 2_databases
Bill howe 2_databasesBill howe 2_databases
Bill howe 2_databases
 
H2O Overview with Amy Wang at useR! Aalborg
H2O Overview with Amy Wang at useR! AalborgH2O Overview with Amy Wang at useR! Aalborg
H2O Overview with Amy Wang at useR! Aalborg
 
The Other HPC: High Productivity Computing
The Other HPC: High Productivity ComputingThe Other HPC: High Productivity Computing
The Other HPC: High Productivity Computing
 
Slides barcelona risk data
Slides barcelona risk dataSlides barcelona risk data
Slides barcelona risk data
 
Vldb14
Vldb14Vldb14
Vldb14
 
Srikanta Mishra
Srikanta MishraSrikanta Mishra
Srikanta Mishra
 
Democratizing Data Science in the Cloud
Democratizing Data Science in the CloudDemocratizing Data Science in the Cloud
Democratizing Data Science in the Cloud
 
Analytics of analytics pipelines: from optimising re-execution to general Dat...
Analytics of analytics pipelines:from optimising re-execution to general Dat...Analytics of analytics pipelines:from optimising re-execution to general Dat...
Analytics of analytics pipelines: from optimising re-execution to general Dat...
 
Massive-Scale Analytics Applied to Real-World Problems
Massive-Scale Analytics Applied to Real-World ProblemsMassive-Scale Analytics Applied to Real-World Problems
Massive-Scale Analytics Applied to Real-World Problems
 
19CS3052R-CO1-7-S7 ECE
19CS3052R-CO1-7-S7 ECE19CS3052R-CO1-7-S7 ECE
19CS3052R-CO1-7-S7 ECE
 
Why Data Science is a Science
Why Data Science is a ScienceWhy Data Science is a Science
Why Data Science is a Science
 
BICOD-2017
BICOD-2017BICOD-2017
BICOD-2017
 
Bicod2017
Bicod2017Bicod2017
Bicod2017
 

Plus de University of Washington

Database Agnostic Workload Management (CIDR 2019)
Database Agnostic Workload Management (CIDR 2019)Database Agnostic Workload Management (CIDR 2019)
Database Agnostic Workload Management (CIDR 2019)University of Washington
 
Data Responsibly: The next decade of data science
Data Responsibly: The next decade of data scienceData Responsibly: The next decade of data science
Data Responsibly: The next decade of data scienceUniversity of Washington
 
Thoughts on Big Data and more for the WA State Legislature
Thoughts on Big Data and more for the WA State LegislatureThoughts on Big Data and more for the WA State Legislature
Thoughts on Big Data and more for the WA State LegislatureUniversity of Washington
 
Data, Responsibly: The Next Decade of Data Science
Data, Responsibly: The Next Decade of Data ScienceData, Responsibly: The Next Decade of Data Science
Data, Responsibly: The Next Decade of Data ScienceUniversity of Washington
 
Data Science, Data Curation, and Human-Data Interaction
Data Science, Data Curation, and Human-Data InteractionData Science, Data Curation, and Human-Data Interaction
Data Science, Data Curation, and Human-Data InteractionUniversity of Washington
 
Big Data Talent in Academic and Industry R&D
Big Data Talent in Academic and Industry R&DBig Data Talent in Academic and Industry R&D
Big Data Talent in Academic and Industry R&DUniversity of Washington
 
Big Data Middleware: CIDR 2015 Gong Show Talk, David Maier, Bill Howe
Big Data Middleware: CIDR 2015 Gong Show Talk, David Maier, Bill Howe Big Data Middleware: CIDR 2015 Gong Show Talk, David Maier, Bill Howe
Big Data Middleware: CIDR 2015 Gong Show Talk, David Maier, Bill Howe University of Washington
 
Myria: Analytics-as-a-Service for (Data) Scientists
Myria: Analytics-as-a-Service for (Data) ScientistsMyria: Analytics-as-a-Service for (Data) Scientists
Myria: Analytics-as-a-Service for (Data) ScientistsUniversity of Washington
 
Big Data Curricula at the UW eScience Institute, JSM 2013
Big Data Curricula at the UW eScience Institute, JSM 2013Big Data Curricula at the UW eScience Institute, JSM 2013
Big Data Curricula at the UW eScience Institute, JSM 2013University of Washington
 
Enabling Collaborative Research Data Management with SQLShare
Enabling Collaborative Research Data Management with SQLShareEnabling Collaborative Research Data Management with SQLShare
Enabling Collaborative Research Data Management with SQLShareUniversity of Washington
 
Virtual Appliances, Cloud Computing, and Reproducible Research
Virtual Appliances, Cloud Computing, and Reproducible ResearchVirtual Appliances, Cloud Computing, and Reproducible Research
Virtual Appliances, Cloud Computing, and Reproducible ResearchUniversity of Washington
 
HaLoop: Efficient Iterative Processing on Large-Scale Clusters
HaLoop: Efficient Iterative Processing on Large-Scale ClustersHaLoop: Efficient Iterative Processing on Large-Scale Clusters
HaLoop: Efficient Iterative Processing on Large-Scale ClustersUniversity of Washington
 
Query-Driven Visualization in the Cloud with MapReduce
Query-Driven Visualization in the Cloud with MapReduce Query-Driven Visualization in the Cloud with MapReduce
Query-Driven Visualization in the Cloud with MapReduce University of Washington
 

Plus de University of Washington (20)

Database Agnostic Workload Management (CIDR 2019)
Database Agnostic Workload Management (CIDR 2019)Database Agnostic Workload Management (CIDR 2019)
Database Agnostic Workload Management (CIDR 2019)
 
Data Responsibly: The next decade of data science
Data Responsibly: The next decade of data scienceData Responsibly: The next decade of data science
Data Responsibly: The next decade of data science
 
Thoughts on Big Data and more for the WA State Legislature
Thoughts on Big Data and more for the WA State LegislatureThoughts on Big Data and more for the WA State Legislature
Thoughts on Big Data and more for the WA State Legislature
 
Data, Responsibly: The Next Decade of Data Science
Data, Responsibly: The Next Decade of Data ScienceData, Responsibly: The Next Decade of Data Science
Data, Responsibly: The Next Decade of Data Science
 
Science Data, Responsibly
Science Data, ResponsiblyScience Data, Responsibly
Science Data, Responsibly
 
Data Science, Data Curation, and Human-Data Interaction
Data Science, Data Curation, and Human-Data InteractionData Science, Data Curation, and Human-Data Interaction
Data Science, Data Curation, and Human-Data Interaction
 
Urban Data Science at UW
Urban Data Science at UWUrban Data Science at UW
Urban Data Science at UW
 
Intro to Data Science Concepts
Intro to Data Science ConceptsIntro to Data Science Concepts
Intro to Data Science Concepts
 
Big Data Talent in Academic and Industry R&D
Big Data Talent in Academic and Industry R&DBig Data Talent in Academic and Industry R&D
Big Data Talent in Academic and Industry R&D
 
Big Data Middleware: CIDR 2015 Gong Show Talk, David Maier, Bill Howe
Big Data Middleware: CIDR 2015 Gong Show Talk, David Maier, Bill Howe Big Data Middleware: CIDR 2015 Gong Show Talk, David Maier, Bill Howe
Big Data Middleware: CIDR 2015 Gong Show Talk, David Maier, Bill Howe
 
Data Science and Urban Science @ UW
Data Science and Urban Science @ UWData Science and Urban Science @ UW
Data Science and Urban Science @ UW
 
Myria: Analytics-as-a-Service for (Data) Scientists
Myria: Analytics-as-a-Service for (Data) ScientistsMyria: Analytics-as-a-Service for (Data) Scientists
Myria: Analytics-as-a-Service for (Data) Scientists
 
Big Data Curricula at the UW eScience Institute, JSM 2013
Big Data Curricula at the UW eScience Institute, JSM 2013Big Data Curricula at the UW eScience Institute, JSM 2013
Big Data Curricula at the UW eScience Institute, JSM 2013
 
eResearch New Zealand Keynote
eResearch New Zealand KeynoteeResearch New Zealand Keynote
eResearch New Zealand Keynote
 
Data science curricula at UW
Data science curricula at UWData science curricula at UW
Data science curricula at UW
 
Enabling Collaborative Research Data Management with SQLShare
Enabling Collaborative Research Data Management with SQLShareEnabling Collaborative Research Data Management with SQLShare
Enabling Collaborative Research Data Management with SQLShare
 
Virtual Appliances, Cloud Computing, and Reproducible Research
Virtual Appliances, Cloud Computing, and Reproducible ResearchVirtual Appliances, Cloud Computing, and Reproducible Research
Virtual Appliances, Cloud Computing, and Reproducible Research
 
End-to-End eScience
End-to-End eScienceEnd-to-End eScience
End-to-End eScience
 
HaLoop: Efficient Iterative Processing on Large-Scale Clusters
HaLoop: Efficient Iterative Processing on Large-Scale ClustersHaLoop: Efficient Iterative Processing on Large-Scale Clusters
HaLoop: Efficient Iterative Processing on Large-Scale Clusters
 
Query-Driven Visualization in the Cloud with MapReduce
Query-Driven Visualization in the Cloud with MapReduce Query-Driven Visualization in the Cloud with MapReduce
Query-Driven Visualization in the Cloud with MapReduce
 

Dernier

Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRDelhi Call girls
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...ssifa0344
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPirithiRaju
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptxanandsmhk
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡anilsa9823
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...anilsa9823
 
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptxBroad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptxjana861314
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxgindu3009
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Lokesh Kothari
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)PraveenaKalaiselvan1
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxUmerFayaz5
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...Sérgio Sacani
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real timeSatoshi NAKAHIRA
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PPRINCE C P
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Patrick Diehl
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisDiwakar Mishra
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfmuntazimhurra
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoSérgio Sacani
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani
 

Dernier (20)

Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
 
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptxBroad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real time
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C P
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdf
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 

Big Data + Big Sim: Query Processing over Unstructured CFD Models

  • 1. Bill Howe Information School Computer Science & Engineering University of Washington Big Data + Big Sim: Query Processing over Unstructured CFD Models 8/7/2017 Bill Howe, UW 1 Scott Moe Applied Math University of Washington
  • 2. This morning… • Data-intensive science in oceanography • Background on databases and query algebras • Regridding: Integrating ocean models using a database-style algebra • If time: Responsible data science 8/7/2017 Bill Howe, UW 2 Motivation Algebraic Optimization Regridding End
  • 3. My position for this talk… • Simulations are sources of data • Analysis requires querying across heterogeneous data sources, including simulations • The CS database community has the right set of concepts and approaches …but ultimately we’re just plumbers 8/7/2017 Bill Howe, UW 3 Motivation Algebraic Optimization Regridding End
  • 4. The Fourth Paradigm 1. Empirical + experimental 2. Theoretical 3. Computational 4. Data-Intensive Jim Gray 8/7/2017 Bill Howe, UW 4 Motivation Algebraic Optimization Regridding End
  • 5. Nearly every field of discovery is transitioning from “data poor” to “data rich” Astronomy: LSST Physics: LHC Oceanography: OOI Social Sciences Biology: Sequencing Economics Neuroscience: EEG, fMRI Motivation Algebraic Optimization Regridding End
  • 6. 8/7/2017 Bill Howe, UW 6 Complex System “Little linear windows” Academic research Practitioners One view of “data science” is the streamline the discovery, interpretation, and operationalization of semi-robust local patterns that have predictive power for some task.1 In general, these don’t exist. But in specific situations, they do.
  • 7. slide: John Delaney, UW Motivation Algebraic Optimization Regridding End
  • 8. Regional Scale Nodes 8/7/2017 Bill Howe, UW 8 John Delaney 10s of Gigabits/second from the ocean floor Motivation Algebraic Optimization Regridding End
  • 9. 8/7/2017 Bill Howe, UW 9 17 federal organizations named as partners 11 Regional Associations “a strategy for incorporating observation systems from … near shore waters as part of … a network of observatories.” Motivation Algebraic Optimization Regridding End
  • 10. Center for Coastal Margin Observation and Prediction (CMOP) 8/7/2017 Bill Howe, UW 10 Antonio Baptista Motivation Algebraic Optimization Regridding End
  • 11. Virtual Mekong Basin 8/7/2017 Bill Howe, UW 11 img src: Mark Stoermer, UW Center for Environmental Visualization Jeff Richey Motivation Algebraic Optimization Regridding End
  • 12. So what? • Geosciences are transitioning from expedition-based to observatory-based science • Enormous investments in integrating sensors and models • The big problem: ad hoc queries over large, heterogeneous, distributed datasets and models 8/7/2017 Bill Howe, UW 12 Motivation Algebraic Optimization Regridding End
  • 13. So what do we do about querying across heterogeneous sources? Raise the level of abstraction and let the system handle the details 8/7/2017 Bill Howe, UW 13 Motivation Algebraic Optimization Regridding End
  • 14. Pre-Relational: if your data changed, your application broke. Early RDBMS were buggy and slow (and often reviled), but required only 5% of the application code. “Activities of users at terminals and most application programs should remain unaffected when the internal representation of data is changed and even when some aspects of the external representation are changed.” Key Idea: Programs that manipulate tabular data exhibit an algebraic structure allowing reasoning and manipulation independently of physical data representation Digression: Relational Database History -- Codd 1979 Motivation Algebraic Optimization Regridding End
  • 15. Key Idea: An Algebra of Tables select project join join Other operators: aggregate, union, difference, cross product Motivation Algebraic Optimization Regridding End
  • 16. 16 Review: Algebraic Optimization N = ((4*2)+((4*3)+0))/1 Algebraic Laws: 1. (+) identity: x+0 = x 2. (/) identity: x/1 = x 3. (*) distributes: (n*x+n*y) = n*(x+y) 4. (*) commutes: x*y = y*x Apply rules 1, 3, 4, 2: N = (2+3)*4 two operations instead of five, no division operator Same idea works with very large tables, but the payoff is much higher Motivation Algebraic Optimization Regridding End
  • 17. 17 Algebraic Optimization: Find a better logical plan Product Purchase pid=pid price>100 and city=‘Seattle’ x.name,z.name δ cid=cid Customer Π σ Product(pid, name, price) Purchase(pid, cid, store) Customer(cid, name, city) SELECT DISTINCT x.name, z.name FROM Product x, Purchase y, Customer z WHERE x.pid = y.pid and y.cid = z.cid and x.price > 100 and z.city = ‘Seattle’ Motivation Algebraic Optimization Regridding End
  • 18. 18 Algebraic Optimization: Find a better logical plan Product Purchase pid=pid city=‘Seattle’ x.name,z.name δ cid=cid Customer Π σ price>100 σ Query optimization = finding cheaper, equivalent expressions SELECT DISTINCT x.name, z.name FROM Product x, Purchase y, Customer z WHERE x.pid = y.pid and y.cid = z.cid and x.price > 100 and z.city = ‘Seattle’ Motivation Algebraic Optimization Regridding End
  • 19. Same logical expression, different physical algorithms Which is faster? SELECT * FROM Order o, Item i WHERE o.order = i.order join scan scan o.order = i.order Order oItem i for each record i in Item: for each record o in Order: if o.order = i.order: return (r,s) Option 1 for each record i in Item: insert into hashtable for each record o in Order: lookup corresponding records in hashtable return matching pairs Option 2 O(N) O(1) O(M) O(1) O(N) O(1) O(~1) O(M) overall: O(N*M) overall: O(N+M) Motivation Algebraic Optimization Regridding End
  • 20. 3/12/09 Bill Howe, eScience Institute 20  H0 : (x,y,b) V0 : (z) A restrict(0, z >b) B color is depth Algebraic Manipulation of Scientific Datasets, B. Howe, D. Maier, VLDBJ 2005  H0 : (x,y,b) V0 : ( ) apply(0, z=(surf  b) *  ) bind(0, surf) C color is salinity GridFields: An Algebra of Meshes Motivation Algebraic Optimization Regridding End
  • 21. Example (1) H = Scan(context, "H") rH = Restrict("(326<x) & (x<345) & (287<y) & (y<302)", 0, H) H = rH = dimensionpredicate color: bathymetry Motivation Algebraic Optimization Regridding End
  • 22. 8/7/2017 howeb@stccmop.org Example: Transect P Motivation Algebraic Optimization Regridding End
  • 23. 8/7/2017 howeb@stccmop.org Transect: Bad Query Plan  H(x,y,b) V(z) r(z>b) b(s) regrid  P P  V 1) Construct full-size 3D grid 2) Construct 2D transect grid 3) Interpolate 1) onto 2) Motivation Algebraic Optimization Regridding End
  • 24. 8/7/2017 howeb@stccmop.org Transect: Optimized Plan P  V V(z) P H(x,y,b) regrid b(s) regrid  1) Find 2D cells containing points 2) Create “stacks” of 2D cells carrying data 3) Create 2D transect grid 4) Interpolate 2) onto 3) Motivation Algebraic Optimization Regridding End
  • 25. 8/7/2017 howeb@stccmop.org 1) Find cells containing points in P Motivation Algebraic Optimization Regridding End
  • 26. 8/7/2017 howeb@stccmop.org 1) 4) 2) 1) Find cells containing points in P 2) Construct “stacks” of cells 4) Interpolate Motivation Algebraic Optimization Regridding End
  • 27. Transect: Results 8/7/2017 howeb@stccmop.org 0 5 10 15 20 25 30 35 40 45 vtk(3D) interpolate simple interp_o simple_o secs 800 MB (1 timestep) Motivation Algebraic Optimization Regridding End
  • 28. Back to integrating models: What is the right abstraction? • Claim: Everything reduces to regridding • Model-data comparisons skill assessment? Regrid observations onto model mesh • Model-model comparison? Regrid one model’s mesh onto the other’s • Model coupling? Regrid a meso-scale atmospheric model onto your regional ocean model • Visualization? Regrid onto a 3D mesh, or regrid onto a 2D array of pixels 8/7/2017 Bill Howe, UW 28 Motivation Algebraic Optimization Regridding End
  • 29. Status Quo • “FTP + MATLAB” • “Nascent Databases” – File-based, format-specific API – UniData’s NetCDF, HDF5 – Some IO optimization, some indexing • “Data Servers” – Same as file-based systems, – but supports RPC 8/7/2017 Bill Howe, UW 29 Hyrax None of this scales - up with data volumes - up with number of sources - down with developer expertise Motivation Algebraic Optimization Regridding End
  • 30. Summary so far • “Integration” means “regridding” – mesh to pixels, mesh to mesh, trajectory to mesh – satellites to models, models to models, observations to models • Regridding is hard – Must be easy, tolerant of unusual grids, numerically conservative, efficient Our goal • Define a “universal regridding” operator with nice algebraic properties • Use it to implement efficient distributed data sharing applications, parallel algorithms, and more 8/7/2017 Bill Howe, UW 30 Motivation Algebraic Optimization Regridding End
  • 31. What are some complexities we want to hide? • Unstructured Grids • Numerical Conservation • Choice of Algorithms 8/7/2017 Bill Howe, UW 31 Motivation Algebraic Optimization Regridding End
  • 32. 8/7/2017 Bill Howe, UW 32 Motivation Algebraic Optimization Regridding End
  • 33. 8/7/2017 Bill Howe, UW 33 Washington Oregon Columbia River Estuary Motivation Algebraic Optimization Regridding End
  • 34. Washington Oregon Columbia River Estuary Motivation Algebraic Optimization Regridding End
  • 35. SciDB Hyrax GridFields ESMF VTK/Paraview easy; good support hard; poor support Motivation Algebraic Optimization Regridding End
  • 36. Structured grids are easy 8/7/2017 Bill Howe, eScience Institute 36  The data model… (Cartesian products of coordinate variables)  …immediately implies a representation, (multidimensional arrays)  …an API, (reading and writing subslabs)  …and an efficient implementation (address calculation using array “shape”) Motivation Algebraic Optimization Regridding End
  • 37. What are some complexities we want to hide? • Unstructured Grids • Numerical Conservation • Choice of Algorithms 8/7/2017 Bill Howe, UW 37 Motivation Algebraic Optimization Regridding End
  • 38. Naïve Method: Interpolation (Spatial Join) 8/7/2017 Bill Howe, UW 38 For each vertex in the target grid, Find containing cell in the source grid, Evaluate the basis functions to interpolate Motivation Algebraic Optimization Regridding End
  • 39. 8/7/2017 Bill Howe, UW 39 Motivation Algebraic Optimization Regridding End
  • 40. Supermeshing [Farrell 10] 8/7/2017 Bill Howe, UW 40 For each cell in the target grid, Find overlapping cells in the source grid, Compute their intersections Derive new coefficients to minimize L2 norm * Guaranteeed Conservative * Minimizes Error But: Domains must match exactly Motivation Algebraic Optimization Regridding End
  • 41. 8/7/2017 Bill Howe, UW 41 Motivation Algebraic Optimization Regridding End
  • 42. What are some complexities we want to hide? • Unstructured Grids • Numerical Conservation • Choice of algorithms 8/7/2017 Bill Howe, UW 42 Motivation Algebraic Optimization Regridding End
  • 43. 8/7/2017 Bill Howe, UW 43 Motivation Algebraic Optimization Regridding End
  • 44. Finding mesh intersections 8/7/2017 Bill Howe, UW 44 Motivation Algebraic Optimization Regridding End
  • 45. 8/7/2017 Bill Howe, UW 45 Motivation Algebraic Optimization Regridding End
  • 46. 8/7/2017 Bill Howe, UW 46 Motivation Algebraic Optimization Regridding End
  • 47. 8/7/2017 Bill Howe, UW 47 Restrict(Regrid(X,Y)) = Regrid(Restrict(X), Restrict(Y)) Commutativity of Regrid and Restrict: G0 = Regrid(Restrict0(X), Restrict0(Y))) G1 = Regrid(Restrict1(X), Restrict1(Y))) : GN = Regrid(Restrict2(X), Restrict2(Y))) R = Stitch(G0, G1, G2) Motivation Algebraic Optimization Regridding End
  • 48. 8/7/2017 Bill Howe, UW 48 Motivation Algebraic Optimization Regridding End
  • 49. “Lumping” 8/7/2017 Bill Howe, UW 49 Motivation Algebraic Optimization Regridding End
  • 50. 8/7/2017 Bill Howe, UW 50 Motivation Algebraic Optimization Regridding End
  • 51. 8/7/2017 Bill Howe, UW 51 Motivation Algebraic Optimization Regridding End
  • 52. 8/7/2017 Bill Howe, UW 52 Globally conservative Parallelizable Commutes with user- selected restrictions masking to handle mismatched domains Todos: • Characterize the error relative to plain supermeshing • Universal Regridding-as-a-Service Motivation Algebraic Optimization Regridding End
  • 53. Outreach and Usage • Code is available, but in transition to github – Search “gridfields” on google code – http://code.google.com/p/gridfields/ – C++ with Python bindings • Integrated into the Hyrax Data Server – OPULS project funded by NOAA – Server-side processing of unstructured grids • Other users – US Geological Survey – NOAA 8/7/2017 Bill Howe, UW 538/7/2017 Bill Howe, UW 53 Motivation Algebraic Optimization Regridding End
  • 54. 8/7/2017 Bill Howe, UW 54 • Screenshot of OPeNDAP demo http://ec2-174-129-186-110.compute-1.amazonaws.com:8088/nc/test4.nc.nc? ugrid_restrict(0,"Y>41.5&Y<42.75&X>-68.0&X<-66.0") Motivation Algebraic Optimization Regridding End
  • 55. Wrap up • Integration of big data and big models is the game • Database-style systems are about hiding complexity and raising the level of abstraction • A database-style query algebra for FEMs emphasizing interpolation and regridding across data and models made sense to us • But more broadly: a richer infrastructure for comparing and sharing model results and data • One idea: “Virtual datasets” where the model is executed in response to queries, perhaps with simpler grids and relaxed assumptions 8/7/2017 Bill Howe, UW 55 Motivation Algebraic Optimization Regridding End
  • 56. 56 Propublica, May 2016 Motivation Regridding Supermeshi ng Database Algebras Evaluat ion Numerical conservatio n Responsible Data Science
  • 57. 57 The Special Committee on Criminal Justice Reform's hearing of reducing the pre-trial jail population. Technical.ly, September 2016 Philadelphia is grappling with the prospect of a racist computer algorithm Any background signal in the data of institutional racism is amplified by the algorithm operationalized by the algorithm legitimized by the algorithm “Should I be afraid of risk assessment tools?” “No, you gotta tell me a lot more about yourself. At what age were you first arrested? What is the date of your most recent crime?” “And what’s the culture of policing in the neighborhood in which I grew up in?” Motivation Regridding Supermeshi ng Database Algebras Evaluat ion Numerical conservatio n Responsible Data Science
  • 58. 8/7/2017 Bill Howe, UW 58 Amazon Prime Now Delivery Area: Atlanta Bloomberg, 2016 Motivation Regridding Supermeshi ng Database Algebras Evaluat ion Numerical conservatio n Responsible Data Science
  • 59. 8/7/2017 Bill Howe, UW 59 Amazon Prime Now Delivery Area: Boston Bloomberg, 2016 Motivation Regridding Supermeshi ng Database Algebras Evaluat ion Numerical conservatio n Responsible Data Science
  • 60. 8/7/2017 Bill Howe, UW 60 Amazon Prime Now Delivery Area: Chicago Bloomberg, 2016 Motivation Regridding Supermeshi ng Database Algebras Evaluat ion Numerical conservatio n Responsible Data Science
  • 61. First decade of Data Science research and practice: What can we do with massive, noisy, heterogeneous datasets? Next decade of Data Science research and practice: What should we do with massive, noisy, heterogeneous datasets? The way I think about this…..(1) Motivation Regridding Supermeshi ng Database Algebras Evaluat ion Numerical conservatio n Responsible Data Science
  • 62. The way I think about this…. (2) Decisions are based on two sources of information: 1. Past examples e.g., “prior arrests tend to increase likelihood of future arrests” 2. Societal constraints e.g., “we must avoid racial discrimination” 8/7/2017 Data, Responsibly / SciTech NW 62 We’ve become very good at automating the use of past examples We’ve only just started to think about incorporating societal constraints Motivation Regridding Supermeshi ng Database Algebras Evaluat ion Numerical conservatio n Responsible Data Science
  • 63. The way I think about this… (3) How do we apply societal constraints to algorithmic decision-making? Option 1: Rely on human oversight Ex: EU General Data Protection Regulation requires that a human be involved in legally binding algorithmic decision-making Ex: Wisconsin Supreme Court says a human must review algorithmic decisions made by recidivism models Issues with scalability, prejudice Option 2: Build systems to help enforce these constraints This is the approach we are exploring 8/7/2017 Data, Responsibly / SciTech NW 63 Motivation Regridding Supermeshi ng Database Algebras Evaluat ion Numerical conservatio n Responsible Data Science
  • 64. The way I think about this…(4) On transparency vs. accountability: • For human decision-making, sometimes explanations are required, improving transparency – Supreme court decisions – Employee reprimands/termination • But when transparency is difficult, accountability takes over – medical emergencies, business decisions • As we shift decisions to algorithms, we lose both transparency AND accountability • “The buck stops where?” 8/7/2017 Data, Responsibly / SciTech NW 64 Motivation Regridding Supermeshi ng Database Algebras Evaluat ion Numerical conservatio n Responsible Data Science
  • 65. Fairness Accountability Transparency Privacy Reproducibility Fides: A platform for responsible data science joint with Stoyanovich [US], Abiteboul [FR], Miklau [US], Sahuguet [US], Weikum [DE] Data Curation novel features to support: So what do we do about it? Motivation Regridding Supermeshi ng Database Algebras Evaluat ion Numerical conservatio n Responsible Data Science
  • 66. Motivation Regridding Supermeshi ng Database Algebras Evaluat ion Numerical conservatio n Responsible Data Science