SlideShare une entreprise Scribd logo
1  sur  51
Télécharger pour lire hors ligne
What	
  is	
  a	
  Data	
  Commons	
  and	
  	
  
Why	
  Should	
  You	
  Care?	
  
Robert	
  Grossman	
  
University	
  of	
  Chicago	
  
Open	
  Cloud	
  Consor@um	
  
April	
  22,	
  2015	
  
NASA	
  IS&T	
  Colloquium	
  
Collect	
  data	
  and	
  
distribute	
  files	
  via	
  DAAC	
  
and	
  apply	
  data	
  mining	
  	
  
Make	
  data	
  available	
  via	
  
open	
  APIs	
  and	
  
apply	
  data	
  science	
  
2000	
  
2010-­‐2015	
  
2020	
  
???	
  
1.	
  Data	
  Commons	
  
We	
  have	
  a	
  problem	
  …	
  
The	
  commodi@za@on	
  of	
  
sensors	
  is	
  crea@ng	
  an	
  
explosive	
  growth	
  of	
  data	
  
It	
  can	
  take	
  weeks	
  to	
  
download	
  large	
  geo-­‐
spa@al	
  datasets	
  
Analyzing	
  the	
  data	
  is	
  more	
  
expensive	
  than	
  producing	
  it	
  
There	
  is	
  not	
  enough	
  funding	
  for	
  
every	
  researcher	
  to	
  house	
  all	
  
the	
  data	
  they	
  need	
  
Data	
  Commons	
  
Data	
  commons	
  co-­‐locate	
  data,	
  storage	
  and	
  compu@ng	
  
infrastructure,	
  and	
  commonly	
  used	
  tools	
  for	
  analyzing	
  
and	
  sharing	
  data	
  to	
  create	
  a	
  resource	
  for	
  the	
  research	
  
community.	
  
Source:	
  Interior	
  of	
  one	
  of	
  Google’s	
  Data	
  Center,	
  www.google.com/about/datacenters/	
  
The	
  Tragedy	
  of	
  the	
  Commons	
  
Source:	
  GarreY	
  Hardin,	
  The	
  Tragedy	
  of	
  the	
  Commons,	
  Science,	
  Volume	
  162,	
  Number	
  3859,	
  pages	
  
1243-­‐1248,	
  13	
  December	
  1968.	
  
Individuals	
  when	
  they	
  act	
  independently	
  following	
  their	
  
self	
  interests	
  can	
  deplete	
  a	
  deplete	
  a	
  common	
  resource,	
  
contrary	
  to	
  a	
  whole	
  group's	
  long-­‐term	
  best	
  interests.	
  
GarreY	
  Hardin	
  
7	
  
www.opencloudconsor@um.org	
  
•  U.S	
  based	
  not-­‐for-­‐profit	
  corpora@on.	
  
•  Manages	
  cloud	
  compu@ng	
  infrastructure	
  to	
  support	
  
scien@fic	
  research:	
  Open	
  Science	
  Data	
  Cloud,	
  Project	
  
Matsu,	
  &	
  OCC	
  NOAA	
  Data	
  Commons.	
  
•  Manages	
  cloud	
  compu@ng	
  infrastructure	
  to	
  support	
  
medical	
  and	
  health	
  care	
  research:	
  Biomedical	
  
Commons	
  Cloud.	
  	
  	
  
•  Manages	
  cloud	
  compu@ng	
  testbeds:	
  Open	
  Cloud	
  
Testbed.	
  
	
  
What	
  Scale?	
  
•  New	
  data	
  centers	
  are	
  
some@mes	
  divided	
  into	
  
“pods,”	
  which	
  can	
  be	
  built	
  
out	
  as	
  needed.	
  
•  A	
  reasonable	
  scale	
  for	
  what	
  is	
  
needed	
  for	
  a	
  commons	
  is	
  one	
  
of	
  these	
  pods	
  (“cyberpod”)	
  
•  Let’s	
  use	
  the	
  term	
  “datapod”	
  
for	
  the	
  analy@c	
  infrastructure	
  
that	
  scales	
  to	
  a	
  cyberpod.	
  
•  Think	
  of	
  as	
  the	
  scale	
  out	
  of	
  a	
  
database.	
  
Pod	
  A	
   Pod	
  B	
  
experimental	
  
science	
  
simula@on	
  
science	
  
data	
  science	
  
1609	
  
30x	
  
1670	
  
250x	
  
1976	
  
10x-­‐100x	
  
2004	
  
10x-­‐100x	
  
Core	
  Data	
  Commons	
  Services	
  
•  Digital	
  IDs	
  
•  Metadata	
  services	
  
•  High	
  performance	
  transport	
  
•  Data	
  export	
  
•  Pay	
  for	
  compute	
  with	
  images/containers	
  containing	
  
commonly	
  used	
  tools,	
  applica@ons	
  and	
  services,	
  
specialized	
  for	
  each	
  research	
  community	
  
Cloud	
  1	
   Cloud	
  3	
  
Data	
  Commons	
  	
  1	
  
Commons	
  
provide	
  data	
  to	
  
other	
  commons	
  
and	
  to	
  clouds	
  
Research	
  
projects	
  
producing	
  data	
  
Research	
  scien@sts	
  at	
  
research	
  center	
  B	
  
Research	
  scien@sts	
  at	
  
research	
  center	
  C	
  
Research	
  scien@sts	
  at	
  
research	
  center	
  A	
  
downloading	
  data	
  
Community	
  develops	
  
open	
  source	
  sojware	
  
stacks	
  for	
  commons	
  
and	
  clouds	
  
Cloud	
  2	
  
Data	
  
Commons	
  2	
  
Complex	
  sta@s@cal	
  
models	
  over	
  small	
  data	
  
that	
  are	
  highly	
  manual	
  
and	
  update	
  infrequently.	
  
Simpler	
  sta@s@cal	
  
models	
  over	
  large	
  data	
  
that	
  are	
  highly	
  
automated	
  and	
  update	
  
frequently.	
  
memory	
   databases	
  
GB	
   TB	
   PB	
  
W	
   KW	
   MW	
  
datapods	
  
cyber	
  pods	
  
Is	
  More	
  Different?	
  	
  Do	
  New	
  Phenomena	
  
Emerge	
  at	
  Scale	
  in	
  Biomedical	
  Data?	
  
Source:	
  P.	
  W.	
  Anderson,	
  More	
  is	
  Different,	
  Science,	
  Volume	
  177,	
  Number	
  4047,	
  4	
  August	
  1972,	
  pages	
  393-­‐396.	
  
2.	
  OCC	
  Data	
  Commons	
  
matsu.opensciencedatacloud.org	
  
OCC-­‐NASA	
  Collabora@on	
  2009	
  -­‐	
  present	
  	
  
•  Public-­‐private	
  data	
  collabora@ve	
  announced	
  April	
  21,	
  2015	
  
by	
  Secretary	
  of	
  Commerce	
  Pritzker.	
  	
  
•  AWS,	
  Google,	
  Microsoj	
  and	
  Open	
  Cloud	
  Consor@um	
  will	
  
form	
  four	
  collabora@ons.	
  
OSDC	
  Commons	
  Architecture	
  
Object	
  storage	
  
(permanent)	
  
Scalable	
  light	
  
weight	
  workflow	
  
Community	
  data	
  products	
  
(data	
  harmoniza@on)	
  
Data	
  
submission	
  
portal	
  
Open	
  APIs	
  
for	
  data	
  
access	
  and	
  
data	
  access	
  
portal	
  
Co-­‐located	
  
“pay	
  for	
  
compute”	
  
Digital	
  ID	
  Service	
  
&	
  Metadata	
  Service	
  
Devops	
  suppor@ng	
  virtual	
  machines	
  	
  and	
  containers	
  
3.	
  Scanning	
  Queries	
  over	
  Commons	
  	
  
and	
  the	
  Matsu	
  Wheel	
  
What	
  is	
  the	
  Project	
  Matsu?	
  
Matsu	
  is	
  an	
  open	
  source	
  
project	
  for	
  processing	
  satellite	
  
imagery	
  to	
  support	
  earth	
  
sciences	
  researchers	
  using	
  a	
  
data	
  commons.	
  
Matsu	
  is	
  a	
  joint	
  project	
  
between	
  the	
  Open	
  Cloud	
  
Consor@um	
  and	
  NASA’s	
  EO-­‐1	
  
Mission	
  (Dan	
  Mandl,	
  Lead)	
  
All	
  available	
  L1G	
  images	
  (2010-­‐now)	
  
NASA’s	
  Matsu	
  Mashup	
  
Flood/Drought	
  Dashboard	
  Examples	
  
GeoSocial	
  API	
  Consumer	
  embedded	
  in	
  Dashboard	
  
Ini@al	
  crowdsourcing	
  func@onality	
  
(pictures,	
  GPS	
  features	
  and	
  water	
  
edge	
  loca@ons	
  )	
  
GeoSocial	
  API	
  used	
  to	
  discover	
  
Radarsat	
  product	
  in	
  area	
  (User	
  can	
  
see	
  registra@on	
  error)	
  
1.	
  Open	
  Science	
  Data	
  
Cloud	
  (OSDC)	
  stores	
  
Level	
  0	
  data	
  from	
  EO-­‐1	
  
and	
  uses	
  an	
  OpenStack-­‐
based	
  cloud	
  to	
  create	
  
Level	
  1	
  data.	
  
2.	
  OSDC	
  also	
  
provides	
  OpenStack	
  
resources	
  for	
  the	
  
Nambia	
  Flood	
  
Dashboard	
  
developed	
  by	
  Dan	
  
Mandl’s	
  team.	
  
3.	
  Project	
  Matsu	
  uses	
  
a	
  Hadoop/Accumulo	
  
system	
  to	
  run	
  
analy@cs	
  nightly	
  and	
  
to	
  create	
  @les	
  with	
  
OGC-­‐compliant	
  
WMTS.	
  
Amount	
  of	
  data	
  retrieved	
  
Number	
  of	
  queries	
  
mashup	
  
re-­‐analysis	
  
“wheel”	
  
row-­‐oriented	
   column-­‐oriented	
  
done	
  by	
  staff	
  
self-­‐service	
  by	
  
community	
  
Matsu	
  Hadoop	
  Architecture	
  
Hadoop	
  HDFS	
  
Matsu	
  Web	
  Map	
  Tile	
  
Service	
  
Matsu	
  MR-­‐based	
  
Tiling	
  Service	
  
NoSQL	
  
Database(Accumulo)	
  
Images	
  at	
  different	
  zoom	
  layers	
  
suitable	
  for	
  OGC	
  Web	
  Mapping	
  Server	
  
Level	
  0,	
  Level	
  1	
  and	
  Level	
  2	
  
images	
  
MapReduce	
  used	
  to	
  process	
  Level	
  n	
  to	
  Level	
  n+1	
  
data	
  and	
  to	
  par@@on	
  images	
  for	
  different	
  zoom	
  
levels	
  
NoSQL-­‐based	
  
Analy@c	
  Services	
  
Streaming	
  Analy@c	
  
Services	
  
MR-­‐based	
  Analy@c	
  
Services	
  
Analy@c	
  Services	
   Storage	
  for	
  WMTS	
  @les	
  and	
  
derived	
  data	
  products	
  
Presenta@on	
  Services	
  
Web	
  Coverage	
  
Processing	
  Service	
  
(WCPS)	
  
Workflow	
  Services	
  
The Matsu Wheel
for analyzing large volumes of hyperspectral image data
Spectral anomaly detected:
Barren Island active volcano, Feb, 2014
Spectral anomaly detected:
Nishinoshima active volcano, Dec, 2014
Spectral anomaly detected:
North Sentinel Island fires, May, 2014
Spectral anomaly detected:
North Sentinel Island fires, May, 2014
Spectral anomaly detected:
Colima Volcano, April 14, 2015
Spectral anomaly detected:
Colima Volcano, April 14, 2015
Matsu Wheel Spectral Anomaly Detector
§  “Contours and Clusters” – looks for physical contours around
spectral clusters
§  PCA analysis applied to the set of reflectivity values (spectra) for
every pixel, and the top 5 components are extracted for further
analysis.
Matsu Wheel Spectral Anomaly Detector
§  “Contours and Clusters” – looks for physical contours around
spectral clusters
§  PCA analysis applied to the set of reflectivity values (spectra) for
every pixel, and the top 5 components are extracted for further
analysis.
§  Pixels are clustered in the transformed 5-D spectral space using a
k-means clustering algorithm.
§  For each image, k = 50 spectral clusters are formed and ranked
from most to least extreme using the Mahalanobis distance of the
cluster from the spectral center.
§  For each spectral cluster, adjacent pixels are grouped together into
contiguous objects.
à returns geographic regions of spectral anomalies that are scored
again as anomalous (0 least , 1000 most) compared to a set of
“normal” spectra, constructed for comparison over a baseline of time
Wheel analytic (beta):
SVM-based land cover classifier
Matsu Wheel Land Cover Classifier
§  Support Vector Machine supervised classifier (uses Python’s Sci-
kit learn)
§  “Training set” constructed from a variety of scenes with range
of locations, time of year, sun angle
§  Cloud, Desert/ Dry Land, Water, Vegetation are manually
(visually) classified using RGB image
à returns classified image
4.	
  Data	
  Peering	
  
Tier	
  1	
  ISPs	
  “Created”	
  the	
  Internet	
  
Amount	
  of	
  data	
  retrieved	
  
Number	
  of	
  	
  
queries	
  
Number	
  of	
  sites	
  
download	
  data	
  
data	
  peering	
  
Cloud	
  1	
  
Data	
  Commons	
  	
  1	
  
Data	
  Commons	
  	
  2	
  
Data	
  Peering	
  
•  Tier	
  1	
  Commons	
  exchange	
  data	
  for	
  the	
  research	
  
community	
  at	
  no	
  charge.	
  
Three	
  Requirements	
  
Two	
  Research	
  Data	
  Commons	
  with	
  a	
  Tier	
  1	
  data	
  
peering	
  rela@onship	
  agree	
  as	
  follows:	
  	
  
1.  To	
  transfer	
  research	
  data	
  between	
  them	
  at	
  no	
  cost.	
  	
  	
  	
  	
  
2.  To	
  peer	
  with	
  at	
  least	
  two	
  other	
  Tier	
  1	
  Research	
  Data	
  
Commons	
  at	
  10	
  Gbps	
  or	
  higher.	
  
3.  To	
  support	
  Digital	
  IDs	
  (of	
  a	
  form	
  to	
  be	
  determined	
  
by	
  mutual	
  agreement)	
  so	
  that	
  a	
  researcher	
  using	
  
infrastructure	
  associated	
  with	
  one	
  Tier	
  1	
  Research	
  
Data	
  Commons	
  can	
  access	
  data	
  transparently	
  from	
  
any	
  of	
  the	
  Tier	
  1	
  Research	
  Data	
  Commons	
  that	
  hold	
  
the	
  desired	
  data.	
  	
  
	
  
5.	
  Five	
  Challenges	
  for	
  Data	
  Commons	
  
The	
  5P	
  Challenges	
  
•  Permanent	
  objects	
  with	
  Digital	
  IDs	
  
•  Cyber	
  Pods	
  with	
  scalable	
  storage	
  and	
  analy@cs	
  
•  Data	
  Peering	
  
•  Portable	
  data	
  
•  Support	
  for	
  Pay	
  for	
  compute	
  
Challenge	
  1:	
  Permanent	
  Secure	
  Objects	
  
•  How	
  do	
  I	
  assign	
  Digital	
  IDs	
  and	
  key	
  metadata	
  to	
  
“controlled	
  access”	
  data	
  objects	
  and	
  collec@ons	
  of	
  data	
  
objects	
  to	
  support	
  distributed	
  computa@on	
  of	
  large	
  
datasets	
  by	
  communi@es	
  of	
  researchers?	
  
–  Metadata	
  may	
  be	
  both	
  public	
  and	
  controlled	
  access	
  
–  Objects	
  must	
  be	
  secure	
  
•  Think	
  of	
  this	
  as	
  a	
  “dns	
  for	
  data.”	
  
•  The	
  test:	
  One	
  commons	
  serving	
  the	
  earth	
  science	
  
community	
  can	
  transfer	
  1	
  PB	
  of	
  data	
  files	
  to	
  another	
  
commons	
  and	
  no	
  data	
  scien@st	
  needs	
  to	
  change	
  their	
  
code	
  
Challenge	
  2:	
  Cyber	
  Pods	
  and	
  Datapods	
  
•  How	
  can	
  I	
  add	
  a	
  rack	
  of	
  compu@ng/storage/networking	
  
equipment	
  to	
  a	
  cyber	
  pod	
  (that	
  has	
  a	
  manifest)	
  so	
  that	
  	
  
–  Ajer	
  aYaching	
  to	
  power	
  
–  Ajer	
  aYaching	
  to	
  network	
  
–  No	
  other	
  manual	
  configura@on	
  is	
  required	
  
–  The	
  data	
  services	
  can	
  make	
  use	
  of	
  the	
  addi@onal	
  
infrastructure	
  
–  The	
  compute	
  services	
  can	
  make	
  use	
  of	
  the	
  addi@onal	
  
infrastructure	
  	
  
•  In	
  other	
  words,	
  we	
  need	
  an	
  open	
  source	
  sojware	
  stack	
  
that	
  scales	
  to	
  cyberpods	
  and	
  data	
  analysis	
  that	
  scales	
  to	
  
datapods.	
  
Challenge	
  3:	
  Data	
  Peering	
  
•  How	
  can	
  a	
  cri@cal	
  mass	
  of	
  data	
  commons	
  support	
  
data	
  peering	
  so	
  that	
  a	
  research	
  at	
  one	
  of	
  the	
  
commons	
  can	
  transparently	
  access	
  data	
  managed	
  by	
  
one	
  of	
  the	
  other	
  commons	
  
– We	
  need	
  to	
  access	
  data	
  independent	
  of	
  where	
  it	
  
is	
  stored	
  
– “Tier	
  1	
  data	
  commons”	
  need	
  to	
  pass	
  community	
  
managed	
  data	
  at	
  no	
  cost	
  
– We	
  need	
  to	
  be	
  able	
  to	
  transport	
  large	
  data	
  
efficiently	
  “end	
  to	
  end”	
  between	
  commons	
  
Challenge	
  4:	
  Data	
  Portability	
  	
  
•  We	
  need	
  an	
  “Indigo	
  BuYon”	
  to	
  move	
  our	
  data	
  
between	
  two	
  commons	
  that	
  peer.	
  
Challenge	
  5:	
  Pay	
  for	
  Compute	
  Challenges	
  –	
  
Low	
  Cost	
  Data	
  Integra@on	
  
•  Commons	
  should	
  support	
  a	
  “free	
  storage	
  for	
  research	
  
data,	
  pay	
  for	
  compute,”	
  model	
  perhaps	
  with	
  “chits”	
  
available	
  to	
  researchers.	
  
•  Today,	
  we	
  by	
  and	
  large	
  integrate	
  data	
  with	
  graduate	
  
students	
  and	
  technical	
  staff	
  
•  How	
  can	
  two	
  datasets	
  from	
  two	
  different	
  commons	
  be	
  
“joined”	
  at	
  “low	
  cost”	
  
–  Linked	
  data	
  
–  Controlled	
  Vocabularies	
  
–  Dataspaces	
  
–  Universal	
  Correla@on	
  Keys	
  
–  Sta@s@cal	
  methods	
  
Collect	
  data	
  and	
  
distribute	
  files	
  via	
  DAAC	
  
and	
  apply	
  data	
  mining	
  	
  
Make	
  data	
  available	
  via	
  
open	
  APIs	
  and	
  
apply	
  data	
  science	
  
2000	
  
2010-­‐2015	
  
2020	
  
Operate	
  data	
  
commons	
  &	
  support	
  
data	
  peering	
  
Ques@ons?	
  
51	
  
For	
  more	
  informa@on:	
  
rgrossman.com	
  
@bobgrossman	
  

Contenu connexe

Tendances

Bionimbus: Towards One Million Genomes (XLDB 2012 Lecture)
Bionimbus: Towards One Million Genomes (XLDB 2012 Lecture)Bionimbus: Towards One Million Genomes (XLDB 2012 Lecture)
Bionimbus: Towards One Million Genomes (XLDB 2012 Lecture)Robert Grossman
 
Crossing the Analytics Chasm and Getting the Models You Developed Deployed
Crossing the Analytics Chasm and Getting the Models You Developed DeployedCrossing the Analytics Chasm and Getting the Models You Developed Deployed
Crossing the Analytics Chasm and Getting the Models You Developed DeployedRobert Grossman
 
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...Robert Grossman
 
What is Data Commons and How Can Your Organization Build One?
What is Data Commons and How Can Your Organization Build One?What is Data Commons and How Can Your Organization Build One?
What is Data Commons and How Can Your Organization Build One?Robert Grossman
 
A Gen3 Perspective of Disparate Data
A Gen3 Perspective of Disparate DataA Gen3 Perspective of Disparate Data
A Gen3 Perspective of Disparate DataRobert Grossman
 
Some Frameworks for Improving Analytic Operations at Your Company
Some Frameworks for Improving Analytic Operations at Your CompanySome Frameworks for Improving Analytic Operations at Your Company
Some Frameworks for Improving Analytic Operations at Your CompanyRobert Grossman
 
Some Proposed Principles for Interoperating Cloud Based Data Platforms
Some Proposed Principles for Interoperating Cloud Based Data PlatformsSome Proposed Principles for Interoperating Cloud Based Data Platforms
Some Proposed Principles for Interoperating Cloud Based Data PlatformsRobert Grossman
 
Health & Status Monitoring (2010-v8)
Health & Status Monitoring (2010-v8)Health & Status Monitoring (2010-v8)
Health & Status Monitoring (2010-v8)Robert Grossman
 
Multipleregression covidmobility and Covid-19 policy recommendation
Multipleregression covidmobility and Covid-19 policy recommendationMultipleregression covidmobility and Covid-19 policy recommendation
Multipleregression covidmobility and Covid-19 policy recommendationKan Yuenyong
 
Open Science Data Cloud - CCA 11
Open Science Data Cloud - CCA 11Open Science Data Cloud - CCA 11
Open Science Data Cloud - CCA 11Robert Grossman
 
Open Science Data Cloud (IEEE Cloud 2011)
Open Science Data Cloud (IEEE Cloud 2011)Open Science Data Cloud (IEEE Cloud 2011)
Open Science Data Cloud (IEEE Cloud 2011)Robert Grossman
 
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...Geoffrey Fox
 
From Vaccine Management to ICU Planning: How CRISP Unlocked the Power of Data...
From Vaccine Management to ICU Planning: How CRISP Unlocked the Power of Data...From Vaccine Management to ICU Planning: How CRISP Unlocked the Power of Data...
From Vaccine Management to ICU Planning: How CRISP Unlocked the Power of Data...Databricks
 
High Performance Data Analytics and a Java Grande Run Time
High Performance Data Analytics and a Java Grande Run TimeHigh Performance Data Analytics and a Java Grande Run Time
High Performance Data Analytics and a Java Grande Run TimeGeoffrey Fox
 
Role of Data Accessibility During Pandemic
Role of Data Accessibility During PandemicRole of Data Accessibility During Pandemic
Role of Data Accessibility During PandemicDatabricks
 
Bionimbus Cambridge Workshop (3-28-11, v7)
Bionimbus Cambridge Workshop (3-28-11, v7)Bionimbus Cambridge Workshop (3-28-11, v7)
Bionimbus Cambridge Workshop (3-28-11, v7)Robert Grossman
 

Tendances (20)

Bionimbus: Towards One Million Genomes (XLDB 2012 Lecture)
Bionimbus: Towards One Million Genomes (XLDB 2012 Lecture)Bionimbus: Towards One Million Genomes (XLDB 2012 Lecture)
Bionimbus: Towards One Million Genomes (XLDB 2012 Lecture)
 
Crossing the Analytics Chasm and Getting the Models You Developed Deployed
Crossing the Analytics Chasm and Getting the Models You Developed DeployedCrossing the Analytics Chasm and Getting the Models You Developed Deployed
Crossing the Analytics Chasm and Getting the Models You Developed Deployed
 
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
 
What is Data Commons and How Can Your Organization Build One?
What is Data Commons and How Can Your Organization Build One?What is Data Commons and How Can Your Organization Build One?
What is Data Commons and How Can Your Organization Build One?
 
A Gen3 Perspective of Disparate Data
A Gen3 Perspective of Disparate DataA Gen3 Perspective of Disparate Data
A Gen3 Perspective of Disparate Data
 
Some Frameworks for Improving Analytic Operations at Your Company
Some Frameworks for Improving Analytic Operations at Your CompanySome Frameworks for Improving Analytic Operations at Your Company
Some Frameworks for Improving Analytic Operations at Your Company
 
Cri big data
Cri big dataCri big data
Cri big data
 
Big Data
Big Data Big Data
Big Data
 
Some Proposed Principles for Interoperating Cloud Based Data Platforms
Some Proposed Principles for Interoperating Cloud Based Data PlatformsSome Proposed Principles for Interoperating Cloud Based Data Platforms
Some Proposed Principles for Interoperating Cloud Based Data Platforms
 
Health & Status Monitoring (2010-v8)
Health & Status Monitoring (2010-v8)Health & Status Monitoring (2010-v8)
Health & Status Monitoring (2010-v8)
 
Multipleregression covidmobility and Covid-19 policy recommendation
Multipleregression covidmobility and Covid-19 policy recommendationMultipleregression covidmobility and Covid-19 policy recommendation
Multipleregression covidmobility and Covid-19 policy recommendation
 
Open Science Data Cloud - CCA 11
Open Science Data Cloud - CCA 11Open Science Data Cloud - CCA 11
Open Science Data Cloud - CCA 11
 
Open Science Data Cloud (IEEE Cloud 2011)
Open Science Data Cloud (IEEE Cloud 2011)Open Science Data Cloud (IEEE Cloud 2011)
Open Science Data Cloud (IEEE Cloud 2011)
 
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
 
From Vaccine Management to ICU Planning: How CRISP Unlocked the Power of Data...
From Vaccine Management to ICU Planning: How CRISP Unlocked the Power of Data...From Vaccine Management to ICU Planning: How CRISP Unlocked the Power of Data...
From Vaccine Management to ICU Planning: How CRISP Unlocked the Power of Data...
 
Future of hpc
Future of hpcFuture of hpc
Future of hpc
 
High Performance Data Analytics and a Java Grande Run Time
High Performance Data Analytics and a Java Grande Run TimeHigh Performance Data Analytics and a Java Grande Run Time
High Performance Data Analytics and a Java Grande Run Time
 
Role of Data Accessibility During Pandemic
Role of Data Accessibility During PandemicRole of Data Accessibility During Pandemic
Role of Data Accessibility During Pandemic
 
V3 i35
V3 i35V3 i35
V3 i35
 
Bionimbus Cambridge Workshop (3-28-11, v7)
Bionimbus Cambridge Workshop (3-28-11, v7)Bionimbus Cambridge Workshop (3-28-11, v7)
Bionimbus Cambridge Workshop (3-28-11, v7)
 

En vedette

AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...
AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...
AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...Robert Grossman
 
AnalyticOps - Chicago PAW 2016
AnalyticOps - Chicago PAW 2016AnalyticOps - Chicago PAW 2016
AnalyticOps - Chicago PAW 2016Robert Grossman
 
Clouds and Commons for the Data Intensive Science Community (June 8, 2015)
Clouds and Commons for the Data Intensive Science Community (June 8, 2015)Clouds and Commons for the Data Intensive Science Community (June 8, 2015)
Clouds and Commons for the Data Intensive Science Community (June 8, 2015)Robert Grossman
 
How to Lower the Cost of Deploying Analytics: An Introduction to the Portable...
How to Lower the Cost of Deploying Analytics: An Introduction to the Portable...How to Lower the Cost of Deploying Analytics: An Introduction to the Portable...
How to Lower the Cost of Deploying Analytics: An Introduction to the Portable...Robert Grossman
 
Practical Methods for Identifying Anomalies That Matter in Large Datasets
Practical Methods for Identifying Anomalies That Matter in Large DatasetsPractical Methods for Identifying Anomalies That Matter in Large Datasets
Practical Methods for Identifying Anomalies That Matter in Large DatasetsRobert Grossman
 
Big Data - Lab A1 (SC 11 Tutorial)
Big Data - Lab A1 (SC 11 Tutorial)Big Data - Lab A1 (SC 11 Tutorial)
Big Data - Lab A1 (SC 11 Tutorial)Robert Grossman
 
The Matsu Project - Open Source Software for Processing Satellite Imagery Data
The Matsu Project - Open Source Software for Processing Satellite Imagery DataThe Matsu Project - Open Source Software for Processing Satellite Imagery Data
The Matsu Project - Open Source Software for Processing Satellite Imagery DataRobert Grossman
 
Processing Big Data (Chapter 3, SC 11 Tutorial)
Processing Big Data (Chapter 3, SC 11 Tutorial)Processing Big Data (Chapter 3, SC 11 Tutorial)
Processing Big Data (Chapter 3, SC 11 Tutorial)Robert Grossman
 
Introduction to Big Data and Science Clouds (Chapter 1, SC 11 Tutorial)
Introduction to Big Data and Science Clouds (Chapter 1, SC 11 Tutorial)Introduction to Big Data and Science Clouds (Chapter 1, SC 11 Tutorial)
Introduction to Big Data and Science Clouds (Chapter 1, SC 11 Tutorial)Robert Grossman
 
The Open Science Data Cloud: Empowering the Long Tail of Science
The Open Science Data Cloud: Empowering the Long Tail of ScienceThe Open Science Data Cloud: Empowering the Long Tail of Science
The Open Science Data Cloud: Empowering the Long Tail of ScienceRobert Grossman
 
Managing Big Data (Chapter 2, SC 11 Tutorial)
Managing Big Data (Chapter 2, SC 11 Tutorial)Managing Big Data (Chapter 2, SC 11 Tutorial)
Managing Big Data (Chapter 2, SC 11 Tutorial)Robert Grossman
 
present trans
present transpresent trans
present transPTF
 
Crea una cuenta de e mail
Crea una cuenta de e mailCrea una cuenta de e mail
Crea una cuenta de e mailfranklin060
 
CIENCIAS POLITICAS
CIENCIAS POLITICASCIENCIAS POLITICAS
CIENCIAS POLITICASjuantoro527
 

En vedette (20)

AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...
AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...
AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...
 
AnalyticOps - Chicago PAW 2016
AnalyticOps - Chicago PAW 2016AnalyticOps - Chicago PAW 2016
AnalyticOps - Chicago PAW 2016
 
Clouds and Commons for the Data Intensive Science Community (June 8, 2015)
Clouds and Commons for the Data Intensive Science Community (June 8, 2015)Clouds and Commons for the Data Intensive Science Community (June 8, 2015)
Clouds and Commons for the Data Intensive Science Community (June 8, 2015)
 
How to Lower the Cost of Deploying Analytics: An Introduction to the Portable...
How to Lower the Cost of Deploying Analytics: An Introduction to the Portable...How to Lower the Cost of Deploying Analytics: An Introduction to the Portable...
How to Lower the Cost of Deploying Analytics: An Introduction to the Portable...
 
Practical Methods for Identifying Anomalies That Matter in Large Datasets
Practical Methods for Identifying Anomalies That Matter in Large DatasetsPractical Methods for Identifying Anomalies That Matter in Large Datasets
Practical Methods for Identifying Anomalies That Matter in Large Datasets
 
Adapt or Go extinct
Adapt or Go extinctAdapt or Go extinct
Adapt or Go extinct
 
Big Data - Lab A1 (SC 11 Tutorial)
Big Data - Lab A1 (SC 11 Tutorial)Big Data - Lab A1 (SC 11 Tutorial)
Big Data - Lab A1 (SC 11 Tutorial)
 
The Matsu Project - Open Source Software for Processing Satellite Imagery Data
The Matsu Project - Open Source Software for Processing Satellite Imagery DataThe Matsu Project - Open Source Software for Processing Satellite Imagery Data
The Matsu Project - Open Source Software for Processing Satellite Imagery Data
 
Processing Big Data (Chapter 3, SC 11 Tutorial)
Processing Big Data (Chapter 3, SC 11 Tutorial)Processing Big Data (Chapter 3, SC 11 Tutorial)
Processing Big Data (Chapter 3, SC 11 Tutorial)
 
Introduction to Big Data and Science Clouds (Chapter 1, SC 11 Tutorial)
Introduction to Big Data and Science Clouds (Chapter 1, SC 11 Tutorial)Introduction to Big Data and Science Clouds (Chapter 1, SC 11 Tutorial)
Introduction to Big Data and Science Clouds (Chapter 1, SC 11 Tutorial)
 
The Open Science Data Cloud: Empowering the Long Tail of Science
The Open Science Data Cloud: Empowering the Long Tail of ScienceThe Open Science Data Cloud: Empowering the Long Tail of Science
The Open Science Data Cloud: Empowering the Long Tail of Science
 
Managing Big Data (Chapter 2, SC 11 Tutorial)
Managing Big Data (Chapter 2, SC 11 Tutorial)Managing Big Data (Chapter 2, SC 11 Tutorial)
Managing Big Data (Chapter 2, SC 11 Tutorial)
 
UMANG Resume
UMANG ResumeUMANG Resume
UMANG Resume
 
la nacencia.
la nacencia.la nacencia.
la nacencia.
 
present trans
present transpresent trans
present trans
 
Lecturascríticas9
Lecturascríticas9Lecturascríticas9
Lecturascríticas9
 
Senior seminar finals
Senior seminar finalsSenior seminar finals
Senior seminar finals
 
Crea una cuenta de e mail
Crea una cuenta de e mailCrea una cuenta de e mail
Crea una cuenta de e mail
 
Regla 5 y handball
Regla 5 y handballRegla 5 y handball
Regla 5 y handball
 
CIENCIAS POLITICAS
CIENCIAS POLITICASCIENCIAS POLITICAS
CIENCIAS POLITICAS
 

Similaire à What is a Data Commons and Why Should You Care?

Time to Science/Time to Results: Transforming Research in the Cloud
Time to Science/Time to Results: Transforming Research in the CloudTime to Science/Time to Results: Transforming Research in the Cloud
Time to Science/Time to Results: Transforming Research in the CloudAmazon Web Services
 
Unlocking Open Data in the Cloud
Unlocking Open Data in the CloudUnlocking Open Data in the Cloud
Unlocking Open Data in the CloudAmazon Web Services
 
HPC Clusters in the (almost) Infinite Cloud
HPC Clusters in the (almost) Infinite CloudHPC Clusters in the (almost) Infinite Cloud
HPC Clusters in the (almost) Infinite CloudAmazon Web Services
 
Accelerating Time to Science: Transforming Research in the Cloud
Accelerating Time to Science: Transforming Research in the CloudAccelerating Time to Science: Transforming Research in the Cloud
Accelerating Time to Science: Transforming Research in the CloudJamie Kinney
 
Evolving Storage and Cyber Infrastructure at the NASA Center for Climate Simu...
Evolving Storage and Cyber Infrastructure at the NASA Center for Climate Simu...Evolving Storage and Cyber Infrastructure at the NASA Center for Climate Simu...
Evolving Storage and Cyber Infrastructure at the NASA Center for Climate Simu...inside-BigData.com
 
Open Science Data Cloud (June 21, 2010)
Open Science Data Cloud (June 21, 2010)Open Science Data Cloud (June 21, 2010)
Open Science Data Cloud (June 21, 2010)Robert Grossman
 
How to use NCI's national repository of big spatial data collections
How to use NCI's national repository of big spatial data collectionsHow to use NCI's national repository of big spatial data collections
How to use NCI's national repository of big spatial data collectionsARDC
 
Cyberinfrastructure and Applications Overview: Howard University June22
Cyberinfrastructure and Applications Overview: Howard University June22Cyberinfrastructure and Applications Overview: Howard University June22
Cyberinfrastructure and Applications Overview: Howard University June22marpierc
 
Earth on AWS - Next-Generation Open Data Platforms
Earth on AWS - Next-Generation Open Data PlatformsEarth on AWS - Next-Generation Open Data Platforms
Earth on AWS - Next-Generation Open Data PlatformsAmazon Web Services
 
Bioclouds CAMDA (Robert Grossman) 09-v9p
Bioclouds CAMDA (Robert Grossman) 09-v9pBioclouds CAMDA (Robert Grossman) 09-v9p
Bioclouds CAMDA (Robert Grossman) 09-v9pRobert Grossman
 
Distributed Near Real-Time Processing of Sensor Network Data Flows for Smart ...
Distributed Near Real-Time Processing of Sensor Network Data Flows for Smart ...Distributed Near Real-Time Processing of Sensor Network Data Flows for Smart ...
Distributed Near Real-Time Processing of Sensor Network Data Flows for Smart ...Otávio Carvalho
 
Accelerating Science with Cloud Technologies in the ABoVE Science Cloud
Accelerating Science with Cloud Technologies in the ABoVE Science CloudAccelerating Science with Cloud Technologies in the ABoVE Science Cloud
Accelerating Science with Cloud Technologies in the ABoVE Science CloudGlobus
 
Open Cloud Consortium Overview (01-10-10 V6)
Open Cloud Consortium Overview (01-10-10 V6)Open Cloud Consortium Overview (01-10-10 V6)
Open Cloud Consortium Overview (01-10-10 V6)Robert Grossman
 
An Introduction to Data Intensive Computing
An Introduction to Data Intensive ComputingAn Introduction to Data Intensive Computing
An Introduction to Data Intensive ComputingCollin Bennett
 
The Discovery Cloud: Accelerating Science via Outsourcing and Automation
The Discovery Cloud: Accelerating Science via Outsourcing and AutomationThe Discovery Cloud: Accelerating Science via Outsourcing and Automation
The Discovery Cloud: Accelerating Science via Outsourcing and AutomationIan Foster
 
Mike Warren Keynote
Mike Warren KeynoteMike Warren Keynote
Mike Warren KeynoteData Con LA
 
Scientific
Scientific Scientific
Scientific marpierc
 
Colloborative computing
Colloborative computing Colloborative computing
Colloborative computing Cisco
 
High Performance Computing and Big Data
High Performance Computing and Big Data High Performance Computing and Big Data
High Performance Computing and Big Data Geoffrey Fox
 

Similaire à What is a Data Commons and Why Should You Care? (20)

Time to Science/Time to Results: Transforming Research in the Cloud
Time to Science/Time to Results: Transforming Research in the CloudTime to Science/Time to Results: Transforming Research in the Cloud
Time to Science/Time to Results: Transforming Research in the Cloud
 
Unlocking Open Data in the Cloud
Unlocking Open Data in the CloudUnlocking Open Data in the Cloud
Unlocking Open Data in the Cloud
 
HPC Clusters in the (almost) Infinite Cloud
HPC Clusters in the (almost) Infinite CloudHPC Clusters in the (almost) Infinite Cloud
HPC Clusters in the (almost) Infinite Cloud
 
Accelerating Time to Science: Transforming Research in the Cloud
Accelerating Time to Science: Transforming Research in the CloudAccelerating Time to Science: Transforming Research in the Cloud
Accelerating Time to Science: Transforming Research in the Cloud
 
Evolving Storage and Cyber Infrastructure at the NASA Center for Climate Simu...
Evolving Storage and Cyber Infrastructure at the NASA Center for Climate Simu...Evolving Storage and Cyber Infrastructure at the NASA Center for Climate Simu...
Evolving Storage and Cyber Infrastructure at the NASA Center for Climate Simu...
 
Open Science Data Cloud (June 21, 2010)
Open Science Data Cloud (June 21, 2010)Open Science Data Cloud (June 21, 2010)
Open Science Data Cloud (June 21, 2010)
 
How to use NCI's national repository of big spatial data collections
How to use NCI's national repository of big spatial data collectionsHow to use NCI's national repository of big spatial data collections
How to use NCI's national repository of big spatial data collections
 
Cyberinfrastructure and Applications Overview: Howard University June22
Cyberinfrastructure and Applications Overview: Howard University June22Cyberinfrastructure and Applications Overview: Howard University June22
Cyberinfrastructure and Applications Overview: Howard University June22
 
Earth on AWS - Next-Generation Open Data Platforms
Earth on AWS - Next-Generation Open Data PlatformsEarth on AWS - Next-Generation Open Data Platforms
Earth on AWS - Next-Generation Open Data Platforms
 
Bioclouds CAMDA (Robert Grossman) 09-v9p
Bioclouds CAMDA (Robert Grossman) 09-v9pBioclouds CAMDA (Robert Grossman) 09-v9p
Bioclouds CAMDA (Robert Grossman) 09-v9p
 
Distributed Near Real-Time Processing of Sensor Network Data Flows for Smart ...
Distributed Near Real-Time Processing of Sensor Network Data Flows for Smart ...Distributed Near Real-Time Processing of Sensor Network Data Flows for Smart ...
Distributed Near Real-Time Processing of Sensor Network Data Flows for Smart ...
 
Accelerating Science with Cloud Technologies in the ABoVE Science Cloud
Accelerating Science with Cloud Technologies in the ABoVE Science CloudAccelerating Science with Cloud Technologies in the ABoVE Science Cloud
Accelerating Science with Cloud Technologies in the ABoVE Science Cloud
 
Open Cloud Consortium Overview (01-10-10 V6)
Open Cloud Consortium Overview (01-10-10 V6)Open Cloud Consortium Overview (01-10-10 V6)
Open Cloud Consortium Overview (01-10-10 V6)
 
Big data analytics
Big data analyticsBig data analytics
Big data analytics
 
An Introduction to Data Intensive Computing
An Introduction to Data Intensive ComputingAn Introduction to Data Intensive Computing
An Introduction to Data Intensive Computing
 
The Discovery Cloud: Accelerating Science via Outsourcing and Automation
The Discovery Cloud: Accelerating Science via Outsourcing and AutomationThe Discovery Cloud: Accelerating Science via Outsourcing and Automation
The Discovery Cloud: Accelerating Science via Outsourcing and Automation
 
Mike Warren Keynote
Mike Warren KeynoteMike Warren Keynote
Mike Warren Keynote
 
Scientific
Scientific Scientific
Scientific
 
Colloborative computing
Colloborative computing Colloborative computing
Colloborative computing
 
High Performance Computing and Big Data
High Performance Computing and Big Data High Performance Computing and Big Data
High Performance Computing and Big Data
 

Dernier

Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一ffjhghh
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 

Dernier (20)

VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 

What is a Data Commons and Why Should You Care?

  • 1. What  is  a  Data  Commons  and     Why  Should  You  Care?   Robert  Grossman   University  of  Chicago   Open  Cloud  Consor@um   April  22,  2015   NASA  IS&T  Colloquium  
  • 2. Collect  data  and   distribute  files  via  DAAC   and  apply  data  mining     Make  data  available  via   open  APIs  and   apply  data  science   2000   2010-­‐2015   2020   ???  
  • 4. We  have  a  problem  …   The  commodi@za@on  of   sensors  is  crea@ng  an   explosive  growth  of  data   It  can  take  weeks  to   download  large  geo-­‐ spa@al  datasets   Analyzing  the  data  is  more   expensive  than  producing  it   There  is  not  enough  funding  for   every  researcher  to  house  all   the  data  they  need  
  • 5. Data  Commons   Data  commons  co-­‐locate  data,  storage  and  compu@ng   infrastructure,  and  commonly  used  tools  for  analyzing   and  sharing  data  to  create  a  resource  for  the  research   community.   Source:  Interior  of  one  of  Google’s  Data  Center,  www.google.com/about/datacenters/  
  • 6. The  Tragedy  of  the  Commons   Source:  GarreY  Hardin,  The  Tragedy  of  the  Commons,  Science,  Volume  162,  Number  3859,  pages   1243-­‐1248,  13  December  1968.   Individuals  when  they  act  independently  following  their   self  interests  can  deplete  a  deplete  a  common  resource,   contrary  to  a  whole  group's  long-­‐term  best  interests.   GarreY  Hardin  
  • 7. 7   www.opencloudconsor@um.org   •  U.S  based  not-­‐for-­‐profit  corpora@on.   •  Manages  cloud  compu@ng  infrastructure  to  support   scien@fic  research:  Open  Science  Data  Cloud,  Project   Matsu,  &  OCC  NOAA  Data  Commons.   •  Manages  cloud  compu@ng  infrastructure  to  support   medical  and  health  care  research:  Biomedical   Commons  Cloud.       •  Manages  cloud  compu@ng  testbeds:  Open  Cloud   Testbed.    
  • 8. What  Scale?   •  New  data  centers  are   some@mes  divided  into   “pods,”  which  can  be  built   out  as  needed.   •  A  reasonable  scale  for  what  is   needed  for  a  commons  is  one   of  these  pods  (“cyberpod”)   •  Let’s  use  the  term  “datapod”   for  the  analy@c  infrastructure   that  scales  to  a  cyberpod.   •  Think  of  as  the  scale  out  of  a   database.   Pod  A   Pod  B  
  • 9. experimental   science   simula@on   science   data  science   1609   30x   1670   250x   1976   10x-­‐100x   2004   10x-­‐100x  
  • 10. Core  Data  Commons  Services   •  Digital  IDs   •  Metadata  services   •  High  performance  transport   •  Data  export   •  Pay  for  compute  with  images/containers  containing   commonly  used  tools,  applica@ons  and  services,   specialized  for  each  research  community  
  • 11. Cloud  1   Cloud  3   Data  Commons    1   Commons   provide  data  to   other  commons   and  to  clouds   Research   projects   producing  data   Research  scien@sts  at   research  center  B   Research  scien@sts  at   research  center  C   Research  scien@sts  at   research  center  A   downloading  data   Community  develops   open  source  sojware   stacks  for  commons   and  clouds   Cloud  2   Data   Commons  2  
  • 12. Complex  sta@s@cal   models  over  small  data   that  are  highly  manual   and  update  infrequently.   Simpler  sta@s@cal   models  over  large  data   that  are  highly   automated  and  update   frequently.   memory   databases   GB   TB   PB   W   KW   MW   datapods   cyber  pods  
  • 13. Is  More  Different?    Do  New  Phenomena   Emerge  at  Scale  in  Biomedical  Data?   Source:  P.  W.  Anderson,  More  is  Different,  Science,  Volume  177,  Number  4047,  4  August  1972,  pages  393-­‐396.  
  • 14. 2.  OCC  Data  Commons  
  • 16. •  Public-­‐private  data  collabora@ve  announced  April  21,  2015   by  Secretary  of  Commerce  Pritzker.     •  AWS,  Google,  Microsoj  and  Open  Cloud  Consor@um  will   form  four  collabora@ons.  
  • 17.
  • 18. OSDC  Commons  Architecture   Object  storage   (permanent)   Scalable  light   weight  workflow   Community  data  products   (data  harmoniza@on)   Data   submission   portal   Open  APIs   for  data   access  and   data  access   portal   Co-­‐located   “pay  for   compute”   Digital  ID  Service   &  Metadata  Service   Devops  suppor@ng  virtual  machines    and  containers  
  • 19. 3.  Scanning  Queries  over  Commons     and  the  Matsu  Wheel  
  • 20. What  is  the  Project  Matsu?   Matsu  is  an  open  source   project  for  processing  satellite   imagery  to  support  earth   sciences  researchers  using  a   data  commons.   Matsu  is  a  joint  project   between  the  Open  Cloud   Consor@um  and  NASA’s  EO-­‐1   Mission  (Dan  Mandl,  Lead)  
  • 21. All  available  L1G  images  (2010-­‐now)  
  • 23. Flood/Drought  Dashboard  Examples   GeoSocial  API  Consumer  embedded  in  Dashboard   Ini@al  crowdsourcing  func@onality   (pictures,  GPS  features  and  water   edge  loca@ons  )   GeoSocial  API  used  to  discover   Radarsat  product  in  area  (User  can   see  registra@on  error)  
  • 24. 1.  Open  Science  Data   Cloud  (OSDC)  stores   Level  0  data  from  EO-­‐1   and  uses  an  OpenStack-­‐ based  cloud  to  create   Level  1  data.   2.  OSDC  also   provides  OpenStack   resources  for  the   Nambia  Flood   Dashboard   developed  by  Dan   Mandl’s  team.   3.  Project  Matsu  uses   a  Hadoop/Accumulo   system  to  run   analy@cs  nightly  and   to  create  @les  with   OGC-­‐compliant   WMTS.  
  • 25. Amount  of  data  retrieved   Number  of  queries   mashup   re-­‐analysis   “wheel”   row-­‐oriented   column-­‐oriented   done  by  staff   self-­‐service  by   community  
  • 26. Matsu  Hadoop  Architecture   Hadoop  HDFS   Matsu  Web  Map  Tile   Service   Matsu  MR-­‐based   Tiling  Service   NoSQL   Database(Accumulo)   Images  at  different  zoom  layers   suitable  for  OGC  Web  Mapping  Server   Level  0,  Level  1  and  Level  2   images   MapReduce  used  to  process  Level  n  to  Level  n+1   data  and  to  par@@on  images  for  different  zoom   levels   NoSQL-­‐based   Analy@c  Services   Streaming  Analy@c   Services   MR-­‐based  Analy@c   Services   Analy@c  Services   Storage  for  WMTS  @les  and   derived  data  products   Presenta@on  Services   Web  Coverage   Processing  Service   (WCPS)   Workflow  Services  
  • 27. The Matsu Wheel for analyzing large volumes of hyperspectral image data
  • 28. Spectral anomaly detected: Barren Island active volcano, Feb, 2014
  • 29. Spectral anomaly detected: Nishinoshima active volcano, Dec, 2014
  • 30. Spectral anomaly detected: North Sentinel Island fires, May, 2014
  • 31. Spectral anomaly detected: North Sentinel Island fires, May, 2014
  • 32. Spectral anomaly detected: Colima Volcano, April 14, 2015
  • 33. Spectral anomaly detected: Colima Volcano, April 14, 2015
  • 34. Matsu Wheel Spectral Anomaly Detector §  “Contours and Clusters” – looks for physical contours around spectral clusters §  PCA analysis applied to the set of reflectivity values (spectra) for every pixel, and the top 5 components are extracted for further analysis.
  • 35. Matsu Wheel Spectral Anomaly Detector §  “Contours and Clusters” – looks for physical contours around spectral clusters §  PCA analysis applied to the set of reflectivity values (spectra) for every pixel, and the top 5 components are extracted for further analysis. §  Pixels are clustered in the transformed 5-D spectral space using a k-means clustering algorithm. §  For each image, k = 50 spectral clusters are formed and ranked from most to least extreme using the Mahalanobis distance of the cluster from the spectral center. §  For each spectral cluster, adjacent pixels are grouped together into contiguous objects. à returns geographic regions of spectral anomalies that are scored again as anomalous (0 least , 1000 most) compared to a set of “normal” spectra, constructed for comparison over a baseline of time
  • 36. Wheel analytic (beta): SVM-based land cover classifier
  • 37. Matsu Wheel Land Cover Classifier §  Support Vector Machine supervised classifier (uses Python’s Sci- kit learn) §  “Training set” constructed from a variety of scenes with range of locations, time of year, sun angle §  Cloud, Desert/ Dry Land, Water, Vegetation are manually (visually) classified using RGB image à returns classified image
  • 39. Tier  1  ISPs  “Created”  the  Internet  
  • 40. Amount  of  data  retrieved   Number  of     queries   Number  of  sites   download  data   data  peering  
  • 41. Cloud  1   Data  Commons    1   Data  Commons    2   Data  Peering   •  Tier  1  Commons  exchange  data  for  the  research   community  at  no  charge.  
  • 42. Three  Requirements   Two  Research  Data  Commons  with  a  Tier  1  data   peering  rela@onship  agree  as  follows:     1.  To  transfer  research  data  between  them  at  no  cost.           2.  To  peer  with  at  least  two  other  Tier  1  Research  Data   Commons  at  10  Gbps  or  higher.   3.  To  support  Digital  IDs  (of  a  form  to  be  determined   by  mutual  agreement)  so  that  a  researcher  using   infrastructure  associated  with  one  Tier  1  Research   Data  Commons  can  access  data  transparently  from   any  of  the  Tier  1  Research  Data  Commons  that  hold   the  desired  data.      
  • 43. 5.  Five  Challenges  for  Data  Commons  
  • 44. The  5P  Challenges   •  Permanent  objects  with  Digital  IDs   •  Cyber  Pods  with  scalable  storage  and  analy@cs   •  Data  Peering   •  Portable  data   •  Support  for  Pay  for  compute  
  • 45. Challenge  1:  Permanent  Secure  Objects   •  How  do  I  assign  Digital  IDs  and  key  metadata  to   “controlled  access”  data  objects  and  collec@ons  of  data   objects  to  support  distributed  computa@on  of  large   datasets  by  communi@es  of  researchers?   –  Metadata  may  be  both  public  and  controlled  access   –  Objects  must  be  secure   •  Think  of  this  as  a  “dns  for  data.”   •  The  test:  One  commons  serving  the  earth  science   community  can  transfer  1  PB  of  data  files  to  another   commons  and  no  data  scien@st  needs  to  change  their   code  
  • 46. Challenge  2:  Cyber  Pods  and  Datapods   •  How  can  I  add  a  rack  of  compu@ng/storage/networking   equipment  to  a  cyber  pod  (that  has  a  manifest)  so  that     –  Ajer  aYaching  to  power   –  Ajer  aYaching  to  network   –  No  other  manual  configura@on  is  required   –  The  data  services  can  make  use  of  the  addi@onal   infrastructure   –  The  compute  services  can  make  use  of  the  addi@onal   infrastructure     •  In  other  words,  we  need  an  open  source  sojware  stack   that  scales  to  cyberpods  and  data  analysis  that  scales  to   datapods.  
  • 47. Challenge  3:  Data  Peering   •  How  can  a  cri@cal  mass  of  data  commons  support   data  peering  so  that  a  research  at  one  of  the   commons  can  transparently  access  data  managed  by   one  of  the  other  commons   – We  need  to  access  data  independent  of  where  it   is  stored   – “Tier  1  data  commons”  need  to  pass  community   managed  data  at  no  cost   – We  need  to  be  able  to  transport  large  data   efficiently  “end  to  end”  between  commons  
  • 48. Challenge  4:  Data  Portability     •  We  need  an  “Indigo  BuYon”  to  move  our  data   between  two  commons  that  peer.  
  • 49. Challenge  5:  Pay  for  Compute  Challenges  –   Low  Cost  Data  Integra@on   •  Commons  should  support  a  “free  storage  for  research   data,  pay  for  compute,”  model  perhaps  with  “chits”   available  to  researchers.   •  Today,  we  by  and  large  integrate  data  with  graduate   students  and  technical  staff   •  How  can  two  datasets  from  two  different  commons  be   “joined”  at  “low  cost”   –  Linked  data   –  Controlled  Vocabularies   –  Dataspaces   –  Universal  Correla@on  Keys   –  Sta@s@cal  methods  
  • 50. Collect  data  and   distribute  files  via  DAAC   and  apply  data  mining     Make  data  available  via   open  APIs  and   apply  data  science   2000   2010-­‐2015   2020   Operate  data   commons  &  support   data  peering  
  • 51. Ques@ons?   51   For  more  informa@on:   rgrossman.com   @bobgrossman