SlideShare une entreprise Scribd logo
1  sur  24
Review of 3DPAS Theme
Daniel S. Katz, University of Chicago & Argonne National Laboratory
ShantenuJha, Rutgers University
Neil Chue Hong, University of Edinburgh
Simon Dobson, University of St. Andrews
Andre Luckow, Louisiana State University
Omer Rana, University of Cardiff
YogeshSimmhan, University of Southern California


                                                                      www.ci.anl.gov
                                                                      www.ci.uchicago.edu
Outline
•   e-SI
•   DPA theme
•   3DPAS theme
•   Report in-progress
    –   Application Scenarios
    –   Understanding Distributed Dynamic Data
    –   Vectors
    –   Infrastructure
    –   Programming Systems and Abstractions
•   Future Steps

                                                       www.ci.anl.gov
2       3DPAS review for D3Science – d.katz@ieee.org
                                                       www.ci.uchicago.edu
e-Science Institute (e-SI)
•   A 10-year project (Aug 2001 – July 2011), located in Edinburgh
•   Aimed at, but not limited to, UK
•   http://www.esi.ac.uk/
•   Tagline – time & space to think
•   Mission: to stimulate the creation of new insights in e-Science and computing
    science by bringing together international experts and enabling them to
    successfully address significant and diverse challenges
•   Research themes formed the core of eSI’s activity
     – Theme: connected programme of visitors, workshops and events
     – Conceived and driven by Theme Leader
     – Focusing on a specific issue in e-Science that crosses boundaries and raises new
       research questions
     – Goals:
          o Identify research issues
          o Rally a community of researchers
          o Map a path of future research that will make best progress towards new e-Science methods
            and capabilities.



                                                                                        www.ci.anl.gov
3        3DPAS review for D3Science – d.katz@ieee.org
                                                                                        www.ci.uchicago.edu
Context – Data and Science
•   Data has always been important to science
•   Some use the concept of paradigms
    – First (thousand years ago) – empirical –
      describe natural phenomena
    – Second (few hundred years ago) –
      theoretical – use models and generalizations
    – Third (few decades ago) – computational –
      solve complex problem
    – Fourth (few years ago) – data exploration –
      gain knowledge directly from data from experiment, theory,
      simulation
• Problem – we cannot keep declaring new paradigms at an
  exponentially increasing rate
• But it’s true that there is an emerging science of
  “listening to data”, as defined by Jim Gray, Google, etc.
                                                         www.ci.anl.gov
4       3DPAS review for D3Science – d.katz@ieee.org
                                                         www.ci.uchicago.edu
Distributed Programming Abstractions
•   DPA theme at eSI
     –   http://wiki.esi.ac.uk/Distributed_Programming_Abstractions
•   Series of workshops
•   Led to book in progress: ShantenuJha, Daniel S. Katz, Manish Parashar, Omer
    Rana, and Jon Weissman, “Abstractions for Distributed Applications and
    Systems,” to be published by Wiley in 2012
•   And multiple papers, including: S. Jha, D. S. Katz, M. Parashar, O. Rana, and J.
    Weissman, "Critical Perspectives on Large-Scale Distributed Applications and
    Production Grids," (Best Paper Award Winner), Proceedings of the 10th
    IEEE/ACM International Conference on Grid Computing (Grid 2009), 2009.
•   Idea – start with distributed science and engineering applications – analyze
    them (determine `vectors’); examine interaction with infrastructures and
    tools; find abstractions
     – Tech report on infrastructures (much of Chapter 3) available now:
       http://www.ci.uchicago.edu/research/papers/CI-TR-7-0811
     – Vectors: Execution Unit, Coordination, Communication, Execution Environment
•   In the process, we realized that data intensive applications had some unique
    challenges and issues

                                                                          www.ci.anl.gov
5        3DPAS review for D3Science – d.katz@ieee.org
                                                                          www.ci.uchicago.edu
Dynamic Distributed Data-intensive Programming
Systems and Applications (3DPAS)
•   This led to 3DPAS theme at eSI
    –   http://wiki.esi.ac.uk/3DPAS
•   Similar idea to DPA
    – Start with science and engineering applications
    – See if DPA vector suffice or if new vectors are
      needed
    – Examine what is different with respect to
      infrastructures and programming systems
•   Initially done through workshops at eSI
•   Continuing through weekly teleconferences
•   Driving towards a report/paper
                                                       www.ci.anl.gov
6       3DPAS review for D3Science – d.katz@ieee.org
                                                       www.ci.uchicago.edu
D3 (data intensive, distributed, dynamic)
•   Data intensive: order of magnitude of large data and large computing
     –   Exascale data and petascale computing
     –   Petascale data and exascale computing
     –   Exascale data and exascale computing.
•   Distributed: number, dispersion, and replication of distributed data or
    computation resources
     –   Low in a cloud or cluster that resides in a single building
     –   High in a grid that spans multiple geographically-separated administrative
         domains, or multiple data centers
•   Dynamic: perhaps both data and computation
     –  Data may emerge at runtime
     – Mechanisms to handle data during application execution, e.g., data
        transfer, scheduling
     – Application components may be launched at runtime in response to
        data, application, or environment dynamics
•   All may vary in different stages of an application
    •    Most applications have data collection, storage, analysis stages

                                                                            www.ci.anl.gov
7        3DPAS review for D3Science – d.katz@ieee.org
                                                                            www.ci.uchicago.edu
Value/Impact
•   All data-intensive applications do not have
    dynamic and distributed elements today
•   However, as scales increase, applications will
    have to be distributed and dynamic
    –   And these issues will be increasingly correlated
•   Analyzing current D3 applications should impact
    many future applications
    –   And lead to lessons about and requirements on
        future infrastructures and programming systems


                                                       www.ci.anl.gov
8       3DPAS review for D3Science – d.katz@ieee.org
                                                       www.ci.uchicago.edu
Applications Process
•   Asked questions about possible applications
    1. What is the purpose of the application?
    2. How is the application used to do this?
    3. What infrastructure is used? (including compute, data, network,
       instruments, etc.)
    4. What dynamic data is used in the application?
        a. What are the types of data,
        b. What is the size of the data set(s)?
    5. How does the application get the data?
    6. What are the time (or quality) constraints on the application?
    7. How much diverse data integration is involved?
    8. How diverse is the data?
    9. Please feel free to also talk about the current state of the
      application, if it exists today, and any specific gaps that you know
      need to be overcome

                                                                      www.ci.anl.gov
9      3DPAS review for D3Science – d.katz@ieee.org
                                                                      www.ci.uchicago.edu
Applications Process (2)
• In workshops, discussed current applications, and
  considered if news application “felt” the same as a
  previous application in terms of the answers to the
  questions
• Came to 14 applications
• Noted they fall into different categories
     –   Traditional applications, single program that is run by a user
     –   Archetypical applications: a group of applications,
         independent programs, written by different authors, may be
         competing, usually not intended to run together
     –   Infrastructural applications: set of applications (or
         archetypical applications) that need to be run in series
         (perhaps in different phases), may be run by different groups
         that do not frequently interact


                                                              www.ci.anl.gov
10       3DPAS review for D3Science – d.katz@ieee.org
                                                              www.ci.uchicago.edu
Applications
 Application                 Area                      Type           Lead Person/Site
 Metagenomics                Biosciences               Archetypical   Amsterdam Medical Centre,
                                                                      Netherlands
 ATLAS experiment            Particle                  Infrastructural CERN &Daresbury Lab +
 (WLCG)                      Physics                                   RAL, UK
 Large Synoptic Sky Astrophysics                       Infrastructural University of Edinburgh –
 Survey (LSST)                                                         Institute of Astronomy, UK
 Virtual Astronomy           Astrophysics              Archetypical   University of Edinburgh –
                                                                      Institute of Astronomy, UK
 Cosmic Microwave            Astrophysics              Traditional    Lawrence Berkeley National
 Background                                                           Laboratory, USA
 Marine (Sea                 Biosciences               Infrastructural University of St. Andrews,
 Mammal) Sensors                                                       UK
 Climate                     Earth Science             Infrastructural National Center for
                                                                       Atmospheric Research, USA

                                                                                        www.ci.anl.gov
11      3DPAS review for D3Science – d.katz@ieee.org
                                                                                        www.ci.uchicago.edu
Applications (2)
Application                            Area            Type           Lead Person/Site
Interactive Exploration of             Earth           Archetypical   University of Reading, UK
Environmental Data                     Science
Power Grids                            Energy          Infrastructural University of Southern
                                       Informatics                     California, USA
Fusion (International                  Chemistry/      Traditional    Oak Ridge National
Thermonuclear                          Physics                        Laboratory & Rutgers
Experimental Reactor)                                                 University, USA
Industrial Incident                    Emergency Infrastructural THALES, The Netherlands
Notification and Response              Response
MODIS Data Processing                  Earth           Traditional    Lawrence Berkeley
                                       Science                        National Laboratory, USA
Floating Sensors                       Earth           Infrastructural Lawrence Berkeley
                                       Science                         National Laboratory, USA
Distributed Network                    Security        Infrastructural University of Minnesota,
Intrusion Detection                                                    USA
                                                                                     www.ci.anl.gov
12      3DPAS review for D3Science – d.katz@ieee.org
                                                                                     www.ci.uchicago.edu
Climate (infrastructural)
  •    CMIP/ICPP process runs and analyses climate
       models in 3 stages
  •    Data are generated by distributed HPC centers
  •    Data are stored by distributed ESGF gateways
       and data nodes
  •    Data are analyzed by distributed
       researchers, who search for particular
       data, gather them to a site, process them
  •    Resources for analysis can be dynamic, as can
       data stored in data nodes
Thanks: Don Middleton
                                                            www.ci.anl.gov
 13          3DPAS review for D3Science – d.katz@ieee.org
                                                            www.ci.uchicago.edu
Fusion (traditional)
   • ITER needs a variety of codes
   • Codes run on distributed set of leadership-class
     facilities, using advance reservations to co-schedule
     the simulations
   • Codes reads and writes data files, using ADIOS and
     HDF5
   • Files output by each code are transformed and
     transferred to be used as inputs by other
     codes, linking the codes into a single coupled
     simulation
   • Data generated are too large to be written to disk
     for post-run analysis; in-situ analysis and
     visualization tools are being developed
Thanks: Scott Klasky
                                                             www.ci.anl.gov
  14          3DPAS review for D3Science – d.katz@ieee.org
                                                             www.ci.uchicago.edu
Metagenomics (archetypical)
•    Analysis of genome sequence data being
     produced by next gen devices
•    Sequencers are producing data at a rate
     increasing faster than computing capability
•    Sequencers are distributed; data produced
     cannot all be co-located
•    Multiple analyses (using different software) by
     multiple users need to make best use of
     available computing resources, understanding
     location and access issueswrt datasets
                                                      www.ci.anl.gov
15     3DPAS review for D3Science – d.katz@ieee.org
                                                      www.ci.uchicago.edu
CMB (traditional)
  •    Cosmic Microwave Background (CMB) performs data simulation and analysis to
       understand the Universe 400,000 years after the Big Bang
         – Detectors take O(1012 - 1015) time-ordered sequences
         – Observations reduced to map of O(106 - 108) sky pixels
         – Pixels reduced to O(103 - 104) angular power spectrum coefficients
         – Coefficient reduced to O(10) cosmological parameters
  •    Computationally most expensive step is from map to angular power spectrum
         – Exact solution is O(pixels3) – prohibitive
         – Approximate solution: sets of O(104) Monte Carlo realizations of observed sky to
           remove biases and quantify uncertainties, each of which involves simulating and
           mapping the time-ordered data
         – Map-making is applied to both real and simulated data, but O(104) more times to
           simulated data (uses on-the-fly simulation module – simulations performed when
           requested)
  •    Currently uses single HPC system, but would be faster with distributed systems
  •    Central system that builds map would launch data simulations on available
       remote resources; output data from the simulations would be asynchronously
       delivered back to that central system as files incorporated in map as they are
       produced
Thanks: Julian Borrill
                                                                                www.ci.anl.gov
  16           3DPAS review for D3Science – d.katz@ieee.org
                                                                                www.ci.uchicago.edu
Some Additional Applications
•    ATLAS/WLCG (Infrastructural)
      – Hierarchy of systems; data centrally stored, and locally cached (and copied to
        where they likely will be used), perhaps at various levels of the hierarchy
      – Processing is done by applications that are independent of each other
      – Processing of one data file is independent of processing of another file, but
        groups of processing results are collected to obtain statistical outputs about the
        data
•    LSST (Infrastructural)
      –   Data taken by a telescope
      –   Quick analysis is done at the telescope site for interesting (urgent) events (which
          may involve comparing new data with previous data)
      –   System can get more data from other observatories if needed; request other
          observatories to take more data; or call a human
      –   Data then transferred to an archive site, may be at observatory, where data are
          analyzed, reduced, and classified, some of which may be farmed out to grid
          resources
      –   Detailed analysis of new data vs. archived data is performed
      –   Reanalysis of all data is done periodically
      –   Data are stored in files and databases

                                                                                  www.ci.anl.gov
17        3DPAS review for D3Science – d.katz@ieee.org
                                                                                  www.ci.uchicago.edu
Some More Additional Applications
•    Virtual Astronomy (Archetypical)
      –   Services are orchestrated through a pipeline, including a data retrieval
          service that is used to share data across VO sites
      –   Data are moved through the pipeline, and intermediate and final
          products can be stored in Grid storage service
•    Marine (Sea Mammal) Sensors (Infrastructural)
      –   Data are brought to a central site when sensors periodically transmit
      –   Stored data are analyzed using statistical techniques, then visualized with
          tools such as Google Earth
•    Power Grids (Infrastructural)
      –   Diverse streams arrive at a central utility private cloud at dynamic rates
          controlled by the application
      –   Real-time event detection pipeline can trigger load curtailment
          operations
      –   Data mining is performed on current and historical data for forecasting
      –   Partial application execution on remote micro-grid sites is possible.


                                                                            www.ci.anl.gov
18        3DPAS review for D3Science – d.katz@ieee.org
                                                                            www.ci.uchicago.edu
Even More Additional Applications
•    Industrial Incident Notification and Response (Infrastructural)
     – Data are streamed from diverse sources, and sometimes manually
       entered into the system
     – Disaster detection causes additional information sources to be
       requested from that region and applications to be composed based
       on available data
     – Some applications run on remote sites for data privacy
     – Escalation can cause more humans in the loop and additional
       operations
•    MODIS Data Processing (Traditional)
     – Data brought into system from various FTP servers
     – Pipeline of initial standardized processing steps on data is done on
       clouds or HPC resources
     – Scientists can then submit executables that do further custom
       processing on subsets of the data, which likely include some
       summarization processing (building graphs)

                                                                  www.ci.anl.gov
19       3DPAS review for D3Science – d.katz@ieee.org
                                                                  www.ci.uchicago.edu
3DPAS Vectors
•    DPA vectors
     – Execution Unit
     – Communication
     – Coordination
     – Execution Environment
•    What changes for D3 applications?
     –   DPA already assumed distributed; data-intensive is somewhat
         orthogonal to vectors, last D is dynamic
•    So, what can be dynamic?
     – Data (in value or type)
     – Application (for archetypical and infrastructural applications)
     – Execution Environment
•    And how can the application respond?
     –   All 3 vectors can change (under user control, or autonomically)

                                                                   www.ci.anl.gov
20       3DPAS review for D3Science – d.katz@ieee.org
                                                                   www.ci.uchicago.edu
Infrastructure
•    Software infrastructure to support D3 applications and users exists at three
     levels:
      – System-level software capabilities (e.g., notifications, file system consistency)
      – Middleware (e.g., databases, metadata servers)
      – Programming systems, services and tools (e.g., data-centric workflows)
•    Strong connection between software infrastructure and execution units
      –   Infrastructure supports the communication between and coordination of
          execution units, e.g., to allow co-scheduling
•    What changes for D3 applications?
      –   Boundary between infrastructure and application often blurred
           o   e.g., a catalog may be provided by underlying infrastructure or implemented in application
      –   Sometimes infrastructure requires knowledge of data models
           o   e.g., to support semantic information integration, triggers, optimized data transport
•    General need for infrastructure components to support
      – Data management: sources, storage, access, movement, discovery, notification,
        provenance
      – Data analysis: conversion, enrichment, analysis, workflow, calibration, integration



                                                                                               www.ci.anl.gov
21        3DPAS review for D3Science – d.katz@ieee.org
                                                                                               www.ci.uchicago.edu
Programming Systems
• Pipelines/workflows a key concept
• Loosely, 3 stages for many applications – data collection, data
  storage, data analysis
     –   But the order varies: Sometimes analysis is done during collection to
         reduce storage
• Some stages are built from legacy (heritage) applications
• Some applications don’t include all stages (some stages happen
  elsewhere; data is just “there”)
• Stream processing also is important to some applications (or some
  stages) – the complete data can never be stored, and can only be
  accessed once in time
• Issues that programming systems should address
     –   Programming provisioning of resources
     –   Use of existing services, or building of new services
     –   How to adapt to changes? Autonomics?
     –   Recording provenance

                                                                        www.ci.anl.gov
22       3DPAS review for D3Science – d.katz@ieee.org
                                                                        www.ci.uchicago.edu
Programming Systems (2)




•    Possible change: replace ad hoc and scripted approaches by more formal
     workflow tools
      –   Potential benefits: efficiency, productivity, reproducibility, increased software
          reuse, ability to add provenance tracking
      –   Potential issues: can application-specific knowledge by used by generic tools?

                                                                                   www.ci.anl.gov
23        3DPAS review for D3Science – d.katz@ieee.org
                                                                                   www.ci.uchicago.edu
Conclusions
•    D3 applications exist, the number is increasing
•    There are some similarities across some applications
     – Stages, streaming, dynamism and adaptivity
     – Probably means there are generic abstractions that could be used
•    Programming systems are somewhat ad hoc
•    We want generic tools that
     –   Allow applications to adapt to dynamism in various elements
          o   E.g., developers can find and use available systems at
              runtime, applications can run in the best location with respect to data
              sources
     –   Provide good performance
•    Further research needed
     – How do we abstract the set of distributed systems to allow this?
     – What middleware and tools are needed?



                                                                             www.ci.anl.gov
24       3DPAS review for D3Science – d.katz@ieee.org
                                                                             www.ci.uchicago.edu

Contenu connexe

Tendances

Massive-Scale Analytics Applied to Real-World Problems
Massive-Scale Analytics Applied to Real-World ProblemsMassive-Scale Analytics Applied to Real-World Problems
Massive-Scale Analytics Applied to Real-World Problemsinside-BigData.com
 
IDs书友会 - 主题1 - Swinburne Next Generation Research
IDs书友会 - 主题1 - Swinburne Next Generation Research IDs书友会 - 主题1 - Swinburne Next Generation Research
IDs书友会 - 主题1 - Swinburne Next Generation Research IDs Club 澳洲互联网俱乐部
 
Data Science and What It Means to Library and Information Science
Data Science and What It Means to Library and Information ScienceData Science and What It Means to Library and Information Science
Data Science and What It Means to Library and Information ScienceJian Qin
 
NSF SI2 program discussion at 2014 SI2 PI meeting
NSF SI2 program discussion at 2014 SI2 PI meetingNSF SI2 program discussion at 2014 SI2 PI meeting
NSF SI2 program discussion at 2014 SI2 PI meetingDaniel S. Katz
 
Functional and Architectural Requirements for Metadata: Supporting Discovery...
Functional and Architectural Requirements for Metadata: Supporting Discovery...Functional and Architectural Requirements for Metadata: Supporting Discovery...
Functional and Architectural Requirements for Metadata: Supporting Discovery...Jian Qin
 
Big Data: tools and techniques for working with large data sets
Big Data: tools and techniques for working with large data setsBig Data: tools and techniques for working with large data sets
Big Data: tools and techniques for working with large data setsBoston Consulting Group
 
Mark_Yashar_Resume_2017
Mark_Yashar_Resume_2017Mark_Yashar_Resume_2017
Mark_Yashar_Resume_2017Mark Yashar
 
Heidorn The Path to Enlightened Solutions for Biodiversity's Dark DataViBRANT...
Heidorn The Path to Enlightened Solutions for Biodiversity's Dark DataViBRANT...Heidorn The Path to Enlightened Solutions for Biodiversity's Dark DataViBRANT...
Heidorn The Path to Enlightened Solutions for Biodiversity's Dark DataViBRANT...Bryan Heidorn
 
The Path to Enlightened Solutions for Biodiversity's Dark Data
The Path to Enlightened Solutions for Biodiversity's Dark DataThe Path to Enlightened Solutions for Biodiversity's Dark Data
The Path to Enlightened Solutions for Biodiversity's Dark Datavbrant
 
OntoSoft: A Distributed Semantic Registry for Scientific Software
OntoSoft: A Distributed Semantic Registry for Scientific SoftwareOntoSoft: A Distributed Semantic Registry for Scientific Software
OntoSoft: A Distributed Semantic Registry for Scientific Softwaredgarijo
 
From Data to Knowledge with Workflows & Provenance
From Data to Knowledge with Workflows & ProvenanceFrom Data to Knowledge with Workflows & Provenance
From Data to Knowledge with Workflows & ProvenanceBertram Ludäscher
 
Software and Education at NSF/ACI
Software and Education at NSF/ACISoftware and Education at NSF/ACI
Software and Education at NSF/ACIDaniel S. Katz
 
Cse new graduate_students2011
Cse new graduate_students2011Cse new graduate_students2011
Cse new graduate_students2011Masoud Nikravesh
 
Building the Pacific Research Platform: Supernetworks for Big Data Science
Building the Pacific Research Platform: Supernetworks for Big Data ScienceBuilding the Pacific Research Platform: Supernetworks for Big Data Science
Building the Pacific Research Platform: Supernetworks for Big Data ScienceLarry Smarr
 
Preparing eScience librarians -- RDAP 2012
Preparing eScience librarians -- RDAP 2012 Preparing eScience librarians -- RDAP 2012
Preparing eScience librarians -- RDAP 2012 Jian Qin
 

Tendances (20)

Massive-Scale Analytics Applied to Real-World Problems
Massive-Scale Analytics Applied to Real-World ProblemsMassive-Scale Analytics Applied to Real-World Problems
Massive-Scale Analytics Applied to Real-World Problems
 
IDs书友会 - 主题1 - Swinburne Next Generation Research
IDs书友会 - 主题1 - Swinburne Next Generation Research IDs书友会 - 主题1 - Swinburne Next Generation Research
IDs书友会 - 主题1 - Swinburne Next Generation Research
 
2019 Triangle Machine Learning Day - Biomedical Image Understanding and EHRs ...
2019 Triangle Machine Learning Day - Biomedical Image Understanding and EHRs ...2019 Triangle Machine Learning Day - Biomedical Image Understanding and EHRs ...
2019 Triangle Machine Learning Day - Biomedical Image Understanding and EHRs ...
 
Data Science and What It Means to Library and Information Science
Data Science and What It Means to Library and Information ScienceData Science and What It Means to Library and Information Science
Data Science and What It Means to Library and Information Science
 
NSF SI2 program discussion at 2014 SI2 PI meeting
NSF SI2 program discussion at 2014 SI2 PI meetingNSF SI2 program discussion at 2014 SI2 PI meeting
NSF SI2 program discussion at 2014 SI2 PI meeting
 
Functional and Architectural Requirements for Metadata: Supporting Discovery...
Functional and Architectural Requirements for Metadata: Supporting Discovery...Functional and Architectural Requirements for Metadata: Supporting Discovery...
Functional and Architectural Requirements for Metadata: Supporting Discovery...
 
Big Data: tools and techniques for working with large data sets
Big Data: tools and techniques for working with large data setsBig Data: tools and techniques for working with large data sets
Big Data: tools and techniques for working with large data sets
 
CV_myashar_2017
CV_myashar_2017CV_myashar_2017
CV_myashar_2017
 
Sgci nsf-2-22-17
Sgci nsf-2-22-17Sgci nsf-2-22-17
Sgci nsf-2-22-17
 
Mark_Yashar_Resume_2017
Mark_Yashar_Resume_2017Mark_Yashar_Resume_2017
Mark_Yashar_Resume_2017
 
CV
CVCV
CV
 
Keller geo edu
Keller geo eduKeller geo edu
Keller geo edu
 
Heidorn The Path to Enlightened Solutions for Biodiversity's Dark DataViBRANT...
Heidorn The Path to Enlightened Solutions for Biodiversity's Dark DataViBRANT...Heidorn The Path to Enlightened Solutions for Biodiversity's Dark DataViBRANT...
Heidorn The Path to Enlightened Solutions for Biodiversity's Dark DataViBRANT...
 
The Path to Enlightened Solutions for Biodiversity's Dark Data
The Path to Enlightened Solutions for Biodiversity's Dark DataThe Path to Enlightened Solutions for Biodiversity's Dark Data
The Path to Enlightened Solutions for Biodiversity's Dark Data
 
OntoSoft: A Distributed Semantic Registry for Scientific Software
OntoSoft: A Distributed Semantic Registry for Scientific SoftwareOntoSoft: A Distributed Semantic Registry for Scientific Software
OntoSoft: A Distributed Semantic Registry for Scientific Software
 
From Data to Knowledge with Workflows & Provenance
From Data to Knowledge with Workflows & ProvenanceFrom Data to Knowledge with Workflows & Provenance
From Data to Knowledge with Workflows & Provenance
 
Software and Education at NSF/ACI
Software and Education at NSF/ACISoftware and Education at NSF/ACI
Software and Education at NSF/ACI
 
Cse new graduate_students2011
Cse new graduate_students2011Cse new graduate_students2011
Cse new graduate_students2011
 
Building the Pacific Research Platform: Supernetworks for Big Data Science
Building the Pacific Research Platform: Supernetworks for Big Data ScienceBuilding the Pacific Research Platform: Supernetworks for Big Data Science
Building the Pacific Research Platform: Supernetworks for Big Data Science
 
Preparing eScience librarians -- RDAP 2012
Preparing eScience librarians -- RDAP 2012 Preparing eScience librarians -- RDAP 2012
Preparing eScience librarians -- RDAP 2012
 

En vedette

What do we need beyond a DOI?
What do we need beyond a DOI?What do we need beyond a DOI?
What do we need beyond a DOI?Daniel S. Katz
 
20160607 citation4software panel
20160607 citation4software panel20160607 citation4software panel
20160607 citation4software panelDaniel S. Katz
 
20160607 citation4software opening
20160607 citation4software opening20160607 citation4software opening
20160607 citation4software openingDaniel S. Katz
 
Massachusetts Tidelands Law
Massachusetts Tidelands LawMassachusetts Tidelands Law
Massachusetts Tidelands Lawjoecal
 
NSF SI2 program discussion at 2013 SI2 PI meeting
NSF SI2 program discussion at 2013 SI2 PI meetingNSF SI2 program discussion at 2013 SI2 PI meeting
NSF SI2 program discussion at 2013 SI2 PI meetingDaniel S. Katz
 
Advancing Science through Coordinated Cyberinfrastructure
Advancing Science through Coordinated CyberinfrastructureAdvancing Science through Coordinated Cyberinfrastructure
Advancing Science through Coordinated CyberinfrastructureDaniel S. Katz
 
Summary of WSSSPE and its working groups
Summary of WSSSPE and its working groupsSummary of WSSSPE and its working groups
Summary of WSSSPE and its working groupsDaniel S. Katz
 
Swift Parallel Scripting for High-Performance Workflow
Swift Parallel Scripting for High-Performance WorkflowSwift Parallel Scripting for High-Performance Workflow
Swift Parallel Scripting for High-Performance WorkflowDaniel S. Katz
 
Perspectives on Undergraduate Education in Parallel and Distributed Computing
Perspectives on Undergraduate Education in Parallel and Distributed ComputingPerspectives on Undergraduate Education in Parallel and Distributed Computing
Perspectives on Undergraduate Education in Parallel and Distributed ComputingDaniel S. Katz
 
US University Research Funding, Peer Reviews, and Metrics
US University Research Funding, Peer Reviews, and MetricsUS University Research Funding, Peer Reviews, and Metrics
US University Research Funding, Peer Reviews, and MetricsDaniel S. Katz
 
Opinions on the State of Production Distributed Infrastructure (PDI)
Opinions on the State of Production Distributed Infrastructure (PDI)Opinions on the State of Production Distributed Infrastructure (PDI)
Opinions on the State of Production Distributed Infrastructure (PDI)Daniel S. Katz
 
Using Application Skeletons to Improve eScience Infrastructure
Using Application Skeletons to Improve eScience InfrastructureUsing Application Skeletons to Improve eScience Infrastructure
Using Application Skeletons to Improve eScience InfrastructureDaniel S. Katz
 
Open Source and Science at the National Science Foundation (NSF)
Open Source and Science at the National Science Foundation (NSF)Open Source and Science at the National Science Foundation (NSF)
Open Source and Science at the National Science Foundation (NSF)Daniel S. Katz
 
Working towards Sustainable Software for Science (an NSF and community view)
Working towards Sustainable Software for Science (an NSF and community view)Working towards Sustainable Software for Science (an NSF and community view)
Working towards Sustainable Software for Science (an NSF and community view)Daniel S. Katz
 
Funding Software in Academia
Funding Software in AcademiaFunding Software in Academia
Funding Software in AcademiaDaniel S. Katz
 
Software: impact, metrics, and citation
Software: impact, metrics, and citationSoftware: impact, metrics, and citation
Software: impact, metrics, and citationDaniel S. Katz
 

En vedette (19)

What do we need beyond a DOI?
What do we need beyond a DOI?What do we need beyond a DOI?
What do we need beyond a DOI?
 
20160607 citation4software panel
20160607 citation4software panel20160607 citation4software panel
20160607 citation4software panel
 
Transitive credit
Transitive creditTransitive credit
Transitive credit
 
20160607 citation4software opening
20160607 citation4software opening20160607 citation4software opening
20160607 citation4software opening
 
Massachusetts Tidelands Law
Massachusetts Tidelands LawMassachusetts Tidelands Law
Massachusetts Tidelands Law
 
NSF SI2 program discussion at 2013 SI2 PI meeting
NSF SI2 program discussion at 2013 SI2 PI meetingNSF SI2 program discussion at 2013 SI2 PI meeting
NSF SI2 program discussion at 2013 SI2 PI meeting
 
Advancing Science through Coordinated Cyberinfrastructure
Advancing Science through Coordinated CyberinfrastructureAdvancing Science through Coordinated Cyberinfrastructure
Advancing Science through Coordinated Cyberinfrastructure
 
Summary of WSSSPE and its working groups
Summary of WSSSPE and its working groupsSummary of WSSSPE and its working groups
Summary of WSSSPE and its working groups
 
Swift Parallel Scripting for High-Performance Workflow
Swift Parallel Scripting for High-Performance WorkflowSwift Parallel Scripting for High-Performance Workflow
Swift Parallel Scripting for High-Performance Workflow
 
Perspectives on Undergraduate Education in Parallel and Distributed Computing
Perspectives on Undergraduate Education in Parallel and Distributed ComputingPerspectives on Undergraduate Education in Parallel and Distributed Computing
Perspectives on Undergraduate Education in Parallel and Distributed Computing
 
US University Research Funding, Peer Reviews, and Metrics
US University Research Funding, Peer Reviews, and MetricsUS University Research Funding, Peer Reviews, and Metrics
US University Research Funding, Peer Reviews, and Metrics
 
Opinions on the State of Production Distributed Infrastructure (PDI)
Opinions on the State of Production Distributed Infrastructure (PDI)Opinions on the State of Production Distributed Infrastructure (PDI)
Opinions on the State of Production Distributed Infrastructure (PDI)
 
Using Application Skeletons to Improve eScience Infrastructure
Using Application Skeletons to Improve eScience InfrastructureUsing Application Skeletons to Improve eScience Infrastructure
Using Application Skeletons to Improve eScience Infrastructure
 
Open Source and Science at the National Science Foundation (NSF)
Open Source and Science at the National Science Foundation (NSF)Open Source and Science at the National Science Foundation (NSF)
Open Source and Science at the National Science Foundation (NSF)
 
Working towards Sustainable Software for Science (an NSF and community view)
Working towards Sustainable Software for Science (an NSF and community view)Working towards Sustainable Software for Science (an NSF and community view)
Working towards Sustainable Software for Science (an NSF and community view)
 
Ácidos y bases
Ácidos y basesÁcidos y bases
Ácidos y bases
 
Funding Software in Academia
Funding Software in AcademiaFunding Software in Academia
Funding Software in Academia
 
Software: impact, metrics, and citation
Software: impact, metrics, and citationSoftware: impact, metrics, and citation
Software: impact, metrics, and citation
 
Equilibrio quimico
Equilibrio quimicoEquilibrio quimico
Equilibrio quimico
 

Similaire à Summary of 3DPAS

VIVO Conference 2013 Panel Slides
VIVO Conference 2013 Panel SlidesVIVO Conference 2013 Panel Slides
VIVO Conference 2013 Panel SlidesPatrick West
 
Learning Systems for Science
Learning Systems for ScienceLearning Systems for Science
Learning Systems for ScienceIan Foster
 
The Education of Computational Scientists
The Education of Computational ScientistsThe Education of Computational Scientists
The Education of Computational Scientistsinside-BigData.com
 
Scientific Software Challenges and Community Responses
Scientific Software Challenges and Community ResponsesScientific Software Challenges and Community Responses
Scientific Software Challenges and Community ResponsesDaniel S. Katz
 
NIST Big Data Public Working Group NBD-PWG
NIST Big Data Public Working Group NBD-PWGNIST Big Data Public Working Group NBD-PWG
NIST Big Data Public Working Group NBD-PWGGeoffrey Fox
 
big_data_casestudies_2.ppt
big_data_casestudies_2.pptbig_data_casestudies_2.ppt
big_data_casestudies_2.pptvishal choudhary
 
High Performance Data Analytics and a Java Grande Run Time
High Performance Data Analytics and a Java Grande Run TimeHigh Performance Data Analytics and a Java Grande Run Time
High Performance Data Analytics and a Java Grande Run TimeGeoffrey Fox
 
SGCI Science Gateways Landscape in North America
SGCI Science Gateways Landscape in North AmericaSGCI Science Gateways Landscape in North America
SGCI Science Gateways Landscape in North AmericaSandra Gesing
 
EarthCube Monthly Community Webinar- Nov. 22, 2013
EarthCube Monthly Community Webinar- Nov. 22, 2013EarthCube Monthly Community Webinar- Nov. 22, 2013
EarthCube Monthly Community Webinar- Nov. 22, 2013EarthCube
 
Why manage research data?
Why manage research data?Why manage research data?
Why manage research data?Graham Pryor
 
g-Social - Enhancing e-Science Tools with Social Networking Functionality
g-Social - Enhancing e-Science Tools with Social Networking Functionalityg-Social - Enhancing e-Science Tools with Social Networking Functionality
g-Social - Enhancing e-Science Tools with Social Networking FunctionalityNicholas Loulloudes
 
E research attachment survey
E research attachment surveyE research attachment survey
E research attachment surveyRiri Kusumarani
 
RDAP14: An analysis and characterization of DMPs in NSF proposals from the Un...
RDAP14: An analysis and characterization of DMPs in NSF proposals from the Un...RDAP14: An analysis and characterization of DMPs in NSF proposals from the Un...
RDAP14: An analysis and characterization of DMPs in NSF proposals from the Un...ASIS&T
 
Adoption of Cloud Computing in Scientific Research
Adoption of Cloud Computing in Scientific ResearchAdoption of Cloud Computing in Scientific Research
Adoption of Cloud Computing in Scientific ResearchYehia El-khatib
 
Big Data HPC Convergence and a bunch of other things
Big Data HPC Convergence and a bunch of other thingsBig Data HPC Convergence and a bunch of other things
Big Data HPC Convergence and a bunch of other thingsGeoffrey Fox
 
Lec 1 integrating data science and data analytics in various research thrust
Lec 1 integrating data science and data analytics in various research thrustLec 1 integrating data science and data analytics in various research thrust
Lec 1 integrating data science and data analytics in various research thrustMenchita Falcutila Dumlao
 

Similaire à Summary of 3DPAS (20)

Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
 
VIVO Conference 2013 Panel Slides
VIVO Conference 2013 Panel SlidesVIVO Conference 2013 Panel Slides
VIVO Conference 2013 Panel Slides
 
Learning Systems for Science
Learning Systems for ScienceLearning Systems for Science
Learning Systems for Science
 
The Education of Computational Scientists
The Education of Computational ScientistsThe Education of Computational Scientists
The Education of Computational Scientists
 
Scientific Software Challenges and Community Responses
Scientific Software Challenges and Community ResponsesScientific Software Challenges and Community Responses
Scientific Software Challenges and Community Responses
 
NIST Big Data Public Working Group NBD-PWG
NIST Big Data Public Working Group NBD-PWGNIST Big Data Public Working Group NBD-PWG
NIST Big Data Public Working Group NBD-PWG
 
big_data_casestudies_2.ppt
big_data_casestudies_2.pptbig_data_casestudies_2.ppt
big_data_casestudies_2.ppt
 
High Performance Data Analytics and a Java Grande Run Time
High Performance Data Analytics and a Java Grande Run TimeHigh Performance Data Analytics and a Java Grande Run Time
High Performance Data Analytics and a Java Grande Run Time
 
SGCI Science Gateways Landscape in North America
SGCI Science Gateways Landscape in North AmericaSGCI Science Gateways Landscape in North America
SGCI Science Gateways Landscape in North America
 
EarthCube Monthly Community Webinar- Nov. 22, 2013
EarthCube Monthly Community Webinar- Nov. 22, 2013EarthCube Monthly Community Webinar- Nov. 22, 2013
EarthCube Monthly Community Webinar- Nov. 22, 2013
 
Why manage research data?
Why manage research data?Why manage research data?
Why manage research data?
 
User engagement in research data curation
User engagement in research data curationUser engagement in research data curation
User engagement in research data curation
 
g-Social - Enhancing e-Science Tools with Social Networking Functionality
g-Social - Enhancing e-Science Tools with Social Networking Functionalityg-Social - Enhancing e-Science Tools with Social Networking Functionality
g-Social - Enhancing e-Science Tools with Social Networking Functionality
 
E research attachment survey
E research attachment surveyE research attachment survey
E research attachment survey
 
RDAP14: An analysis and characterization of DMPs in NSF proposals from the Un...
RDAP14: An analysis and characterization of DMPs in NSF proposals from the Un...RDAP14: An analysis and characterization of DMPs in NSF proposals from the Un...
RDAP14: An analysis and characterization of DMPs in NSF proposals from the Un...
 
Big Data
Big Data Big Data
Big Data
 
Shifting the Burden from the User to the Data Provider
Shifting the Burden from the User to the Data ProviderShifting the Burden from the User to the Data Provider
Shifting the Burden from the User to the Data Provider
 
Adoption of Cloud Computing in Scientific Research
Adoption of Cloud Computing in Scientific ResearchAdoption of Cloud Computing in Scientific Research
Adoption of Cloud Computing in Scientific Research
 
Big Data HPC Convergence and a bunch of other things
Big Data HPC Convergence and a bunch of other thingsBig Data HPC Convergence and a bunch of other things
Big Data HPC Convergence and a bunch of other things
 
Lec 1 integrating data science and data analytics in various research thrust
Lec 1 integrating data science and data analytics in various research thrustLec 1 integrating data science and data analytics in various research thrust
Lec 1 integrating data science and data analytics in various research thrust
 

Plus de Daniel S. Katz

Research software susainability
Research software susainabilityResearch software susainability
Research software susainabilityDaniel S. Katz
 
Software Professionals (RSEs) at NCSA
Software Professionals (RSEs) at NCSASoftware Professionals (RSEs) at NCSA
Software Professionals (RSEs) at NCSADaniel S. Katz
 
Parsl: Pervasive Parallel Programming in Python
Parsl: Pervasive Parallel Programming in PythonParsl: Pervasive Parallel Programming in Python
Parsl: Pervasive Parallel Programming in PythonDaniel S. Katz
 
Requiring Publicly-Funded Software, Algorithms, and Workflows to be Made Publ...
Requiring Publicly-Funded Software, Algorithms, and Workflows to be Made Publ...Requiring Publicly-Funded Software, Algorithms, and Workflows to be Made Publ...
Requiring Publicly-Funded Software, Algorithms, and Workflows to be Made Publ...Daniel S. Katz
 
What is eScience, and where does it go from here?
What is eScience, and where does it go from here?What is eScience, and where does it go from here?
What is eScience, and where does it go from here?Daniel S. Katz
 
Citation and Research Objects: Toward Active Research Objects
Citation and Research Objects: Toward Active Research ObjectsCitation and Research Objects: Toward Active Research Objects
Citation and Research Objects: Toward Active Research ObjectsDaniel S. Katz
 
FAIR is not Fair Enough, Particularly for Software Citation, Availability, or...
FAIR is not Fair Enough, Particularly for Software Citation, Availability, or...FAIR is not Fair Enough, Particularly for Software Citation, Availability, or...
FAIR is not Fair Enough, Particularly for Software Citation, Availability, or...Daniel S. Katz
 
Fundamentals of software sustainability
Fundamentals of software sustainabilityFundamentals of software sustainability
Fundamentals of software sustainabilityDaniel S. Katz
 
Software Citation in Theory and Practice
Software Citation in Theory and PracticeSoftware Citation in Theory and Practice
Software Citation in Theory and PracticeDaniel S. Katz
 
Research Software Sustainability: WSSSPE & URSSI
Research Software Sustainability: WSSSPE & URSSIResearch Software Sustainability: WSSSPE & URSSI
Research Software Sustainability: WSSSPE & URSSIDaniel S. Katz
 
Expressing and sharing workflows
Expressing and sharing workflowsExpressing and sharing workflows
Expressing and sharing workflowsDaniel S. Katz
 
Citation and reproducibility in software
Citation and reproducibility in softwareCitation and reproducibility in software
Citation and reproducibility in softwareDaniel S. Katz
 
Software Citation: Principles, Implementation, and Impact
Software Citation:  Principles, Implementation, and ImpactSoftware Citation:  Principles, Implementation, and Impact
Software Citation: Principles, Implementation, and ImpactDaniel S. Katz
 
Working towards Sustainable Software for Science: Practice and Experience (WS...
Working towards Sustainable Software for Science: Practice and Experience (WS...Working towards Sustainable Software for Science: Practice and Experience (WS...
Working towards Sustainable Software for Science: Practice and Experience (WS...Daniel S. Katz
 
Looking at Software Sustainability and Productivity Challenges from NSF
Looking at Software Sustainability and Productivity Challenges from NSFLooking at Software Sustainability and Productivity Challenges from NSF
Looking at Software Sustainability and Productivity Challenges from NSFDaniel S. Katz
 
Scientific research: What Anna Karenina teaches us about useful negative results
Scientific research: What Anna Karenina teaches us about useful negative resultsScientific research: What Anna Karenina teaches us about useful negative results
Scientific research: What Anna Karenina teaches us about useful negative resultsDaniel S. Katz
 
Panel: Our Scholarly Recognition System Doesn’t Still Work
Panel: Our Scholarly Recognition System Doesn’t Still WorkPanel: Our Scholarly Recognition System Doesn’t Still Work
Panel: Our Scholarly Recognition System Doesn’t Still WorkDaniel S. Katz
 
A Method to Select e-Infrastructure Components to Sustain
A Method to Select e-Infrastructure Components to SustainA Method to Select e-Infrastructure Components to Sustain
A Method to Select e-Infrastructure Components to SustainDaniel S. Katz
 

Plus de Daniel S. Katz (20)

Research software susainability
Research software susainabilityResearch software susainability
Research software susainability
 
Software Professionals (RSEs) at NCSA
Software Professionals (RSEs) at NCSASoftware Professionals (RSEs) at NCSA
Software Professionals (RSEs) at NCSA
 
Parsl: Pervasive Parallel Programming in Python
Parsl: Pervasive Parallel Programming in PythonParsl: Pervasive Parallel Programming in Python
Parsl: Pervasive Parallel Programming in Python
 
Requiring Publicly-Funded Software, Algorithms, and Workflows to be Made Publ...
Requiring Publicly-Funded Software, Algorithms, and Workflows to be Made Publ...Requiring Publicly-Funded Software, Algorithms, and Workflows to be Made Publ...
Requiring Publicly-Funded Software, Algorithms, and Workflows to be Made Publ...
 
What is eScience, and where does it go from here?
What is eScience, and where does it go from here?What is eScience, and where does it go from here?
What is eScience, and where does it go from here?
 
Citation and Research Objects: Toward Active Research Objects
Citation and Research Objects: Toward Active Research ObjectsCitation and Research Objects: Toward Active Research Objects
Citation and Research Objects: Toward Active Research Objects
 
FAIR is not Fair Enough, Particularly for Software Citation, Availability, or...
FAIR is not Fair Enough, Particularly for Software Citation, Availability, or...FAIR is not Fair Enough, Particularly for Software Citation, Availability, or...
FAIR is not Fair Enough, Particularly for Software Citation, Availability, or...
 
Fundamentals of software sustainability
Fundamentals of software sustainabilityFundamentals of software sustainability
Fundamentals of software sustainability
 
Software Citation in Theory and Practice
Software Citation in Theory and PracticeSoftware Citation in Theory and Practice
Software Citation in Theory and Practice
 
URSSI
URSSIURSSI
URSSI
 
Research Software Sustainability: WSSSPE & URSSI
Research Software Sustainability: WSSSPE & URSSIResearch Software Sustainability: WSSSPE & URSSI
Research Software Sustainability: WSSSPE & URSSI
 
Software citation
Software citationSoftware citation
Software citation
 
Expressing and sharing workflows
Expressing and sharing workflowsExpressing and sharing workflows
Expressing and sharing workflows
 
Citation and reproducibility in software
Citation and reproducibility in softwareCitation and reproducibility in software
Citation and reproducibility in software
 
Software Citation: Principles, Implementation, and Impact
Software Citation:  Principles, Implementation, and ImpactSoftware Citation:  Principles, Implementation, and Impact
Software Citation: Principles, Implementation, and Impact
 
Working towards Sustainable Software for Science: Practice and Experience (WS...
Working towards Sustainable Software for Science: Practice and Experience (WS...Working towards Sustainable Software for Science: Practice and Experience (WS...
Working towards Sustainable Software for Science: Practice and Experience (WS...
 
Looking at Software Sustainability and Productivity Challenges from NSF
Looking at Software Sustainability and Productivity Challenges from NSFLooking at Software Sustainability and Productivity Challenges from NSF
Looking at Software Sustainability and Productivity Challenges from NSF
 
Scientific research: What Anna Karenina teaches us about useful negative results
Scientific research: What Anna Karenina teaches us about useful negative resultsScientific research: What Anna Karenina teaches us about useful negative results
Scientific research: What Anna Karenina teaches us about useful negative results
 
Panel: Our Scholarly Recognition System Doesn’t Still Work
Panel: Our Scholarly Recognition System Doesn’t Still WorkPanel: Our Scholarly Recognition System Doesn’t Still Work
Panel: Our Scholarly Recognition System Doesn’t Still Work
 
A Method to Select e-Infrastructure Components to Sustain
A Method to Select e-Infrastructure Components to SustainA Method to Select e-Infrastructure Components to Sustain
A Method to Select e-Infrastructure Components to Sustain
 

Dernier

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 

Dernier (20)

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 

Summary of 3DPAS

  • 1. Review of 3DPAS Theme Daniel S. Katz, University of Chicago & Argonne National Laboratory ShantenuJha, Rutgers University Neil Chue Hong, University of Edinburgh Simon Dobson, University of St. Andrews Andre Luckow, Louisiana State University Omer Rana, University of Cardiff YogeshSimmhan, University of Southern California www.ci.anl.gov www.ci.uchicago.edu
  • 2. Outline • e-SI • DPA theme • 3DPAS theme • Report in-progress – Application Scenarios – Understanding Distributed Dynamic Data – Vectors – Infrastructure – Programming Systems and Abstractions • Future Steps www.ci.anl.gov 2 3DPAS review for D3Science – d.katz@ieee.org www.ci.uchicago.edu
  • 3. e-Science Institute (e-SI) • A 10-year project (Aug 2001 – July 2011), located in Edinburgh • Aimed at, but not limited to, UK • http://www.esi.ac.uk/ • Tagline – time & space to think • Mission: to stimulate the creation of new insights in e-Science and computing science by bringing together international experts and enabling them to successfully address significant and diverse challenges • Research themes formed the core of eSI’s activity – Theme: connected programme of visitors, workshops and events – Conceived and driven by Theme Leader – Focusing on a specific issue in e-Science that crosses boundaries and raises new research questions – Goals: o Identify research issues o Rally a community of researchers o Map a path of future research that will make best progress towards new e-Science methods and capabilities. www.ci.anl.gov 3 3DPAS review for D3Science – d.katz@ieee.org www.ci.uchicago.edu
  • 4. Context – Data and Science • Data has always been important to science • Some use the concept of paradigms – First (thousand years ago) – empirical – describe natural phenomena – Second (few hundred years ago) – theoretical – use models and generalizations – Third (few decades ago) – computational – solve complex problem – Fourth (few years ago) – data exploration – gain knowledge directly from data from experiment, theory, simulation • Problem – we cannot keep declaring new paradigms at an exponentially increasing rate • But it’s true that there is an emerging science of “listening to data”, as defined by Jim Gray, Google, etc. www.ci.anl.gov 4 3DPAS review for D3Science – d.katz@ieee.org www.ci.uchicago.edu
  • 5. Distributed Programming Abstractions • DPA theme at eSI – http://wiki.esi.ac.uk/Distributed_Programming_Abstractions • Series of workshops • Led to book in progress: ShantenuJha, Daniel S. Katz, Manish Parashar, Omer Rana, and Jon Weissman, “Abstractions for Distributed Applications and Systems,” to be published by Wiley in 2012 • And multiple papers, including: S. Jha, D. S. Katz, M. Parashar, O. Rana, and J. Weissman, "Critical Perspectives on Large-Scale Distributed Applications and Production Grids," (Best Paper Award Winner), Proceedings of the 10th IEEE/ACM International Conference on Grid Computing (Grid 2009), 2009. • Idea – start with distributed science and engineering applications – analyze them (determine `vectors’); examine interaction with infrastructures and tools; find abstractions – Tech report on infrastructures (much of Chapter 3) available now: http://www.ci.uchicago.edu/research/papers/CI-TR-7-0811 – Vectors: Execution Unit, Coordination, Communication, Execution Environment • In the process, we realized that data intensive applications had some unique challenges and issues www.ci.anl.gov 5 3DPAS review for D3Science – d.katz@ieee.org www.ci.uchicago.edu
  • 6. Dynamic Distributed Data-intensive Programming Systems and Applications (3DPAS) • This led to 3DPAS theme at eSI – http://wiki.esi.ac.uk/3DPAS • Similar idea to DPA – Start with science and engineering applications – See if DPA vector suffice or if new vectors are needed – Examine what is different with respect to infrastructures and programming systems • Initially done through workshops at eSI • Continuing through weekly teleconferences • Driving towards a report/paper www.ci.anl.gov 6 3DPAS review for D3Science – d.katz@ieee.org www.ci.uchicago.edu
  • 7. D3 (data intensive, distributed, dynamic) • Data intensive: order of magnitude of large data and large computing – Exascale data and petascale computing – Petascale data and exascale computing – Exascale data and exascale computing. • Distributed: number, dispersion, and replication of distributed data or computation resources – Low in a cloud or cluster that resides in a single building – High in a grid that spans multiple geographically-separated administrative domains, or multiple data centers • Dynamic: perhaps both data and computation – Data may emerge at runtime – Mechanisms to handle data during application execution, e.g., data transfer, scheduling – Application components may be launched at runtime in response to data, application, or environment dynamics • All may vary in different stages of an application • Most applications have data collection, storage, analysis stages www.ci.anl.gov 7 3DPAS review for D3Science – d.katz@ieee.org www.ci.uchicago.edu
  • 8. Value/Impact • All data-intensive applications do not have dynamic and distributed elements today • However, as scales increase, applications will have to be distributed and dynamic – And these issues will be increasingly correlated • Analyzing current D3 applications should impact many future applications – And lead to lessons about and requirements on future infrastructures and programming systems www.ci.anl.gov 8 3DPAS review for D3Science – d.katz@ieee.org www.ci.uchicago.edu
  • 9. Applications Process • Asked questions about possible applications 1. What is the purpose of the application? 2. How is the application used to do this? 3. What infrastructure is used? (including compute, data, network, instruments, etc.) 4. What dynamic data is used in the application? a. What are the types of data, b. What is the size of the data set(s)? 5. How does the application get the data? 6. What are the time (or quality) constraints on the application? 7. How much diverse data integration is involved? 8. How diverse is the data? 9. Please feel free to also talk about the current state of the application, if it exists today, and any specific gaps that you know need to be overcome www.ci.anl.gov 9 3DPAS review for D3Science – d.katz@ieee.org www.ci.uchicago.edu
  • 10. Applications Process (2) • In workshops, discussed current applications, and considered if news application “felt” the same as a previous application in terms of the answers to the questions • Came to 14 applications • Noted they fall into different categories – Traditional applications, single program that is run by a user – Archetypical applications: a group of applications, independent programs, written by different authors, may be competing, usually not intended to run together – Infrastructural applications: set of applications (or archetypical applications) that need to be run in series (perhaps in different phases), may be run by different groups that do not frequently interact www.ci.anl.gov 10 3DPAS review for D3Science – d.katz@ieee.org www.ci.uchicago.edu
  • 11. Applications Application Area Type Lead Person/Site Metagenomics Biosciences Archetypical Amsterdam Medical Centre, Netherlands ATLAS experiment Particle Infrastructural CERN &Daresbury Lab + (WLCG) Physics RAL, UK Large Synoptic Sky Astrophysics Infrastructural University of Edinburgh – Survey (LSST) Institute of Astronomy, UK Virtual Astronomy Astrophysics Archetypical University of Edinburgh – Institute of Astronomy, UK Cosmic Microwave Astrophysics Traditional Lawrence Berkeley National Background Laboratory, USA Marine (Sea Biosciences Infrastructural University of St. Andrews, Mammal) Sensors UK Climate Earth Science Infrastructural National Center for Atmospheric Research, USA www.ci.anl.gov 11 3DPAS review for D3Science – d.katz@ieee.org www.ci.uchicago.edu
  • 12. Applications (2) Application Area Type Lead Person/Site Interactive Exploration of Earth Archetypical University of Reading, UK Environmental Data Science Power Grids Energy Infrastructural University of Southern Informatics California, USA Fusion (International Chemistry/ Traditional Oak Ridge National Thermonuclear Physics Laboratory & Rutgers Experimental Reactor) University, USA Industrial Incident Emergency Infrastructural THALES, The Netherlands Notification and Response Response MODIS Data Processing Earth Traditional Lawrence Berkeley Science National Laboratory, USA Floating Sensors Earth Infrastructural Lawrence Berkeley Science National Laboratory, USA Distributed Network Security Infrastructural University of Minnesota, Intrusion Detection USA www.ci.anl.gov 12 3DPAS review for D3Science – d.katz@ieee.org www.ci.uchicago.edu
  • 13. Climate (infrastructural) • CMIP/ICPP process runs and analyses climate models in 3 stages • Data are generated by distributed HPC centers • Data are stored by distributed ESGF gateways and data nodes • Data are analyzed by distributed researchers, who search for particular data, gather them to a site, process them • Resources for analysis can be dynamic, as can data stored in data nodes Thanks: Don Middleton www.ci.anl.gov 13 3DPAS review for D3Science – d.katz@ieee.org www.ci.uchicago.edu
  • 14. Fusion (traditional) • ITER needs a variety of codes • Codes run on distributed set of leadership-class facilities, using advance reservations to co-schedule the simulations • Codes reads and writes data files, using ADIOS and HDF5 • Files output by each code are transformed and transferred to be used as inputs by other codes, linking the codes into a single coupled simulation • Data generated are too large to be written to disk for post-run analysis; in-situ analysis and visualization tools are being developed Thanks: Scott Klasky www.ci.anl.gov 14 3DPAS review for D3Science – d.katz@ieee.org www.ci.uchicago.edu
  • 15. Metagenomics (archetypical) • Analysis of genome sequence data being produced by next gen devices • Sequencers are producing data at a rate increasing faster than computing capability • Sequencers are distributed; data produced cannot all be co-located • Multiple analyses (using different software) by multiple users need to make best use of available computing resources, understanding location and access issueswrt datasets www.ci.anl.gov 15 3DPAS review for D3Science – d.katz@ieee.org www.ci.uchicago.edu
  • 16. CMB (traditional) • Cosmic Microwave Background (CMB) performs data simulation and analysis to understand the Universe 400,000 years after the Big Bang – Detectors take O(1012 - 1015) time-ordered sequences – Observations reduced to map of O(106 - 108) sky pixels – Pixels reduced to O(103 - 104) angular power spectrum coefficients – Coefficient reduced to O(10) cosmological parameters • Computationally most expensive step is from map to angular power spectrum – Exact solution is O(pixels3) – prohibitive – Approximate solution: sets of O(104) Monte Carlo realizations of observed sky to remove biases and quantify uncertainties, each of which involves simulating and mapping the time-ordered data – Map-making is applied to both real and simulated data, but O(104) more times to simulated data (uses on-the-fly simulation module – simulations performed when requested) • Currently uses single HPC system, but would be faster with distributed systems • Central system that builds map would launch data simulations on available remote resources; output data from the simulations would be asynchronously delivered back to that central system as files incorporated in map as they are produced Thanks: Julian Borrill www.ci.anl.gov 16 3DPAS review for D3Science – d.katz@ieee.org www.ci.uchicago.edu
  • 17. Some Additional Applications • ATLAS/WLCG (Infrastructural) – Hierarchy of systems; data centrally stored, and locally cached (and copied to where they likely will be used), perhaps at various levels of the hierarchy – Processing is done by applications that are independent of each other – Processing of one data file is independent of processing of another file, but groups of processing results are collected to obtain statistical outputs about the data • LSST (Infrastructural) – Data taken by a telescope – Quick analysis is done at the telescope site for interesting (urgent) events (which may involve comparing new data with previous data) – System can get more data from other observatories if needed; request other observatories to take more data; or call a human – Data then transferred to an archive site, may be at observatory, where data are analyzed, reduced, and classified, some of which may be farmed out to grid resources – Detailed analysis of new data vs. archived data is performed – Reanalysis of all data is done periodically – Data are stored in files and databases www.ci.anl.gov 17 3DPAS review for D3Science – d.katz@ieee.org www.ci.uchicago.edu
  • 18. Some More Additional Applications • Virtual Astronomy (Archetypical) – Services are orchestrated through a pipeline, including a data retrieval service that is used to share data across VO sites – Data are moved through the pipeline, and intermediate and final products can be stored in Grid storage service • Marine (Sea Mammal) Sensors (Infrastructural) – Data are brought to a central site when sensors periodically transmit – Stored data are analyzed using statistical techniques, then visualized with tools such as Google Earth • Power Grids (Infrastructural) – Diverse streams arrive at a central utility private cloud at dynamic rates controlled by the application – Real-time event detection pipeline can trigger load curtailment operations – Data mining is performed on current and historical data for forecasting – Partial application execution on remote micro-grid sites is possible. www.ci.anl.gov 18 3DPAS review for D3Science – d.katz@ieee.org www.ci.uchicago.edu
  • 19. Even More Additional Applications • Industrial Incident Notification and Response (Infrastructural) – Data are streamed from diverse sources, and sometimes manually entered into the system – Disaster detection causes additional information sources to be requested from that region and applications to be composed based on available data – Some applications run on remote sites for data privacy – Escalation can cause more humans in the loop and additional operations • MODIS Data Processing (Traditional) – Data brought into system from various FTP servers – Pipeline of initial standardized processing steps on data is done on clouds or HPC resources – Scientists can then submit executables that do further custom processing on subsets of the data, which likely include some summarization processing (building graphs) www.ci.anl.gov 19 3DPAS review for D3Science – d.katz@ieee.org www.ci.uchicago.edu
  • 20. 3DPAS Vectors • DPA vectors – Execution Unit – Communication – Coordination – Execution Environment • What changes for D3 applications? – DPA already assumed distributed; data-intensive is somewhat orthogonal to vectors, last D is dynamic • So, what can be dynamic? – Data (in value or type) – Application (for archetypical and infrastructural applications) – Execution Environment • And how can the application respond? – All 3 vectors can change (under user control, or autonomically) www.ci.anl.gov 20 3DPAS review for D3Science – d.katz@ieee.org www.ci.uchicago.edu
  • 21. Infrastructure • Software infrastructure to support D3 applications and users exists at three levels: – System-level software capabilities (e.g., notifications, file system consistency) – Middleware (e.g., databases, metadata servers) – Programming systems, services and tools (e.g., data-centric workflows) • Strong connection between software infrastructure and execution units – Infrastructure supports the communication between and coordination of execution units, e.g., to allow co-scheduling • What changes for D3 applications? – Boundary between infrastructure and application often blurred o e.g., a catalog may be provided by underlying infrastructure or implemented in application – Sometimes infrastructure requires knowledge of data models o e.g., to support semantic information integration, triggers, optimized data transport • General need for infrastructure components to support – Data management: sources, storage, access, movement, discovery, notification, provenance – Data analysis: conversion, enrichment, analysis, workflow, calibration, integration www.ci.anl.gov 21 3DPAS review for D3Science – d.katz@ieee.org www.ci.uchicago.edu
  • 22. Programming Systems • Pipelines/workflows a key concept • Loosely, 3 stages for many applications – data collection, data storage, data analysis – But the order varies: Sometimes analysis is done during collection to reduce storage • Some stages are built from legacy (heritage) applications • Some applications don’t include all stages (some stages happen elsewhere; data is just “there”) • Stream processing also is important to some applications (or some stages) – the complete data can never be stored, and can only be accessed once in time • Issues that programming systems should address – Programming provisioning of resources – Use of existing services, or building of new services – How to adapt to changes? Autonomics? – Recording provenance www.ci.anl.gov 22 3DPAS review for D3Science – d.katz@ieee.org www.ci.uchicago.edu
  • 23. Programming Systems (2) • Possible change: replace ad hoc and scripted approaches by more formal workflow tools – Potential benefits: efficiency, productivity, reproducibility, increased software reuse, ability to add provenance tracking – Potential issues: can application-specific knowledge by used by generic tools? www.ci.anl.gov 23 3DPAS review for D3Science – d.katz@ieee.org www.ci.uchicago.edu
  • 24. Conclusions • D3 applications exist, the number is increasing • There are some similarities across some applications – Stages, streaming, dynamism and adaptivity – Probably means there are generic abstractions that could be used • Programming systems are somewhat ad hoc • We want generic tools that – Allow applications to adapt to dynamism in various elements o E.g., developers can find and use available systems at runtime, applications can run in the best location with respect to data sources – Provide good performance • Further research needed – How do we abstract the set of distributed systems to allow this? – What middleware and tools are needed? www.ci.anl.gov 24 3DPAS review for D3Science – d.katz@ieee.org www.ci.uchicago.edu