SlideShare une entreprise Scribd logo
1  sur  19
searching for KDD in MDS standards…
        …the DAME experience
 Marianna Annunziatella, Massimo Brescia, Stefano Cavuoti, Raffaele D’Abrusco, George
     S. Djorgovski, Ciro Donalek, Mauro Garofalo , Marisa Guglielmo, Omar Laurino,
 Giuseppe Longo, Ashish Mahabal, Ettore Mancini, Francesco Manna, Amata Mercurio,
  Alfonso Nocella, Maurizio Paolillo, Luca Pellecchia, Sandro Riccardi, Giovanni Vebber,
                                      Civita Vellucci.

                      Department of Physics – University Federico II – Napoli
     INAF – National Institute of Astrophysics – Capodimonte Astronomical Observatory – Napoli
                       CALTECH – California Institute of Technology - Pasadena
Data Mining (KDD) as the Fourth
      Paradigm Of Science




Definition
DM is the exploration and analysis of large quantities of data in order
to discover meaningful patterns and rules
                 M. Brescia et al. – IVOA Interop Meeting – Napoli, May 2011
The BoK’s Problem
Limited number of problems due to limited number of reliable BoKs

 So far
 • Limited number of BoK (and of limited scope) available
 • Painstaking work for each application (es. spectroscopic redshifts for photometric
    redshifts training)
 • Fine tuning on specific data sets needed (e.g., if you add a band you need to re-train the
    methods)
  • There’s a need of standardization and interoperability between data together
  with DM application


                  Community believes AI/DM methods are black boxes
            You feed in something, and obtain patters, trends, i.e. knowledge….




                         M. Brescia et al. – IVOA Interop Meeting – Napoli, May 2011
What DAME is
DAME Program is a joint effort between University Federico II, INAF-OACN, and Caltech aimed at
implementing (as web applications and services) a scientific gateway for massive data analysis,
exploration and mining, on top of a virtualized distributed computing environment.

                                                                    http://dame.dsf.unina.it/
                                                                    Technical and management info
                                                                    Documents
                                                                    Science cases
                                                                    Newsletters




  http://dame.dsf.unina.it/beta_info.html
  DAMEWARE Web application Beta Version
                         M. Brescia et al. – IVOA Interop Meeting – Napoli, May 2011
DM 4-rule virtuous cycle
                                                       •     Virtuous cycle implementation steps:
•  Finding patterns is not enough                             – Transforming data into information
•  Science business must:                                        via:
– Respond to patterns by taking action                             • Hypothesis testing
                                                                   • Profiling
– Turning:
                                                                   • Predictive modeling
         • Data into Information                              – Taking action
         • Information into Action                                 • Model deployment
         • Action into Value                                       • Scoring
• Hence, the Virtuous Cycle of DM:                            – Measurement
                                                                   • Assessing a model’s stability &
                                                                      effectiveness before it is used
             1.    Identify the problem

             2.    Mining data to transform it into actionable information

             3.    Acting on the information

             4.    Measuring the results


                      M. Brescia et al. – IVOA Interop Meeting – Napoli, May 2011
DM: 11-step Methodology
 The four rules reflect into an 11-step exploded strategy, at the base of DAME concept


1.    Translate any opportunity (science case) into DM opportunity (problem)
2.    Select appropriate data
3.    Get to know the data
4.    Create a model set
5.    Fix problems with the data
6.    Transform data to bring information
7.    Build models
8.    Assess models
9.    Deploy models
10.   Assess results
11.   Begin again (GOTO 1)
                       M. Brescia et al. – IVOA Interop Meeting – Napoli, May 2011
Effective DM process break-down




         M. Brescia et al. – IVOA Interop Meeting – Napoli, May 2011
The Black box Infrastructure
In this scenario DAME (Data Mining & Exploration) project, starting from astrophysics
requirements domain, has investigated the Massive Data Sets (MDS) exploration by
producing a taxonomy of data mining applications (hereinafter called functionalities)
and collected a set of machine learning algorithms (hereinafter called models).

This association functionality-model is made of what we defined "use case", easily
configurable by the user through specific tutorials. At low level, any experiment
launched on the DAME framework, externally configurable through dynamical
interactive web pages, is treated in a standard way, making completely transparent to
the user the specific computing infrastructure used and specific data format given as
input.

So the user doesn’t need to know anything about the computing infrastructure and
almost nothing about the internal mechanisms of the chosen machine learning
model..




                      M. Brescia et al. – IVOA Interop Meeting – Napoli, May 2011
DAME Infrastructure
                                    DR Storage DR Execution

           GRID SE                                         GRID UI                    GRID CE
User & Data Archives                                                                 DM Models Job Execution
(300 TB dedicated)                                                                           (300 multi-core
                                                                                                 processors)




                                                                                        Cloud facilities
                                                                                        16 TB
                                                                                        15 processors

                       M. Brescia et al. – IVOA Interop Meeting – Napoli, May 2011
DAME SW Architecture




    M. Brescia et al. – IVOA Interop Meeting – Napoli, May 2011
The Available Services
DAMEWARE Web Application Resource
Main service providing via browser a list of algorithms and tools to configure and launch
experiments as complete workflows (dataset creation, model setup and run, graphical/text
output):
• Functionalities: Regression, Classification, Image Segmentation, Multi-layer Clustering;
• Models: MLP+BP, MLP+GA, SVM, MLP+QNA, K-Means (through KNIME), PPS, SOM, NEXT-II;

VOGCLUSTERS
Web Application for data and text mining on globular clusters;

STraDiWA (Sky Transient Discovery Web Application)
detect variable objects from real or simulated images (under R&D);

WFXT (Wide Field X-Ray Telescope) Transient Calculator
Web service to estimate the number of transient and variable sources that can be detected by
WFXT within the 3 main planned extragalactic surveys, with a given significant threshold;

SDSS (Sloan Digital Sky Survey)
Local mirror website hosting a complete SDSS Data Archive and Exploration System;
                         M. Brescia et al. – IVOA Interop Meeting – Napoli, May 2011
K-Means (through KNIME)




                                                                             KNIME WORKFLOW
                                                                                  Offline
                                                                                  creation
   OUTPUT

                               DM PLUG-IN COMPONENT

                                                                                   Offline
              EXECUTION
                                                                  Offline          creation
                                                                  creation




                                                          DMM API COMPONENT
CLOUD EXE/STORAGE ENVIRONMENT
               M. Brescia et al. – IVOA Interop Meeting – Napoli, May 2011
Web 2.0 Features in DAME
Web 2.0? It is a system that breaks with the old model of centralized Web sites
and moves the power of the Web/Internet to the desktop. [J. Robb]
the Web becomes a universal, standards-based integration platform. [S. Dietzen]




                     M. Brescia et al. – IVOA Interop Meeting – Napoli, May 2011
VO Interoperability scenarios
DA1       SAMP                Full interoperability between DA (Desktop Applications)
                              Local user desktop fully involved (requires computing
           DA2                power)


                              Full WA  DA interoperability
          WSC                 Partial DA  WA interoperability (such as remote file
 DA
                              storing)
                              MDS must be moved between local and remote apps
MASSIVE    WA
                              Local user desktop partially involved (requires minor
 DATA
                              computing and storage power)
 SETS

                              Except from URI exchange, no standard interoperabilty
WA1       URI?
                              Different accounting policy
                              MDS must be moved between remote apps (but larger
MASSIVE    WA2
                              bandwidth)
 DATA
                              No local computing power required
 SETS
                 M. Brescia et al. – IVOA Interop Meeting – Napoli, May 2011
Our vision: improving aspects
                                    DAs has to become WAs
                                    Unique accounting policy (google/Microsoft like)
   WA1           plugins
                                    To overcome MDS flow apps must be plug&play (e.g.
                                    any WAx feature should be pluggable in WAy on
                   WA2              demand)
                                    No local computing power required. Also smartphones
                                    can run VO apps


                            Requirements
• Standard accounting system;
• No more MDS moving on the web, but just moving Apps, structured as plugin
repositories and execution environments;
• standard modeling of WA and components to obtain the maximum level of granularity;
• Evolution of SAMP architecture to extend web interoperability (in particular for the
migration of the plugins);



                       M. Brescia et al. – IVOA Interop Meeting – Napoli, May 2011
Our vision: plugin granularity flow

    WAx                                                                     WAy
   Px-1                                                                    Py-1

   Px-2                                                                    Py-2

   Px-3                                                                    Py-…

   Px-…                                                                    Py-n

   Px-n                                                                    Px-3
                          3. Way execute Px-3



          This scheme could be iterated and extended
              involving all standardized web apps



             M. Brescia et al. – IVOA Interop Meeting – Napoli, May 2011
The Lernaean Hydra VO KDD App




      M. Brescia et al. – IVOA Interop Meeting – Napoli, May 2011
The Lernaean Hydra VO KDD App
After a certain number of such iterations…

                               The VO KDD App scenario
         WAx                         will become:                                   WAy
                              No different WAs, but simply
       Px-1                    one WA with several sites                           Py-1
                             (eventually with different GUIs
       Px-2                                                                        Py-2
                             and computing environments)
       Px-3                     All WA sites can become a                          Py-…
                                mirror site of all the others
       Px-…                                                                        Py-n

       Px-n                   The synchronization of plugin
                                                                                   Px-1
                                releases between WAs is
       Py-1                    performed at request time                           Px-2

       Py-2                   Minimization of data exchange                        Px-3
                             flow (just few plugins in case of
       Py-…                     synchronization between                            Px-…
                                          mirrors)
       Py-n                                                                        Px-n
                     M. Brescia et al. – IVOA Interop Meeting – Napoli, May 2011
Conclusions
DAME was not originally conceived (for the lack of suitable standards) to
be interoperable with the VO, but offers a good benchmark to plan for
the future developments of KDD on MDS in a VO environment.

1. DAME is just an example of what new ICT (Web 2.0) can do for A&A
KDD problems.
2. A new vision of the KDD App approach, suitable for VO must be based
on the minimization of data transfer and maximization of
interoperability within the VO community.
3. If implemented, the new scheme can reach a wider science
community by giving the opportunity to share data and apps worldwide,
without any particular infrastructure requirements (i.e. by using a
simple smartphone with a low-band connection).
 DAME group is currently involved in the definition of standards and rules and is working to
 modify and adapt the present infrastructure to become compliant with the VO.
                        M. Brescia et al. – IVOA Interop Meeting – Napoli, May 2011

Contenu connexe

En vedette

Design Domicile Showcase
Design Domicile ShowcaseDesign Domicile Showcase
Design Domicile ShowcaseMichael M Grant
 
0.0 sds course introduction vezzoli 10-11 (46)
0.0 sds course introduction vezzoli 10-11 (46)0.0 sds course introduction vezzoli 10-11 (46)
0.0 sds course introduction vezzoli 10-11 (46)LeNS_slide
 
Steve Tagger and the Minnesota Digital Library
Steve Tagger and the Minnesota Digital LibrarySteve Tagger and the Minnesota Digital Library
Steve Tagger and the Minnesota Digital Libraryscottsayre
 
Brescia program management_dame-na-pre-0030
Brescia program management_dame-na-pre-0030Brescia program management_dame-na-pre-0030
Brescia program management_dame-na-pre-0030INAF-OAC
 
Citizen Volunteerism and Urban Interaction Design
Citizen Volunteerism and Urban Interaction DesignCitizen Volunteerism and Urban Interaction Design
Citizen Volunteerism and Urban Interaction Designsbisker
 
Cadei - input2012
Cadei - input2012Cadei - input2012
Cadei - input2012INPUT 2012
 
Study: The Future of VR, AR and Self-Driving Cars
Study: The Future of VR, AR and Self-Driving CarsStudy: The Future of VR, AR and Self-Driving Cars
Study: The Future of VR, AR and Self-Driving CarsLinkedIn
 

En vedette (9)

Design Domicile Showcase
Design Domicile ShowcaseDesign Domicile Showcase
Design Domicile Showcase
 
0.0 sds course introduction vezzoli 10-11 (46)
0.0 sds course introduction vezzoli 10-11 (46)0.0 sds course introduction vezzoli 10-11 (46)
0.0 sds course introduction vezzoli 10-11 (46)
 
Steve Tagger and the Minnesota Digital Library
Steve Tagger and the Minnesota Digital LibrarySteve Tagger and the Minnesota Digital Library
Steve Tagger and the Minnesota Digital Library
 
Brescia program management_dame-na-pre-0030
Brescia program management_dame-na-pre-0030Brescia program management_dame-na-pre-0030
Brescia program management_dame-na-pre-0030
 
C2C Network Good Practice Handbook
C2C Network Good Practice HandbookC2C Network Good Practice Handbook
C2C Network Good Practice Handbook
 
Citizen Volunteerism and Urban Interaction Design
Citizen Volunteerism and Urban Interaction DesignCitizen Volunteerism and Urban Interaction Design
Citizen Volunteerism and Urban Interaction Design
 
Bresciaclass Report Long
Bresciaclass Report LongBresciaclass Report Long
Bresciaclass Report Long
 
Cadei - input2012
Cadei - input2012Cadei - input2012
Cadei - input2012
 
Study: The Future of VR, AR and Self-Driving Cars
Study: The Future of VR, AR and Self-Driving CarsStudy: The Future of VR, AR and Self-Driving Cars
Study: The Future of VR, AR and Self-Driving Cars
 

Similaire à Dame ivoa interop_brescia_naples2011

Cloud Programming Models: eScience, Big Data, etc.
Cloud Programming Models: eScience, Big Data, etc.Cloud Programming Models: eScience, Big Data, etc.
Cloud Programming Models: eScience, Big Data, etc.Alexandru Iosup
 
Ectel nods v2
Ectel nods v2Ectel nods v2
Ectel nods v2nodenot
 
A Cloud Multimedia Platform
A Cloud Multimedia PlatformA Cloud Multimedia Platform
A Cloud Multimedia PlatformDejan Kovachev
 
IPv4 to IPv6 network transformation
IPv4 to IPv6 network transformationIPv4 to IPv6 network transformation
IPv4 to IPv6 network transformationNikolay Milovanov
 
Mmsys slideshare-intel-nokia
Mmsys slideshare-intel-nokiaMmsys slideshare-intel-nokia
Mmsys slideshare-intel-nokiaRufael Mekuria
 
Image transformation using grid(synopsis)
Image transformation using grid(synopsis)Image transformation using grid(synopsis)
Image transformation using grid(synopsis)Mumbai Academisc
 
The Role of Machine Learning in Fluid Network Control and Data Planes.pdf
The Role of Machine Learning in Fluid Network Control and Data Planes.pdfThe Role of Machine Learning in Fluid Network Control and Data Planes.pdf
The Role of Machine Learning in Fluid Network Control and Data Planes.pdfFörderverein Technische Fakultät
 
Big Data Beyond Hadoop*: Research Directions for the Future
Big Data Beyond Hadoop*: Research Directions for the FutureBig Data Beyond Hadoop*: Research Directions for the Future
Big Data Beyond Hadoop*: Research Directions for the FutureOdinot Stanislas
 
Zsl cloud-application migration-8_phased_approach
Zsl cloud-application migration-8_phased_approachZsl cloud-application migration-8_phased_approach
Zsl cloud-application migration-8_phased_approachzslmarketing
 
Population Management in Clouds is a Do-It-Yourself Technology
Population Management in Clouds is a Do-It-Yourself TechnologyPopulation Management in Clouds is a Do-It-Yourself Technology
Population Management in Clouds is a Do-It-Yourself TechnologyTokyo University of Science
 
ACES QuakeSim 2011
ACES QuakeSim 2011ACES QuakeSim 2011
ACES QuakeSim 2011marpierc
 
1-s2.0-S0957417422020759-main.pdf
1-s2.0-S0957417422020759-main.pdf1-s2.0-S0957417422020759-main.pdf
1-s2.0-S0957417422020759-main.pdfarchurssu
 
NLP on Hadoop: A Distributed Framework for NLP-Based Keyword and Keyphrase Ex...
NLP on Hadoop: A Distributed Framework for NLP-Based Keyword and Keyphrase Ex...NLP on Hadoop: A Distributed Framework for NLP-Based Keyword and Keyphrase Ex...
NLP on Hadoop: A Distributed Framework for NLP-Based Keyword and Keyphrase Ex...Paolo Nesi
 
Curriculum Vitae
Curriculum VitaeCurriculum Vitae
Curriculum Vitaebutest
 
Application-Aware Big Data Deduplication in Cloud Environment
Application-Aware Big Data Deduplication in Cloud EnvironmentApplication-Aware Big Data Deduplication in Cloud Environment
Application-Aware Big Data Deduplication in Cloud EnvironmentSafayet Hossain
 
Tim Malthus_Towards standards for the exchange of field spectral datasets
Tim Malthus_Towards standards for the exchange of field spectral datasetsTim Malthus_Towards standards for the exchange of field spectral datasets
Tim Malthus_Towards standards for the exchange of field spectral datasetsTERN Australia
 

Similaire à Dame ivoa interop_brescia_naples2011 (20)

Cloud Programming Models: eScience, Big Data, etc.
Cloud Programming Models: eScience, Big Data, etc.Cloud Programming Models: eScience, Big Data, etc.
Cloud Programming Models: eScience, Big Data, etc.
 
Work Package 3 - Month 6 by Christian Morbidoni
Work Package 3 - Month 6 by Christian MorbidoniWork Package 3 - Month 6 by Christian Morbidoni
Work Package 3 - Month 6 by Christian Morbidoni
 
Ectel nods v2
Ectel nods v2Ectel nods v2
Ectel nods v2
 
2017 dagstuhl-nfv-rothenberg
2017 dagstuhl-nfv-rothenberg2017 dagstuhl-nfv-rothenberg
2017 dagstuhl-nfv-rothenberg
 
A Cloud Multimedia Platform
A Cloud Multimedia PlatformA Cloud Multimedia Platform
A Cloud Multimedia Platform
 
IPv4 to IPv6 network transformation
IPv4 to IPv6 network transformationIPv4 to IPv6 network transformation
IPv4 to IPv6 network transformation
 
Dynamic formation of the distributed micro clouds
Dynamic formation of the distributed micro cloudsDynamic formation of the distributed micro clouds
Dynamic formation of the distributed micro clouds
 
Mmsys slideshare-intel-nokia
Mmsys slideshare-intel-nokiaMmsys slideshare-intel-nokia
Mmsys slideshare-intel-nokia
 
Image transformation using grid(synopsis)
Image transformation using grid(synopsis)Image transformation using grid(synopsis)
Image transformation using grid(synopsis)
 
The Role of Machine Learning in Fluid Network Control and Data Planes.pdf
The Role of Machine Learning in Fluid Network Control and Data Planes.pdfThe Role of Machine Learning in Fluid Network Control and Data Planes.pdf
The Role of Machine Learning in Fluid Network Control and Data Planes.pdf
 
Big Data Beyond Hadoop*: Research Directions for the Future
Big Data Beyond Hadoop*: Research Directions for the FutureBig Data Beyond Hadoop*: Research Directions for the Future
Big Data Beyond Hadoop*: Research Directions for the Future
 
Zsl cloud-application migration-8_phased_approach
Zsl cloud-application migration-8_phased_approachZsl cloud-application migration-8_phased_approach
Zsl cloud-application migration-8_phased_approach
 
Simms-fsci-madmps-2017
Simms-fsci-madmps-2017Simms-fsci-madmps-2017
Simms-fsci-madmps-2017
 
Population Management in Clouds is a Do-It-Yourself Technology
Population Management in Clouds is a Do-It-Yourself TechnologyPopulation Management in Clouds is a Do-It-Yourself Technology
Population Management in Clouds is a Do-It-Yourself Technology
 
ACES QuakeSim 2011
ACES QuakeSim 2011ACES QuakeSim 2011
ACES QuakeSim 2011
 
1-s2.0-S0957417422020759-main.pdf
1-s2.0-S0957417422020759-main.pdf1-s2.0-S0957417422020759-main.pdf
1-s2.0-S0957417422020759-main.pdf
 
NLP on Hadoop: A Distributed Framework for NLP-Based Keyword and Keyphrase Ex...
NLP on Hadoop: A Distributed Framework for NLP-Based Keyword and Keyphrase Ex...NLP on Hadoop: A Distributed Framework for NLP-Based Keyword and Keyphrase Ex...
NLP on Hadoop: A Distributed Framework for NLP-Based Keyword and Keyphrase Ex...
 
Curriculum Vitae
Curriculum VitaeCurriculum Vitae
Curriculum Vitae
 
Application-Aware Big Data Deduplication in Cloud Environment
Application-Aware Big Data Deduplication in Cloud EnvironmentApplication-Aware Big Data Deduplication in Cloud Environment
Application-Aware Big Data Deduplication in Cloud Environment
 
Tim Malthus_Towards standards for the exchange of field spectral datasets
Tim Malthus_Towards standards for the exchange of field spectral datasetsTim Malthus_Towards standards for the exchange of field spectral datasets
Tim Malthus_Towards standards for the exchange of field spectral datasets
 

Dernier

New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructureitnewsafrica
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observabilityitnewsafrica
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkPixlogix Infotech
 

Dernier (20)

New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App Framework
 

Dame ivoa interop_brescia_naples2011

  • 1. searching for KDD in MDS standards… …the DAME experience Marianna Annunziatella, Massimo Brescia, Stefano Cavuoti, Raffaele D’Abrusco, George S. Djorgovski, Ciro Donalek, Mauro Garofalo , Marisa Guglielmo, Omar Laurino, Giuseppe Longo, Ashish Mahabal, Ettore Mancini, Francesco Manna, Amata Mercurio, Alfonso Nocella, Maurizio Paolillo, Luca Pellecchia, Sandro Riccardi, Giovanni Vebber, Civita Vellucci. Department of Physics – University Federico II – Napoli INAF – National Institute of Astrophysics – Capodimonte Astronomical Observatory – Napoli CALTECH – California Institute of Technology - Pasadena
  • 2. Data Mining (KDD) as the Fourth Paradigm Of Science Definition DM is the exploration and analysis of large quantities of data in order to discover meaningful patterns and rules M. Brescia et al. – IVOA Interop Meeting – Napoli, May 2011
  • 3. The BoK’s Problem Limited number of problems due to limited number of reliable BoKs So far • Limited number of BoK (and of limited scope) available • Painstaking work for each application (es. spectroscopic redshifts for photometric redshifts training) • Fine tuning on specific data sets needed (e.g., if you add a band you need to re-train the methods) • There’s a need of standardization and interoperability between data together with DM application Community believes AI/DM methods are black boxes You feed in something, and obtain patters, trends, i.e. knowledge…. M. Brescia et al. – IVOA Interop Meeting – Napoli, May 2011
  • 4. What DAME is DAME Program is a joint effort between University Federico II, INAF-OACN, and Caltech aimed at implementing (as web applications and services) a scientific gateway for massive data analysis, exploration and mining, on top of a virtualized distributed computing environment. http://dame.dsf.unina.it/ Technical and management info Documents Science cases Newsletters http://dame.dsf.unina.it/beta_info.html DAMEWARE Web application Beta Version M. Brescia et al. – IVOA Interop Meeting – Napoli, May 2011
  • 5. DM 4-rule virtuous cycle • Virtuous cycle implementation steps: • Finding patterns is not enough – Transforming data into information • Science business must: via: – Respond to patterns by taking action • Hypothesis testing • Profiling – Turning: • Predictive modeling • Data into Information – Taking action • Information into Action • Model deployment • Action into Value • Scoring • Hence, the Virtuous Cycle of DM: – Measurement • Assessing a model’s stability & effectiveness before it is used 1. Identify the problem 2. Mining data to transform it into actionable information 3. Acting on the information 4. Measuring the results M. Brescia et al. – IVOA Interop Meeting – Napoli, May 2011
  • 6. DM: 11-step Methodology The four rules reflect into an 11-step exploded strategy, at the base of DAME concept 1. Translate any opportunity (science case) into DM opportunity (problem) 2. Select appropriate data 3. Get to know the data 4. Create a model set 5. Fix problems with the data 6. Transform data to bring information 7. Build models 8. Assess models 9. Deploy models 10. Assess results 11. Begin again (GOTO 1) M. Brescia et al. – IVOA Interop Meeting – Napoli, May 2011
  • 7. Effective DM process break-down M. Brescia et al. – IVOA Interop Meeting – Napoli, May 2011
  • 8. The Black box Infrastructure In this scenario DAME (Data Mining & Exploration) project, starting from astrophysics requirements domain, has investigated the Massive Data Sets (MDS) exploration by producing a taxonomy of data mining applications (hereinafter called functionalities) and collected a set of machine learning algorithms (hereinafter called models). This association functionality-model is made of what we defined "use case", easily configurable by the user through specific tutorials. At low level, any experiment launched on the DAME framework, externally configurable through dynamical interactive web pages, is treated in a standard way, making completely transparent to the user the specific computing infrastructure used and specific data format given as input. So the user doesn’t need to know anything about the computing infrastructure and almost nothing about the internal mechanisms of the chosen machine learning model.. M. Brescia et al. – IVOA Interop Meeting – Napoli, May 2011
  • 9. DAME Infrastructure DR Storage DR Execution GRID SE GRID UI GRID CE User & Data Archives DM Models Job Execution (300 TB dedicated) (300 multi-core processors) Cloud facilities 16 TB 15 processors M. Brescia et al. – IVOA Interop Meeting – Napoli, May 2011
  • 10. DAME SW Architecture M. Brescia et al. – IVOA Interop Meeting – Napoli, May 2011
  • 11. The Available Services DAMEWARE Web Application Resource Main service providing via browser a list of algorithms and tools to configure and launch experiments as complete workflows (dataset creation, model setup and run, graphical/text output): • Functionalities: Regression, Classification, Image Segmentation, Multi-layer Clustering; • Models: MLP+BP, MLP+GA, SVM, MLP+QNA, K-Means (through KNIME), PPS, SOM, NEXT-II; VOGCLUSTERS Web Application for data and text mining on globular clusters; STraDiWA (Sky Transient Discovery Web Application) detect variable objects from real or simulated images (under R&D); WFXT (Wide Field X-Ray Telescope) Transient Calculator Web service to estimate the number of transient and variable sources that can be detected by WFXT within the 3 main planned extragalactic surveys, with a given significant threshold; SDSS (Sloan Digital Sky Survey) Local mirror website hosting a complete SDSS Data Archive and Exploration System; M. Brescia et al. – IVOA Interop Meeting – Napoli, May 2011
  • 12. K-Means (through KNIME) KNIME WORKFLOW Offline creation OUTPUT DM PLUG-IN COMPONENT Offline EXECUTION Offline creation creation DMM API COMPONENT CLOUD EXE/STORAGE ENVIRONMENT M. Brescia et al. – IVOA Interop Meeting – Napoli, May 2011
  • 13. Web 2.0 Features in DAME Web 2.0? It is a system that breaks with the old model of centralized Web sites and moves the power of the Web/Internet to the desktop. [J. Robb] the Web becomes a universal, standards-based integration platform. [S. Dietzen] M. Brescia et al. – IVOA Interop Meeting – Napoli, May 2011
  • 14. VO Interoperability scenarios DA1 SAMP Full interoperability between DA (Desktop Applications) Local user desktop fully involved (requires computing DA2 power) Full WA  DA interoperability WSC Partial DA  WA interoperability (such as remote file DA storing) MDS must be moved between local and remote apps MASSIVE WA Local user desktop partially involved (requires minor DATA computing and storage power) SETS Except from URI exchange, no standard interoperabilty WA1 URI? Different accounting policy MDS must be moved between remote apps (but larger MASSIVE WA2 bandwidth) DATA No local computing power required SETS M. Brescia et al. – IVOA Interop Meeting – Napoli, May 2011
  • 15. Our vision: improving aspects DAs has to become WAs Unique accounting policy (google/Microsoft like) WA1 plugins To overcome MDS flow apps must be plug&play (e.g. any WAx feature should be pluggable in WAy on WA2 demand) No local computing power required. Also smartphones can run VO apps Requirements • Standard accounting system; • No more MDS moving on the web, but just moving Apps, structured as plugin repositories and execution environments; • standard modeling of WA and components to obtain the maximum level of granularity; • Evolution of SAMP architecture to extend web interoperability (in particular for the migration of the plugins); M. Brescia et al. – IVOA Interop Meeting – Napoli, May 2011
  • 16. Our vision: plugin granularity flow WAx WAy Px-1 Py-1 Px-2 Py-2 Px-3 Py-… Px-… Py-n Px-n Px-3 3. Way execute Px-3 This scheme could be iterated and extended involving all standardized web apps M. Brescia et al. – IVOA Interop Meeting – Napoli, May 2011
  • 17. The Lernaean Hydra VO KDD App M. Brescia et al. – IVOA Interop Meeting – Napoli, May 2011
  • 18. The Lernaean Hydra VO KDD App After a certain number of such iterations… The VO KDD App scenario WAx will become: WAy No different WAs, but simply Px-1 one WA with several sites Py-1 (eventually with different GUIs Px-2 Py-2 and computing environments) Px-3 All WA sites can become a Py-… mirror site of all the others Px-… Py-n Px-n The synchronization of plugin Px-1 releases between WAs is Py-1 performed at request time Px-2 Py-2 Minimization of data exchange Px-3 flow (just few plugins in case of Py-… synchronization between Px-… mirrors) Py-n Px-n M. Brescia et al. – IVOA Interop Meeting – Napoli, May 2011
  • 19. Conclusions DAME was not originally conceived (for the lack of suitable standards) to be interoperable with the VO, but offers a good benchmark to plan for the future developments of KDD on MDS in a VO environment. 1. DAME is just an example of what new ICT (Web 2.0) can do for A&A KDD problems. 2. A new vision of the KDD App approach, suitable for VO must be based on the minimization of data transfer and maximization of interoperability within the VO community. 3. If implemented, the new scheme can reach a wider science community by giving the opportunity to share data and apps worldwide, without any particular infrastructure requirements (i.e. by using a simple smartphone with a low-band connection). DAME group is currently involved in the definition of standards and rules and is working to modify and adapt the present infrastructure to become compliant with the VO. M. Brescia et al. – IVOA Interop Meeting – Napoli, May 2011