SlideShare une entreprise Scribd logo
1  sur  70
DataFinder:  Organizing the Data Chaos of Scientists PyCon UK 2008 (September 12 th , 2008, Birmingham) Andreas Schreiber < Andreas.Schreiber@dlr.de> German Aerospace Center (DLR), Cologne http://www.dlr.de/sc
[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],Sites and employees    Köln    Lampoldshausen    Stuttgart    Oberpfaffenhofen Braunschweig       Göttingen Berlin -      Bonn Trauen      Hamburg    Neustrelitz Weilheim    Bremen -  
Short Overview ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Introduction ,[object Object],[object Object]
Introduction Background ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Introduction Data Management Problem ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
DataFinder History Search for solution for scientific data management ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
DataFinder Development From Java Prototype to Python Product… ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Python for Scientists and Engineers Reasons for Python in Research and Industry ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],“ I want to design  planes,  not software!”
“ Python has the cleanest,  most-scientist- or engineer  friendly syntax and semantics. Paul F. Dubois Paul F. Dubois. Ten good practices in scientific programming. Comp. In Sci. Eng., Jan/Feb 1999, pp.7-11
DataFinder Overview Basic Concept ,[object Object],[object Object],[object Object]
WebDAV Web-based Distributed Authoring & Versioning ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
DataFinder Overview Client and Server ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
DataFinder Client Graphical User Interfaces User Client Administrator Client Implementation in Python with Qt/PyQt
DataFinder Server Supported WebDAV servers ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
WebDAV / Meta Data Server (1) Tamino WebDAV Server ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
WebDAV / Meta Data Server (2) Apache + mod_dav ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Apache Core Server mod_http mod_auth_ldap mod_dav mod_dav_fs File system
WebDAV / Meta Data Server (3) Catacomb ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Apache Core Server mod_http mod_auth_ldap mod_dav mod_dav_fs File system DB (MySQL) Catacomb mod_dav_repos
Mass Data Storage Data Stores Logical   View User   Client Storage  Locations
DataFinder  Technical Aspects ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Python API  User Client Extension with GUI import   threading from  datafinder.application  import  search_support from  datafinder.gui.user  import  facade def  searchAndDisplayResult(): &quot;&quot;&quot;Searches and displays the result in the  search result logging window. &quot;&quot;&quot; query =  &quot;displayname contains ‘test’ OR displayname == ‘ab’&quot; result = search_support.performSearch(query) resultLogger = facade.getSearchResultLogger() for path in result.keys(): resultLogger.info( &quot;Found item %s.&quot;  % path)  thread = threading.Thread(target=searchAndDisplayResult) thread.start()
Python API  Command Line Example (without GUI) # Get API from  datafinder.application  import  ExternalFacade externalFacade = ExternalFacade.getInstance() # Connect to a repository externalFacade.performBasicDatafinderSetup(username,    password,    startUrl) # Download the whole content rootItem = externalFacade.getRootWebdavServerItem() items = externalFacade.getCollectionContents(rootItem) for item in items: externalFacade.downloadFile(item, baseDirectory)
Additional “Batteries”… Used Libraries beyond the Python Standard Library (1) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Additional “Batteries”… Used Libraries beyond the Python Standard Library (2) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
WebDAV Client Library Support for DAV Extensions ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Simple Use Case: File Upload and Search
 
 
 
 
 
 
 
 
 
 
 
 
Working with DataFinder…
Configuration and Customization Preparing DataFinder for certain “use cases” ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
DataFinder Configuration Data Model and Data Stores ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Data Structure Mapping of Organizational Data Structures User Object (collection) Object (file) Relation Attributes (meta data) Project A Project B Project C File 1 File 2 Simulation I Experiment Simulation II Project Mega Code Ultra User Eddie Key Value Project Mega Code Ultra User Eddie Key Value Project Mega Code Ultra User Eddie Key Value Project Mega Code Ultra User Eddie Key Value Project Mega Code Ultra User Eddie Key Value Project Mega Code Ultra User Eddie Key Value Project Mega Code Ultra User Eddie Key Value Project Mega Code Ultra User Eddie Key Value Project Mega Code Ultra User Eddie Key Value
Meta Data ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Impact for Users ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],“ Damn! I’m a great scientist! I want freedom to have  my own directory layout…”
Customization   Python-Scripting for Extension and Automation ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
DataFinder Scripting  Downloading File and Starting Application # Download the selected file and try to execute it. from  datafinder.application  import  ExternalFacade from  guitools.easygui  import  * import  os from  tempfile  import  * from  win32api  import  ShellExecute # Get instance of ExternalFacade to access DataFinder API facade = ExternalFacade.getInstance() # Get currently selected collection in DataFinder Server-View  resource = facade.getSelectedResource() if  resource != None: tmpFile = mktemp(ressource.name) facade.downloadFile(resource, tmpFile) if  os.path.exists(tmpFile): ShellExecute(0, None, tmpFile,  &quot;&quot; ,  &quot;&quot; , 1) else : msgbox( &quot;No file selected to execute.&quot; )
Examples…
Example 1: Turbine Simulation
Example 1:  Fluid Dynamics Simulation Turbine Simulation ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Turbine Simulation Data Model
Turbine Simulation: Graphical User Interface
Turbine Simulation: Customized GUI Extensions ,[object Object],[object Object],[object Object],[object Object],[object Object],1 2 3 4 5
Turbine Simulation  Starting External Applications ,[object Object],[object Object],[object Object],1 2 3
Example 2: Automobile Supplier
Example 2:  Automobile Supplier DataFinder for Simulation and Data Management  ,[object Object],[object Object],[object Object],[object Object]
Automobile Supplier Data Model
Automobile Supplier Configuration of Customers Parameters
Automobile Supplier Management of Simulations ,[object Object],[object Object],[object Object],[object Object]
Automobile Supplier Upload, Download, and Versioning of Files ,[object Object],[object Object],[object Object]
Example 3: Air Traffic Management
Example 3:  Air Traffic Monitoring   Database for Air Traffic Monitoring ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Database for Air Traffic Monitoring Data Model and Data Migration
Database for Air Traffic Monitoring Data Import Wizard ,[object Object],[object Object],[object Object]
Database for Air Traffic Monitoring Search Results
Current Work and Future Plans  ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Am Ende… Hinweise ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Availability ,[object Object],[object Object],[object Object],[object Object]
Links ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Questions?

Contenu connexe

Tendances

Tendances (20)

Introduction to Lucidworks Fusion - Alexander Kanarsky, Lucidworks
Introduction to Lucidworks Fusion - Alexander Kanarsky, LucidworksIntroduction to Lucidworks Fusion - Alexander Kanarsky, Lucidworks
Introduction to Lucidworks Fusion - Alexander Kanarsky, Lucidworks
 
RSI at the HDF & HDF-EOS Workshop VI
RSI at the HDF & HDF-EOS Workshop VIRSI at the HDF & HDF-EOS Workshop VI
RSI at the HDF & HDF-EOS Workshop VI
 
Integration Patterns for Big Data Applications
Integration Patterns for Big Data ApplicationsIntegration Patterns for Big Data Applications
Integration Patterns for Big Data Applications
 
New Directions for Spark in 2015 - Spark Summit East
New Directions for Spark in 2015 - Spark Summit EastNew Directions for Spark in 2015 - Spark Summit East
New Directions for Spark in 2015 - Spark Summit East
 
Introducing DataFrames in Spark for Large Scale Data Science
Introducing DataFrames in Spark for Large Scale Data ScienceIntroducing DataFrames in Spark for Large Scale Data Science
Introducing DataFrames in Spark for Large Scale Data Science
 
Apache Flink - Hadoop MapReduce Compatibility
Apache Flink - Hadoop MapReduce CompatibilityApache Flink - Hadoop MapReduce Compatibility
Apache Flink - Hadoop MapReduce Compatibility
 
Open Archives Initiative Object Reuse and Exchange
Open Archives Initiative Object Reuse and ExchangeOpen Archives Initiative Object Reuse and Exchange
Open Archives Initiative Object Reuse and Exchange
 
SomeSlides
SomeSlidesSomeSlides
SomeSlides
 
GraphFrames: DataFrame-based graphs for Apache® Spark™
GraphFrames: DataFrame-based graphs for Apache® Spark™GraphFrames: DataFrame-based graphs for Apache® Spark™
GraphFrames: DataFrame-based graphs for Apache® Spark™
 
Advanced Analytics using Apache Hive
Advanced Analytics using Apache HiveAdvanced Analytics using Apache Hive
Advanced Analytics using Apache Hive
 
160606 data lifecycle project outline
160606 data lifecycle project outline160606 data lifecycle project outline
160606 data lifecycle project outline
 
Lifecycle of a Solr Search Request - Chris "Hoss" Hostetter, Lucidworks
Lifecycle of a Solr Search Request - Chris "Hoss" Hostetter, LucidworksLifecycle of a Solr Search Request - Chris "Hoss" Hostetter, Lucidworks
Lifecycle of a Solr Search Request - Chris "Hoss" Hostetter, Lucidworks
 
Automated Multiplatform Compilation and Validation of a Collaborative Reposit...
Automated Multiplatform Compilation and Validation of a Collaborative Reposit...Automated Multiplatform Compilation and Validation of a Collaborative Reposit...
Automated Multiplatform Compilation and Validation of a Collaborative Reposit...
 
Up-front Design Considerations in FHIR Data Modeling
Up-front Design Considerations in FHIR Data Modeling Up-front Design Considerations in FHIR Data Modeling
Up-front Design Considerations in FHIR Data Modeling
 
Graph Analytics in Spark
Graph Analytics in SparkGraph Analytics in Spark
Graph Analytics in Spark
 
Streaming analytics state of the art
Streaming analytics state of the artStreaming analytics state of the art
Streaming analytics state of the art
 
Big data or big deal
Big data or big dealBig data or big deal
Big data or big deal
 
A Data Ecosystem to Support Machine Learning in Materials Science
A Data Ecosystem to Support Machine Learning in Materials ScienceA Data Ecosystem to Support Machine Learning in Materials Science
A Data Ecosystem to Support Machine Learning in Materials Science
 
Relevance in the Wild - Daniel Gomez Vilanueva, Findwise
Relevance in the Wild - Daniel Gomez Vilanueva, FindwiseRelevance in the Wild - Daniel Gomez Vilanueva, Findwise
Relevance in the Wild - Daniel Gomez Vilanueva, Findwise
 
Applied Semantic Search with Microsoft SQL Server
Applied Semantic Search with Microsoft SQL ServerApplied Semantic Search with Microsoft SQL Server
Applied Semantic Search with Microsoft SQL Server
 

Similaire à Organizing the Data Chaos of Scientists

DataFinder concepts and example: General (20100503)
DataFinder concepts and example: General (20100503)DataFinder concepts and example: General (20100503)
DataFinder concepts and example: General (20100503)
Data Finder
 
Googleappengineintro 110410190620-phpapp01
Googleappengineintro 110410190620-phpapp01Googleappengineintro 110410190620-phpapp01
Googleappengineintro 110410190620-phpapp01
Tony Frame
 
Off-Label Data Mesh: A Prescription for Healthier Data
Off-Label Data Mesh: A Prescription for Healthier DataOff-Label Data Mesh: A Prescription for Healthier Data
Off-Label Data Mesh: A Prescription for Healthier Data
HostedbyConfluent
 

Similaire à Organizing the Data Chaos of Scientists (20)

DataFinder concepts and example: General (20100503)
DataFinder concepts and example: General (20100503)DataFinder concepts and example: General (20100503)
DataFinder concepts and example: General (20100503)
 
Enterprise guide to building a Data Mesh
Enterprise guide to building a Data MeshEnterprise guide to building a Data Mesh
Enterprise guide to building a Data Mesh
 
PyModESt: A Python Framework for Staging of Geo-referenced Data on the Coll...
PyModESt: A Python Framework for Staging of Geo-referenced Data on the Coll...PyModESt: A Python Framework for Staging of Geo-referenced Data on the Coll...
PyModESt: A Python Framework for Staging of Geo-referenced Data on the Coll...
 
The Enterprise Guide to Building a Data Mesh - Introducing SpecMesh
The Enterprise Guide to Building a Data Mesh - Introducing SpecMeshThe Enterprise Guide to Building a Data Mesh - Introducing SpecMesh
The Enterprise Guide to Building a Data Mesh - Introducing SpecMesh
 
Modernizing Your Data Warehouse using APS
Modernizing Your Data Warehouse using APSModernizing Your Data Warehouse using APS
Modernizing Your Data Warehouse using APS
 
Company Visitor Management System Report.docx
Company Visitor Management System Report.docxCompany Visitor Management System Report.docx
Company Visitor Management System Report.docx
 
A Gen3 Perspective of Disparate Data
A Gen3 Perspective of Disparate DataA Gen3 Perspective of Disparate Data
A Gen3 Perspective of Disparate Data
 
Data Ingestion in Big Data and IoT platforms
Data Ingestion in Big Data and IoT platformsData Ingestion in Big Data and IoT platforms
Data Ingestion in Big Data and IoT platforms
 
File Repository on GAE
File Repository on GAEFile Repository on GAE
File Repository on GAE
 
Practical OData
Practical ODataPractical OData
Practical OData
 
Datalake Architecture
Datalake ArchitectureDatalake Architecture
Datalake Architecture
 
Googleappengineintro 110410190620-phpapp01
Googleappengineintro 110410190620-phpapp01Googleappengineintro 110410190620-phpapp01
Googleappengineintro 110410190620-phpapp01
 
20090701 Climate Data Staging
20090701 Climate Data Staging20090701 Climate Data Staging
20090701 Climate Data Staging
 
Off-Label Data Mesh: A Prescription for Healthier Data
Off-Label Data Mesh: A Prescription for Healthier DataOff-Label Data Mesh: A Prescription for Healthier Data
Off-Label Data Mesh: A Prescription for Healthier Data
 
Microsoft Build 2023 Updates – Copilot Stack and Azure OpenAI Service (Machin...
Microsoft Build 2023 Updates – Copilot Stack and Azure OpenAI Service (Machin...Microsoft Build 2023 Updates – Copilot Stack and Azure OpenAI Service (Machin...
Microsoft Build 2023 Updates – Copilot Stack and Azure OpenAI Service (Machin...
 
Data Science with the Help of Metadata
Data Science with the Help of MetadataData Science with the Help of Metadata
Data Science with the Help of Metadata
 
Understanding the Windows Azure Platform - Dec 2010
Understanding the Windows Azure Platform - Dec 2010Understanding the Windows Azure Platform - Dec 2010
Understanding the Windows Azure Platform - Dec 2010
 
Apache Arrow: Open Source Standard Becomes an Enterprise Necessity
Apache Arrow: Open Source Standard Becomes an Enterprise NecessityApache Arrow: Open Source Standard Becomes an Enterprise Necessity
Apache Arrow: Open Source Standard Becomes an Enterprise Necessity
 
ArcReady - Architecting For The Cloud
ArcReady - Architecting For The CloudArcReady - Architecting For The Cloud
ArcReady - Architecting For The Cloud
 
Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...
Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...
Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...
 

Plus de Andreas Schreiber

Plus de Andreas Schreiber (20)

Provenance-based Security Audits and its Application to COVID-19 Contact Trac...
Provenance-based Security Audits and its Application to COVID-19 Contact Trac...Provenance-based Security Audits and its Application to COVID-19 Contact Trac...
Provenance-based Security Audits and its Application to COVID-19 Contact Trac...
 
Visualization of Software Architectures in Virtual Reality and Augmented Reality
Visualization of Software Architectures in Virtual Reality and Augmented RealityVisualization of Software Architectures in Virtual Reality and Augmented Reality
Visualization of Software Architectures in Virtual Reality and Augmented Reality
 
Provenance as a building block for an open science infrastructure
Provenance as a building block for an open science infrastructureProvenance as a building block for an open science infrastructure
Provenance as a building block for an open science infrastructure
 
Raising Awareness about Open Source Licensing at the German Aerospace Center
Raising Awareness about Open Source Licensing at the German Aerospace CenterRaising Awareness about Open Source Licensing at the German Aerospace Center
Raising Awareness about Open Source Licensing at the German Aerospace Center
 
Open Source Licensing for Rocket Scientists
Open Source Licensing for Rocket ScientistsOpen Source Licensing for Rocket Scientists
Open Source Licensing for Rocket Scientists
 
Interactive Visualization of Software Components with Virtual Reality Headsets
Interactive Visualization of Software Components with Virtual Reality HeadsetsInteractive Visualization of Software Components with Virtual Reality Headsets
Interactive Visualization of Software Components with Virtual Reality Headsets
 
Provenance for Reproducible Data Science
Provenance for Reproducible Data ScienceProvenance for Reproducible Data Science
Provenance for Reproducible Data Science
 
Visualizing Provenance using Comics
Visualizing Provenance using ComicsVisualizing Provenance using Comics
Visualizing Provenance using Comics
 
Quantified Self Comics
Quantified Self ComicsQuantified Self Comics
Quantified Self Comics
 
Nachvollziehbarkeit mit Hinblick auf Privacy-Verletzungen
Nachvollziehbarkeit mit Hinblick auf Privacy-VerletzungenNachvollziehbarkeit mit Hinblick auf Privacy-Verletzungen
Nachvollziehbarkeit mit Hinblick auf Privacy-Verletzungen
 
Reproducible Science with Python
Reproducible Science with PythonReproducible Science with Python
Reproducible Science with Python
 
Python at Warp Speed
Python at Warp SpeedPython at Warp Speed
Python at Warp Speed
 
A Provenance Model for Quantified Self Data
A Provenance Model for Quantified Self DataA Provenance Model for Quantified Self Data
A Provenance Model for Quantified Self Data
 
Open Source im DLR
Open Source im DLROpen Source im DLR
Open Source im DLR
 
Tracking after Stroke: Doctors, Dogs and All The Rest
Tracking after Stroke: Doctors, Dogs and All The RestTracking after Stroke: Doctors, Dogs and All The Rest
Tracking after Stroke: Doctors, Dogs and All The Rest
 
High Throughput Processing of Space Debris Data
High Throughput Processing of Space Debris DataHigh Throughput Processing of Space Debris Data
High Throughput Processing of Space Debris Data
 
Bericht von der QS15 Conference & Exposition
Bericht von der QS15 Conference & ExpositionBericht von der QS15 Conference & Exposition
Bericht von der QS15 Conference & Exposition
 
Telemedizin: Gesundheit, messbar für jedermann
Telemedizin: Gesundheit, messbar für jedermannTelemedizin: Gesundheit, messbar für jedermann
Telemedizin: Gesundheit, messbar für jedermann
 
Big Python
Big PythonBig Python
Big Python
 
Quantified Self mit Wearable Devices und Smartphone-Sensoren
Quantified Self mit Wearable Devices und Smartphone-SensorenQuantified Self mit Wearable Devices und Smartphone-Sensoren
Quantified Self mit Wearable Devices und Smartphone-Sensoren
 

Dernier

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 

Dernier (20)

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 

Organizing the Data Chaos of Scientists

  • 1. DataFinder: Organizing the Data Chaos of Scientists PyCon UK 2008 (September 12 th , 2008, Birmingham) Andreas Schreiber < Andreas.Schreiber@dlr.de> German Aerospace Center (DLR), Cologne http://www.dlr.de/sc
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11. “ Python has the cleanest, most-scientist- or engineer friendly syntax and semantics. Paul F. Dubois Paul F. Dubois. Ten good practices in scientific programming. Comp. In Sci. Eng., Jan/Feb 1999, pp.7-11
  • 12.
  • 13.
  • 14.
  • 15. DataFinder Client Graphical User Interfaces User Client Administrator Client Implementation in Python with Qt/PyQt
  • 16.
  • 17.
  • 18.
  • 19.
  • 20. Mass Data Storage Data Stores Logical View User Client Storage Locations
  • 21.
  • 22. Python API User Client Extension with GUI import threading from datafinder.application import search_support from datafinder.gui.user import facade def searchAndDisplayResult(): &quot;&quot;&quot;Searches and displays the result in the search result logging window. &quot;&quot;&quot; query = &quot;displayname contains ‘test’ OR displayname == ‘ab’&quot; result = search_support.performSearch(query) resultLogger = facade.getSearchResultLogger() for path in result.keys(): resultLogger.info( &quot;Found item %s.&quot; % path) thread = threading.Thread(target=searchAndDisplayResult) thread.start()
  • 23. Python API Command Line Example (without GUI) # Get API from datafinder.application import ExternalFacade externalFacade = ExternalFacade.getInstance() # Connect to a repository externalFacade.performBasicDatafinderSetup(username, password, startUrl) # Download the whole content rootItem = externalFacade.getRootWebdavServerItem() items = externalFacade.getCollectionContents(rootItem) for item in items: externalFacade.downloadFile(item, baseDirectory)
  • 24.
  • 25.
  • 26.
  • 27. Simple Use Case: File Upload and Search
  • 28.  
  • 29.  
  • 30.  
  • 31.  
  • 32.  
  • 33.  
  • 34.  
  • 35.  
  • 36.  
  • 37.  
  • 38.  
  • 39.  
  • 41.
  • 42.
  • 43. Data Structure Mapping of Organizational Data Structures User Object (collection) Object (file) Relation Attributes (meta data) Project A Project B Project C File 1 File 2 Simulation I Experiment Simulation II Project Mega Code Ultra User Eddie Key Value Project Mega Code Ultra User Eddie Key Value Project Mega Code Ultra User Eddie Key Value Project Mega Code Ultra User Eddie Key Value Project Mega Code Ultra User Eddie Key Value Project Mega Code Ultra User Eddie Key Value Project Mega Code Ultra User Eddie Key Value Project Mega Code Ultra User Eddie Key Value Project Mega Code Ultra User Eddie Key Value
  • 44.
  • 45.
  • 46.
  • 47. DataFinder Scripting Downloading File and Starting Application # Download the selected file and try to execute it. from datafinder.application import ExternalFacade from guitools.easygui import * import os from tempfile import * from win32api import ShellExecute # Get instance of ExternalFacade to access DataFinder API facade = ExternalFacade.getInstance() # Get currently selected collection in DataFinder Server-View resource = facade.getSelectedResource() if resource != None: tmpFile = mktemp(ressource.name) facade.downloadFile(resource, tmpFile) if os.path.exists(tmpFile): ShellExecute(0, None, tmpFile, &quot;&quot; , &quot;&quot; , 1) else : msgbox( &quot;No file selected to execute.&quot; )
  • 49. Example 1: Turbine Simulation
  • 50.
  • 51.
  • 53.
  • 54.
  • 56.
  • 58. Automobile Supplier Configuration of Customers Parameters
  • 59.
  • 60.
  • 61. Example 3: Air Traffic Management
  • 62.
  • 63. Database for Air Traffic Monitoring Data Model and Data Migration
  • 64.
  • 65. Database for Air Traffic Monitoring Search Results
  • 66.
  • 67.
  • 68.
  • 69.